Datajure REPL Implementation
Starting from v1.1.0, each version of Datajure will come with a standalone REPL, which can run independently on devices with Java Runtime Environment (JRE) installed. In addition to including Datajure as a library into projects, users can also use Datajure REPL for data processing.
The Datajure REPL consists of two components:
- Uberjar Server and Client
- Shell Launcher
Uberjar Server and Client
The core functionality of the server and client is provided by REPL-y and nREPL, of which Datajure REPL has only limited customization. The relevant source codes are written in repl.dtj
and dsl.dtj
.
Datajure REPL does the following:
- Prints a welcome note.
- Launches an nREPL server, which writes to
.nrepl-port
for a text editor to connect to. - Starts a REPL(-y).
After that, the function init-eval
defined in dsl.clj
will be called. It will require all the needed libraries and put their APIs to the corresponding namespaces named after their standard abbreviations. The library name and the corresponding namespace correspond as follows:
Library | Namespace |
---|---|
tech.v3.dataset | ds |
tablecloth.api | tc |
clojask.dataframe | ck |
zero-one.geni.core | g |
datajure.dsl | dtj |
By default, the APIs of each backend are also introduced. This is to facilitate users to use Datajure syntax while also calling the more basic API provided by the backend. However, users can also choose to cancel them manually.
In its command line parameters, the Datajure REPL accepts files with the suffix .dtj
that contain Datajure statements. Technically, this is implemented as input stream redirection.
Shell Launcher
We provide users with shell scripts to automatically download and launch the Datajure REPL. It is divided into Linux/macOS version and Windows version. The former is implemented based on Bash, but it is also compatible with most other shells; the latter is implemented based on PowerShell, and it is not compatible with cmd.exe
.
The launcher scripts are stored in the scripts
folder of the main repository.
When the launcher is run, it will first try to parse parameters, defined by the following syntax:
datajure [<path>] [--force-download] [--proxy <https-proxy>] [--help]
<path>
is an optional input file that should be a Datajure source file (.dtj
suffix).
Afterwards, it will call the get_latest_release()
fucntion to check the latest update.
If no Datajure REPL Uberjar is found on the local device, or the --force-download
flag is specified, the launcher will download the latest version from GitHub releases.
In particular, if the --proxy
parameter is specified, all network communications above will be conducted through the proxy. This will be convenient for users in special network environments.
Thereafter, Datajure will run the Datajure REPL using java
in the system path. For compatibility with newer JRE versions, the following JVM options are specified:
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
The launcher for Windows will also automatically checks and downloads the Windows Utilities for Hadoop to ensure that the Datajure REPL runs smoothly.