Tablecloth Backend Implementation

Tablecloth is the default backend for Datajure. It is an addition on the top of tech.ml.dataset, reorganising its existing functions into simple-to-use APIs.

Detailed documentations of Tablecloth can be found on its official website.

Dataset Construction

Datajure provides a function dataset to construct a Tablecloth dataset from an associative map. See Examples for specific usage.

Alternatively, you can also create the dataset using the function tc/dataset or tc/let-dataset provided by Tablecloth, with which you can create a dataset from:

  • single values
  • sequence of maps
  • map of sequences or values
  • sequence of columns (taken from other dataset or created manually)
  • sequence of pairs: [string column-data] or [keyword column-data]
  • array of any arrays
  • file types: raw/gzipped csv/tsv, json, xls(x) taken from local - file system or URL
  • input stream

For examples, please refer to the official document of Tablecloth.

Row Selection

The row selection operations are implemented with the tc/select-rows function in Tablecloth, which natively supports both "by-filter" selection and "by-index" selection. When multiple filters are provided, we use reduce to repeatedly apply tc/select-rows.

Column Selection

The column selection operations are implemented with the tc/select-columns function in Tablecloth.

Optional Selection

Group by

The grouping by operation is implemented using the tc/group-by function provided by Tablecloth, which accepts a regular dataset and returns a grouped dataset. The grouped dataset will then be handled by the aggregate functions.

Sort by

The sorting by operation is implemented using the tc/order-by function provided by Tablecloth.

Aggregate Function

Tablecloth lacks customized support for aggregate functions. Therefore, we need to use the tc/info function to generate relevant statistical data of relevant columns when performing grouping by operations, and select the required columns during column selection.

We defined the get-agg-key function to solve the naming problem of aggregated columns.



Copyright © 2024 Datajure
Powered by Cryogen