Clojask: About

Talk on Clojure data-recur meeting
Benchmarks
Comparison/Advantages with other larger than memory systems
Clojask Library Ecosystem
Clojask Logic Flow Diagram

Talk on Clojure data-recur meeting

This talk is about the general information and status of the project as of Oct 2022. (From: 9:09 To: 58:02)

Benchmarks

Number of workers = 4

Operation	Dask (N=1.8M)	Dask (N=3.6M)	Dask (N=80M)*	Clojask (N=1.8M)	Clojask (N=3.6M)	Clojask (N=80M)
Element-wise operation	119.3	261.3	N/A	72.3	133.3	1836.6
Row-wise selection	115.0	232.0	N/A	67.9	145.6	1757.5
Aggregation	116.0	226.7	N/A	58.6	112.1	1236.9
Groupby-aggregate	116.7	229.3	N/A	459.4	803.1	25860.0
Left join	114.7	248.7	N/A	1174.4	2310.2	14007.9
Inner join	116.7	242.0	N/A	1138.8	2768.5	21609.3
Rolling join	-	-	-	2812.1	3943.1	> 28800

Remarks:

N = Number of lines in csv file
All benchmarks are in the unit of second (s)
All benchmarks are inclusive of the time used for importing necssary libraries, loading the dataframe from csv file and ouputting the processed dataframe to one single csv file.
*In the case of Dask (N=80M) the program could not manage to complete the operation in 7 hours

System info

'platform': 'Darwin',
'platform-release': '20.4.0',
'platform-version': 'Darwin Kernel Version 20.4.0: Thu Apr 22 21:46:47 PDT 2021; root:xnu-7195.101.2~1/RELEASE_X86_64',
'architecture': 'x86_64',
'processor': 'i386',
'ram': '8 GB'

Source code

The benchmarking code for Dask and Clojask could be found here respectively:

Comparison/Advantages with other larger than memory systems

Hadoop MapReduce

Functions	Clojask	Hadoop MapReduce
Larger-than-memory source file	✅	✅
Write intermediate results to tmp files	✅	✅
MapReduce paradigm	✅	✅
Join, filter, aggregate, etc. on large files	✅	❌

Spark

Functions	Clojask	Spark
Construct operations' DAG	❌	✅
Join, filter, aggregate, etc	✅	✅
Cache intermediate results between stages in memory	❌	✅
Minimum memory usage	✅	❌

Clojask Library Ecosystem

(Click to view a closeup of image)

Clojask Logic Flow Diagram

(Click to view a closeup of image)