Skip to content

Releases: cylondata/cylon

0.6.0

07 Mar 19:32
Compare
Choose a tag to compare

Cylon 0.6.0 is a major release. We are excited to present UCC, Gloo integration, More distributed operations

Features

Cylon C++ and Python

  • Implemention of Slice, Head and Tail Operations
  • adding conda docker
  • Ucc integration
  • adding cylonflow as a submodule
  • Use generic operator
  • Summit fixes
  • Adding custom mpirun params cmake var
  • Adding cmake parallelism flag
  • Gloo python binding
  • Enabling gloo CI
  • Add downloading catch2 header dynamically
  • Dist sort cpu
  • Cylon Gloo integration
  • Adding distributed scalar aggregates
  • Extending datatypes
  • Allowing custom MPI_Comm for MPI

Build

  • Updating to Arrow 0.9.x
  • Windows build support
  • MacOS build support
  • Conda build is the default build
  • Improving docker build

You can download source code from Github
Conda binaries are available in Anaconda

Commits

91bdd54 Update conda-actions.yml (#645)
d1739ed Added buildable instructions for Rivanna (#643)
d9a6420 Arrow 9.0.0 and gcc-11 update (#601)
4c867b1 Summit Fixes (#623)
7f8a3b1 Fixing sample bug (#631)
ce12454 Cython binding for slice, head and tail (#619)
ef4c904 #610: SampleArray util method replaced by using arrow::compute::Take … (#612)
4694a9e Minor fixes (#608)
121b386 Fixing: Corrupted result when joining tables contain list data types #615 (#616)
68fa598 Summit fixes (#607)
de3ec7b fixing bash splitting (#606)
0a489fc adding cmake parallelism flag (#605)
035fd70 Implement Slice, Head and Tail Operation in both centralize and distr… (#592)
d99a6f2 adding custom mpirun params cmake var (#604)
f20c119 Update README-summit.md (#603)
4bc27f9 Create README-summit.md (#602)
e6b7306 Minor fixes (#596)
2e6ac80 adding conda docker (#600)
4dd359f Ucc integration (#591)
61b4a82 adding cylonflow as a submodule (#593)
e4dd38b Use generic operator (#583)
6c0dfa8 Gloo python binding (#587)
773f11f Gloo python bindings (#585)
2fc95be Add downloading catch2 header dynamically (#584)
c56ab2d Enabling gloo CI (#582)
a820ed8 Dist sort cpu (#574)
f68cc62 Adding UCC build (#579)
2759a30 Cylon Gloo integration (#576)
b2c0820 Adding distributed scalar aggregates (#570)
9c2fdc4 Extending datatypes (#568)
e3d553c Bump ua-parser-js from 0.7.22 to 0.7.31 in /docs (#566)
3bafb75 Bump ssri from 6.0.1 to 6.0.2 in /docs (#565)
814a463 minor fixes (#564)
be92253 Bump lodash from 4.17.20 to 4.17.21 in /docs (#561)
e87dd7c Bump shelljs from 0.8.4 to 0.8.5 in /docs (#562)
71bd8bf Bump nanoid from 3.1.22 to 3.2.0 in /docs (#563)
49b343d Allowing custom MPI_Comm for MPI (#559)
fa52dd4 Update contributors.md
54d4a53 added io functions (#550)
1a8c3d7 Fixing 554 (#558)
887ea18 update arrow link (#557)
1ce4c6b Fixing 552 (#553)
f5e31a1 Merging 0.5.0 release (#547)

Contributors

Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22
Arup Kumar Sarker
Mills Wellons Staylor
Gregor von Laszewski

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.5.0

16 Dec 19:29
Compare
Choose a tag to compare

Cylon 0.5.0 is a major release. We are excited to present GCylon, cudf-based distributed
DataFrame for Nvidia GPUs, UCX integration, Anaconda support, and much more.

Features

Cylon C++ and Python

  • Adding UCX integration with MPI
  • Adding read distribution
  • Changing join column naming convention to match SQL and pandas
  • Adding Dataframe.applymap, Dataframe.isin
  • Add iloc operation to DataFrame
  • Adding null handling to table operators and Comparators
  • Adding Equal/ distributed equal operators
  • Adding array flattening
  • Adding Repartition
  • Adding mapreduce style group-by aggregators
  • Adding table level AllGather, Gather and Broadcast operators
  • Performance improvements and bug fixes

Build

  • Updating to Arrow 0.5.x
  • Windows build support
  • MacOS build support
  • Conda build is the default build
  • Improving docker build

Gcylon

First release of Gcylon which supports distributed DataFrame processing on Nvidia GPUs using CuDF:

  • Implemented shuffling and distributed sorting
  • Distributed Join/merge
  • Distributed GroupBy
  • DataFrame Set operations
  • Repartitioning DataFrames
  • Distributed IO for reading/writing CSV, JSON and Parquet files

You can download source code from Github
Conda binaries are available in Anaconda

Commits

3344bf9 Mapreduce style group-by aggregators (#535)
50ef890 Remove minor warnings (#544)
559e8eb Adding CPU serializer (#539)
abb4404 fixed unused variable/parameter and casting warnings (#542)
62a3f08 Distributed IO (#533)
15d06d6 Bump color-string from 1.5.4 to 1.7.4 in /docs (#534)
810c4ed fixing RNG issue (#538)
fbb049b fixing build error (#536)
a10e052 Bump algoliasearch-helper from 3.3.3 to 3.6.2 in /docs (#532)
112ea97 Repartition - CPU (#526)
79c4b73 create a MacOS yml file (#530)
b9e7a8c Repartition - GPU (#528)
2191b9f fixed function name change in cudf api from gcylon test files (#529)
3e9036e Upgrading to arrow 5.0.0 (#525)
24d182a Groupby values null handling (#527)
54a5074 Null handling for Comparators (#524)
0b9516e Adding array flattening (#522)
b3fc2a2 Implemented MergeOrSort when merging sorted tables (#523)
1e061b2 Feature/equal (#499)
e378d1d reformatted gcylon codes with tab size 2, non-functional changes (#521)
8450d9b Added support for sliced tables in gather, broadcast and sorting (#520)
92b8124 Update windows.yml
1f9790d Update macos.yml
d33f9ac Update conda-actions.yml
963d491 Update c-cpp.yml
2229981 added mpi datatype dispatching for primitive data types (#519)
d9936b4 Head tail operators (#512)
ac99d00 Formatting code (#518)
fff84cc Code formatting (#517)
f32f04d Null handling in splitters and build arrays (#511)
4cab7ca Delete files from CPP example folder that are not needed (#516)
d174430 moving tutorial repo to (#514)
9cd7911 Python example cleanup (#513)
fe4caf3 Distributed sorting (#510)
2302f58 Minor improvements to the Table API (#508)
71eb80a adding new test utils (#507)
24b83dd Adding to docker docs (#498)
6f2faf8 Update conda.md
4f8f3c7 Gcylon docs (#501)
a786258 Adding contributing guide to documentation (#496)
8ab8b2d changing join column naming convention to match SQL and pandas (#487)
f18b91f improvements to ucx build from conda (#484)
912fb54 Windows build (#482)
216758a making improvements to the build (#483)
4e2894e Add functions to dataframe (#481)
1f1ddd9 Documentation update (#479)
e623315 Bump tar from 6.1.5 to 6.1.11 in /docs (#477)
1e5db7b improve docs (#476)
58c0595 removing extra examples (#474)
3c823f6 Gcylon integration (#470)
92748eb Cpp example cleanup (#475)
fa14527 Docs improvements (#469)
1306220 Bump url-parse from 1.4.7 to 1.5.3 in /docs (#473)
8234ae7 Bump path-parse from 1.0.6 to 1.0.7 in /docs (#472)
c8b435b Bump tar from 6.0.5 to 6.1.5 in /docs (#471)
1cc28dd Performance improvements (#453)
9092bbf MacOS build (#464)
d59d91e Add iloc operation to DataFrame (#465)
8d7a8dc Removed glog files from the header files (#463)
ea62eef License updates (#462)
2f56265 changed all relative Cylon header references to global (#461)
123c93c Building in conda env without using conda-build (#457)
3b3a285 Compilation document improvements (#454)
8578b1f Adding barrier at the end of the test case (#458)
e6eded5 Fix for empty df (#455)
8f14992 Fixed mpi test case (#456)
cb06998 Changes to the Docs (#451)
4ce1d7e updates to the docker readme
e011e0f enhancing readme
adfa6c0 adding read distribution (#432)
bd2e024 UCX integration (#439)
a42d04a Bump ws from 6.2.1 to 6.2.2 in /docs (#437)
710b562 Bump dns-packet from 1.3.1 to 1.3.4 in /docs (#435)
07aee74 adding new operators to DataFrame API (#429)
71e57f8 Updating to arrow 4.0 (#418)
a490dc2 changing ctx to const reference in methods (#419)
18a5447 missing docs (#428)
38534f5 0.4.1 release (#427)
10f5a6a Enabling scalars in df set_item (#425)
0be7897 Op bench refactor (#417)
ec964d8 Bug fixes in dataframe (#420)
e0ba964 Update c-cpp.yml
0200c02 adding finalize check and removing destructor finalize call. (#412)
149919c Update README.md
016c5c9 adding missing test case
5609535 Update README.md
e3ca0bf 0.4.0 release (#411)

Contributors

Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.4.1

04 May 16:28
1476926
Compare
Choose a tag to compare

Cylon 0.4.1 is a bug fix release.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.4.0

21 Apr 16:49
76d150e
Compare
Choose a tag to compare

Cylon 0.4.0 is a major release with the following features.

Major Features

Python

  • DataFrame API similar to Pandas supporting around 40 operators commonly used in Pandas.
  • Conda build and conda based binaries for Linux for installing.
  • Python binding to all the operators added on the C++ level.
  • Providing compute functions with both Arrow and Numpy for filtering, math operations and comparison operators.
  • Added operator benchmarks.
  • Added new options for CSV reading supporting all the options in PyArrow for reading CSV.

C++

  • Added distributed multi-column operations on tables for join, union, intersection, set difference and sort.
  • Added improved hash operations using Bytell Hash Maps. Improved performance by 2 times for union, intersection, set difference and unique.
  • Added new aggregate operations for GroupBy operation (Mean, Variance, Std Dev, Quantile, NUnique, Median).
  • Implemented GroupBy aggregators using CRTP (Curiously recurring template pattern).
  • Improved indexing at the core by Added more types, improved performance of indexed lookups.
  • Added unique distributed operator.
  • Added temporal data types like DateTime, Date32 (seconds resolution), Date64 (milliseconds resolution) and TImestamp (with time zone information).
  • Other performance improvements and bug fixes.

Build

  • Compiling using external Apache Arrow installation (local/ pip).

Applications and Benchmarks

  • Implementing a subset of TPC-XBB queries (Queries 6, 7, 9, 14, 22, 23) and the rest is ongoing.
  • Applications with connections to deep learning.

You can download source code from Github

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.3.1

18 Dec 04:22
Compare
Choose a tag to compare

Cylon 0.3.1 is a bug fix release.

You can download source code from Github

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.3.0

12 Dec 15:15
Compare
Choose a tag to compare

Cylon 0.3.0 adds the following features. Please note that this release may not be backward
compatible with previous releases.

Major Features

C++

  • Adding order-by and distributed table sort operations
  • Multiple partitioning schemes (modulo, hash, and range)
  • C++ API refactoring
  • Performance improvements in the existing C++ API

Python (Pycylon)

  • Exposing table operators similar to Pandas (28 new operators).
    • Comparison operators
    • Logical Operators
    • Math operators
    • Null/NA value filtering and filling
    • Filtering and updating (including inplace ops)
    • Schema refactoring
    • Experimental indexing abstract
  • Distributed Data sorting Python bindings
  • Adding new examples for updated operations. (https://github.com/cylondata/cylon/tree/master/python/examples)

You can download source code from Github

Examples

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

0.2.0

20 Oct 00:57
Compare
Choose a tag to compare

Cylon 0.2.0 adds the following features. Please note that this release may not be backward
compatible with v0.1.0.

Major Features

C++

  • Adding aggregates and group-by API
  • Creating tables using std::vectors or cylon::Columns
  • C++ API refactoring
  • Major performance improvements in the existing C++ API

Python (Pycylon)

  • Extending Cython API for extended development for other Cython/Python libraries
  • Aggregates and Groupby addition
  • Column name-based relational algebra operations and aggregate/groupby ops addition
  • Major performance improvements in the existing Python API

Java (JCylon)

  • Performance improvements

You can download source code from Github

Examples

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Cylon Release 0.1.0

18 Jul 01:01
Compare
Choose a tag to compare

Cylon 0.1.0 is the first open-source public release of Cylon Project. We are excited to bring a high-performance
data engineering toolkit that can work as a library as well as a standalone framework. This is the first step towards building a complete toolkit designed to work with AI/ML systems and integrate with data processing systems with the
vision "data engineering everywhere".

You can download source code from Github

Who should use Cylon?

  • Users of Pandas dataframes or SQL interface
  • Those needing parallel data engineering
  • Those needing Python C++ Java interoperability
  • HPC Python (Dask) and Big Data (Kubernetes) environments

Major Features in v0.1.0

  • Introducing Cylon C++ engine based on Apache Arrow.
  • Cylon C++, Python (PyCylon) and Java language bindings
  • Seamless integration with Pandas and NumPy
  • Distributed operations using MPI
  • Local and distributed operations (Select, Project, Joins, Intersection, Union, Subtract)
  • Jupyter notebook support and experimental Google Colab support

Examples

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0