Learning summary and examples about data systems.
-
Updated
May 12, 2024 - Java
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Learning summary and examples about data systems.
The Internals of Spark SQL
Extracting observatory temperature data from CSV files and generating tile images using Mercator projection for visualization
Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.
A Python package to submit and manage Apache Spark applications on Kubernetes.
SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
Platform for Big Data & AI
DoC Spark on minikube from Mac with Docker Desktop
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Quill for Scala 3
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created by Matei Zaharia
Released May 26, 2014