#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,253 public repositories matching this topic...

LB-Yu / data-systems-learning

Learning summary and examples about data systems.

distributed-systems big-data spark hbase flink

Updated May 12, 2024
Java

japila-books / spark-sql-internals

The Internals of Spark SQL

spark apache-spark book internals spark-sql mkdocs-material

Updated May 12, 2024

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated May 12, 2024
Python

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated May 12, 2024
Java

hogimn / observatory

Extracting observatory temperature data from CSV files and generating tile images using Mercator projection for visualization

java spark mercator-projection

Updated May 12, 2024
Java

Tiago-B-C-Reis / Apache_Spark

Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.

machine-learning spark apache-spark pyspark

Updated May 12, 2024
Jupyter Notebook

hussein-awala / spark-on-k8s

A Python package to submit and manage Apache Spark applications on Kubernetes.

python kubernetes airflow spark

Updated May 12, 2024
Python

risingwave

risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.

Updated May 12, 2024
Rust

alvertogit / bigdata_docker

Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

python docker data-science machine-learning scala big-data spark jupyter-notebook jupyter-lab spark3

Updated May 12, 2024
Python

mauropelucchi / unibg_mobile_and_cloud_2024

University of Bergamo - Mobile & Cloud (Computer Engineering) 2023/2024

python aws mobile spark flutter

Updated May 12, 2024
C++

xuwenyihust / DataPulse

Platform for Big Data & AI

kubernetes spark jupyter-notebook gcp mlflow delta-lake

Updated May 12, 2024
Shell

iimeta / fastapi-admin

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated May 12, 2024
Go

iimeta / fastapi-web

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated May 12, 2024
Vue

masalinas / doc-spark-minikube

DoC Spark on minikube from Mac with Docker Desktop

kubernetes spark python3 minio spark-sql spark-operator

Updated May 12, 2024
Shell

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated May 12, 2024
C++

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated May 12, 2024
Scala

starlake-ai / starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery scala spark etl snowflake hdfs redshift synapse

Updated May 12, 2024
Scala

zio / zio-quill

Compile-time Language Integrated Queries for Scala

mysql linq postgres scala database spark cassandra jdbc scalajs sparksql

Updated May 12, 2024
Scala

zio / zio-protoquill

Quill for Scala 3

linq scala sql spark cassandra jdbc postgresql sparksql language-integrated-query

Updated May 12, 2024
Scala

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated May 11, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 414 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics