databricks

Implementation of the "CCF: Fast and Scalable Connected Component Computation in MapReduce" paper with Spark. Study of its scalability on several datasets using various clusters' sizes on Databricks and Google Cloud Platform (GCP)

machine-learning cloud big-data spark graph-algorithms gcp python3 databricks

Updated Nov 4, 2022
HTML

khoinguyen19k8 / formula1

Star

Data pipeline that processes Formula1 data with Azure Databricks, DeltaLake, and Azure Data Factory

azure databricks azure-data-factory delta-lake

Updated Jul 14, 2023
Python

ash-0521 / Ensuring-Smiles-using-Spark-ML

Star

The primary objective of this study is to explore the feasibility of using machine learning algorithms to classify health insurance plans based on their coverage for routine dental services. To achieve this, I used six different classification algorithms: LR, DT, RF, GBT, SVM, FM(Tech: PySpark, SQL, Databricks, Zeppelin books, Hadoop, Spark-Submit)

python sql random-forest pyspark logistic-regression factorization-machines support-vector-machine databricks hadoop-mapreduce gradient-boosting spark-submit zeppelin-notebook decison-trees

Updated Jun 25, 2023
Python

brunogdealmeida / databricks-ingestion

Star

Ingestão de dados do Olist em formato CSV para as camadas Raw, Bronze, Silver e Gold

python sql databricks databricks-notebooks

Updated May 22, 2023
Jupyter Notebook

JuanCampbsi / extraction_genomic_streaming

Star

The data engineering team focuses on establishing a robust and reliable data pipeline. We use Kafka to manage the data streaming topics and later process and consume this data with the help of Spark.

python kafka pyspark kafka-consumer kafka-producer kafka-streams databricks

Updated Oct 5, 2023
Jupyter Notebook

ThiagSampaio / Trabalhando-com-Spark

Star

Neste repositório trabalharemos com processamento de dados usando Spark.

spark databricks minio-server

Updated Dec 9, 2023
Jupyter Notebook

AbhishekGit-hash / Real-Time-Delta-Lake-with-Pyspark

Star

Batch & streaming data pipelines built using Databricks with Pyspark and modeled the data into star schema to analyze in PowerBI, Formula-1 racing data from multiple data sources, APIs.

python pyspark spark-streaming powerbi data-modeling databricks star-schema etl-pipeline

Updated Jan 29, 2024
Python

pnraj / AZURE

Star

This Repo Contains Azure Data Engineering Projects

azure data-warehouse event-hubs databricks blob-storage synapse-analytics

Updated Feb 8, 2024
Python

Stefen-Taime / azurePipeline

Star

Azure Data Pipeline

http vault azure terraform databricks datalake

Updated Dec 16, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the databricks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the databricks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databricks

Here are 742 public repositories matching this topic...

RaviTella / Databricks

mganta / spark-eventhubs-queue

Bertelsmann-AI / databricks_learnings

pedro-tofani / data-lake-example

pugillum / databricks_login

meng-ucalgary / ensf-612-midterm

siladitya-basu / Spark-Course

diego-inacio / databricks-tutorials

Ryndine / databricks_exploration

aiwithqasim / daiwt-mlops

daneisburgh / freehold-forecast

obrunet / Spark_Computation_of_Connected_Component_in_Graphs

khoinguyen19k8 / formula1

ash-0521 / Ensuring-Smiles-using-Spark-ML

brunogdealmeida / databricks-ingestion

JuanCampbsi / extraction_genomic_streaming

ThiagSampaio / Trabalhando-com-Spark

AbhishekGit-hash / Real-Time-Delta-Lake-with-Pyspark

pnraj / AZURE

Stefen-Taime / azurePipeline

Improve this page

Add this topic to your repo