Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
Updated
May 24, 2024 - Java
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Postgres for Search and Analytics
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
Upserts, Deletes And Incremental Processing on Big Data.
Repository for tutorials, information and notes on technology in general.
lakeFS - Data version control for your data lake | Git for data
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Building Information System for potential energy savings
Open Control Plane for Tables in Data Lakehouse
汇总Apache Hudi相关资料
Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.
To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."