Data sources used by the Big Data Innovation Team
-
Updated
Jun 6, 2024 - Jupyter Notebook
Data sources used by the Big Data Innovation Team
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
SQL-like interface to tabular structured data
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
CrateDB Toolkit.
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
The MDSplus data management system
Advanced and Fast Data Transformation in R
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
A collection of Python scripts to acquire SFDI data and process it, in order to measure optical properties of tissue.
A simple package to abstract away the process of creating usable DataFrames for data analytics. This package is heavily inspired by the amazing Python library, Pandas.
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
A public repository for all things RAG (Retrieval Augmented Generation)
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
Kubernetes-native platform to run massively parallel data/streaming jobs
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."