Skip to content

This repo demonstrate a comprehensive real-time analytic stack using popular open-source tools.

Notifications You must be signed in to change notification settings

luatnc87/real-time-analytic-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real Time Data Analytic Stack

Welcome to Real Time Analytics Stack! This repo showcases a complete real-time analytic stack using popular open-source tools.

In this tutorial, I demonstrate how to use Docker Compose to quickly set up a real time data analytic stack using Apache SeaTunnel, Doris and Superset. The pipeline uses SeaTunnel to ingest real-time CDC event from MySQL database into Doris data warehouse (You can transform the data with dbt) and visualize the data with Superset.

real time data analytic stack architecture

Components of the Real Time Data Analytic Stack

Before we set up the project, let’s briefly look at each tool used in this example of a real-time data analytic stack to make sure you understand their responsibilities.

Apache SeaTunnel

SeaTunnel is a very easy-to-use, ultra-high-performance, distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in production by nearly 100 companies.

Apache Doris

Apache Doris is a high-performance, real-time analytic database base on the MPP (Massive Parralell Processing) architecture and is known for extreme speed and ease of use. It takes only sub-second response time to return query results under massive amounts of data, can support not only highly concurrent point query scenarios, but also high throughput complex analytic scenarios.

Apache Superset

Apache Superset is a modern business intelligence, data exploration and visualization platform. Superset connects with a variety of databases and provides an intuitive interface for visualizing datasets. It offers a wide choice of visualizations as well as a no-code visualization builder. You can run Superset locally with Docker Compose or in the cloud using Preset. Superset sits at the end of this real time data analytics stack example and is used to visualize the data stored in Apache Doris.

Pre-requisites

To follow along, you need to:

Install Docker and Docker Compose in your machine. You can follow this guide to install Docker and this one to install Docker Compose.

Using Docker Compose to Bootstrap a Real Time Data Analytic Stack

This tutorial uses Docker Compose and a shell script to set up the required resources. Docker saves you from installing additional dependencies locall. You can quickly start and stop the instances.

The shell script setup.sh provides two commands, up and down, to start and stop the instances. The compose files are stored in seatunnel/docker-compose-seatunnel.yaml, doris/docker-compose-doris.yaml, and superset/docker-compose-superset.yaml. You can go through these files and make any necessary customization, for example, changing the ports where the instances start or installing additional dependencies.

Setting up SeaTunnel, Doris, Superset with Docker Compose

Setting up Apache SeaTunnel

The script launches the SeaTunnel instance at

Setting up Apache Doris

The script launches the Doris FE (front end) instance at http://localhost:8030. You can see the following screen, which indicates that the FE has start successfully. doris_fe_login.png Note: Here we use the Doris built-in default user (root) to log in with an empty password.

Setting up Apache Superset

One the setup.sh command has completed, visit http://localhost:8088 to access the Superset UI. Enter admin as username and password. Choose Apache Doris from the supported database drop-down, then provide information to finish connection configuration.

Using the Real Time Data Analytic Stack

One the stack is ready and running, you can start using it to ingest and process your data.

Sync real-time CDC event from MySQL database into Apache Doris DWH

Create a materialized view to near real-time aggregate data

Visualize data on dashboard using Superset

Cleaning up

Conclusion

About the author

About

This repo demonstrate a comprehensive real-time analytic stack using popular open-source tools.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published