Software terminologies and concepts, software architecture overview
Summarized the keywords and solutions have faced in my learning and experience.
- Full Version
Software_Architecture_Mindmap.png
Three main pillars upon software architecture
-
Modern Application Development
-
Cloud Computing (AWS/Azure/GCP)
-
Data Science (ML/NN)
and
Numerous technologies and methodologies.
โ 2022. (https://github.com/kimtth) all rights reserved.
This mindmap created by https://app.mindmapmaker.org/
- System Design 101: ByteByteGo
- Awesome Lists: ๐ Awesome lists about all kinds of interesting topics /
awesome.re
/ github topic - Awesome Software Architecture (simskij)
- Awesome Software Architecture: A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles
- Software Architecture Books: A comprehensive list of books on Software Architecture
- System Design: Learn how to design systems at scale and prepare for system design interviews
- Microsoft .NET Application Architecture - Reference Apps
- Software Architecture Books
- System Design Fight Club
- System Design - Neo Kim
- Awesome System Design Resources
- InfoQ: News and Articles
- Dzone: RefCards and Trend Reports
- Thoughtworks: Technology Radar
- Microsoft Learn: Documentation and Code samples
-
Expand
๐น
Latency
is the response time of your application, usually expressed in milliseconds๐น
Throughput
is how many transactions per second or minute your application can handle๐น
Errors
is usually measured in a percent of๐น
Saturation
is the ability of your application to use the available CPU and Memory -
InfoQ minibooks: Architectures Youโve Always Wondered About .. 2021 / 2023 / 2024
- AWS to Azure services comparison
- Google Cloud to Azure services comparison
- Compare AWS and Azure services to Google Cloud
- Microsoft Azure Developer's Cheat Sheet: Every product, feature and service in the Azure family
- Azure Cloud Adoption Framework :CAF: organization-wide adoption guidance
- Azure Well-architected Framework :WAF: workload-focussed design and continuous improvement guidance
- Azure Architecture Center :AAC: architecture patterns and reference architectures
-
Expand
๐น Abstractly speaking, a landing zone helps you plan for and design an Azure deployment, by conceptualizing a designated area for placement and integration of resources.
There are two types of landing zones:
platform landing zone
: provides centralized enterprise-scale foundational services for workloads and applications.application landing zone
: provides services specific to an application or workload.
- Kaggle Solutions and Ideas: Collection of Kaggle Solutions and Ideas
- Best-of Machine Learning with Python: A ranked list of awesome machine learning Python libraries. Updated weekly.
- freeCodeCamp: Learn to code for free. youtube
- Ultimate Collection of 60 YouTube Courses for 21 Programming Languages
- Computer Science courses with video lectures
- Software Industry Statistics: Statista Industry Insight
- Gartner Top Strategic Technology Trends 2024
- MAD (ML/AI/Data) Landscape
- Substack Leaderboard: Newsletter
- Algorithm Visualizer: Interactive Online Platform that Visualizes Algorithms from Code
- Best Kubernetes Tools: Bluelight Consulting
- Power BI DAX Patterns
- OOP Design Patterns
- Data Engineering Wiki
- AWS Architecture Blog
- Azure Architecture Blog
- GCP Cloud Blog
- Netflix TechBlog
- Uber Blog
- The Cloudflare Blog
- Engineering at Meta
- LinkedIn Engineering
- Stripe Blog: Engineering
- Discord Blog: Engineering & Developers
- Slack Engineering
- 79 Engineering Blogs To Level Up Your System Design Skills
- General
- The Pragmatic Programmer by David Thomas and Andrew Hunt
- Modern Software Engineering by David Farley
- Code Complete by Steve McConnell
- Software Engineering at Google by Titus Winters, Tom Manshreck, and Hyrum Wright
- Good Practices
- Clean Code by Uncle Bob Martin
- Head First Design Patterns by Eric Freeman
- Refactoring by Martin Fowler
- Design Patterns by Eric Gamma and Others
- Data Structures and Algorithms
- Grokking Algorithms by Aditya Bhargava
- Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
- Cracking the Coding Interview by Gayle Laakmann McDowell
- Data
- Designing Data-Intensive Applications by Martin Kleppman
- Learning SQL by Alan Beaulieu
- Testing
- Growing Object-Oriented Software by Tests by Steve Freeman Unit Testing Principles, Practices, and Patterns by Vladimir Khorikov
- The Art of Unit Testing by Roy Osherove
- TDD by Example by Kent Beck
- Software Architecture
- Fundamentals Of Software Architecture by Mark Richards and Neil Ford
- Clean Architecture by Uncle Bob Martin
- Software Architecture The Hard Parts by Neal Ford, Mark Richards, Pramod Sadalage, and Zhamak Dehghani
- Domain Driven Design Quickly by Abel Avram and Floyd Marinescu
- A Philosophy of Software Design by John Ousterhout
- System Design Interview by Alex Xu
- Domain-Driven Design by Eric Evans
- Distributed Systems
- Understanding Distributed Systems by Roberto Vitillo
- Designing Distributed Systems by Brendan Burns
- DevOps
- DevOps Handbook by Gene Kim, Patrick Debois, John Willis, and Jez Humble
- Continuous Delivery by Jez Humble and David Farley
- Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
- Machine Learning
- The Hundred-Page Machine Learning Book by Andriy Burkov
- Designing Machine Learning Systems by Chip Huyen
- On the Criteria To Be Used in Decomposing Systems into Modules (1972), D.L. Parnas: ref
- An Axiomatic Basis for Computer Programming (1969), C.A.R. Hoare: ref
- Time, Clocks, and the Ordering of Events in a Distributed System (1978), L. Lamport: ref
- Out of the Tar Pit (2006), B. Moseley, P. Marks: ref
- Dynamo: Amazonโs Highly Available Key-value Store (2007), G. DeCandia et al.: ref
- MapReduce: Simplified Data Processing on Large Clusters (2004), J. Dean, S. Ghemawat: ref
- A Note On Distributed Computing (1994), J. Waldo, G. Wyant, A. Wollrath, S. Kendall: ref
- A Metrics Suite for Object-Oriented Design (1994), S.R. Chidamber: ref
- A Relational Model of Data for Large Shared Data Banks (1969), E.F. Codd: ref
- Why Functional Programming Matters (1990), J. Hughes: ref
- Here's a reading list of 70+ Distributed Systems papers mostly from conferences in just last 2 years! 70+ Distributed Systems papers [Jan 2024]
- ref [May 2024]
- Dynamo - Amazonโs Highly Available Key Value Store ref
- Google File System: Insights into a highly scalable file system ref
- Scaling Memcached at Facebook: A look at the complexities of Caching ref
- BigTable: The design principles behind a distributed storage system ref
- Borg - Large Scale Cluster Management at Google ref
- Cassandra: A look at the design and architecture of a distributed NoSQL database ref
- Attention Is All You Need: Into a new deep learning architecture known as the transformer ref
- Kafka: Internals of the distributed messaging platform ref
- FoundationDB: A look at how a distributed database ref
- Amazon Aurora: To learn how Amazon provides high-availability and performance ref
- Spanner: Design and architecture of Googleโs globally distributed databas ref
- MapReduce: A detailed look at how MapReduce enables parallel processing of massive volumes of data ref
- Shard Manager: Understanding the generic shard management framework ref
- Dapper: Insights into Googleโs distributed systems tracing infrastructure ref
- Flink: A detailed look at the uni๏ฌed architecture of stream and batch processing ref
- A Comprehensive Survey on Vector Databases ref
- Zanzibar: A look at the design, implementation and deployment of a global system for managing access control lists at Google ref
- Monarch: Architecture of Googleโs in-memory time series database ref
- Thrift: Explore the design choices behind Facebookโs code-generation tool ref
- Bitcoin: The ground-breaking introduction to the peer-to-peer electronic cash system ref
- WTF - Who to Follow Service at Twitter: Twitterโs (now X) user recommendation system ref
- MyRocks: LSM-Tree Database Storage Engine ref
- GoTo Considered Harmful ref
- Raft Consensus Algorithm: To learn about the more understandable consensus algorithm ref
- Time Clocks and Ordering of Events: The extremely important paper that explains the concept of time and event ordering in a distributed system ref
- Deep Learning - Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Mathematics for Machine Learning - Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong
- An Introduction to Statistical Learning - Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor
- The Elements of Statistical Learning - Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
- Probabilistic Machine Learning: An Introduction - Kevin Patrick Murphy
- Probabilistic Machine Learning: Advanced Topics - Kevin Patrick Murphy
- Understanding Machine Learning - Shai Shalev-Shwartz and Shai Ben-David
- Automated Machine Learning - Frank Hutter, Lars Kotthoff, Joaquin Vanschoren
- Applied Causal Inference - Uday Kamath, Kenneth Graham, Mitchell Naylor
- Reinforcement Learning: An Introduction - Richard S. Sutton and Andrew G. Barto
- The Hundred-Page Machine Learning Book - Andriy Burkov
- Machine Learning Engineering - Andriy Burkov
- Natural Language Processing with Python - Steven Bird, Ewan Klein, and Edward Loper
- Dive into Deep Learning - Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola
- Machine Learning Yearning - Andrew NG
- Machine Learning for Humans - Vishal Maini, Samer Sabri
- Pattern Recognition and Machine Learning - Christopher M. Bishop
- Deep Learning on Graphs - Yao Ma and Jiliang Tang
- Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur
- Feature Engineering and Selection - Max Kuhn and Kjell Johnson
- Hands-On Machine Learning with R - Bradley Boehmke & Brandon Greenwell
- Deep Learning Interviews - Shlomo Kashani and Amir Ivry
- Machine Learning Interpretability - Patrick Hall and Navdeep Gill
- Interpretable Machine Learning - Christoph Molnar
- Boosting: Foundations and Algorithms - Robert E. Schapire, Yoav Freund
- A Brief Introduction to Machine Learning for Engineers - Osvaldo Simeone
- Speech and Language Processing - Daniel Jurafsky & James Martin
- Computer Vision: Models, Learning, and Inference - Simon J.D. Prince
- Information Theory, Inference and Learning Algorithms - David J. C. MacKay
- Machine Learning For Dummies - Judith Hurwitz and Daniel Kirsch
- Machine Learning for Beginners
- Machine Learning YouTube Videos
- Mathematics for Machine Learning
- Deep Learning Book
- Machine Learning ZoomCamp
- Machine Learning Tutorials
- Awesome Machine Learning
- CS 229 Machine Learning Cheatsheets
- Machine Learning Interview Guide
- Awesome Production Machine Learning
- 365 Data Science Flashcards
-
Gartner's PACE Layered Application Strategy: A methodology for categorizing, selecting, managing and governing applications based on their characteristics and the speed of change they require1.
-
JIT vs AOT: JIT and AOT are two types of compilers that differ in when they convert a program from one language to another, either at run-time or build-time.
-
SSG: Static site generator list: A tool that generates a full static HTML website based on raw data and a set of templates.
-
Popular Enterprise Architecture Frameworks: TOGAF, Zachman, Federal Enterprise Architecture (FEA), Gartner Enterprise Architecture Framework, Business Architecture Guildโs BIZBOK, Department of Defense Architecture Framework (DoDAF), ArchiMate, and Sherwood Applied Business Security Architecture (SABSA).
-
Are Architecture Styles, Patterns, and Design Patterns Different?
Architecture Styles vs Patterns vs Design Patterns
๐ญ. ๐๐ฟ๐ฐ๐ต๐ถ๐๐ฒ๐ฐ๐๐๐ฟ๐ฎ๐น ๐๐๐๐น๐ฒ๐ This is the highest level of abstraction, where architectural designs instruct us on structuring our code. The highest level of granularity describes the application's layers and high-level modules and how they relate to and interact with one another. Examples of architectural styles include: ๐น Monolith ๐น Layered ๐น Event-driven ๐น Self-contained Systems ๐น Microservices ๐น Space-Based ๐ฎ. ๐๐ฟ๐ฐ๐ต๐ถ๐๐ฒ๐ฐ๐๐๐ฟ๐ฎ๐น ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป๐ These patterns represent a way to implement an architectural style, so we can do this regularly. Some examples are how to separate the user interface (UI) and data, how internal modules interact, and what layers we will use. Patterns answer these types of questions. They usually impact the code base and how to structure the code inside. Examples of architectural patterns include: ๐น Model-View-Presenter (MVP): 1:1 Relationship between View and Presenter. e.g., Windows forms ๐น Model-View-Controller (MVC): e.g., Smalltalk, ASP.Net MVC ๐น ModelโViewโViewmodel (MVVM): One to Many relationship between View and ViewModel. e.g., Silverlight, WPF, AngularJs: ๐น Domain-Driven Design ๐ฏ. ๐๐ฒ๐๐ถ๐ด๐ป ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป๐ These differ from architectural patterns in that they focus on a smaller code base area and have a smaller influence (focus on a local problem). These include limiting the creation of a class to only one object or notifying all dependent objects when the internal state of an object is changed. These patterns are described in the book "Design Patterns: Elements of Reusable Object-Oriented Software" by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides from 1994. We have ๐๐ต๐ฟ๐ฒ๐ฒ ๐ด๐ฟ๐ผ๐๐ฝ๐ ๐ผ๐ณ ๐๐ฒ๐๐ถ๐ด๐ป ๐ฃ๐ฎ๐๐๐ฒ๐ฟ๐ป๐: ๐น ๐๐ฟ๐ฒ๐ฎ๐๐ถ๐ผ๐ป๐ฎ๐น: here we have Factory Method, Builder, Singleton, ... ๐น ๐ฆ๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฎ๐น: here we have an Adapter, Bridge, and Decorator, ... ๐น ๐๐ฒ๐ต๐ฎ๐๐ถ๐ผ๐ฟ๐ฎ๐น: here we have Command, Iterator, State, Strategy, ...
-
Memory consistency model: A Primer on Memory Consistency and Cache Coherence
SC vs TSO vs Relaxed Memory Consistency
๐ญ. Sequential Consistency (SC): Operations execute in order as per the program. ๐น SC preserves order for two memory operations from the same thread for all four combinations of loads and stores (Load โ Load, Load โ Store, Store โ Store, and Store โ Load). ๐น MIPS R10000 ๐ฎ. Total Store Order (TSO): Reads can happen before preceding writes complete. ๐น TSO preserves the first three orders (Load โ Load, Load โ Store, Store โ Store) but not Store โ Load order. ๐น x86 CPU. ๐ฏ. Relaxed Memory Consistency: Allows more reordering of operations for performance. ๐น ARM and RISC-V
-
API Gateway vs Load Balancer
Expand
๐น API Gateway: Manages access to backend services, handles tasks like rate-limiting, authentication, logging, and security policies.
๐น Load Balancer: Distributes network traffic across multiple servers for high availability and even load distribution.
-
Data engineering & Data Scientists Vocab 101 ref
Expand
๐น Data engineering Vocab 101๐น 75 Key Terms That Data Scientists Remember by Heart
๐น A Comprehensive NumPy Cheat Sheet Of 40 Most Used Methods
๐น 15 Pandas โ Polars โ SQL โ PySpark Translations
๐น 11 Key Probability Distributions
๐น 6 Must-Know Types of Clustering Algorithms in Machine Learning
๐น 25 Most Important Mathematical Definitions in Data Science
-
Transfer Learning, Fine-tuning, Multitask Learning and Federated Learning ref
-
DevOps, Platform engineering and SRE (site reliability engineering) ref
SRE vs. DevOps vs. Platform Engineering
๐นDevOps, SRE, and Platform Engineering are practices that streamline software development and maintenance. They all involve automation and collaboration.
๐นDevOps covers the entire software development process promoting team collaboration.
๐นSRE focuses on system reliability, including application monitoring and emergency response.
๐นPlatform Engineering manages the infrastructure and tools needed for software development and operations.
๐นDevOps is about the whole development process, SRE emphasizes reliability and scalability, and Platform Engineering is about infrastructure and tool management.
-
API Protocols (ref. ByteByteGo)
-
Web services and APIs (SOAP, RestAPI, GraphQL, gRPC and Kafka) ref
SOAP, RestAPI, GraphQL, gRPC and Kafka
๐นSOAP (Simple Object Access Protocol): XML-based protocol for web services, heayweight, favored for security and reliability.
๐นREST (Representational State Transfer): Uses HTTP methods, simple and easy to use, but can be resource-heavy.
๐นGraphQL: Allows flexible data queries, reduces data over-fetching.
๐นgRPC (Google Remote Procedure Call)**: High-performance RPC framework, ideal for connecting microservices. Built on top of HTTP/2 and uses Protocol Buffers for data exchange.
๐นKafka: Distributed streaming platform, uses publish-subscribe model for message queueing. real-time consistency. "at-least-once" delivery.
-
Real-time communication and messaging (MQTT, AMQP and WebSocket) ref
MQTT vs AMQP vs WebSocket
๐นMQTT (Message Queuing Telemetry Transport): Lightweight messaging protocol, uses publish-subscribe model, ideal for IoT and M2M communication. Three levels of Quality of Service (QoS): โAt most onceโ (QoS 0), โAt least onceโ (QoS 1), and โExactly onceโ (QoS 2).
๐นAMQP (Advanced Message Queuing Protocol): Open-standard application layer protocol, robust message delivery, routing, and security features. Two qualities of service: โAt most once (delivered once or lost)โ and โAt least once (delivered one or more times.)โ.
๐นWebSocket: Enables full-duplex communication channels over a single TCP connection
-
Reactive programming vs event-driven architecture ref
- Event-Driven: Handles user actions or system events. More general and can be used in any context where an event occurs
- Reactive: Data-driven approach. managing data streams and propagating changes, like in a spreadsheet model.
-
RABC vs ReABC: RBAC (Role-Based Access Control) is an authorization model that assigns permissions based on predefined roles. On the other hand, ReBAC (Relationship-Based Access Control) extends RBACโs capabilities by considering relationships between entities.
-
Conway's law: Software engineering principle that states that the structure of a system reflects the structure of the organization that designs it.
-
Data Management in Distributed systems (Partitioning, Shuffling and Bucketing)
Partitioning vs Shuffling vs Bucketing
๐นPartitioning: The process of dividing a large dataset into smaller parts, known as partitions. This process splits Hive table's files into multiple files. For example,
../hive/warehouse/sales_table/product_id=P1
.๐นShuffling: Shuffling is the process of redistributing data across different partitions. The overhead of operations can be ranked as follows:
orderby
>join
>groupby
.๐นBucketing: This is the process of decomposing data into manageable parts based on a certain column, thereby improving query performance and storage efficiency. It is best used when there are very few repeating values in a column (for example 1. a primary key column). For instance, Bucket0:
../hive/warehouse/sales_table/product_id=P1/000000_0
, Bucket1:../hive/warehouse/sales_table/product_id=P1/000001_0
, and so on. -
SSO (Single Sign-On) is an authentication scheme that allows a user to log in with a single ID and password to any of several related, yet independent, software systems.
SSO workflow, Types of SSO, SSO Implementations
๐นSSO workflow: Identity Provider (IdP), Service Provider (SP), SSO Server
- IdP: Central Authentication server e.g., Google
- SP: Individual Applications rely on SSO e.g, Trello
- SSO Server: Bridge between IdP and SPs
๐นTypes of SSO: SAML, OAuth (Open Authorization) 2.0, Open ID Connect (OIDC)
Protocol Purpose Token Format - OAuth 2.0 Open standard for Authorization Access Tokens Temporary access to 3rd party app OpenID Connect (OIDC) Open standard for Authentication JSON Web Token (JWT) Newer type of SSO based on OAuth 2.0, Straightforward protocol than SAML SAML Authentication, Authorization XML Most common, Use SAML Protocol to exchange authentication between SSO server and SP ๐นSome other Types of SSO: Kerberos, Smart card authentication
- Kerberos: Less suitable for internet-facing SSO due to the shared secret between KDC (Key Distributin Center) and all participants.
- Smart card authentication: Physical card
๐นSSO Implementations: Microsoft Entra ID (FKA Micorsoft Active Directory), Okta, Ping Identity, OneLogin, Auth0
-
Deployment Styles: Blue/Green, Canary, and A/B
Blue/Green, Canary, A/B
๐นBlue/Green Deployment: Two identical environments, "Blue" and "Green". Deploy new version in inactive environment, test, then switch users to it. For example, AWS supports blue/green deployment strategies including Elastic Beanstalk, OpsWorks, CloudFormation, CodeDeploy, and Amazon ECS.
๐นCanary Deployment: Roll out new version to a small group of users, monitor feedback, then do a full-scale release.
๐นA/B Testing: Compare two versions of a webpage or app to see which performs better. A typical example of A/B testing is website usability testing.
-
Flaky Test: A Flaky Test is a test that sometimes passes and sometimes fails, despite no changes in the code. Causes can include poorly written tests, async waits, test order dependency, and concurrency issues. They can slow down CI/CD pipelines and cause issues for end users. ref
-
Hadoop Ecosystem
Hadoop vs Azure, AWS, GCP
๐น1. HDFS (File Storage): Azure Data Lake Storage, Amazon S3, Google Cloud Storage
๐น2. YARN (Resource Management): No direct equivalent in Azure, AWS, GCP
๐น3. MapReduce (Data Processing): HDInsight, Amazon EMR, Google Cloud Dataproc
๐น4. Spark (Fast Data Processing): Databricks, Spark in HDInsight, Azure Synapse Analytics, Amazon EMR, Google Cloud Dataproc
๐น5. PIG, HIVE (Query Data): HDInsight, Azure Synapse Analytics, Amazon EMR, Google Cloud Dataproc
๐น6. HBase (NoSQL DB): Azure Cosmos DB, HBase on a virtual machine (VM), HBase in Azure HDInsight, Amazon DynamoDB, Google Cloud Bigtable
๐น7. Mahout, Spark MLLib (ML Libraries): Databricks, Amazon SageMaker, No direct equivalent in GCP
๐น8. Solar, Lucene (Search/Index): Azure Cognitive Search, Amazon CloudSearch, Google Cloud Search
๐น9. Zookeeper (Cluster Management): No direct equivalent in Azure, Amazon Managed Apache ZooKeeper, No direct equivalent in GCP
๐น10. Oozie (Job Scheduling): Azure Data Factory, AWS Step Functions, Google Cloud Composer
-
Software defined Networking(SDN) Northbound vs Southbound
Expand
graph TD A[Application layer - routing, load balancing, etc] -->|Northbound APIs| B[Control layer - SDN controller] B -->|Southbound APIs| C[Infrastructure layer - physical switches, data plane]
๐นThe Controller is the SDN network's brain, directing traffic flows.
๐น The Southbound Interface communicates the controller's decisions to the switches using protocols like OpenFlow.
๐นSDN Switches direct traffic based on the controller's instructions.
๐นNetwork Devices (servers, routers, etc.) send and receive data flows as directed by the SDN switches.
๐นThe Northbound Interface uses APIs to exchange data between the controller and applications.
๐นSDN Applications use network data to perform tasks, communicating their needs to the controller.
graph LR A[Controller] -- API --> B[Southbound Interface] B -- OpenFlow --> C[SDN Switches] C -- Data Flow --> D[Network Devices] A -- API --> E[Northbound Interface] E -- Applications --> F[SDN Applications]
-
Cracking coding interviews
Expand
๐นsrc: ref
๐นTwo Pointers: Navigating arrays with two indices. ref
๐นIntervals: Working with ranges of values. ref / ref / ref
๐นDynamic Programming: Solving complex problems by breaking them down into simpler subproblems. ref / ref
๐นTree Traversal: Visiting all nodes in a tree. ref / ref
๐นDFS-BFS: Depth-first and breadth-first search algorithms. ref / ref / ref / ref
๐นBinary Search: Finding an element in a sorted array. ref
๐นArray: A data structure holding elements. ref
๐นSliding Window: A subset of data that moves. ref / ref / ref / ref / ref
๐นBacktracking: Trying out all possibilities to find a solution. ref / ref / ref
๐นCombination: Finding all possible arrangements of elements. ref
๐นTrie: A tree-like data structure for storing strings. ref
๐นWord Break: Dividing a string into words. ref
๐นBit Manipulation: Performing operations on binary numbers. ref / ref
๐นSum: Adding numbers together. ref
๐นMonotonic Stack: A stack keeping elements in an ordered manner. ref
-
Medallion architecture: A data design pattern for lakehouses. It enhances data quality across three layers: bronze (raw), silver (curated), and gold (presentation). This โmulti-hopโ architecture allows data to transition between layers as required. ref
-
Slowly changing dimensions (SCD): Slowly Changing Dimensions change over time, but at a slow pace and unpredictably. For example, a customerโs address in a retail business.
-
Star schema: The Star Schema is a data model for data warehouses. It has a central fact table for measurable data and surrounding dimension tables for descriptive data. ref
-
OLAP vs OLTP:
OLAP
: Used for complex data analysis and business reporting, such as financial analysis and sales forecasting.OLTP
: Used for real-time processing of online transactions, including everyday transactions like ATM withdrawals and in-store purchases.