Skip to content

Latest commit

 

History

History
81 lines (58 loc) · 3.1 KB

text-search-engine.md

File metadata and controls

81 lines (58 loc) · 3.1 KB

Design a Text-based Search Engine

Problem Statement

Design a dead-simple text-based search engine that serves relevant results without using any tooling like ElasticSearch. The idea is to understand the internals of Search Engine and the math behind TF-IDF. Extend your search engine to support boolean expressions, typo tolerance, phonetics, and anything that you find amusing.

Requirements

The problem statement is something to start with, be creative and dive into the product details and add constraints and features you think would be important.

Core Requirements

  • build a simple text based search engine that serves relevant results
  • make search engine as robust as possible

Micro Requirements

  • ensure the data in your system is never going in an inconsistent state
  • ensure your system is free of deadlocks (if applicable)
  • ensure that the throughput of your system is not affected by locking, if it does, state how it would affect

Output

Design Document

Create a design document of this system/feature stating all critical design decisions, tradeoffs, components, services, and communications. Also specify how your system handles at scale, and what will eventually become a chokepoint.

Do not create unnecessary components, just to make design look complicated. A good design is always simple and elegant. A good way to think about it is if you were to create a spearate process/machine/infra for each component and you will have to code it yourself, would you still do it?

Prototype

To understand the nuances and internals of this system, build a prototype that

  • build a search engine on top of 100MB of text data using your favourite programming language

Recommended Tech Stack

This is a recommended tech-stack for building this prototype

Which Options
Language Golang, Java, C++

Outcome

You'll learn

  • a simple text-based search engine
  • math behind tf-idf
  • basics of NLP - stemming, lemmatization, and phonetics

Share and shoutout

If you find this assignment helpful, please

  • share this assignment with your friends and peers
  • star this repository and help it reach a wider audience
  • give me a shoutout on Twitter @arpit_bhayani, or on LinkedIn at @arpitbhayani.

This assignment is part of Arpit's System Design Masterclass - A masterclass that helps you become great at designing scalable, fault-tolerant, and highly available systems.