Skip to content

DanielPFlorian/Transformers-Github-Semantic-Search

Repository files navigation

Transformers-Github-Semantic-Search

Transformers-Github-Semantic-Search is a demonstration on how to create a dataset for an NLP Application and use it to build a semantic search engine. We will use the Datasets and Transformers Python libraries from Huggingface to complete this task. Using requests and the GitHub Rest API we'll pull issues from the Transformers Github Repository then proceed to clean up and augment the dataset with comments. Next we'll build the semantic search engine that will help us find answers to questions and issues we may have about the repository using tokenizers, text emebeddings, and FAISS.

Techniques Used

  • NLP (Natural Language Processing)
  • Dataset creation using Request and GitHub API
  • Dataset Exploration, Cleaning and Augmentation
  • Text Embedding creation using Tokenizer
  • FAISS indexing for Semantic Search