Transformers-Github-Semantic-Search

Transformers-Github-Semantic-Search is a demonstration on how to create a dataset for an NLP Application and use it to build a semantic search engine. We will use the Datasets and Transformers Python libraries from Huggingface to complete this task. Using requests and the GitHub Rest API we'll pull issues from the Transformers Github Repository then proceed to clean up and augment the dataset with comments. Next we'll build the semantic search engine that will help us find answers to questions and issues we may have about the repository using tokenizers, text emebeddings, and FAISS.

Techniques Used

NLP (Natural Language Processing)
Dataset creation using Request and GitHub API
Dataset Exploration, Cleaning and Augmentation
Text Embedding creation using Tokenizer
FAISS indexing for Semantic Search

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Create_Dataset_Notebook.ipynb		Create_Dataset_Notebook.ipynb
LICENSE		LICENSE
README.md		README.md
Semantic_Search_Notebook.ipynb		Semantic_Search_Notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Create_Dataset_Notebook.ipynb

Create_Dataset_Notebook.ipynb

LICENSE

LICENSE

README.md

README.md

Semantic_Search_Notebook.ipynb

Semantic_Search_Notebook.ipynb

Repository files navigation

Transformers-Github-Semantic-Search

Techniques Used

About

Releases

Packages

Languages

License

DanielPFlorian/Transformers-Github-Semantic-Search

Folders and files

Latest commit

History

Repository files navigation

Transformers-Github-Semantic-Search

Techniques Used

About

Topics

Resources

License

Stars

Watchers

Forks

Languages