Skip to content

Angon-pro/DS-ML_projects

Repository files navigation

DS-ML_projects

Data Science and Machine Learning projects

Article relevance classification

Description:
Creation of a relevance classifier for articles.

"Article_relevance_classification" directory contains the .ipynb file and data links (Google Drive)

Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn, Matplotlib, nltk, pymorphy2, wordcloud

BERTopic model for news topic modeling

Description:
Implementation of BERTopic framework to create news topic model.

"BERTopic_news_clf" directory contains the source files and links

Stack: Python, Jupyter Notebook, BERTopic, Pandas, NumPy, Matplotlib, SentenceTransformers, Scikit-learn

CNN MNIST classifier

Description:
The model identifies a digit on an input image using softmax output activation function. The accuracy of the CNN reaches about ~99%.

"CNN_MNIST_classifier" directory contains the .ipynb file
MNIST is imported from tensorflow.keras.datasets

Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras

CNN thermogram classifier

Description:
The CNN is projected as a fire-detection solution based on flight altitude obtained thermograms analysis. The model classifies a thermogram with values 0 (no fire) and 1 (fire). The reached accuracy of the network classification is about ~75%.

"CNN_thermogram_classifier" directory contains the .ipynb file and link to the dataset

Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras

Embeddings and Similarity

Description:
Realization of semantic search using cosine distance, GigaChatEmbeddings and Weaviate vector database.

"Embeddings_and_Similarity" directory contains source files, configs and news dataset with 1000 docs

Stack: Python, Pandas, LangChain, GigaChat SDK (GigaChain), Weaviate

Logistic regression Titanic classifier

Description:
A model to predict either a Titanic passenger will survive or not based on their passenger class, sex, age, amount of sibligs, amount of children and ticket price. The solution uses the logistic regression method provided by Scikit-learn library. The reached accuracy is top 80%.

"Logistic_regression_Titanic_classifier" directory contains the .ipynb file and the dataset

Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn

MapReduce algorithm realization

Description:
Realization of MapReduce algorithm to find average title lengths for specified news sources based on open news dataset.

"Map-Reduce" directory contains the source files and data links

Stack: Python, Pandas