multilingual-nlp

Here are 31 public repositories matching this topic...

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

benchmark information-retrieval retrieval text-classification clustering sts semantic-search reranking text-embedding sgpt neural-search sentence-transformers sbert multilingual-nlp bitext-mining

Updated May 25, 2024
Python

DmitryRyumin / EMNLP-2023-Papers

Star

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. ⭐ support NLP!

Updated May 18, 2024
Python

ArkS0001 / IIT-Bombay-Whisper-Hindi-ASR-Model-Machine-Learning-Intern

Sponsor

Star

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as

multilingual voice-recognition openai whisper nlp-machine-learning speechtotext multilingual-nlp llm openai-whisper

Updated Apr 29, 2024
Jupyter Notebook

MaLA-LM / mala-500

Star

MaLA-500: Massive Language Adaptation of Large Language Models

multilingual-nlp large-language-models

Updated Apr 24, 2024
Python

cisnlp / Glot500

Star

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL 2023)

multilingual nlp natural-language-processing acl dataset glot xlm multilingual-models xlm-r multilingual-nlp glot500

Updated Apr 20, 2024
Python

negar-foroutan / multiLMs-lang-neutral-subnets

Star

[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.

mt5 lottery-ticket-hypothesis mbert cross-lingual-transfer multilingual-language-models multilingual-nlp

Updated Apr 1, 2024
Python

Rajarshi1001 / IITK-SemEval-2024-Task-1

Star

IITK at SemEval Task 1: Semantic Textual Relatedness for African and Asian Languages

semantics transformer lexical-analysis google-distance contrastive-learning sentence-transformers multilingual-nlp tsdae

Updated Mar 27, 2024
Jupyter Notebook

csebuetnlp / CrossSum

Star

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.

cross-lingual-summarization cross-lingual-transfer multilingual-nlp

Updated Mar 26, 2024
Python

longxudou / multispider

Star

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

multilingual natural-language-processing semantic-parsing text-to-sql multilingual-nlp

Updated Mar 12, 2024
Python

epfl-dlab / llm-latent-language

Star

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

multilingual-nlp llm mechanistic-interpretability llama2

Updated Mar 11, 2024
Jupyter Notebook

AnanthaRajuC / AIML_NLP

Star

AIML Natural Language Processing - Speech, Audio

python nlp translation speech-to-text multilingual-nlp openai-whisper

Updated Feb 26, 2024
Java

BatsResearch / LexC-Gen-Data-Archive

Star

Data Repository for LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

multilingual sentiment-analysis topic-modeling synthetic-data multilingual-nlp

Updated Feb 23, 2024

BatsResearch / LexC-Gen

Star

Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.

multilingual sentiment-analysis topic-modeling synthetic-data synthetic-dataset-generation low-resource-languages lexicon-based multilingual-nlp llm

Updated May 1, 2024
Python

maryamteimouri / MultilingualTextClassifier

Star

The project involves creating a transformer-based classifier for a multilingual text classification task.

multilingual nlp deep-learning text-classification transformer transfer-learning multilingual-nlp

Updated Feb 10, 2024
Jupyter Notebook

FSoft-AI4Code / TheVault

Star

[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

dataset multilingual-nlp ai4code

Updated Feb 5, 2024
Jupyter Notebook

cambridgeltl / prompt4bli

Star

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.