MTEB: Massive Text Embedding Benchmark
-
Updated
May 25, 2024 - Python
MTEB: Massive Text Embedding Benchmark
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. ⭐ support NLP!
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as
MaLA-500: Massive Language Adaptation of Large Language Models
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL 2023)
[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.
IITK at SemEval Task 1: Semantic Textual Relatedness for African and Asian Languages
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
AIML Natural Language Processing - Speech, Audio
Data Repository for LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
The project involves creating a transformer-based classifier for a multilingual text classification task.
[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.
[EMNLP 2023 - Findings] Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention
Code Repository for Paper: Multilingual Question Answering System Utilizing Natural Language Inference.
R library for Harmony
Its a language learning app. Using React, Material UI and Node js.
Add a description, image, and links to the multilingual-nlp topic page so that developers can more easily learn about it.
To associate your repository with the multilingual-nlp topic, visit your repo's landing page and select "manage topics."