This repo would be occasionally but continuously updated, which would collect papers that are related to Multi-Modal Machine Learning applications.
This part collects fancy applications that rely on Multi-Modal Machine Learning, especially those modal combinations that are NOT well learned at this point, that is to say, common and popular tasks such as Visual Question Answering, Image Caption Generation will not be focused here. If only one modality is involed, it means this paper introduces how to obatin the data under this modal.
Year | Venue | Paper | Modalities | Project/Code |
---|---|---|---|---|
2020 | ECCV | Multiple Sound Sources Localization fromCoarse to Fine | Vision+Sound | - |
2020 | CVPR | Music Gesture for Visual Sound Separation | Vision+Sound | Project |
2020 | ICASSP | Sight to Sound: An End-to-end Approach for Visual Piano Transcription | Vision+Sound | Project |
2019 | CVPR | Connecting Touch and Vision via Cross-Modal Prediction | Vision+Touch | Project/Code |
2019 | Nature | Learning the signatures of the human grasp using a scalable tactile glove | Touch | Project/Code |
2019 | IJCV | Learning Sight from Sound:Ambient Sound Provides Supervision for Visual Learning | Vision+Sound | - |
2019 | NC | Real-time decoding of question-and-answer speech dialogue using human cortical activity | Speech+ECoG | - |
2017 | CVPR | Lip Reading Sentences in the Wild | Vision+Speech | - |
2016 | ECCV | Ambient Sound Provides Supervisionfor Visual Learning | Vision+Sound | - |
2016 | CVPR | Visually Indicated Sounds | Vision+Sound | Project |
This part collects the papers that introduce new modalities into traditional Tasks.
Year | Venue | Paper | Task | Basic | New |
---|---|---|---|---|---|
2020 | ECCV | Not only Look, but also Listen: LearningMultimodal Violence Detection under WeakSupervision | Detection | Vision | Sound |
2019 | ICCV | Self-Supervised Moving Vehicle Tracking With Stereo Sound | Tracking | Vision | Sound |
2019 | ICCVW | DECCNet: Depth Enhanced Crowd Counting | Counting | Vision | Depth |
2019 | CVPRW | WiFi and Vision Multimodal Learning for Accurate and Robust Device-Free Human Activity Recognition | Recognition | Vision | WiFi |
This part lists some huge datasets that include multi-modal annotations
Year | Dataset | Modalities | Project | Paper |
---|---|---|---|---|
2020 | VGG-Sound | Vision+Sound | Project | ICASSP |
2017 | Lip Reading in the Wild | Vision+Speech | Project | ACCV |
2016 | Cross-Modal Places | Vision+Language | Project | CVPR & T-PAMI |
This part lists some extra resources about Multi-modal Machine Learning
Survey Papers
Year | Venue | Title |
---|---|---|
2018 | T-PAMI | Multimodal Machine Learning: A Survey and Taxonomy |
Workshops
Year | Venue | Title | Proceedings |
---|---|---|---|
2020 | CVPR | Workshop on Multimodal Learning | Proceedings |
2019 | ICCV | Cross-Modal Learning in Real World | Proceedings |
2019 | CVPR | 2nd Multimodal Learning and Applications Workshop (MULA) | Proceedings |
2018 | ECCV | 1st Multimodal Learning and Applications Workshop (MULA) | Proceedings |
Tutorials
Year | Venue | Title |
---|---|---|
2016 | CVPR | Multimodal Machine Learning tutorial |
This part lists researchers that are actively working on Multi-Modal Machine Learning.
Name | Affiliation | Research Interests | Google Scholar |
---|---|---|---|
Antonio Torralba | MIT | Vision+Audition+Touch | Scholar |
Andrew Zisserman | Oxford | Vision+Audio | Scholar |
Andrea Vedaldi | Oxford | Vision+Audio | Scholar |
If you are also interested in Multi-Modal Machine Learning, and would like to recommend some papers/projects to this repo, feel free to open an issue or make a pull request.