Skip to content

A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

khuangaf/Awesome-Chart-Understanding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 

Repository files navigation

Awesome Chart Understanding

Awesome PRWelcome arXiv

A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

The repository will be continuously updated 📝. Don't forget to hit the ⭐️ and stay tuned!

If you find this resource beneficial for your research, please do not hesitate to cite the paper referenced in the Citation section. Thank you!

Table of Contents

Tasks and Datasets

Chart Question Answering

Factoid Questions

  • DVQA: Understanding Data Visualizations via Question Answering.

    Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan.

  • FigureQA: An Annotated Figure Dataset for Visual Reasoning.

    Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio.

  • LEAF-QA: Locate, Encode & Attend for Figure Question Answering.

    Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, Ajay Joshi.

  • STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering.

    Hrituraj Singh, Sumit Shekhar.

  • PlotQA: Reasoning over Scientific Plots.

    Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, Pratyush Kumar.

  • MapQA: A Dataset for Question Answering on Choropleth Maps.

    Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, Ningchuan Xiao.

  • ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning.

    Ahmed Masry, Xuan Long Do, Jia Qing Tan, Shafiq Joty, Enamul Hoque.

  • SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs.

    Shengzhi Li, Nima Tajbakhsh.

  • MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning.

    Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu.

  • MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models.

    Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao.

  • ChartBench: A Benchmark for Complex Visual Reasoning in Charts.

    Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, Jian Guo.

  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models.

    Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu.

  • Evaluating Task-based Effectiveness of MLLMs on Charts.

    Yifan Wu, Lutao Yan, Yuyu Luo, Yunhai Wang, Nan Tang.

Long-form Questions

  • OpenCQA: Open-ended Question Answering with Charts.

    Shankar Kantharaj, Xuan Long Do, Rixie Tiffany Leong, Jia Qing Tan, Enamul Hoque, Shafiq Joty.

Chart Captioning (Summarization)

  • Neural Data-Driven Captioning of Time-Series Line Charts.

    Andrea Spreafico, Giuseppe Carenini.

  • Figure Captioning with Relation Maps for Reasoning.

    Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Ryan Rossi.

  • Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model.

    Jason Obeid, Enamul Hoque.

  • What Will You Tell Me About the Chart? – Automated Description of Charts.

    Karolina Seweryn, Katarzyna Lorenc, Anna Wróblewska, Sylwia Sysko-Romańczuk.

  • SciCap: Generating Captions for Scientific Figures.

    Ting-Yao Hsu, C Lee Giles, Ting-Hao Huang.

  • Chart-to-Text: A Large-Scale Benchmark for Chart Summarization.

    Shankar Kantharaj, Rixie Tiffany Leong, Xiang Lin, Ahmed Masry, Megh Thakkar, Enamul Hoque, Shafiq Joty.

  • LineCap: Line Charts for Data Visualization Captioning Models.

    Anita Mahinpei, Zona Kostic, Chris Tanner.

  • ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries.

    Raian Rahman, Rizvi Hasan, Abdullah Al Farhad, Md Tahmid Rahman Laskar, Md. Hamjajul Ashmafee, Abu Raihan Mostofa Kamal.

  • VisText: A Benchmark for Semantically Rich Chart Captioning.

    Benny Tang, Angie Boggust, Arvind Satyanarayan.

  • FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback.

    Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi.

  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models.

    Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu.

Factual Inconsistency Detection for Chart Captioning

  • Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning.

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji.

Chart Fact-checking

  • Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking.

    Mubasharar Akhtar, Oana Cocarascu, Elena Simperl.

  • ChartCheck: An Evidence-Based Fact-Checking Dataset over Real-World Chart Images.

    Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Cocarascu, Elena Simperl.

Chart Caption Factual Error Correction

  • Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning.

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji.

Methods

Classification-based Methods

Fixed Output Vocab

  • A Simple Neural Network Module for Relational Reasoning.

    Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap.

  • DVQA: Understanding Data Visualizations via Question Answering.

    Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan.

  • MapQA: A Dataset for Question Answering on Choropleth Maps.

    Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, Ningchuan Xiao.

Dynamic Encoding

  • DVQA: Understanding Data Visualizations via Question Answering.

    Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan.

  • Answering Questions about Data Visualizations using Efficient Bimodal Fusion.

    Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan.

  • LEAF-QA: Locate, Encode & Attend for Figure Question Answering.

    Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, Ajay Joshi.

  • PlotQA: Reasoning over Scientific Plots.

    Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, Pratyush Kumar.

Pre-trained

  • TaPas: Weakly Supervised Table Parsing via Pre-training.

    Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Eisenschlos.

  • STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering.

    Hrituraj Singh, Sumit Shekhar.

  • ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning.

    Ahmed Masry, Xuan Long Do, Jia Qing Tan, Shafiq Joty, Enamul Hoque.

Generation-based Methods

Without Pre-training

  • SciCap: Generating Captions for Scientific Figures.

    Ting-Yao Hsu, C Lee Giles, Ting-Hao Huang.

  • Figure Captioning with Relation Maps for Reasoning.

    Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Ryan Rossi.

  • Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model.

    Jason Obeid, Enamul Hoque.

  • Tackling Hallucinations in Neural Chart Summarization.

Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg

Pre-trained

  • Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs.

    Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang.

  • MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering.

    Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Eisenschlos.

  • UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning.

    Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty.

  • Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding.

    Kenton Lee, Mandar Joshi, Iulia Raluca Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.

  • Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA.

    Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar.

Tool Augmentation

  • DePlot: One-shot visual language reasoning by plot-to-table translation.

    Fangyu Liu, Julian Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun.

  • Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning.

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji.

  • Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization.

    Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty.

  • DOMINO: A Dual-System for Multi-step Visual Language Reasoning.

    Peifeng Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz, Maryam Fazel-Zarandi.

  • StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding.

    Renqiu Xia, Bo Zhang, Haoyang Peng, Hancheng Ye, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan.

  • SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials.

    Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park.

  • OneChart: Purify the Chart Structural Extraction via One Auxiliary Token .

    Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang.

Large Vision-language Models

Tailored for Chart Understanding

  • ChartLlama: A Multimodal LLM for Chart Understanding and Generation.

    Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang.

  • MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning.

    Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu.

  • ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.

    Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo.

  • ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning.

    Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty.

  • ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.

    Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao.

  • FigurA11y: AI Assistance for Writing Scientific Alt Text.

    Nikhil Singh, Andrew Head, Lucy Lu Wang, Jonathan Bragg.

  • Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs.

    Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma.

  • TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning.

    Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang.

General-purpose

  • Visual Instruction Tuning.

    Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee.

  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.

    Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang.

  • mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.

    Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou.

  • mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

    Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou.

  • SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

    Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao.

  • Gemini: A Family of Highly Capable Multimodal Models.

    Gemini Team Google.

  • GPT-4V.

    OpenAI.

  • Introducing the next generation of Claude (Claude 3).

    Antropic.

Evaluation

Faithfulness/ Factuality

  • Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning.

    Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji.

Citation

@misc{huang-etal-2024-chart,
    title = "From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models",
    author = "Huang, Kung-Hsiang and Chan, Hou Pong and Fung, Yi R. and Qiu, Haoyi and Zhou, Mingyang and Joty, Shafiq and Chang, Shih-Fu and Ji, Heng",
    year={2024},
    eprint={2403.12027},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

About

A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •