Depth-Aware Sparse Transformer for Video-Language Learning

Haonan Zhang, Lianli Gao, Pengpeng Zeng, Xinyu Lyu, Alan Hanjalic, Heng Tao Shen,

[Paper] | ACM MM23

This is the code implementation of the paper "Depth-Aware Sparse Transformer for Video-Language Learning", the checkpoint and feature will be released soon.

Overview

In Video-Language (VL) learning tasks, a massive amount of text annotations are describing geometrical relationships of instances (e.g., 19.6% to 45.0% in MSVD, MSR-VTT, MSVD-QA, and MSVRTTQA), which often become the bottleneck of the current VL tasks (e.g., 60.8% vs. 98.2% CIDEr in MSVD for geometrical and non-geometrical annotations). Considering the rich spatial information of depth map, an intuitive way is to enrich the conventional 2D visual representations with depth information through current SOTA models, i.e., transformer. However, it is cumbersome to compute the self-attention on a long-range sequence and heterogeneous video-level representations with regard to computation cost and flexibility on various frame scales. To tackle this, we propose a hierarchical transformer, termed Depth-Aware Sparse Transformer (DAST).

Figure 1. Overview of the DAST for Video-Language Learning.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
README.md		README.md
evaluate.py		evaluate.py
framework.png		framework.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

README.md

README.md

evaluate.py

evaluate.py

framework.png

framework.png

train.py

train.py

Repository files navigation

Depth-Aware Sparse Transformer for Video-Language Learning

Haonan Zhang, Lianli Gao, Pengpeng Zeng, Xinyu Lyu, Alan Hanjalic, Heng Tao Shen,

Overview

About

Releases

Packages

Languages

zchoi/DAST

Folders and files

Latest commit

History

Repository files navigation

Depth-Aware Sparse Transformer for Video-Language Learning

Haonan Zhang, Lianli Gao, Pengpeng Zeng, Xinyu Lyu, Alan Hanjalic, Heng Tao Shen,

Overview

About

Resources

Stars

Watchers

Forks

Languages