I'm a Ph.D. Candidate at the Department of Computing, The Hong Kong Polytechnic University.
- High-Level Video Content Analytics
- Visual Knowledge Learning
- Foundation Models
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseI'm a Ph.D. Candidate at the Department of Computing, The Hong Kong Polytechnic University.
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.