In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in ...
Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
Can you chip in? As an independent nonprofit, the Internet Archive is fighting for universal access to quality information. We build and maintain all our own systems, but we don’t charge for access, ...
This paper will be presented as an oral paper at the ICASSP Audio for Multimedia and Multimodal Processing Session at 6/6/2023 10:50:00 (Eastern European Summer Time). Please cite our paper if you ...
Can you chip in? This year we’ve reached an extraordinary milestone: 1 trillion web pages preserved on the Wayback Machine. This makes us the largest public repository of internet history ever ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果