Domain-Aware Continual Zero-Shot Learning

Abstract

We introduce Domain Aware Continual Zero-Shot Learning (DACZSL), the task of visually recognizing images of unseen categories in unseen domains sequentially. We created DACZSL on top of the DomainNet dataset by dividing it into a sequence of tasks, where classes are incrementally provided on seen domains during training and evaluation is conducted on unseen domains for both seen and unseen classes. We also proposed a novel Domain-Invariant CZSL Network (DIN), which outperforms state-of-the-art baseline models that we adapted to DACZSL setting. We adopt a structure-based approach to alleviate forgetting knowledge from previous tasks with a small per-task private network in addition to a global shared network. To encourage the private network to capture the domain and task-specific representation, we train our model with a novel adversarial knowledge disentanglement setting to make our global network task-invariant and domain-invariant over all the tasks. Our method also learns a class-wise learnable prompt to obtain better class-level text representation, which is used to represent side information to enable zero-shot prediction of future unseen classes.

DACZSL Setting

overview
Our DACZSL setting: we illustrate here the difference between Zero-Shot Learning (ZSL), Continual ZSL (CZSL), and our Domain-Aware CZSL (DACZSL). CZSL extend ZSL to handle sequential tasks while we study domain shift on top of CZSL.

Method: Domain-Invariant Continual Zero-Shot Learning Network (DIN)

overview
Architecture of our proposed method DIN. We first learn an class-wise prompt by contrastive learning with the output from CLIP pre-trained Transformer text encoder $\operatorname{PM}(\texttt{CLASS})$ and the output $z_{G}$ from the global net $G$. Then we conduct adversarial contrastive learning to make the global network only outputs domain and task-invariant information.

Generalized Domain-Aware Zero-Shot Learning (GDAZSL) Results

overview
GDAZSL experimental results. We show significantly better results than baselines. Moreover, we also find that our proposed DIN can achieve better results than Oracle (UB), in which unseen images from unseen domains are used for training and uses the same network architecture as the one in CNZSL.

Domain-Aware Continual Zero-Shot Learning Results

overview
Comparative results on noise-reduced DomainNet with DACZSL uniform and non-uniform settings. In each cell, left is the uniform result while right is non-uniform result. + Tf means we use CLIP pre-trained text Transformer as the feature extractor. We use ResNet50 as visual encoder backbone for fair evaluations.

Citation

If you find our work useful in your research, please consider citing:
@article{yi2021domain,
  title={Domain-Aware Continual Zero-Shot Learning},
  author={Yi, Kai and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2112.12989},
  year={2021}
}