Multi Task Learning With Shared Representations

1

MS COCO (Common Objects in Context)Dataset59/100

via “multi-task dataset enabling transfer learning across detection, segmentation, captioning, and pose tasks”

330K images with object detection, segmentation, and captions.

Unique: Single dataset with annotations for 7+ vision tasks enables multi-task learning and transfer learning; shared image set allows models to learn task-agnostic visual representations and transfer knowledge across tasks

vs others: More comprehensive than single-task datasets; enables multi-task learning unlike separate datasets for each task; shared image set ensures fair comparison across tasks unlike different image distributions

2

Florence-2Model57/100

via “cross-task knowledge transfer through shared representations”

Microsoft's unified model for diverse vision tasks.

Unique: Achieves knowledge transfer across 6+ vision tasks through a single unified seq2seq architecture, where shared visual encoding and decoder parameters enable cross-task learning without task-specific branches or ensemble methods

vs others: Outperforms task-specific models on low-data scenarios through knowledge transfer, though with 5-10% lower peak performance on high-data tasks compared to specialized models

3

FlairRepository55/100

via “multi-task learning with shared representations and task-specific heads”

PyTorch NLP framework with contextual embeddings.

Unique: Implements multi-task learning through a unified architecture where a shared BiLSTM encoder feeds into task-specific output heads (CRF for tagging, softmax for classification), enabling flexible combinations of different task types; supports dynamic task weighting during training to balance task contributions

vs others: More efficient than training separate models for each task while maintaining task-specific output constraints; enables knowledge transfer between related tasks, improving performance on low-resource tasks; simpler to implement than complex multi-task architectures with task-specific encoders

4

roberta-baseModel52/100

via “multi-task learning and auxiliary objective training”

fill-mask model by undefined. 1,90,34,963 downloads.

Unique: RoBERTa's improved pretraining produces representations with stronger task-agnostic semantic content, enabling more effective multi-task learning with less task interference compared to BERT — auxiliary tasks improve primary task performance by 1-3% absolute on average

vs others: More effective for multi-task learning than single-task fine-tuning due to stronger base representations; requires more careful tuning than task-specific models but provides better generalization and inference efficiency than ensemble approaches

5

oneformer_ade20k_swin_largeModel44/100

via “task-conditioned-query-generation”

image-segmentation model by undefined. 90,906 downloads.

Unique: Implements task conditioning via learnable query tokens (e.g., 100 queries for panoptic, 150 for semantic) that are concatenated with positional encodings and processed through the same transformer decoder stack. This differs from multi-head approaches (separate decoder heads per task) by forcing shared feature representations while allowing task-specific query distributions.

vs others: Reduces model parameters by 25-30% vs separate task-specific decoders while maintaining within 0.5 mIoU of task-specific models, enabling efficient multi-task deployment. However, task-specific models can be independently optimized, potentially achieving 1-2 mIoU higher performance if model size is not constrained.

6

flairRepository25/100

via “multi-task-learning-with-shared-representations”

A very simple framework for state-of-the-art NLP

Unique: Flair's multi-task learning framework uses shared embedding and encoder layers with task-specific output heads, enabling efficient knowledge transfer while maintaining task-specific prediction heads. This architecture allows fine-grained control over task weighting and loss functions, supporting both hard parameter sharing and soft parameter sharing strategies.

vs others: Flair's multi-task learning is more flexible than single-task pipelines (supports arbitrary task combinations) and more interpretable than end-to-end multi-task transformers, with explicit control over task weighting and loss functions.

7

BLIP: Boostrapping Language-Image Pre-training for Unified Vision-Language... (BLIP)Product25/100

via “multi-task vision-language pre-training with shared representations”

* ⭐ 02/2022: [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and... (Data2vec)](https://proceedings.mlr.press/v162/baevski22a.html)

Unique: Combines multi-task learning with data bootstrapping: the same unified model is trained on both understanding tasks (retrieval) and generation tasks (captioning, VQA) using bootstrapped training data. This creates a virtuous cycle where the captioner generates training data for other tasks, and multi-task learning improves the captioner's quality.

vs others: Outperforms single-task models by leveraging shared representations and multi-task learning, achieving SOTA on multiple benchmarks simultaneously. Unlike separate task-specific models, BLIP's unified approach reduces model size and inference latency while improving generalization through positive transfer between tasks.

8

Mastering Diverse Domains through World Models (DreamerV3)Product24/100

via “multi-task visual policy learning with task-agnostic world models”

* ⏫ 02/2023: [Grounding Large Language Models in Interactive Environments with Online RL (GLAM)](https://arxiv.org/abs/2302.02662)

Unique: DreamerV3's task-agnostic world model learns shared visual representations without explicit task conditioning, relying on the policy learning objective to extract task-relevant information from the shared latent space. This contrasts with task-conditioned approaches (e.g., MTRL baselines) that explicitly encode task identity, making DreamerV3 more flexible for discovering emergent task structure.

vs others: Achieves better sample efficiency and generalization than task-conditioned baselines by learning task-invariant visual dynamics, while avoiding the computational overhead of task-specific world models or explicit task embeddings.

9

mSLAM: Massively multilingual joint pre-training for speech and text (mSLAM)Product23/100

via “multilingual text representation learning with shared vocabulary”

* ⭐ 02/2022: [ADD 2022: the First Audio Deep Synthesis Detection Challenge (ADD)](https://arxiv.org/abs/2202.08433)

Unique: Learns text representations across 143+ languages in a single shared embedding space using a unified tokenizer, enabling true cross-lingual understanding without language-specific fine-tuning, whereas prior multilingual models (mBERT, XLM-R) required language-specific adaptation

vs others: More parameter-efficient than maintaining separate models per language, and enables better cross-lingual transfer than language-specific models by learning shared semantic space across all languages

10

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks (Florence-2)Model21/100

via “multi-task vision model with shared representation”

* ⏫ 12/2023: [VideoPoet: A Large Language Model for Zero-Shot Video Generation (VideoPoet)](https://arxiv.org/abs/2312.14125)

Unique: Uses single encoder-decoder backbone with shared parameters across all vision tasks, trained on 5.4B diverse annotations to learn unified representation handling variable spatial hierarchies and semantic granularities. Contrasts with ensemble or task-specific approaches by consolidating capabilities into one model.

vs others: Reduces deployment complexity and memory footprint compared to maintaining separate detection (YOLO), segmentation (DeepLab), grounding (ALBEF), and captioning (BLIP) models, though individual task performance vs specialized baselines unknown.

11

Practical Deep Learning for Coders part 2: Deep Learning Foundations to Stable Diffusion - fast.aiProduct21/100

via “multi-task and meta-learning frameworks”

![](https://img.shields.io/badge/Level-Medium-yellow)

Unique: Provides practical implementations of multi-task learning with systematic task weighting strategies and meta-learning approaches (MAML, Prototypical Networks) from scratch, combined with empirical analysis of when multi-task learning helps vs hurts generalization. Includes frameworks for identifying task relatedness and designing shared representations.

vs others: More practical and implementation-focused than academic meta-learning papers by providing working code and systematic frameworks for task weighting and architecture design, while more comprehensive than generic transfer learning tutorials by covering few-shot learning and rapid adaptation.

12

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (Retroformer)Product19/100

via “multi-task agent learning with shared trajectory representation”

### Other Papers <a name="2023op"></a>

Unique: Enables multi-task learning by conditioning the language model policy on task descriptions, allowing a single agent to learn from trajectories across diverse tasks and generalize to new tasks — this is distinct from task-specific agents that require separate training for each task

vs others: More sample-efficient than single-task agents because it leverages cross-task patterns, and more flexible than fixed multi-task architectures because task conditioning is learned end-to-end

Top Matches

Also Known As

Company