Multimodal Foundation Models

Home -
Categorieën -
Multimodal Foundation Models

Multimodal Foundation Models

Chunyuan Li-Zhe Gan-Zhengyuan Yang

Engels | 12-08-2025 | 230 pagina's

9781638283362

Paperback / softback

€ 106,95

In winkelwagentje

Van Dinter boeken

Niet in magazijn, wel te bestellen.
Informeer naar levertijd

Tekst achterflap

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs. The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Details

EAN :	9781638283362
Auteur:
Uitgever :	Van Ditmar Boekenimport B.V.
Publicatie datum :	12-08-2025
Uitvoering :	Paperback / softback
Taal/Talen :	Engels
Status :	Niet in magazijn, wel te bestellen. Informeer naar levertijd
Aantal pagina's :	230
Reeks :	Foundations and Trends(r) in Computer Graphics and Vision