Mini-Gemini:

    Mining the Potential of Multi-modality Vision Language Models

    The Chinese University of Hong Kong

    Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

    Abstract

    In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



    Model

    The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

    BibTeX

    
    @article{li2024minigemini,
      title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
      author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
      journal={arXiv preprint arXiv:2403.18814},
      year={2024}
    }
      

    Acknowledgement

    This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    Examples









    主站蜘蛛池模板: 国产一区二区精品久久| 伊人久久大香线蕉av一区| 色天使亚洲综合一区二区| 久久亚洲中文字幕精品一区| 久久青草精品一区二区三区| 视频精品一区二区三区| 玩弄放荡人妻一区二区三区| 精品久久一区二区| 在线观看国产一区| 伊人久久大香线蕉av一区| 国产一区二区精品久久岳 | 国产精华液一区二区区别大吗| 欧美av色香蕉一区二区蜜桃小说| 国产在线无码一区二区三区视频| 国产亚洲欧洲Aⅴ综合一区| 一区二区三区日本视频| 香蕉久久AⅤ一区二区三区| 国产SUV精品一区二区88| 亚洲乱色熟女一区二区三区蜜臀| 国产福利一区二区三区在线视频| 日韩人妻精品一区二区三区视频 | 日本精品一区二区三区在线视频一 | 日本免费一区尤物| 天海翼一区二区三区高清视频 | 好爽毛片一区二区三区四| 亚洲国模精品一区| 日韩精品人妻一区二区中文八零 | 精品无码国产一区二区三区麻豆| 97精品国产一区二区三区| 日本高清一区二区三区| 日本免费一区尤物| 国产一区三区二区中文在线| 91福利国产在线观看一区二区 | 一本一道波多野结衣一区| 另类国产精品一区二区| 国产a久久精品一区二区三区| 一区二区三区在线|日本| 中文字幕一区二区三区永久 | 国产午夜精品免费一区二区三区| 无码少妇一区二区性色AV| 日本免费一区二区在线观看|