Mini-Gemini:

    Mining the Potential of Multi-modality Vision Language Models

    The Chinese University of Hong Kong

    Updates: Mini-Gemini is comming! We release the paper, code, data, models, and demo for Mini-Gemini.

    Abstract

    In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current framework with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpass the developed private models.



    Model

    The framework of Mini-Gemini is conceptually simple: dual vision encoders are utilized to provide low-resolution visual embedding and high-resolution candidates; patch info mining is proposed to conduct patch-level mining between high-resolution regions and low-resolution visual queries; LLM is utilized to marry text with images for both comprehension and generation at the same time.

    BibTeX

    
    @article{li2024minigemini,
      title={Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models},
      author={Li, Yanwei and Zhang, Yuechen and Wang, Chengyao and Zhong, Zhisheng and Chen, Yixin and Chu, Ruihang and Liu, Shaoteng and Jia, Jiaya},
      journal={arXiv preprint arXiv:2403.18814},
      year={2024}
    }
      

    Acknowledgement

    This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    Examples









    主站蜘蛛池模板: 亚洲日韩国产欧美一区二区三区 | 国产精品亚洲一区二区在线观看| 精品视频一区二区三区| 亚洲片国产一区一级在线观看| 免费一区二区视频| 日本亚洲国产一区二区三区| 无码精品国产一区二区三区免费| 国产一区在线mmai| 精品一区二区三区电影| 国产亚洲情侣一区二区无| 亚洲日本一区二区三区| 国产精品一区二区久久精品| 国产综合精品一区二区| 中文字幕一区日韩在线视频| 亚洲一区二区三区亚瑟| 高清一区二区三区视频| 午夜精品一区二区三区免费视频| 亚洲性色精品一区二区在线| 一级特黄性色生活片一区二区| 精品国产一区二区22| 麻豆AV一区二区三区| 国产一区二区三区在线免费观看 | 无码人妻av一区二区三区蜜臀| 白丝爆浆18禁一区二区三区 | 交换国产精品视频一区| 国产乱码精品一区二区三区四川| 国产精品一区二区三区99| 日韩在线不卡免费视频一区| 国产精品久久亚洲一区二区 | 国产精品一区在线麻豆| 国产一区二区三区精品久久呦| 日本中文字幕一区二区有码在线| 毛片一区二区三区无码| 美女视频在线一区二区三区| 国产福利一区二区| av无码免费一区二区三区| 三上悠亚亚洲一区高清| 日韩一区二区在线免费观看| 亚洲一区爱区精品无码| 一区二区三区在线观看中文字幕| 国产精品香蕉在线一区|