DreamTalk

    Diffusion-based Expressive Talking Head
    Generation Framework.
    dreamtalk

    When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

    Yifeng Ma1, Shiwei Zhang2, Jiayu Wang2, Xiang Wang3, Yingya Zhang2, Zhidong Deng1

    1Tsinghua University, 2Alibaba Group, 3Huazhong University of Science and Technology

    Diffusion models have shown remarkable success in a variety of downstream generative tasks, yet remain under-explored in the important and challenging expressive talking head generation. In this work, we propose a DreamTalk framework to fulfill this gap, which employs meticulous design to unlock the potential of diffusion models in generating expressive talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network is able to consistently synthesize high-quality audio-driven face motions across diverse expressions. To enhance the expressiveness and accuracy of lip motions, we introduce a style-aware lip expert that can guide lip-sync while being mindful of the speaking styles. To eliminate the need for expression reference video or text, an extra diffusion-based style predictor is utilized to predict the target expression directly from the audio. By this means, DreamTalk can harness powerful diffusion models to generate expressive faces effectively and reduce the reliance on expensive style references. Experimental results demonstrate that DreamTalk is capable of generating photo-realistic talking faces with diverse speaking styles and achieving accurate lip motions, surpassing existing state-of-the-art counterparts.

    The code and checkpoints are released.

    Overview

    Generalization Capabilities: Songs
    送別 Farewell (Chinese), Love Story (English)
    More Songs
    上海灘 The Bund (Cantonese), Lemon (Japanese), All For Love (English)
    Generalization Capabilities: Out-of-domain Portraits

    Generalization Capabilities: Speech in Multiple Languages
    Speech in Chinese, French, German, Italian, Japanese, Korean, and Spanish
    Generalization Capabilities: Noisy Audio

    Speaking Style Manipulation
    Adjusting the Scale of Classifier-free Guidance; Style Code Interpolation
    Speaking Style Prediction

    If you are seeking an exhilarating challenge and the chance to collaborate with AIGC and large-scale pretraining, then you have come to the right place. We are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com

    References

    @article{ma2023dreamtalk,
    title={DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models},
    author={Ma, Yifeng and Zhang, Shiwei and Wang, Jiayu and Wang, Xiang and Zhang, Yingya and Deng, Zhidong},
    journal={arXiv preprint arXiv:2312.09767},
    year={2023}
    }

    主站蜘蛛池模板: 免费av一区二区三区| 亚州国产AV一区二区三区伊在| 精品国产高清自在线一区二区三区| 国产成人精品无人区一区| 亚洲一区无码中文字幕乱码| 久久亚洲色一区二区三区| 国产一区二区视频免费| 一区二区不卡在线| 久久综合亚洲色一区二区三区| 中文字幕色AV一区二区三区| 日韩精品无码Av一区二区| 精品无码一区二区三区水蜜桃| 国产精品99精品一区二区三区| 影音先锋中文无码一区| 精品乱码一区二区三区四区| 亚洲一区二区电影| 亚洲一区二区三区四区在线观看| 国产成人精品视频一区二区不卡| 中文字幕在线观看一区二区| 国产伦精品一区二区三区视频猫咪 | 乱精品一区字幕二区| 亚洲熟妇av一区二区三区漫画| 亚洲国模精品一区| 精品一区二区三区免费观看| 精品国产亚洲一区二区在线观看 | 老熟妇仑乱一区二区视頻| 久久精品亚洲一区二区三区浴池| 久久久久无码国产精品一区| 人妻少妇久久中文字幕一区二区| 亚洲视频在线一区二区三区| 久久久久人妻一区精品色| 在线精品一区二区三区| 性色av闺蜜一区二区三区| 全国精品一区二区在线观看| 一区二区三区在线观看中文字幕 | 久久精品午夜一区二区福利| 97人妻无码一区二区精品免费| 伊人久久精品一区二区三区| 国产精品一区二区久久沈樵| 国产激情一区二区三区| 国产主播一区二区|