Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    stepfun-ai/step-audio ? ? 17 Feb 2025

    Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.

    Instruction Following Voice Cloning

    3,036
    6.18 stars / hour

    Craw4LLM: Efficient Web Crawling for LLM Pretraining

    cxcscmu/crawl4llm ? 19 Feb 2025

    Web crawl is a main source of large language models' (LLMs) pretraining data, but the majority of crawled web pages are discarded in pretraining due to low data quality.

    398
    4.83 stars / hour

    SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

    openai/swelancer-benchmark ? 17 Feb 2025

    We introduce SWE-Lancer, a benchmark of over 1, 400 freelance software engineering tasks from Upwork, valued at \$1 million USD total in real-world payouts.

    1,059
    4.71 stars / hour

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    stepfun-ai/step-video-t2v ? ? 14 Feb 2025

    We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.

    Video Generation Video Reconstruction

    1,947
    3.67 stars / hour

    OmniParser for Pure Vision Based GUI Agent

    microsoft/omniparser ? ? 1 Aug 2024

    The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces.

    Natural Language Visual Grounding

    15,985
    3.48 stars / hour

    MoBA: Mixture of Block Attention for Long-Context LLMs

    moonshotai/moba ? ? 18 Feb 2025

    Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI).

    1,354
    2.96 stars / hour

    PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation

    microsoft/pike-rag ? 20 Jan 2025

    Despite notable advancements in Retrieval-Augmented Generation (RAG) systems that expand large language model (LLM) capabilities through external retrieval, these systems often struggle to meet the complex and diverse needs of real-world industrial applications.

    Language Modeling Language Modelling +3

    773
    2.13 stars / hour

    OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

    aslp-lab/osum ? ? 23 Jan 2025

    Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions.

    Event Detection Gender Classification +3

    268
    1.79 stars / hour

    Data Formulator 2: Iteratively Creating Rich Visualizations with AI

    microsoft/data-formulator ? 28 Aug 2024

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals.

    Code Generation Navigate

    8,022
    1.41 stars / hour

    Magma: A Foundation Model for Multimodal AI Agents

    microsoft/Magma ? 18 Feb 2025

    We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds.

    Robot Manipulation

    437
    1.40 stars / hour
    主站蜘蛛池模板: 91福利国产在线观看一区二区| 在线观看国产一区| 区三区激情福利综合中文字幕在线一区 | 日韩一区二区超清视频| 久久精品无码一区二区三区免费| 性色av无码免费一区二区三区 | 亚洲熟女一区二区三区| 91在线精品亚洲一区二区| 国产欧美一区二区精品仙草咪| 人妻无码一区二区视频| 少妇无码一区二区三区| 亚洲一区二区三区写真| 国产AV国片精品一区二区| 久久无码AV一区二区三区| 国产精品一区二区久久乐下载 | 亚洲韩国精品无码一区二区三区 | 一区免费在线观看| 日韩高清国产一区在线| 波多野结衣一区二区三区高清在线| 国产一区二区三区在线视頻 | 国产区精品一区二区不卡中文| 亚洲AV无码一区二区三区电影| 国产激情精品一区二区三区| 日韩毛片基地一区二区三区| 国产伦精品一区二区三区女| 国产拳头交一区二区| 无码人妻久久一区二区三区| 无码人妻啪啪一区二区| 日本精品视频一区二区三区| 区三区激情福利综合中文字幕在线一区| 亚洲一区二区久久| 午夜福利一区二区三区在线观看 | 亚洲免费视频一区二区三区| 一区二区三区四区精品视频| 午夜爽爽性刺激一区二区视频| 九九久久99综合一区二区| 国产一区二区福利久久| 亚洲愉拍一区二区三区| 亚洲一区二区中文| 内射女校花一区二区三区| 鲁丝丝国产一区二区|