Skip to content

    Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

    Notifications You must be signed in to change notification settings

    ragavsachdeva/magi

    Folders and files

    NameName
    Last commit message
    Last commit date

    Latest commit

    ?

    History

    22 Commits
    ?
    ?

    Repository files navigation

    Magi, The Manga Whisperer

    Static Badge Static Badge Dynamic JSON Badge Static Badge

    Static Badge Static Badge Dynamic JSON Badge Static Badge

    Table of Contents

    1. Magiv1
    2. Magiv2
    3. Datasets

    Magiv1

    Magi_teaser

    v1 Usage

    from transformers import AutoModel
    import numpy as np
    from PIL import Image
    import torch
    import os
    
    images = [
            "path_to_image1.jpg",
            "path_to_image2.png",
        ]
    
    def read_image_as_np_array(image_path):
        with open(image_path, "rb") as file:
            image = Image.open(file).convert("L").convert("RGB")
            image = np.array(image)
        return image
    
    images = [read_image_as_np_array(image) for image in images]
    
    model = AutoModel.from_pretrained("ragavsachdeva/magi", trust_remote_code=True).cuda()
    with torch.no_grad():
        results = model.predict_detections_and_associations(images)
        text_bboxes_for_all_images = [x["texts"] for x in results]
        ocr_results = model.predict_ocr(images, text_bboxes_for_all_images)
    
    for i in range(len(images)):
        model.visualise_single_image_prediction(images[i], results[i], filename=f"image_{i}.png")
        model.generate_transcript_for_single_image(results[i], ocr_results[i], filename=f"transcript_{i}.txt")

    Magiv2

    magiv2

    v2 Usage

    from PIL import Image
    import numpy as np
    from transformers import AutoModel
    import torch
    
    model = AutoModel.from_pretrained("ragavsachdeva/magiv2", trust_remote_code=True).cuda().eval()
    
    
    def read_image(path_to_image):
        with open(path_to_image, "rb") as file:
            image = Image.open(file).convert("L").convert("RGB")
            image = np.array(image)
        return image
    
    chapter_pages = ["page1.png", "page2.png", "page3.png" ...]
    character_bank = {
        "images": ["char1.png", "char2.png", "char3.png", "char4.png" ...],
        "names": ["Luffy", "Sanji", "Zoro", "Ussop" ...]
    }
    
    chapter_pages = [read_image(x) for x in chapter_pages]
    character_bank["images"] = [read_image(x) for x in character_bank["images"]]
    
    with torch.no_grad():
        per_page_results = model.do_chapter_wide_prediction(chapter_pages, character_bank, use_tqdm=True, do_ocr=True)
    
    transcript = []
    for i, (image, page_result) in enumerate(zip(chapter_pages, per_page_results)):
        model.visualise_single_image_prediction(image, page_result, f"page_{i}.png")
        speaker_name = {
            text_idx: page_result["character_names"][char_idx] for text_idx, char_idx in page_result["text_character_associations"]
        }
        for j in range(len(page_result["ocr"])):
            if not page_result["is_essential_text"][j]:
                continue
            name = speaker_name.get(j, "unsure") 
            transcript.append(f"<{name}>: {page_result['ocr'][j]}")
    with open(f"transcript.txt", "w") as fh:
        for line in transcript:
            fh.write(line + "\n")

    Datasets

    Disclaimer: In adherence to copyright regulations, we are unable to publicly distribute the manga images that we've collected. The test images, however, are available freely, publicly and officially on Manga Plus by Shueisha.

    Static Badge Static Badge

    Other notes

    • Request to download Manga109 dataset here.
    • Download a large scale dataset from Mangadex using this tool.
    • The Manga109 test splits are available here: detection, character clustering. Be careful that some background characters have the same label even though they are not the same character, see.

    License and Citation

    The provided models and datasets are available for academic research purposes only.

    @InProceedings{magiv1,
        author    = {Sachdeva, Ragav and Zisserman, Andrew},
        title     = {The Manga Whisperer: Automatically Generating Transcriptions for Comics},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        month     = {June},
        year      = {2024},
        pages     = {12967-12976}
    }
    
    @misc{magiv2,
          author={Ragav Sachdeva and Gyungin Shin and Andrew Zisserman},
          title={Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names}, 
          year={2024},
          eprint={2408.00298},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2408.00298}, 
    }
    

    About

    Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published
    主站蜘蛛池模板: 国内国外日产一区二区| 国产成人精品视频一区二区不卡| 亚洲Av无码一区二区二三区| 久久久一区二区三区| 日本一区二区三区精品国产 | 无码一区二区三区爆白浆| 日本国产一区二区三区在线观看 | 精品久久久中文字幕一区| 亚洲欧美日韩一区二区三区在线| 最新中文字幕一区| 日本国产一区二区三区在线观看| 亚洲第一区在线观看| 卡通动漫中文字幕第一区| 精品福利一区3d动漫| 国产精品成人99一区无码| 日本一区二区三区久久| 精品一区精品二区| 精品国产a∨无码一区二区三区| 国产精品成人一区无码| 亚洲AV无码一区二区三区国产| 无码人妻精品一区二区| 精品中文字幕一区二区三区四区| 亚洲综合无码一区二区| 99久久无码一区人妻a黑| 亚洲午夜一区二区三区| 蜜桃AV抽搐高潮一区二区| 日本人的色道www免费一区| 国模精品一区二区三区| 日本伊人精品一区二区三区| 精品一区二区三区免费毛片爱 | 精品人伦一区二区三区潘金莲| 色一情一乱一伦一区二区三区| 精品国产福利一区二区| 国产精品免费综合一区视频| 91在线精品亚洲一区二区| 国产伦精品一区二区三区免.费 | 亚洲一区二区三区成人网站| 亚洲AV无码一区二区三区性色| 亚洲国产精品一区二区第一页 | 国精产品一区一区三区免费视频| 亚洲av无码片vr一区二区三区|