图标描述
Human Image Personalization with High-fidelity Identity Preservation

1The University of Hong Kong   2Alibaba Group   3Ant Group

Update on May 14

🔥🔥🔥 Online Demo 🔥🔥🔥 has been graciously provided by Sakib Ahamed . For optimal results, please use the following hyper-parameters instead of the default hyper-parameters of this demo to obtain stable ID Fidelity. It's strongly advised to read through this tutorial / 中文教程 before jumping in.

感谢 Sakib Ahamed 提供的 🔥🔥🔥 Online Demo 🔥🔥🔥, 为了获得更好的生成结果,请使用以下参数进行尝试而不是 demo 中默认参数, 我们强烈建议您请在使用前先行阅读 中文教程 获取一些参数经验,否则可能会导致生成结果较差。

```Recommended hyper-parameters to obtain stable ID Fidelity
# Please include the age word in the prompt, e.g. young woman/man
# Otherwise, FlashFace tends to produce middle-aged faces, which tend to be fatter.
# 请在 prompt 包含年龄词如 young,否则 FlashFace 可能生成中年照片,脸部一般偏胖


positive prompt: A handsome young man / A beautiful young woman .......
face position: [0.3, 0.2, 0.6, 0.5] # avoid generating faces that are too large or too small
Reference Feature Strength: 1.2
Reference Guidance Strength: 3.2
Step Index to Launch Ref Guidance: 750

# When artifacts are found on the face, reduce these three values appropriately.
# 当脸部出现贴图感,请适当调低这三个参数
```


```Recommended hyper-parameters to change the age
# Please include age words in the prompt, e.g. baby girl/boy,  An very old woman/man


positive prompt: A baby girl / An very old woman ......
face position: [0.3, 0.2, 0.6, 0.5] # avoid generating faces that are too large or too small
default_text_control_scale = 8.5

Reference Feature Strength: 0.9
Reference Guidance Strength: 2.5
Step Index to Launch Ref Guidance: 750

# When ID Fidelity is not enough for the changed-aged person, please turn these values up
# 如果感觉人脸相似度不够,请调高这三个参数
```


Update on April 19

We are excited to announce that the inference code for FlashFace-SD1.5 is now available. This released version represents a clear advancement over our paper, as it has been trained for an extended duration. Notably, this enhanced checkpoint demonstrates remarkable progress in terms of lighting and shadow effects. It show strong identity preservation ability even for non-celebrities . Please refer to the following images for more details. Stay tuned for more versions!








Human Image Personalization Results

Diverse human image personalization results produced by our proposed FlashFace, which enjoys the features of
(1) preserving the identity of reference faces in great details (e.g., tattoos, scars, or even the rare face shape of virtual characters)
(2) accurately following the instructions especially when the text prompts contradict the reference images (e.g., customizing an adult to a ``child'' or an ``elder'').

pipeline
pipeline
pipeline
pipeline

Change the age or gender

pipeline

Turn virtual characters into real people

pipeline

Make real people to artworks

pipeline

Identity Mixing

pipeline

Face Swapping Under Language Control

pipeline

Abstract

This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt. Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following, benefiting from two subtle designs. First, we encode the face identity into a series of feature maps instead of one image token as in prior arts, allowing the model to retain more details of the reference faces (e.g., scars, tattoos, and face shape ). Second, we introduce a disentangled integration strategy to balance the text and image guidance during the text-to-image generation process, alleviating the conflict between the reference faces and the text prompts (e.g., personalizing an adult into a ``child'' or an ``elder''). Extensive experimental results demonstrate the effectiveness of our method on various applications, including human image personalization, face swapping under language prompts, making virtual characters into real people, etc.

Pipeline

The overall pipeline of FlashFace. During training, we randomly select B ID clusters and choose N+1 images from each cluster. We crop the face region from N images as references and leave one as the target image. This target image is used to calculate the loss. The input latent of Face ReferenceNet has shape (B*N) x 4 x h x w. We store the reference face features after the self-attention layer within the middle blocks and decoder blocks. A face position mask is concatenated to the target latent to indicate the position of the generated face. During the forwarding of the target latent through the corresponding position in the U-Net, we incorporate the reference feature using an additional reference attention layer. During inference, users can obtain the desired image by providing a face position(optional), reference images of the person, and a description of the desired image.

pipeline

BibTeX

@misc{zhang2024flashface,
            title={FlashFace: Human Image Personalization with High-fidelity Identity Preservation}, 
            author={Shilong Zhang and Lianghua Huang and Xi Chen and Yifei Zhang and Zhi-Fan Wu and Yutong Feng and Wei Wang and Yujun Shen and Yu Liu and Ping Luo},
            year={2024},
            eprint={2403.17008},
            archivePrefix={arXiv},
            primaryClass={cs.CV}
      }