Shilong Zhang(张士龙)

Shilong is a second-year(2023-now) Ph.D. student in the Department of Computer Science at The University of Hong Kong (HKU), under the guidance of Prof.Ping Luo. Prior to this, he worked at Shanghai AI Lab, where he was part of the team led by kai Chen.

Shilong completed his Bachelor's degree at the University of Science and Technology of China (USTC) in 2019 and was recognized as one of the outstanding graduates(73/1824 ≈ 4%).

His research interests are primarily focused on computer vision and deep learning, specifically Object Detection and Large Vision-Language Model. He is also a core developer of MMDetection and MMCV.

Email  /  Google Scholar  /  Github

profile photo

Recent News
  • [2024/3/26] We propose FlashFace that can generate high ID fidelity images in seconds. Code has been released!
  • [2023/7/7] We present a vision and language model named GPT4RoI to do region-level image understanding.
  • [2023/4/26] We present a vision and language model named MultiModal-GPT .
  • [2023/3/20] Two papers was accepted by CVPR 2023. DDQ DETR achieve 52.1 AP with R-50 backbone within 12 epochs.
  • [2022/3/15] One paper was accepted by CVPR 2022.
  • [2021/11/27] We release MMFewShot, an open source few shot learning toolbox based on PyTorch.
  • [2021/5/8] One paper was accepted by ICML 2021.
  • [2020/2/24] One paper was accepted by CVPR 2020.
  • [2019/6/28] Awarded as outstanding graduates by USTC.

Publications

[1] FlashFace: Human Image Personalization with High-fidelity Identity Preservation   
Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo
This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt
Code has been released at this repo !

[2] GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest    Shilong Zhang*, Peize Sun*, Shoufa Chen, Min Xiao, Wenqi Shao, Wenwei Zhang, Kai Chen, Ping Luo
(* Equal contribution) ,
We present a vision and language model named GPT4RoI to do region-level image understanding.
Code has been released at this repo !

[3] MultiModal-GPT: A Vision and Language Model for Dialogue with Humans   
Tao Gong*, Chengqi Lyu*, Shilong Zhang*, Yudong Wang*, Miao Zheng*, Qian Zhao*, Kuikun Liu*, Wenwei Zhang*, Ping Luo, Kai Chen
(* random order) ,
We present a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans.
Code has been released at this repo !

[4] Dense Distinct Query for End-to-End Object Detection   
Shilong Zhang*, Xinjiang Wang*, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, Kai Chen
CVPR2023(* Equal contribution) ,
DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting.
Code has been released at this repo !

[5] Consistent-Teacher: Towards Reducing Inconsistent Pseudo-targets in Semi-supervised Object Detection   
Xinjiang Wang*, Xingyi Yang*, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, Wayne Zhang
CVPR2023 (* Equal contribution) ,
It achieves 40.0 mAP with ResNet-50 backbone given only 10% of annotated MS-COCO data, which surpasses previous baselines using pseudo labels by around 3 mAP. When trained on fully annotated MS-COCO with additional unlabeled data, the performance further increases to 47.2 mAP.
Code has been released !

[6] Group R-CNN for Point-based Weakly Semi-supervised Object Detection   
Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou, Kai Chen
CVPR2022 (* Equal contribution) ,
We study the problem of weakly semi-supervised object detection with points (WSSOD-P). Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images.
Code has been released !

[7] Group Fisher Pruning for Practical Network Compression   
Liyang Liu*, Shilong Zhang*,Zhanghui Kuang,Jing-Hao Xue ,Aojun Zhou
ICML2021 (* Equal contribution) ,
We present a general channel pruning framework for complicated structures !
Code has been released !

[8] Scale-equalizing Pyramid Convolution for object detection   
Xinjiang Wang*, Shilong Zhang* , Zhuoran Yu, Litong Zhang, Wayne Zhang
CVPR2020 (* Equal contribution),
We proposed a scale-equalizing pyramid convolution method that relaxes the discrepancy between the feature pyramid and the gaussian pyramid. The module boosts the performance about 3.5 mAP in single-stage object detection with negligible inference time.
Code has been released !



Stolen from Jon Barron