I will join Google Research as a Research Scientist this summer. Previously, I was a PhD student of computer science at University of Washington, where I enjoyed my times with Brian Curless and Ira Kemelmacher-Shlizerman.
I was formerly a principal engineer in CyberLink and received my bachelor's and master's degree from National Taiwan University advised by Ja-Ling Wu.
My research interests broadly cover 3D vision, neural rendering, and computational photography, with expertise on human modeling.
I love pixels and believe the power of simple intuition and creative thinking.
I was wandering at Google in 2019 summer, where I found lots of fun in the Augmented Perception group led by Shahram Izadi. With Rohit Pandey, Christian Hane, Sofien Bouaziz, and Sean Fanello, we delved into the world of 3D neural rendering: the good, the bad, and the beauty. | |
Summer 2018 was an amazing experience for me at Facebook Seattle as a research intern. I have the privilege of collaborating with the computational photography team led by Michael Cohen. We worked to find innovative solutions for neural rendering and modeling human bodies. | |
CyberLink is a worldwide multimedia editing and playback software company in Taiwan. During the period, I served as Tech Lead and was fortunate to work with a group of talents, developing 13 image/video technologies and filing 9 granted patents for consumer products. |
PersonNeRF: Personalized Reconstruction from Photo Collections |
Chung-Yi Weng, Pratul P. Srinivasan, Brian Curless, Ira Kemelmacher-Shlizerman |
CVPR, 2023 |
We present PersonNeRF, a method that takes a collection of photos of a subject
captured with arbitrary viewpoints, body poses and appearances, and builds a personalized space that enables rendering the
subject in an intuitive way with novel combinations of these attributes.
Above we rebuild Roger Federer's personalized space from his photos across more than 10 years. |
project page | paper | video |
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video |
Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, Ira Kemelmacher-Shlizerman |
CVPR, 2022 (Oral Presentation) |
We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. |
project page | paper | video | code |
Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild |
Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman |
arXiv, 2020 |
Given an "in-the-wild" video, we train a deep network with the video frames to produce an animatable human representation that can be rendered from any camera view in any body pose, enabling applications such as motion re-targeting and bullet-time rendering without the need for rigged 3D meshes.
Here we rebuild a 3D animatable Roger Federer from a video of 2015 US Open Final. Please check out the dynamic versions of the results on the project page. |
project page | paper | video |
Photo Wake-Up: 3D Character Animation from a Single Photo |
Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman |
CVPR, 2019 |
Given a single photo as input (far left), we create a 3D animatable version of the subject, which can now walk towards the viewer (middle). The 3D result can be experienced in augmented reality (right); in the result above the user has virtually hung the artwork with a HoloLens headset and can watch the character run out of the painting from different views. Please get details and dynamic versions of the results on the project page. |
project page | paper | video | MIT Tech Review |
RoleNet: Movie Analysis from the Perspective of Social Networks |
Chung-Yi Weng, Wei-Ta Chu, Ja-Ling Wu |
IEEE Transactions on Multimedia 2009 |
Inspired by idea of social network analysis, we propose a novel way to analyze movie videos from the perspective of social networks. The relationship between characters in a movie is elaborately described as a network, called RoleNet. Based on RoletNet, further network analysis is performed to extract semantic information in the movie, including leading roles, macro/micro community structures, and hidden story lines and story units. |
paper |
Tiling Slideshow |
Jun-Cheng Chen, Wei-Ta Chu, Jin-Hau Kuo, Chung-Yi Weng, Ja-Ling Wu |
ACM Multimedia 2006 (Best Paper Award) |
Tiling Slideshow is a brave new photo displaying method to arrange photos in a tile-like manner, coordinating with the pace of background music. Photo clustering is applied based on the relationship between photos; music beat detection is perfomed in order to trigger the progress of slideshow; photo importance is computed to help ROI determination. Finally, the layout organization is formulated as a constrianed optimization problem to make sure the most satisfied composition results could be produced. |
project page | paper |
MoonRiver: Deep Neural Network in C++ |
Chung-Yi Weng |
MoonRiver is a deep neural network framework built from scratch using C++. Our goal is to shed light on the complex inner working flow of network learning. We have designed MoonRiver to be (1) lightweight: no third-party dependencies, easy to compile with any standard C++ compiler; (2) scalable: effortlessly designing and learning large networks with minimal fuss. We demonstrate the effectiveness of MoonRiver by training and testing auto-encoder and LeNet. The codes used to implement these networks are concise and the experimental results are promising. |
poster | report |
Becoming Brad Pitt |
Chung-Yi Weng, Xuan Luo |
Final Project, Computer Vision (UW CSE576) |
In this project, we designed a face reenactment system that enables users to control another person's (e.g Brad Pitt) head motions and facial expressions. Our system involves implementing a high-speed tracking module, a puppetry module to control facial expression and a morphig module to enforce smooth transition when switching inbween different identities. Experiments show that our system can reenact head motions and facial expressions of a target person in real time. |
report | video |
Hyperlapse Video Creation |
Chung-Yi Weng, Xin Yang |
Final Project, Computer Graphics (UW CSE557) |
The project aims to create a hyperlapse video in real-time. To this end, we describe the input video as a graph where the nodes represent frames and the edges are cost functions that penalize undesired frame transitions. A heyperlapse video with a given target speed can be generated by finding an optimal path in the graph. |
project page |
Local Collision Avoidance for a Nano-drone |
Melanie Anderson, Chung-Yi Weng, Bindita Chaudhuri |
Final Project, Robotics (UW CSE571) |
In this project, we have come up with a system that lets a nano-drone navigate an indoor space filled with static and moving objects. We do this by creating a map of the environment using raw laser data, represented as an occupancy grid. To find the best route to the goal, we use A* search. If any moving objects get in the way during navigation, we update the map and come up with a new obstacle-free path. |
report video |
Magic Selection |
Industry Product @ CyberLink |
Magic Selection assists users in selecting objects on an image by predicting object boundaries based on user-drawn foreground/background strokes. |
Smart Lasso |
Industry Product @ CyberLink |
Smart Lasso adjusts a user-drawn boundary to fit the true object boundary, providing an alternative approach for labeling objects. |
Soft Matting |
Industry Product @ CyberLink |
Soft Matting predicts opacity values on user-labeled regions, applying image matting in an interactive manner. Matting technique is helpful for refining boundary of objects such as fur or hairs. |
Perfect Group Shot |
Industry Product @ CyberLink |
By capturing several photos in a short period, Perfect Group Shot creates a group photo by seamlessly compositing human subjects with the best facial expressions (e.g. smiling without eye blinking) from different images. |
Sequence Shot |
Industry Product @ CyberLink |
Sequence Shot creates a time-lapse effect in a single image by detecting and combining moving subjects from multiple photos. |
Panorama |
Industry Product @ CyberLink |
Panorama is an image stiching technique for creating a wide-angle view photo from multiple images. Our system allows for taking as input out-of-order images and is robust to panning, zooming, and moving objects. |
Holmes ‐ (Object Tracking by Rectangle or Point) |
Industry Product @ CyberLink |
Holmes tracks objects in a video given a point or a rectangle as input. The tracked point or rectangle are used for video editing tasks such as adding a dialog box on top of an object. |
Cupid ‐ (Object Tracking by Object Boundary) |
Industry Product @ CyberLink |
Cupid tracks an object as well as its region in a video, taken as input a user-labeled object mask at the first frame. The generated object masks are used for downstream video editing such as applying filters on the labeled object. |
Robust Face Detection |
Industry Product @ CyberLink |
Building upon sparse and high-dimension features, we train a face detector with a hierarchical architecture that performs swift detection and is robust to in-plane and out-of-plane rotations. |
Facial Landmark Localization |
Industry Product @ CyberLink |
We train a facial landmark detector in an iterative optimization process by selecting the most discriminative feature in each iteration. The detection is real-time and robust to occlusions and hard shadows. Building upon the technique, CyberLink starts a startup, Perfect Corp., that focuses on virtual makeup and goes public on the NYSE in 2022 |
Face Login |
Industry Product @ CyberLink |
Face Login takes as input video frames from an RGB webcam and recognizes human faces to determine authorized users. The system adopts online learning approach to update face models, hence robust to different light conditions. |
FaceME ‐ (Face Clustering/Recognition in Photos/Videos) |
Industry Product @ CyberLink |
FaceME recognizes and clusters faces in photos and videos with similar appearances, assisting users to effectively tag faces and organize personal datasets. |
Chaplin ‐ (Hand Tracking Based Button Control System) |
Industry Product @ CyberLink |
Chaplin tracks your hand movements to control the buttons displayed on the screen. Just wave your hand to activate the system, and it follows your hand's every move. All it takes is a webcam to provide the RGB frames as input. |
Patent No. | Country | Title |
---|---|---|
US 8,121,358 | US | C.-Y. Weng, W.-T. Tsai, and C.-M. Lee, "Method of Grouping Images by Faces" |
US 8,531,478 | US | C.-M. Lee, C.-Y. Weng, "Method of Browsing Photos based on People" |
US 8,649,602 | US | S.-M. Tang, C.-Y. Weng, and J.-H. Huang, "Systems and Methods for Tagging Photos" |
US 8,693,739 | US | C.-Y. Weng, S.-M. Tang, and H.-C. Huang , "Systems and Methods for Performing Facial Detection" |
US 8,761,519 | US | M.-H. Chang, C.-Y. Weng, "System and method for Selecting an Object Boundary in an Image" |
US 8,769,409 | US | C.-Y. Weng, H.-C. Huang , "Systems and Methods for Improving Object Detection" |
US 8,867,789 | US | H.-W. Hsiao, C.-Y. Weng, "Systems and Methods for Tracking an Object in a Video" |
US 9,336,583 | US | H.-C. Huang, H.-W. Hsiao, C.-Y. Weng, and C.-D. Chung, "Systems and Methods for Image Editing " |
JP 4,925,370 | JP | C.-Y. Weng, W.-T. Tsai, and C.-M. Lee, "Method of Grouping Images by Faces" |