Saturday, December 18, 2004
Whistler, British Columbia, Canada
Saturday Morning Session: 7:30am – 10:30am7:30am – Introduction and Goals of the Workshop
Radek Grzeszczuk, Intel Labs
Aaron Hertzmann, University of Toronto7:45am – Data-Driven Computer Graphics
Hanspeter Pfister, Mitsubishi Electric Research Laboratories
During its formative years, computer graphics has focused largely on modeling the everyday world with analytic representations that are transformed into images and animations using efficient simulations. These analytic models are used for the representation of surface shape, the description of surface reflectance, and the application of physics for simulating the dynamics of elastic materials, to name just a few examples. At present, this framework for computer graphics is very mature and has produced some astounding results in movies and computer games. Nevertheless, much of the world is too complex to be described by analytic models, which can be easily confirmed by observations in our every-day environments or during a Sunday stroll.
In last ten years, we have also witnessed significant technological developments in the areas of high-quality sensors and measurement devices. In addition, powerful computers have become pervasive, which has greatly increased our ability to handle complexity. In this talk I will explore new, data-driven approaches to computer graphics that model the world around us directly from measurements. These approaches differ from the classical analytic models and instead depend on cameras for data acquisition, machine learning for generalization, and signal processing for image synthesis.
I will discuss three specific computer graphics applications of data-driven modeling. The first, called image-based 3D photography, addresses the problem of creating and rendering high-quality computer graphics models of arbitrary real-world objects. Second, I will discuss the problem of interpolating and extrapolating new reflectance models (specifically isotropic BRDFs) from a collection of acquired samples. Finally, I will present a practical data-driven system for the animation of faces, where identity and performance parameters are lifted directly from video.
8:10am – Example-Based Image Analysis and Synthesis
William T. Freeman, Massachusetts Institute of Technology
8:35am – Epitome as an Image Representation
Nebojsa Jojic, Microsoft Research
We present novel simple appearance and shape models that we call epitomes. The epitome of an image is its miniature, condensed version containing the essence of the textural and shape properties of the image. As opposed to previously used simple image models, such as templates or basis functions, the size of the epitome is considerably smaller than the size of the image or the object it represents, but the epitome still contains most of the constitutive elements needed to reconstruct the image. A collection of images often shares and epitome, e.g. when images are a few consecutive frames from a video sequence, or when they are photographs of similar objects. A particular image in a collection is defined by its epitome and a smooth mapping from the epitome to the image pixels. When the epitomic representation is used within a hierarchical generative model, appropriate inference algorithms can be derived to extract the epitome from a single image or collection of images and at the same time, perform various inference task such as image segmentation, motion estimation, object removal, super-resolution, image inpaiting, etc.
9:00am – Discussion
9:20am – Coffee Break
9:40am – Algorithmic and Computational Issues in 3D Modeling from Images
Radek Grzeszczuk, Intel Labs
New advances in high-quality sensing devices have simplified the acquisition and the synthesis of 3D content. Despite this impressive progress it is hard to escape the conclusion that 3D content creation is facing serious challenges. The shortage of powerful data analysis algorithms limits the potential of existing acquisition devices, and requirements for increased model complexity necessitate ever more sophisticated acquisition systems that generate yet more data. In this talk, I will present examples of successful applications of data analysis tools to generation of highly compressed, yet accurate, representations of large image-based datasets. I will then discuss some of the unique challenges that 3D content creation through data analysis pose, e.g., processing of large data sets, limited execution time, high visual quality reconstruction. I will conclude by looking at the problem of data analysis from the point of view of computational efficiency and scalability. I will present examples of how to exploit special problem structure to develop algorithms with favorable computation to memory bandwidth ratio and with scalability needed to handle the large analysis problems encountered in modeling.
10:05am – Computational Learning for GraphicsDemetri Terzopoulos, New York University & University of Toronto
We will investigate the promising role that computational learning can play in tackling challenging problems that arise in computer graphics. In particular, the presentation will address image-based rendering and physics-based animation:
- Learning multilinear (tensor) models of image ensembles, applied to rendering the detailed appearance of textured surfaces with complex mesostructural self-occlusion, interreflection and self-shadowing.
- Efficient neural network emulators of physical dynamics, dubbed "NeuroAnimators", which may be trained to produce physically realistic motions by observing simulated physical systems in action.
- Biomechanically modeled artificial animals that, through sensor-guided reinforcement learning, are capable of acquiring motor controllers that produce lifelike, muscle-actuated locomotion.
- Support vector machine methods that learn the functional competencies of composable motor controllers for the physics-based animation of articulated, anthropomorphic figures.
Saturday Afternoon Session: 4:00pm – 7:00pm
4:00pm – Convergence of Graphics and Vision: A Probabilistic View
Aaron Hertzmann, University of Toronto
I will argue for a unified view of computer graphics and computer vision, in which the problems of each area correspond to estimating different unknowns within a probabilistic generative model (a.k.a., belief networks such as HMMs or non-linear probabilistic PCA). This unified view eliminates the need for many of the heuristics that would otherwise be needed by data-driven graphics algorithms, provides very high-quality computer vision results due to high-quality generative models, and provides a theoretical unity to these fields. The use of probabilistic models from machine learning is central to making this unification possible. I will show examples in estimating 3D character motion/pose/shape (vision), and creating 3D character pose/motion (graphics).
4:25pm – Learning MRFs for Surface Smoothing and Submeshing in 3D Vision and
Computer GraphicsSebastian Thrun, Stanford University
Jamies Diebel, Stanford University
We will discuss a MRF algorithm for smoothing and sub-meshing range scans acquired by stereo vision systems. Our approach uses MRFs defined over surface points and surface normals for expressing smoothness constraints that subsequently are used for finding lower-complexity approximations. Our approach learns non-linear potentials that relate surface vertices and surface normals in a way that makes it possible to denoise a 3-D model while enhancing visually distinct features, such as edges. Results seem to be an improvement over the state of the art in computer graphics and computer vision.
4:50pm – The Need for Strong Domain Knowledge in Learning Models for Computer Animation
Zoran Popović, University of Washington
One of the key challenges in computer graphics is the development of methods that create high-fidelity realism of shape and motion for humans and other living creatures. Bottom-up approaches to this problem try to develop detailed models of dynamics and musculo-skeletal structure in order to synthesize realistic shape and movement. Unfortunately, the underlying complexities of the natural motion are too great and, to a large extent, still unknown. In computer graphics, machine learning approaches tend to capture detailed realistic nuances by using large data-sets. Unfortunately, they often use overly simplified models, thus providing little control to the synthesis process.
This talk will present an argument for combining real-world data with the sophisticated models of natural systems to produce a controllable high-fidelity shape and motion of humans significantly beyond what has already been seen in the input data.
Specifically, I will describe a template-based model for representation and exploration of the space of human shape, that can produce realistic human shapes that meet specific parameters such as height, weight, body fat, etc. Subsequently, I will describe the reduced representation of the space of natural human poses, that captures the style of an individual and can be subsequently used to solve a wide range of inverse kinematics problems. Finally, I will show how a momentum-based model of human movement can be used to parameterize the dynamics of human motion. I will demonstrate how this framework can be used for real-time synthesis of a wide range of movement from a single input motion capture sequence.
5:15pm – Coffee Break
5:25pm – Learning Probabilistic Models of Human Motion for Inference and Synthesis
Michael J. Black, Brown University
I will describe recent work on learning probability distributions and its application to human motion analysis. Kinematic body models tyically have 30 or more parameters representing the motion of the human body joints. These are correlated with each other and over time. Joint distributions over these parameters are needed to accurately model human motion but they are highly non-Gaussian and difficult to model. I will present new tools based on maximum entropy learning for modeling such distributions from limited training data.
5:50pm – Looking at PeopleDavid A. Forsyth, University of California, Berkeley
Deva Ramanan , University of California, Berkeley
An important, open vision problem is to describe what people are doing in a sequence of video. This problem is difficult for several reasons. First, one must determine the configuration of each person's body; but it is hard to track people accurately, because many parts of the body are small and fast-moving. Second, one must determine motion paths from the tracks; this is tricky, because one must sew together tracker reports into a motion path that could, indeed, be human. Finally, one must describe what the person is doing; this problem is poorly understood, not least because there is no natural or canonical set of categories into which to classify activities. In this talk, I will describe progress on this problem obtained by marrying a tracker and a motion synthesis system. The tracker obtains measurements of body configuration from the image; the motion synthesis system takes those measurements and generates motions that are (a) clearly human and (b) close to the image measurements. Since the synthesis system uses labelled frames, we also obtain a labelling of each image frame.
6:15pm – Panel Discussion
All Speakers