Keynote Lectures

Silvio Savarese


Bio: Silvio Savarese is an Associate Professor of Computer Science at Stanford University and director of the SAIL-Toyota Center for AI Research at Stanford. He earned his Ph.D. in Electrical Engineering from the California Institute of Technology in 2005 and was a Beckman Institute Fellow at the University of Illinois at Urbana-Champaign from 2005–2008. He joined Stanford in 2013 after being Assistant and then Associate Professor of Electrical and Computer Engineering at the University of Michigan, Ann Arbor, from 2008 to 2013. His research interests include computer vision, robotic perception and machine learning. He is recipient of several awards including a Best Student Paper Award at CVPR 2016, the James R. Croes Medal in 2013, a TRW Automotive Endowed Research Award in 2012, an NSF Career Award in 2011 and Google Research Award in 2010. In 2002 he was awarded the Walker von Brimer Award for outstanding research initiative.  

Conference Day 1 

Seeing Objects and People in the 3D world: Visual Intelligence in Perspective

Abstract: Computers can now recognize objects from images, classify simple human activities or reconstruct the 3D geometry of an environment.  However, these achievements are far from the kind of coherent and integrated interpretations that humans are capable of from just a quick glance of the complex 3D world. When we look at an environment, we don’t just recognize the objects in isolation, but rather perceive a rich scenery of the 3D space, its objects, the people and all the relations among them. This allows us to effortlessly navigate through the environment, or to interact with objects in the scene with amazing precision or to predict what is about to happen next. In this talk I will give an overview of the research from my group and discuss our latest work on designing visual models that can process different sensing modalities and enable intelligent understanding of the sensing data.  I will also demonstrate that our models are potentially transformative in application areas related to autonomous or assisted navigation, smart environments, social robotics, augmented reality, and large scale information management.

Stan Sclaroff

StanSclaroffBio: Stan Sclaroff joined the BU Department of Computer Science in 1995 after completing his PhD at MIT. He founded the Image and Video Computing research group at Boston University in 1995. He served as the Chair of the Department from 2007-2013. Stan’s research interests are in computer vision, pattern recognition, and machine learning.

Stan is an expert in the areas of tracking, video-based analysis of human motion and gesture, deformable shape matching and recognition, as well as image/video database indexing, retrieval, and data mining methods. He developed one of the first content-based image retrieval systems for the Internet, the ImageRover, years before Google Image Search appeared. His more recent work has focused on human tracking algorithms, analysis and identification of hand motion related to sign language, and filtering methods for multimedia retrieval.  He is a Fellow of the IEEE and IAPR.

Conference Day 2 

Saliency and Personalization in Deep Models of Human Activities

Abstract: What is visually salient in models for classification of human activities? How can we adapt and better personalize models of human movements, activities, and gestures? In this talk, I will report on our recent research related to computer-vision based tracking and analysis of human actions, interactions and communicative behaviors. I will describe new methods we have developed for top-down saliency estimation in convolutional neural networks and recurrent neural network models, with applications to space-time localization and classification of human activities in video.   I will also describe our new formulation for personalizing gesture recognition using hierarchical Bayesian neural networks (HBNNs). Our HBNN models can adapt themselves to new subjects when only a small number of subject-specific personalization data is available.

Roberto Cipolla


Bio: Roberto Cipolla obtained a B.A. (Engineering) from the University of Cambridge in 1984 and an M.S.E. (Electrical Engineering) from the University of Pennsylvania in 1985. From 1985 to 1988 he studied and worked in Japan at the Osaka University of Foreign Studies (Japanese Language) and Electrotechnical Laboratory.In 1991 he was awarded a D.Phil. (Computer Vision) from the University of Oxford and from 1991-92 was a Toshiba Fellow and engineer at the Toshiba Corporation Research and Development Centre in Kawasaki, Japan. He joined the Department of Engineering, University of Cambridge in 1992 as a Lecturer and a Fellow of Jesus College. He became a Reader in Information Engineering in 1997 and a Professor in 2000. His research interests are in computer vision and robotics and include the recovery of motion and 3D shape of visible surfaces from image sequences; object detection and recognition; novel man-machine interfaces using hand, face and body gestures; real-time visual tracking for localisation and robot guidance; aplications of computer vision in mobile phones, visual inspection and image-retrieval and video search. He has authored 2 books, edited 11 volumes and co-authored more than 300 papers.

Conference Day 3  

Geometry, Uncertainty and Deep Learning 

Abstract: The last decade has seen a revolution in the theory and application of computer vision and machine learning. I will begin with a brief review of some of the fundamentals with a few examples from my own research group (3R’s of computer vision – reconstruction, registration and recognition – see research videos at

I will then introduce some recent results from real-time deep learning systems that exploit geometry and compute model uncertainty.

Understanding what a model does not know is a critical part of safe machine learning systems. New tools, such as Bayesian deep learning, provide a framework for understanding uncertainty in deep learning models, aiding interpretability and safety of such systems. Additionally, knowledge of geometry is an important consideration in designing effective algorithms. In particular, we will explore the use of geometry to help design networks that can be trained with unlabelled data for stereo and for  human body pose and shape recovery.