Projects

Thumbnail of Modeling Physics underlying visual inputs using Contextual RNN-GANs
Modeling Physics underlying visual inputs using Contextual RNN-GANs
August to December 2017

Abstract, Report

Understanding the motion of objects in order to predict and control their movements is one of the crucial problems in Artificial Intelligence (AI). It is evident that humans and a large number of animals possess this extraordinary ability of easily manipulating object motion using visual inputs. For example, it enables humans to drive vehicles safely, play games like billiards and football, navigate in crowded and newer environments. This seemingly simple, but stark, ability raises the natural question— how is the visual input used to infer and understand the motion of surrounding objects. Some recent works propose an explanatory framework based on generative physics representations, which states that the brain infers and retains a noisy, but detailed, representation of physics underlying the motion of objects and uses generative simulations based on these representations for predicting the object motion. In view of these works, it is extremely interesting and intellectually challenging to study how can such representations of physics underlying the visual inputs be modeled. Also, a model of the physical constraints on the visual inputs can be used to perform simulations, forecast object interactions and generate video sequences of moving objects. All of these potential applications are highly challenging problems in computer vision. In view of this, we aim to study the problem of modeling abstractions of physics that underlie given visual inputs and propose a contextual RNN-GAN based approach to learning these models.

Thumbnail of Cross Modality Supervision Transfer based Depth Estimation
Cross Modality Supervision Transfer based Depth Estimation
September to December 2016

Abstract, Report, Presentation

We propose to develop a model for depth maps estimation for an RGB sensor using the approach of supervision transfer across multiple modalities including– i. RGB and ii. Depth. The problem statement can be described as follows– We want to generate a model for depth estimation for a given RGB sensor. For this task, we consider another RGBD sensor and capture image frames that have some overlap in their field of view. We aim to learn CNN based representations for Depth of the RGB sensor using informations from the two sensors using supervision transfer across the different image modalities. We next aim to learn “invertible” CNNs to get partial network that can generate depth maps. If successful, the novel approach would add an extra modality, Depth, for the RGB sensor. The motivations for taking up this ambitious big project are twofold– i. It is an unexplored problem and has significant research value in contemporary field of vision ii. If successful, this approach will have multiple applications of immense practical importance. One such example is autonomous cars, where camera and depth sensor do not have identical field of view. It can be used in making low-cost motion capturing systems by replacing some of its many RGBD sensors with RGB cameras.

Thumbnail of RGBD Material Editing using commodity depth sensors
RGBD Material Editing using commodity depth sensors

Research Intern, INRIA Sophia Antipolis, France (supervised by George Drettakis, Adrien Bousseau, Erik Reinhard)
May to July 2016

Abstract, Presentation, Report-1, Report-2

We describe three approaches for texture synthesis based on material editing of indoor scenes using the data obtained from a commodity depth sensors. In the first approach we propose an extended version of patch match for material texture synthesis. The second approach is more of data- centric approach where we propose a combination of patch regression trained on a dataset and patch match based ap- proach for texture synthesis. In the third and final approach we exploit the recent advances in deep learning systems to develop an end to end pipeline for texture synthesis. The input for our system is an RGB image and a corresponding depth/normal map captured by a Kinect sensor. Material editing is carried out by changing depth/normals in a region of the image. Once editing is done, we use the proposed tex- ture synthesis algorithms to create real-like texture in the changed region of the image. We observe that the proposed approaches are able to capture structures effectively and also able to synthesize real-like textures. Each of the methods have their own limitations which are discussed in the indi- vidual sections. At the moment our approach is single view and runs offline on a PC.

Thumbnail of Realtime 3D reconstruction based Mixed Reality
Realtime 3D reconstruction based Mixed Reality
January to April 2016

Abstract, Report, Video

In this project, we develop a technique to create a mixed reality application where virtual objects can smartly interact with physical world. The aim of the project is to build a framework for developing various applications like helping challenged people, interactive visualizations and gaming. One of the application whose demo was shown was furniture placement in The project developed on Project Tango device [1] involves stitching of meshes acquired from the 3D point cloud data. We explore both online as well as offline techniques for stitching of meshes. Further, we augment virtual objects with animation and comprehensive interactions to forge a mixed reality and an alternate virtual reality (using a Head Mounted Display). The novelty of the proposed scheme is that all the processes except segmentation and path planning are executed real-time.

Publications

Thumbnail of Reactive Displays for Virtual Reality
Reactive Displays for Virtual Reality

G S S Srinivas Rao, Neeraj Thakur, and Vinay P. Namboodiri
Proceedings of 16th IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), Nantes, France, 2017

Abstract, DOI, Paper, Video, Poster

The feeling of presence in virtual reality has enabled a large number of applications. These applications typically deal with 360° content. However, a large amount of existing content is available in terms of images and videos i.e 2D content. Unfortunately, these do not react to the viewer's position or motion when viewed through a VR HMD. Thus in this work, we propose reactive displays for VR which instigate a feeling of discovery while exploring 2D content. We create this by taking into account user's position and motion to compute homography based mappings that adapt the 2D content and re-project it onto the display. This allows the viewer to obtain a more richer experience of interacting with 2D content similar to the effect of viewing through the window at a scene. We also provide a VR interface that uses a constrained set of reactive displays to easily browse through 360° content. The proposed interface tackles the problem of nausea caused by existing interfaces like photospheres by providing a natural room-like intermediate interface before changing 360° content. We perform user studies to evaluate both of our interfaces. The results show that the proposed reactive display interfaces are indeed beneficial.

Thumbnail of On the Development of a Dynamic Virtual Reality System using Audio and Visual Scenes
On the Development of a Dynamic Virtual Reality System using Audio and Visual Scenes

Sandeep Reddy, G S S Srinivas Rao, and Rajesh M Hegde
IEEE NCC 2016, IIT Guwhati 2016

Abstract, DOI, Paper

Virtual reality systems have been widely used in many popular and diverse applications including education and gaming. However, development of a dynamic virtual reality system which combines both audio and visual scenes has hitherto not been investigated. In this work a dynamic virtual reality system which synchronizes both audio and visual information is developed. Realtime audio and visual information is obtained from a spherical audio visual camera with 64 microphones and 5 cameras. Subsequently, a head mounted display application is designed to render spherical video. A three dimensional sound rendering algorithm using head related transfer functions is developed. Finally, a virtual reality system that combines both spherical audio and video is realized. The head position of the user is also integrated into this system adaptively to make the system dynamic. Both subjective and objective evaluations of the proposed virtual reality system indicate its significance.

Thumbnail of High Accuracy Optical Flow based Future Image Predictor Model
High Accuracy Optical Flow based Future Image Predictor Model

Nishchal K. Verma, Dhekane Eeshan Gunesh, G. S. S. Srinivas Rao, and Aakansha Mishra
IEEE AIPR 2015, Washington DC 2015

Abstract, DOI, Paper

In this paper, High Accuracy Optical Flow (HAOF) based future image frames generator model is proposed. The aim of this work is to develop a framework which is capable of predicting the future image frames for any given sequence of images. The requirement is to predict large number of image frames with better clarity and better accuracy. In the first step, the vertical and horizontal components of flow velocities of the intensities at each pixel positions are estimated using High Accuracy Optical Flow (HAOF) algorithm. The estimated flow velocities in all the image frames at all the pixel positions are then modeled using separate Artificial Neural Networks (ANN). The trained models are used to predict the flow velocities of intensities at all the pixel positions in the future image frames. The intensities at all the pixel positions are mapped to new positions by using the velocities predicted by the model. The concept of Bilinear Interpolation is used to obtain predicted images from the new positions of intensities. The quality of the predicted image frames is evaluated by using Canny Edge Detection based Image Comparison Metric (CIM) and Mean Structural Similarity Index Measure (MSSIM). The predictor model is simulated by applying it on the two image sequences-an image sequence of a fighter jet landing over the navy deck, and another image sequence of a train moving on a bridge. The proposed framework is found to give promising results with better clarity and better accuracy.

Talks

10/14/17
ARCore and Tango with the Unity SDK

Google Developer Groups, Brussels | Event Page, Video

06/25/16
Project Tango - Learn and Touch

Google Developer Groups, Brussels | Slides, Video

06/11/16
Introduction to Google's Project Tango

Stuttgart VR & AR Meetup, Stuttgart, Germany | Event Page, Slides, Featured in Stuttgarter Zeitung

05/18/16
Introduction to Google's Project Tango

Google Developers Group, Nice, France | Event Page

Contact

sairao1996@gmail.com

Other: gsssrao@iitk.ac.in, srao@fyusion.com