Keio University

Seeing Things from a Different Perspective

Publish: September 10, 2018

If you search for "changing your perspective," you will find that its importance is discussed through various examples. It has even become the subject of collections of famous quotes and maxims. Most of these are like life lessons, suggesting that by changing the way you think about or perceive things, you can gain new ideas and solutions.

In information and computer science, particularly in my specialty of computer vision, a "viewpoint" refers to the camera's position when capturing an image—or more precisely, the center position of the lens used in the camera. As shown in Figure 1, a camera records an image, which is a map of light created by projecting the colors of all light rays passing through the viewpoint onto a two-dimensional plane. This concept is called "perspective projection." It was known as early as ancient times, and by the 14th century, although cameras did not yet exist, this idea of perspective projection was used as a technique to create realistic paintings that looked exactly like what the human eye sees.

Now, what would you see if you were to capture an object or scene from a different viewpoint than the one it was originally shot from? You might think that since it's the same object or scene, you would only see the same thing, but that's not actually the case. Things that were hidden and invisible from one viewpoint can be seen clearly by changing the viewpoint. In other words, just as the collections of quotes and maxims say, when you can't see what you want to see, you should change your perspective and observe. In my laboratory, we use computer vision technology to research video generation techniques that allow for freely changing the viewpoint, as well as their applications.

One example of this is a technology called Diminished Reality. To see what is hidden by an object, this technology virtually erases the occluding object and presents the imagery that should have been visible there, thereby generating and displaying a video as if the obstruction has disappeared. In the example in Figure 2, we are watching a baseball game, but the umpire and catcher are blocking the view of the pitcher's motion and the ball's trajectory. To address this, the umpire and catcher are removed, and the pitching motion, which was hidden from view, is displayed by transforming the viewpoint of video captured from the first-base or third-base side to fill in that area. Furthermore, in the example shown in Figure 3, during a task using a drill where the drill occludes the work object, a camera capturing the scene from another viewpoint is used to present the occluded area to the operator.

This technology is not simply about capturing the same phenomenon from multiple different viewpoints and presenting them to an observer. Instead, it is a video presentation method that transforms the video to the viewpoint the observer wishes to see, thereby simultaneously showing information that is not directly visible to the observer.

A key elemental technology here is viewpoint transformation technology. As explained earlier, a camera records the colors of light rays passing through the "viewpoint," which is the center of the lens. Therefore, to transform the viewpoint, it is necessary to know the colors of all light rays that would pass through the "new viewpoint." This is where "3D geometry theory," which has been studied in the field of computer vision, comes into play. Using this theory, we estimate the 3D position from which the light ray corresponding to each pixel captured by the camera was emitted. Assuming that light rays of the same color are radiated in all directions from that point, we then synthesize the image that would be captured from the "new viewpoint" through "perspective projection."

In recent years, with the spread of smartphones, everyone has begun to take pictures in various places. At tourist spots and sports venues where many people gather, it is not uncommon for numerous individuals to simultaneously photograph the same building or event from different viewpoints. Furthermore, for the purpose of recording collaborative work among multiple people, small cameras are sometimes attached to the workers' heads to capture the process. By taking this vast amount of video data captured from many viewpoints and observing it while freely setting and transforming it to any desired viewpoint, it becomes possible to obtain a wealth of information that cannot be gained simply by replaying the original footage.

As I wrote at the beginning, people often try to solve problems by "changing their perspective," and in doing so, they frequently seek the opinions of others. This is because each person observes and thinks about things from a different viewpoint. In the same way, I believe that the scope of applications for technology that uses a multitude of videos of the same phenomenon from different viewpoints to present a new way of seeing will continue to expand in the future.

Figure 1: Conceptual diagram of perspective projection
Figure 2: Example of a Diminished Reality display for baseball footage
Figure 3: Example of a Diminished Reality display presenting an area occluded from the operator's viewpoint

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Showing item 1 of 3.

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Showing item 1 of 3.