Home > About Fuji Xerox > Technology> New Technologies for Valuable Communication > Computer-human Interaction Technology > Automatic Video Camera Selection Technology

Automatic Video Camera Selection Technology

Along with widespread use of the internet and increases in communication speed, the video streaming of events and university lectures has become more common than ever before. In events and lectures, speakers rarely stand still and tend to move about freely, such as walking toward a screen to point out material displayed on the screen, and using a whiteboard for further explanation. Thus, it is difficult to record every move made by speakers on video, and even if we could, editing the video would be time-consuming. Fuji Xerox has developed Video Camera Auto Selection Technology that automatically detects a speaker to capture and indexes the recorded scenes so that people can select which scene to watch.

Auto switching of cameras

Fig. 1: Eight cameras installed in a room and the method of selecting a camera

Eight cameras (C0 to C7) are installed in a room in such a way that any location in that room cam be recorded by at least two cameras (Fig. 1). After one of the cameras is initially selected, our technology to determine the positions of people and detect a speaker is used to determine the position and height of the speaker in order to check whether the speaker is within the camera's shooting range. For each camera, the angle formed by the vector connecting the speaker and the center of the room and the vector connecting a camera and the center of the room is checked. The speaker can thus be recorded almost from the front (C1 in Fig. 1) by selecting a camera with an angle close to 180 degrees.

Creating a video image of a speaker

Fig. 2: Creating a cropped image of a speaker

The position of a speaker is determined in the image recorded by camera C0 (Fig. 2 (a)). The area of the speaker to be cropped from the image (Fig. 2 (a)) is measured with perspective matrix transformation under the presumption that the width of a person is approximately 1 m. Then the cropped image of the speaker is created (Fig. 2 (b)).

Fig. 3: Switching cameras when other people occlude a speaker

In an actual meeting, a speaker may be occluded by other people located between the speaker and a camera (Fig. 3 (a)). In such case, a camera that can fully capture the speaker is selected based on positional relationships between all the participants and cameras (Fig. 3 (b)).

Indexed display of a video of a meeting on the communication viewer

Fig. 4: Communication viewer

The right pane of the communication viewer (Fig. 4) displays thumbnail images of a speaker according to the timeline, thus allowing users to know the switching of speakers at a glance. Video indexing allows the display of thumbnail images based on such events as the switching of speakers, changing of slides, and writing of information on a whiteboard. Selecting a thumbnail image plays the video of the meeting, starting from the moment shown in the thumbnail image.

Case example