Television used to be such a passive experience. Lean back on your cushions and let the drama flood your brain. Not so anymore. From the red button, to SMS voting, to third screen apps like zeebox, it seems the trend is to pass control from the hidden directors and writers behind the scenes to us, the audience.
At a demonstration of a new suite of TV technologies yesterday I got some insight into where this trend might be going.
The FascinatE Project is one of those weird international collaborations between public and private that could probably only happen in the EU (it is funded to the tune of EURO9.5m by the EU). It involves technology companies like Alcatel Lucent, broadcasters like the BBC, research organisations like Fraunhofer (creators of the ubiquitous MP3 music format), and universities like Salford, location for yesterday’s presentation.
FascinatE (a slightly tortured acronym accounts for that capitalised final letter) aims to provide a more interactive experience for users watching live events — concerts, dramas, sports etc. The technology it uses to does this comes in four main parts.
If you want to let users direct their own viewing, then you need to capture more of what’s going on at an event. That means more cameras and more microphones. But the ability to switch between a series of fixed shots — where the focus is still in the control of a camera operator or director — is a bit old school. Why not capture EVERYTHING and then allow users to choose their own virtual camera shot?
To do this FascinatE uses a new panoramic camera created by Fraunhofer. Featuring 10 Indiecam 2K (twice HD) or even ALEXA 4K cameras (in a much larger form factor). Each camera is sited as close as possible to its two neighbours in a ring, pointed upwards towards a continuous circle of mirrors at 45 degrees to the floor. Siting the cameras so close together enables their images to be stitched together in software with minimal distortion, providing a complete, panoramic, high resolution capture of the event.
If you have ever taken a series of photos in a circle and then stitched them together on a computer to make a single panorama, imagine the same process but for live video, and done in real time.
3D sound is captured by a single MH Acoustics Eigenmike — essentially 32 microphones in a single sphere, giving a complete picture of the soundfield in three dimensions. The 3D sound is tied to the view: focus on the drummer and the drum sounds come up in the mix. Pan to the left and the drum sounds get louder on your right.
Having just a single mike and camera to do complete capture of the event makes for a potentially very portable and easy to set up package. Although in the demonstration and likely in reality, these are supplemented with mics on all the main sound sources (instruments, singers etc) and standard cameras for alternative views.
With the current technology you need these extra cameras, particularly for large scale events like football. The lack of focal length control on the panoramic camera means that there’s some loss of resolution when zoomed right in, even when shooting at 2K.
When you’re shooting with that many cameras at high definition, you’re going to be storing and more importantly shipping a lot of data. Since we’re talking about live events, the first issue is getting the raw video — 20Gbps of it using the 2K camera setup — to somewhere you can process it. This is just a question of raw bandwidth.
The next issue is getting it out to consumers across all of the many devices they may want to use: TV, smartphone, tablet, etc. You obviously can’t ship they everything, so how do you ensure they can choose exactly which view they want to see?
Alcatel Lucent tackled this problem and came up with a way of breaking the enormous 20x2K picture into a grid of smaller ‘tiles’. Rather than shipping the whole picture, the network node serving the video only sends the tiles that feature in the picture the consumer has selected. For example, they might have chosen to focus on the lead singer at a gig. Only the tiles that feature the singer and his surrounds need to be sent, compressed, to the user device.
This is clever but it doesn’t scale too well: what if 10,000 people choose the same view? You’re streaming the same four tiles 10,000 times — tough on bandwidth and your servers. Here the server’s intelligence kicks in and switches from unicast (tiles that only one person wants) to broadcast (popular tiles that are shared) ensuring it minimises server load and bandwidth.
So you’ve got huge amounts of video and three dimensional sound. How do you choose what view to watch?
Did you know there was an established ‘grammar’ for cinematography? Nor did I until yesterday. But like the ‘golden ratio’ seems to define aesthetically pleasing shapes, the grammar of cinematography gives us a set of rules for deciding whether to pan or cut, when, and how fast. Joanneum Research has taken these rules and applied them to the vast amount of data coming in from the cameras, and given a system intelligence to identify people and events on the screen.
The result is essentially a fully, or semi-automated director, able to process vast amounts of data into a watchable programme, with or without the input of human directors either in the studio or in the home.
Robot director or not, none of this clever technology is going to be of any value without an engaging user experience. And the demo we saw yesterday seems to have that. Via tablet, smartphone or TV the ability to pan around the performance space where students from Salford were performing a series of dance pieces was slick and seamless (barring a temporary loss of sound — no demonstration is ever going to be perfect). Swiping your finger across the tablet resulted in the video panning smoothly and responsively. Admittedly the performance was only in the next room, but the way the system is designed, network latency shouldn’t be too much of an issue.
Where it got really exciting was with the gesture-based interface. You can pan and zoom just by waving your hands in the air. Increase the volume by cupping your hand to your ear, lower it with a finger to your lips.
Imagine coupling the display to an immersive headset and having the video pan with your head movements. That would be an incredible way to remotely experience a gig or event.
According to one BBC member of the project, they are considering using some of this technology for events in 2014. Certainly it didn’t seem far from consumer-ready as a technology suite, not withstanding the usual levels of rigorous testing required for broadcast equipment. My guess is that it will be small scale tests next year before wider roll-out the following year. But you can easily see the relevant technology being added to the iPlayer client to support this sort of broadcast.