How long will our love affair last with the glitzy record breaking show, Strictly Come Dancing? When will it be replaced in our affections – and by what?Read More
Apple’s big name content signings have caught all the headlines, but its move into curation is much more important for the future of TVRead More
Television used to be such a passive experience. Lean back on your cushions and let the drama flood your brain. Not so anymore. From the red button, to SMS voting, to third screen apps like zeebox, it seems the trend is to pass control from the hidden directors and writers behind the scenes to us, the audience.
At a demonstration of a new suite of TV technologies yesterday I got some insight into where this trend might be going.
The FascinatE Project is one of those weird international collaborations between public and private that could probably only happen in the EU (it is funded to the tune of EURO9.5m by the EU). It involves technology companies like Alcatel Lucent, broadcasters like the BBC, research organisations like Fraunhofer (creators of the ubiquitous MP3 music format), and universities like Salford, location for yesterday’s presentation.
FascinatE (a slightly tortured acronym accounts for that capitalised final letter) aims to provide a more interactive experience for users watching live events — concerts, dramas, sports etc. The technology it uses to does this comes in four main parts.
If you want to let users direct their own viewing, then you need to capture more of what’s going on at an event. That means more cameras and more microphones. But the ability to switch between a series of fixed shots — where the focus is still in the control of a camera operator or director — is a bit old school. Why not capture EVERYTHING and then allow users to choose their own virtual camera shot?
To do this FascinatE uses a new panoramic camera created by Fraunhofer. Featuring 10 Indiecam 2K (twice HD) or even ALEXA 4K cameras (in a much larger form factor). Each camera is sited as close as possible to its two neighbours in a ring, pointed upwards towards a continuous circle of mirrors at 45 degrees to the floor. Siting the cameras so close together enables their images to be stitched together in software with minimal distortion, providing a complete, panoramic, high resolution capture of the event.
If you have ever taken a series of photos in a circle and then stitched them together on a computer to make a single panorama, imagine the same process but for live video, and done in real time.
3D sound is captured by a single MH Acoustics Eigenmike — essentially 32 microphones in a single sphere, giving a complete picture of the soundfield in three dimensions. The 3D sound is tied to the view: focus on the drummer and the drum sounds come up in the mix. Pan to the left and the drum sounds get louder on your right.
Having just a single mike and camera to do complete capture of the event makes for a potentially very portable and easy to set up package. Although in the demonstration and likely in reality, these are supplemented with mics on all the main sound sources (instruments, singers etc) and standard cameras for alternative views.
With the current technology you need these extra cameras, particularly for large scale events like football. The lack of focal length control on the panoramic camera means that there’s some loss of resolution when zoomed right in, even when shooting at 2K.
When you’re shooting with that many cameras at high definition, you’re going to be storing and more importantly shipping a lot of data. Since we’re talking about live events, the first issue is getting the raw video — 20Gbps of it using the 2K camera setup — to somewhere you can process it. This is just a question of raw bandwidth.
The next issue is getting it out to consumers across all of the many devices they may want to use: TV, smartphone, tablet, etc. You obviously can’t ship they everything, so how do you ensure they can choose exactly which view they want to see?
Alcatel Lucent tackled this problem and came up with a way of breaking the enormous 20x2K picture into a grid of smaller ‘tiles’. Rather than shipping the whole picture, the network node serving the video only sends the tiles that feature in the picture the consumer has selected. For example, they might have chosen to focus on the lead singer at a gig. Only the tiles that feature the singer and his surrounds need to be sent, compressed, to the user device.
This is clever but it doesn’t scale too well: what if 10,000 people choose the same view? You’re streaming the same four tiles 10,000 times — tough on bandwidth and your servers. Here the server’s intelligence kicks in and switches from unicast (tiles that only one person wants) to broadcast (popular tiles that are shared) ensuring it minimises server load and bandwidth.
So you’ve got huge amounts of video and three dimensional sound. How do you choose what view to watch?
Did you know there was an established ‘grammar’ for cinematography? Nor did I until yesterday. But like the ‘golden ratio’ seems to define aesthetically pleasing shapes, the grammar of cinematography gives us a set of rules for deciding whether to pan or cut, when, and how fast. Joanneum Research has taken these rules and applied them to the vast amount of data coming in from the cameras, and given a system intelligence to identify people and events on the screen.
The result is essentially a fully, or semi-automated director, able to process vast amounts of data into a watchable programme, with or without the input of human directors either in the studio or in the home.
Robot director or not, none of this clever technology is going to be of any value without an engaging user experience. And the demo we saw yesterday seems to have that. Via tablet, smartphone or TV the ability to pan around the performance space where students from Salford were performing a series of dance pieces was slick and seamless (barring a temporary loss of sound — no demonstration is ever going to be perfect). Swiping your finger across the tablet resulted in the video panning smoothly and responsively. Admittedly the performance was only in the next room, but the way the system is designed, network latency shouldn’t be too much of an issue.
Where it got really exciting was with the gesture-based interface. You can pan and zoom just by waving your hands in the air. Increase the volume by cupping your hand to your ear, lower it with a finger to your lips.
Imagine coupling the display to an immersive headset and having the video pan with your head movements. That would be an incredible way to remotely experience a gig or event.
According to one BBC member of the project, they are considering using some of this technology for events in 2014. Certainly it didn’t seem far from consumer-ready as a technology suite, not withstanding the usual levels of rigorous testing required for broadcast equipment. My guess is that it will be small scale tests next year before wider roll-out the following year. But you can easily see the relevant technology being added to the iPlayer client to support this sort of broadcast.
CES has been the subject of much debate in the tech media the last few days. For the first time I can remember it has not just been geeks salivating at the prospect of more shiny stuff to play with. Some have been genuinely questioning the point of a hardware-focused show in a social, software-driven age.
While some nice gadgets have been announced, there’s no new revolutions appearing that weren’t already in progress. Just about the only major stories from CES so far seem to be about TVs.
This spurred a quick chat I had with BBC Merseyside tonight about growing TVs: with the advent of 84 inch TVs at CES, the presenter wanted to know how big you really need your TV to be. Like any good analyst, I answered his question with a question: what is a television for?
What IS a Television For?
‘Television’ combines two words: ‘tele’ meaning distant and ‘vision’ meaning, well, vision. You can’t say this is inaccurate based on current usage, but the word ‘television’ conjures up very specific ideas for me. Families crowding around a flickering set for appointment viewing like Corrie or the FA Cup final. Dodgy aerials that always needed adjusting. Constant fiddling to get a better picture. That for me was ‘television’. The name describes not just the box in the corner but the programming it carried and the over-the-airwaves means by which that programming was delivered.
The modern television is very different. Appointment viewing is limited to live events and the big reality shows (though even those seem to be declining). Increasingly what we watch through our screens it is not broadcast over the airwaves and it is not watched synchronously with the rest of the nation. It is piped through an internet connection and watched at our leisure. It is interactive content fed from a games console. And increasingly it will be information and applications delivered from the cloud.
For me what was the ‘television’ is really today just another screen. An interactive interface to the morass of applications and content in the cloud that increasingly hosts and defines our day-to-day lives. Less and less will the TV be restricted to video content: more and more it will be a means of accessing calendars, shopping lists, news, games and communications.
The Future Will Be Televised
So what is going to change to enable this?
One interesting development at CES was the multi-user TV, that enables two people to simultaneously watch different programmes on the same TV in high definition. Today this uses glasses but you can imagine some form of micro-mirror based system that enables pixels to be restricted to a narrow field of vision focused on individuals whose head position is tracked around the room.
Motion and voice control is already here in high-end TVs, and combined with the smartphone and tablet these herald the end of the remote control. Not a decade too soon either. Finer gesture control will give us the slickness of tablet-style touch on wall-sized screens.
And these are very much a reality. Once manufacturers nail down how to mass print OLED screens on flexible substrates, 84in screens will fast become normal and even small. Why not have a fully interactive wall if it can be shipped like wallpaper, doubles as room lighting and costs little to run?
Content will not come over the airwaves. Why have a dedicated chunk of the spectrum devoted to TV when any kind of content can be delivered more efficiently over the internet? The only kind of aerial you may find on a TV will be for Wi-Fi or whatever has replaced it.
None of this is far away. Around ten years ago I bought myself a top of the range TV. One of the last CRT models, the Panasonic TX-36PD30. Plasma panels were available but they were expensive and didn’t yet deliver the best picture. It had a 36in screen and a list price of over £2000. But by today’s standards it was a relic: enormous body and bezel, small screen, all analogue connections, and totally dumb compared to the smart, svelte, internet-connected digital panels that seem to grace most homes now. Jump ten or twenty years into the future and the ‘smart’ TVs of today will look equally Neanderthal.