A LOOK AT THE FUTURE OF HANDS-FREE COMPUTING
Computer use has been a tactile experience for decades. Despite huge increases in computingpower,
the two fundamental peripherals – keyboard and mouse – have remained.
Using a computer means sitting down at a desk or table and flirting with
carpal tunnel. Don’t want to do that? Too bad!
Apple’s
iPhone, and the plethora of touchscreen devices that have followed it
have started to change the experience from the bottom up. Our smallest
devices now are the most innovative while our desktops remain static and
dull.
This
newly founded hierarchy may not be permanent, however, as technologies
have begun to mature and converge not just on smartphones and tablets
but also laptops, desktops, and home consoles. The hands-free future
dreamed by science fiction is inching its way towards reality.
Speech
Of
all the hands-free technologies that may empower the user of tomorrow,
speech is the only one that has already arrived. Apple’s Siri and its
Android competitor, Google Now, can already be found on millions of smartphones. These services are joined by numerous competitors like Dragon Systems,
a company that makes adaptive text-to-speech software for PC and
Android; and even Intel intends to get in on the action through
partnerships and its Perceptual Computing SDK.
Mooly Eden, one of Intel’s most visionary executives, claimed at CES
2013 that “Voice could overtake touch in as little as five years.”
The
resources of a company like Intel shouldn’t be taken lightly, but there
are problems that Eden doesn’t address. Accuracy is the most serious.
Despite years of research, text-to-speech remains troublesome. A recent test of Siri found
that its comprehension of words spoken while used on a city street
stood at 83 percent. If that figure seems acceptable, think again.
Comprehension issues lead Siri to accurately answer only 62 percent of
queries.
These numbers are not much different from those recorded in a 2001 accuracy test of
commercially available voice recognition software. Why the lack of
progress? Well, the problem is difficult – and it’s not based on
computer power (despite what Intel may say). The latest research
suggests that current speech recognition technology is doomed to fail
because it relies entirely on audible data but pays little attention to
context. A 2009 paper co-written
by several experts, including Dragon Systems co-founder Janet M. Baker,
states that “If we want our systems to be more powerful and to
understand the nature of speech itself, we must collect and label more
of it.”
Put
simply, current computers can hear us, but they have no idea what we’re
saying. They piece together words and then, in the case of Siri or
Google Now, input those words into a search engine as text. Apple claims
that “Siri not only understands what you say, it’s smart enough to know
what you mean.” To us, Siri’s just a glorified search history. The
software can create context by remembering past voice input and other
data, such as your location, but it gleans little context from the words
alone.
Does
this mean speech is doomed? Not exactly – but speech doesn’t have the
lead it appears to enjoy at first glance. Speech will continue to be a
secondary interface for many devices, and it may increasingly find its
way into the home via game consoles and PC peripherals. A lot of work is
needed to give speech recognition the accuracy users expect.
Facial recognition and eye tracking
If
speech is the faltering veteran, eye-tracking is the excitable young
upstart bright-eyed and ready to change the world. Companies have only
begun to take notice of the concept within the last half-decade, yet its
potential is exciting.
We sampled the cutting edge of consumer eye-tracking technology at Tobii’s CES booth this year. The Swedish start-up, founded in 2007, has spent the last five years perfecting an eye-tracking peripheral that can be used with any desktop or laptop.
The
company’s demo had us select items with a stare, shoot asteroids with a
glance, and scroll through documents without touching a mouse. All of
these demonstrations worked with nearly perfect precision on the show
floor while others passed behind us or leaned over to see what we were
doing. Tobii didn’t skip a beat – and if it can handle the CES show
floor, it should have no problems at home.
Eye-tracking
works well not just because of accuracy, but also because of
compatibility. Our eyes may wander across a screen, but our gaze always
rests on a point of focus. That point is specific, so it can be used to
activate interface elements as if it were a mouse cursor. Existing user
interfaces work well with this technology, and that means it’s easier to
adopt. Humanity doesn’t have to throw out decades of UI design and
start from scratch.
This
technology could also change how we view content. While monitors and
televisions are very large, our eyes can only see fine detail within a
five-degree angle around our point of focus. Objects beyond are less
defined. Current computer displays don’t take this to their advantage.
They instead display all content with fine detail.
Researchers working for Microsoft have discovered how to take advantage of our deficiency with a technique called Foveated Rendering.
The researchers asked users to view a 3D scene that only provided full
detail at the user’s point of focus while the background was rendered at
a lower resolution. Out of 76 participants, 71 gave image quality a 4
or 5-star rating. That’s an impressive result, and it wasn’t achieved
with laboratory equipment. The eye-tracking hardware was Tobii’s X50, an
older version of the company’s hardware, and the monitor was a 24-inch
1080p display from LG.
The
implications are significant. Despite leaps in CPU and GPU performance,
hardware remains a constraint. New 4K displays have a difficult path to
adoption because computers and game consoles struggle to handle that
many pixels. If computers didn’t have to render all the pixels in
detail, however, resolution, image quality and display size would become
nearly limitless.
Motion tracking
Hands-free
motion tracking has already infiltrated the living room via Microsoft’s
Kinect, though in a limited way. The peripheral has tried hard to
enable hands-free interaction and was shipped with support for both
motion and speech control of the Xbox 360. To date, 24 million Kinect
sensors have been sold, which means about one in three Xbox 360 owners
have sprung for one.
Though
Kinect is innovative, its adoption hasn’t been without issue. The
device has trouble processing quick or subtle movement and can only
function in large spaces. Yes, users can scroll or zoom content via Kinect with a swiping motion,
but that motion must be slow and exaggerated. No one wants to wave
their arms around just to scroll to another picture. To catch on, motion
tracking must be more subtle. As we saw at CES, Leap Motion has showed some promise, but it only works in a limited area.
These
problems don’t imply that motion tracking is hopeless. Unlike speech
recognition, which faces fundamental issues, motion detection is well
understood. Microsoft has researched the area with the focus of a mad
scientist and has come up with solutions that range from animation of household objects to real-time, 3D tracking of movement within a room.
Hardware,
not theory, is the obstacle. Motion tracking cameras work by running
algorithms on frames to determine movement. Accuracy and speed can be
improved by increasing resolution and framerate, but this increases
processor demand. The Kinect’s resolution is 640 x 480 and it only
captures at 30 frames per second. Improving these specifications to 720p
and 60 frames per second increases the data that must be processed
six-fold.
Intel has addressed this as
another branch of its perceptual computing initiative, and here the
company has a more convincing argument. Unlike speech recognition,
motion tracking is a computer problem. Microsoft’s original Kinect
tackles it with a processor built into the camera apparatus itself, and
Intel could tackle it with specific processor features or a
sub-processor in its chipset.
Whatever
the hardware solution, the goal must be quick, accurate processing of
subtle movement within both limited and large spaces. We’re excited to
see what Microsoft announces during its NextBox unveil, as rumors point
to an improved version of Kinect bundled with the console.
Given the huge increase in power between the Xbox 360 and the next-gen
system, whatever it may be, Microsoft’s new console might be the next
leap forward in motion tracking. We have our fingers crossed.
The home computer becomes the home
Put
the pieces together and a vision of the future forms. If users don’t
need to use tactile input to manipulate a computer, a computer doesn’t
need to be an isolated device. Instead, it transforms into a network of
displays and sensors that can be accessed from anywhere in your home.
Imagine
that you’ve returned from a hard day at work. When you return, your
computer notices your entry. Lights automatically turn on,
air-conditioning or heat change to your preference, and any connected
devices you often use come out of sleep.
You’re
hungry, so you head to the kitchen. Motion tracking sensors notice you
and then notice you’re taking pots and pans out of the cupboard. A
display in the kitchen flickers on, displaying your recipe book, which
can be manipulated via hands-free gestures. But you’re just cooking
pasta – no recipe needed for that – so with a voice command you switch
to your personal email, then your social networking feeds, catching up
on what you missed while dinner cooks.
Now,
with dinner served, you head to the couch. This is noticed – the
television comes on with your favorite channel displayed by default. You
want to watch something on Netflix, however, so you open the app with a
voice command and then scroll through selections with a flick of your
fingertips. When you find the movie you want, you fix your gaze on it
and say “open.”
None
of this is beyond the reach of current technology. Synthesizing these
hands-free features into a computing ecosystem is the hard part, and it
will take years, maybe decades. But the foundation is finally in place.
Source : http://www.digitaltrends.com/
0 comments:
Post a Comment