The Tribune, Chandigarh, India

A race to fingerprint the human voice
Mark Piesing

YOU do not have to say anything, but it may harm your defence if you do not mention, when questioned, something you later rely on in court. Anything you do say may be given in evidence.” We have all heard this 1,000 times yet we barely give a thought as to what may happen to all the recordings that the police make of their interviews. Or indeed to the somewhat more mundane equivalent: “This call may be recorded for training purposes.” However, without your permission — or even your knowledge — your recorded voice may be about to play a key role in the race to fingerprint the human voice.

Fuelled by 9/11, spurred on by the advance of digital society and made possible by raw computing power, the development of increasingly sophisticated automated speaker recognition systems (ASRS) are now bringing the prospect of a “voiceprint” enticingly close, threatening to make the skilled voice scientist redundant. These automated systems, already widely used by police and intelligence services on the Continent, can in as little as 15 minutes use a background population of voices to make a statistical judgement on the significance of any similarity or difference between the voice of the criminal and that of a suspect that could have taken a human 15 hours to complete.

“September 11 was the trigger for this as, after the attacks, the police and intelligence services realised that while there were so many recordings of the voices of terrorists, they didn’t have the technology they needed to extract information from them,” Antonio Moreno says. Moreno is the technical director of Agnitio Corp, which was spun out of the Technical University of Madrid in 2004 and provides forensic automated speaker recognition systems, such as its market-leading Batvox, to the police forces of more than 20 countries, including Germany and the US but not yet the UK.

“By the time of Spain’s own 9/11 [the Madrid train bombings of March 2004], Batvox could be used to identify some of the men behind the bombings as, although they wore masks on YouTube, they spoke naturally.”

For Prof. Peter French, founder of the UK’s leading and oldest forensic speech laboratory, JP French Associates, the bugging, recording and identification of people traffickers, drug dealers and terrorists was only the beginning of this revolution.

“The ubiquity of mobile phones means that almost the first thing you do if you are attacked is call 999, and as all 999 calls are recorded a lot of people inadvertently record their rape or mugging and capture their attackers’ voices,” French says.

Now, though, the “great quest” is to fingerprint the human voice and “many engineers keep telling me that all they need is more time to tweak the algorithms and they can achieve full accuracy”, French says. Francis Nolan, Professor of Phonetics in the Department of Theoretical and Applied Linguistics at the University of Cambridge, agrees that the balance is shifting towards these automated systems due to technical advances that have made them possible. The importance of speaker identification has grown for the simple reason that “it’s not the amount or nature of crime that has changed, it’s just the sheer amount of recorded material that is now available”.

Nolan adds that “while on the Continent the police are more likely to use an automated system, in the UK the tradition has been to use a skilled dialectician”, who would analyse one at a time the sound of the vowels and the even the rise and fall of the voice, its melody, through a complicated system of notation called the International Phonetic Alphabet.

Later, acoustic tests were introduced that allowed the dialectologist to measure the different elements of the speech signal and so extract information that was beyond the ability of humans to hear.

Now, Nolan says, “we are beginning to augment the human element still further through the introduction of automated systems such as Batvox”, which by analysing the speech signal analyses the characteristics of each human’s vocal tract and comes up with a statistical model that can compare an unknown voice against voices coming from known speakers regardless of what they are saying. Batvox, for example, then produces a likelihood ratio, much like a DNA profile does, to suggest how significant such a match is. The system depends on a reference population of hundreds of human voices from which to learn what is the norm. For Nolan, while these ASRs are “an extra tool” in the specialists’ tool box, “ there is a real danger that these systems hide from a jury the implications of the “complexities of the human voice and language”.

However, while French acknowledges that they are “unlikely ever to do away with the human altogether”, he argues that automated systems “are improving in their accuracy and objectivity and that there is some resistance to their use in the UK” that is preventing their wider adoption beyond labs such as French Associates.

“Systems such as Batvox provide centre-stage forensic evidence in court, even if just like other forensic tools it’s not ‘beyond reasonable doubt’. It can get near-100 per cent accurate at the moment depending on the quality and amount of speech input.” Even fingerprinting, he adds, “isn’t as reliable as it is portrayed on CSI.” Similarly, Moreno feels the technology is improving and the reliability of other forensics tools is overstated. “Fingerprints are great in the lab but in the actual crime scene are often blurred or incomplete. Phone calls now are pretty stable,” he says. “So it’s comparable in its accuracy to many other tools, although the main problem is that at the moment there aren’t enough voices in the databases.” Some financial institutions are now building databases of suspect fraudsters’ voices.

Perhaps the main problem these systems face is, apart from background noise, the length of the call. “Blackmail or kidnapping calls may only last five or six secs,” Moreno says. “And you need six or seven seconds for an accurate result. Although even in these cases if the system can identify to three or four other kidnappings then it is a great help to the police.”

In the end, for French, even though many engineers believe they can reach the holy grail of fingerprinting the human voice, “if it turns out that people’s vocal tracts and speech-producing organs don’t differ enough biologically from each other then there will be a limit to how accurate systems will be”.

Nolan feels more strongly that “an individual doesn’t have a voice, but many voices” so that a human specialist is always going to be needed to make a judgement. — The Independent

Chip converts snapshots into perfect pictures

A cutting-edge chip can instantly convert your smartphone snapshots into more realistic, professional-looking ones, an American study says.

Built by the MIT’s Microsystems Technology Lab, the chip can instantly create a more realistic or enhanced lighting in a shot without destroying its ambience. The technology could be integrated with any smartphone, tablet computer or digital camera.

Existing computational photography systems tend to be software applications that are installed onto cameras and smartphones, said Rahul Rithe, graduate student in MIT’s Department of Electrical Engineering and Computer Science, who led the project.

However, such systems consume substantial power, take a considerable amount of time to run, and require a fair amount of knowledge on the part of the user, according to a MIT statement.

“We wanted to build a single chip that could perform multiple operations, consume significantly less power compared to doing the same job in software, and do it all in real time,” Rithe said. — IANS

THIS UNIVERSE
PROF YASH PAL

What is the building element of energy?

I am more than a little surprised by your question, because it can be both trivial and philosophically deep. Energy is not a thing that you can carry in your arm or pocket. It does not have colour, or texture. But it can be measured; it can be positive or negative with respect to some measure. It is usually defined as capability to do work. When I reached this point, my grandson pointed out that there are lots of good answers available on the Internet, and he went on to push some tabs. I soon discovered that almost the same words lead you to very different worlds. I could not say that most of them were false and trivial, but I could not help feeling that we might be wasting lot of good thinking and energy — yes, energy — doing nothing, no work either. I think that you would have got the meaning of my first sentence. This is turning out to be silly answer to a rather meaningless (or meaningful?) question.

Readers can e-mail questions to Prof Yash Pal at palyash.pal@gmail.com

Trends

The Space Exploration Technologies, or SpaceX, Dragon spacecraft stands inside a processing hangar at Cape Canaveral Air Force Station in Florida in this undated picture. NASA and its international partners are targeting March 1, 2013, as the launch date for the next cargo resupply flight to the International Space Station by SpaceX. SpaceX’s Dragon capsule will be filled with about 1,200 pounds of supplies for the space station crew and experiments being conducted aboard the orbiting laboratory. — Reuters/NASA handout

Mini planet found far beyond Earth’s solar system

CAPE CANAVERAL, Florida: Astronomers have found a mini planet beyond our solar system that is the smallest of more than 800 extra-solar planets discovered, scientists said. The planet, known as Kepler-37b, is one of three circling a yellow star similar to the Sun that is located in the constellation Lyra, about 210 light years away. One light year is about 10 trillion km. Kepler-37b as well as two sibling planets were discovered with a NASA space telescope of the same name, which studies light from about 1,50,000 sun-like stars.

NASA Mars rover ready to analyse rock powder

CAPE CANAVERAL, Florida: NASA’s Mars rover Curiosity, dispatched to learn if the planet ever had ingredients for life, drilled its first bit of powder from inside a potentially water-formed ancient rock, scientists said. The robotic geology station, which landed inside a giant impact basin on August 6 for a two-year mission, transferred about a tablespoon of rock powder from its drill into a scoop, pictures relayed by the rover Wednesday showed.

Subatomic calculations indicate finite lifespan for universe

BOSTON: Scientists are still sorting out the details of last year’s discovery of the Higgs boson particle, but add up the numbers and it’s not looking good for the future of the universe, scientists said. “If you use all the physics that we know now and you do what you think is a straightforward calculation, it’s bad news,” Joseph Lykken, a theoretical physicist with the Fermi National Accelerator Laboratory in Batavia, Illinois, said. “It may be that the universe we live in is inherently unstable and at some point billions of years from now it’s all going to get wiped out,” said Lykken.

Asteroid may have killed dinosaurs quicker than scientists thought

CAPE CANAVERAL, Florida: Dinosaurs died off about 33,000 years after an asteroid hit the Earth, much sooner than scientists had believed, and the asteroid may not have been the sole cause of extinction, according to a study. Earth’s climate may have been at a tipping point when a massive asteroid smashed into what is now Mexico’s Yucatan Peninsula and triggered cooling temperatures that wiped out the dinosaurs, researchers said. — Reuters