The shape of IT services has changed significantly in just the last few years, as the biggest company in office hardware and software has engaged in an increasingly integrated pivot towards artificial intelligence.

Whilst Windows 10 had the largely optional use of Cortana, its follow-up Copilot has been increasingly integrated in non-optional ways into the Windows 11 user experience, to the point that Microsoft Office has even been renamed 365 Copilot to symbolise the integration.

In a recent podcast discussion between Windows and Devices head Pavan Davuluri and Microsoft’s AI Product Manager Christiaan Brinkhoff, the pair hinted that the future of IT would not use a mouse and keyboard but instead be primarily voice-driven.

This is a promise that has been made countless times before, but given some of the questionable decisions and unfortunate security issues that have emerged from Copilot’s integration, there is no reason to believe that Windows 12 or 13 would not be primarily voice-driven.

The problem with how this would work for most businesses in practice is less about technological limitations and more about its practical use and the security implications it could cause in practice.

To understand why, it is worth taking a quick look at previous attempts at voice-driven computing, and then examining why the keyboard and mouse are more likely to stick around than the futurist vision at Microsoft would suggest.

The History Of The Talking Computer

The broad concept of voice computing, or a collection of systems where the human voice is used to operate computers, has historically been seen as the endpoint of computer interfaces, as conventional wisdom suggests that asking a computer to do a task like you would another human being is the most intuitive way of operating it.

The earliest attempts at speech machines predate computers, and are believed to have been attempted by Wolfgang von Kempelen, the inventor of the Mechanical Turk, an alleged early artificial chess player that turned out to be an impressively elaborate hoax.

Unlike his chess-playing con, the speaking machine took him 20 years to make and consisted of a bellows-like contraption capable of making human speech sounds, although it did require a human operator to play it like a musical instrument.

The first computer to recognise human speech rather than simply record or produce it was Bell Labs’ AUDREY computer in 1952. It could recognise ten sounds, which were all ten single digits as spoken by its inventor. It had an accuracy rate of 90 per cent.

A decade later, Shoebox by IBM could recognise 16 words, although ten of them were the same zero-to-nine digits as before.

Another decade later, Hearsay-I was the first system able to transcribe general speech as opposed to single words, and by 1976, Harpy was able to recognise 1,000 words.

Function And Frustration

Voice recognition improved substantially throughout the 1980s, with projects such as the Xanadu futurist house and the Halcyon video game console showing that voice recognition was possible in theory, even if in practice it was still too limited for the home.

The biggest issue at the time was processing delays, which made it far slower and far more frustrating than simply using a remote control, keyboard or mouse.

This was an issue even in the mid-2000s. The 2004 PlayStation 2 video game Lifeline (Operator’s Side in Japan) relied exclusively on voice controls that ultimately made the action-packed gameplay almost impossible due to a mix of poor recognition of accents and slow processing of commands.

By the end of the decade, voice interfaces would both improve in accuracy and become more accessible, beginning with Apple’s Siri system in 2011.

Is Voice The Best Way To Use A Computer?

Much of the push for voice-driven computing is predicated on an idea rooted largely in science fiction that a voice-based interface is the most intuitive way to use any electronic device. Rick Dyer, inventor of the failed Halcyon, noted that whilst not everyone knew how to type, everyone knew how to talk.

For consumer electronics, this has demonstrably found a market through devices such as smart speakers and televisions, but would Microsoft going all-in on voice-driven computers actually be the best option for businesses?

Voice recognition has gotten faster, but it is still relatively slow compared to a fast typist or simply using the mouse. Asking a computer to “open a new Word document” takes longer than simply clicking the icon. 

Similarly, whilst the accuracy of large language models in general and voice recognition tools in specific has improved, it is not perfect. Whilst this is fine when using a smart device at home, it can lead to time-consuming corrections, especially in a hypothetical future without a keyboard.

Finally, there are the privacy implications. Whilst the data harvesting nature of voice-recognition devices is not exactly new, it could potentially lead to a costly data breach. 

Notwithstanding the technology itself, having to dictate documents, code or accounting figures rather than using a keyboard creates the risk that people will simply overhear sensitive information.

This means that for some offices at least, office desks will retain a mouse and keyboard for the foreseeable future.

Main Logo
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.