Or, use techniques such as those in the article, such as random keypresses played during the actual ones.
For example, a Twitch streamer enters responses into their stream-chat with a live mic. Later, the streamer enters their Twitch password. Someone employing this technique could reasonably be able to learn the audio from the first scenario, and apply the findings in the second scenario.
Or possibly the exact opposite of that, I can't tell if it's a one-to-one mapping on mobile: https://www2.b3ta.com/buffyswear/
(Also, I'm feeling my age now, given how many years have elapsed since that kind of thing passed for internet culture…)
The goal is to cause the eavesdropper to totally reevaluate their life choices, and maybe even get caught up in the story.
I’m sure it depends on the application to some extent. I can type my pin in without looking at all, so I can cover it up while doing it. If I had to hunt and peck, it’d easier for an onlooker to observe my slower motions I think.
But if I used the same machine often enough to produce wear specific to me, this randomization would be really useful.
So, in case that “what” was intended to denote some confusion, there is the most likely source.
Do we need something similar for microphones too?
If you cared enough about the authentication in the first place to bother with 2FA, then I guess it seems like the reduction there is still something to be worried about, right?
Lots of “two factor authentication” schemes seem to involve just getting a text or something, so, not very secure at all. Of course, this is bad 2FA, but it is popular.
Otherwise, you leave behind grease where your fingers touched
https://www.theverge.com/23810061/zenith-space-command-remot...
Yes, that's also obscurity, but obscurity is actually good - it only got a (deservedly) bad reputation from when it gets used as a substitute (but I fail to see how using a nonstandard keyboard layout would even count as obscurity in the context of an audio attack, as the clear text reference would surely go through the same layout?)
"...passwords, discussions, messages, or other sensitive information..."
Is it more of a physical fingerprint of each key, such that if you swapped keys/springs the model would need to be updated? So it's produced by manufacturing inconsistencies, the way individual typewriters used to be forensically identified?
Or is more each key being identical, but producing a different resonance pattern within the keyboard/laptop due to the shape of all of the matter surrounding it? If you move the keyboard in the room, do you have to re-train the model?
I also wonder how much it varies depending on how hard you press each key -- not at all or a great deal? And what about by keyboard -- when you compare thin MacBook keys with an external full-height keyboard, is one easier/harder to recognize each key on than the other?
I'm pretty sure Zoom does this by default as part of its noise cancellation (it's potentially even easier since you can use keydown events to help identify, not just the audio stream).
So as long as basic default noise cancellation is on, that would at least prevent this over regular videoconferencing. And because of this, I'm having a hard time thinking of when else this would be a realistic threat, where the attacker wouldn't already have enough physical access to either install a regular keylogger or else a hidden camera.
A couple years ago for a weekend project I made a simple "audio-mnist" dataset from handwritten digit audio recordings. I never got past a few days worth of work, but open-sourcing it has been on my mind for a minute. This post kicked me into action. Getting some more data, basic CNN examples, etc. could provide a nice starting point for a lot of research and tools.
There is still separate code I'd have to find and make intelligible to create the recordings and split the audio.
Anyway, in case anyone finds part of this process interesting or useful.
Asking for “what signal it is detecting” might be better asked from a “what is the greatest signal bearing information” being used… which would help in averting attacks.
This kind of stuff could be real menacing in all sorts of public places like airports, coffee shops and etc.
https://en.m.wikipedia.org/wiki/Tempest_(codename)
TEMPEST considered almost everything from electromagnetic leakage to exactly the attack described here.
Even if you flip a few letters from something like the above a human attacker will easily be able to fix it manually.
"horswstaplevatterucorrect" for example is still intelligible.
Offline you need the database which isn't public.
Online you usually need something else on new machines to get at the true master password.
Keystrokes should only be a problem when noise suppression is set to low/off, which you want to do for e.g. playing music.
But noise suppression is applied to sending audio, not receiving it. So you might need to tell your coworkers to re-enable their noise suppression.
You don’t need to guess every character.
I've always been a bit fascinated by this attack vector and wondered if would get to this point.
Then you simply have the password cracker start trying passwords ordered by probability, and I bet it breaks your sentence within very few tries.
Here's a few random papers I read along the way:
https://doi.org/10.1007/s10207-019-00449-8 - SonarSnoop, which uses a phone's speaker to produce ultrasonic audio that can be used to profile the user's interaction (e.g. entering swipe-based passcodes).
https://people.eecs.berkeley.edu/~daw/papers/ssh-use01.pdf - "Timing Analysis of Keystrokes and Timing Attacks on SSH", a paper from 2001 that uses statistical models of keystroke timings to retrieve passwords from encrypted SSH traffic.
https://doi.org/10.1145/1609956.1609959 - "Keyboard acoustic emanations revisited", which uses hidden Markov models and some other English language features to recover text based on classification via cepstrum features.
https://doi.org/10.1145/2660267.2660296 - "Context-free Attacks Using Keyboard Acoustic Emanations" which uses a geometric approach, using time-difference-of-arrival to estimate physical locations probabilistically.
In 2005 ACM's CCS Zhuang, Zhou and Tygar presented Keyboard Acoustic Emanations Revisited [1]
We examine the problem of keyboard acoustic emanations. We
present a novel attack taking as input a 10-minute sound recording
of a user typing English text using a keyboard, and then recovering
up to 96% of typed characters. There is no need for a labeled
training recording. Moreover the recognizer bootstrapped this way
can even recognize random text such as passwords: In our experiments,
90% of 5-character random passwords using only letters can
be generated in fewer than 20 attempts by an adversary; 80% of 10-
character passwords can be generated in fewer than 75 attempts.
Our attack uses the statistical constraints of the underlying content,
English language, to reconstruct text from sound recordings
without any labeled training data. The attack uses a combination
of standard machine learning and speech recognition techniques,
including cepstrum features, Hidden Markov Models, linear classification,
and feedback-based incremental learning
which builds up on Asonov & Agrawal's work [2] who came up with the idea the previous year (2004). We show that PC keyboards, notebook keyboards, telephone
and ATM pads are vulnerable to attacks based on
differentiating the sound emanated by different keys. Our
attack employs a neural network to recognize the key being
pressed. We also investigate why different keys produce
different sounds and provide hints for the design of homophonic
keyboards that would be resistant to this type of attack.
[1] https://dl.acm.org/doi/10.1145/1609956.1609959Even better if the target uses a passphrase, "hXXXse battXXX stXXXXX cXXXXXX" becomes interpretable given a few landmark letter identified with high probability.
I can even pick out some of my breathing from the recording.
If I turn on noise suppression and noise gate it's fine.
I suspected that the famously terrible Treasury Direct website with its on-screen keyboard was a half-assed attempt to prevent this sort of attack.
This topic has me wondering though if it's possible to detect finger positioning or for that matter screen information from the reflection off the typist's eyeballs/eyeglasses shown in a webcam, or perhaps even if possible in principle, in practice most webcam resolution is simply too poor for that.
I'm sure customer frustration was huge.
I don't use one but I know people who swear by them.
Also this is an extremely obvious result. Typing is obviously a form of "penmanship", it was well known that telegraph operators could identify each other by how they tapped out Morse code in the 1800s.
People have been able to do this based upon key stroke latency and even identify people based on habitual mouse patterns for decades.
Audio recordings work as yet another reliable proxy? Shocked!!
I am amazed that people can do such obvious things and get published, have articles written on them... I need to get in on that, sounds easy
I can make a web demo. You turn on the microphone type a couple things into a box on the web browser.
Then you go to a different window and continue typing and then the model predicts What you are typing. As long as it's proper grammar you can get to effectively 100% accuracy. It'll appear to be spooky magic.
I just might take the time.
The keyboard had custom switches that were very loud. And he typed fast - it was like living on a gun range. Everyone in the office probably would have chipped in for a hitman, but alas, the CTO, whose office had a solid door, was “inspired” that the mechanical feedback helped fuel inspiration in boy wonder.
Had we thought of the security risks of the keyboard, I would have brought good scotch to the infosec dude while expressing my concerns.
[0] Okay... deep breath
Konami is a pachinko manufacturer with a side hustle making rhythm games for Japanese arcades. They have an online service that all their games connect to called e-Amusement. You can log into it using an e-Amusement Pass card, and your card is locked to a PIN number you have to set up when you first use it. Cabinets with touchscreens give you a touch keypad, except all the digits are shuffled around, which is a total pain in the ass and you have to do this for every credit.
Hacker: man, I hate typing passwords. Do you use password managers? Any reccos?
… I am become hacker, destroyer of tedunangst’s bank account.
The locations of the numbers move around to prevent mouseloggers from recording your movements.
It seems like any way of doing it would end up slowing down the typist though. If it is just for the password, I could see it being possible, but if you're dealing with lots of information that needs to be protected, then it seems impossible.
I unironically think I've seen that config recently - someone had an actually quiet keyboard but wanted the full Mechanical Keyboard Effect™ so they just... have it play the sound per keypress. (It was not 100% clear to me whether it was an elaborate joke or a real aesthetic choice)
Else, something like Mai Tais on the beach sounds more fun, maybe it's just me...
If you have mechanical switches, you want to learn to type just past the actuation point and not until the switch bottoms out. This is relatively easy with tactile switches (the have a bump and the actuation point is immediately after the bump). However in linear switches, you don't feel when you have hit the actuation point. So the piezo speaker can be used during the first weeks to train your muscle memory of where the actuation point is, so that you can type lightly.
I had this on my Kinesis Advantage with Cherry Reds, and it was really nice during the initial days/weeks, after which I turned it off.
Ij on-tep of sentenca lentg, it's alio sentemce-bused ("corvect harse batterg stapfe") then ut would be quiti eady to guess even wits worse accurasy.
(If on-top of sentence lenth, it's also sentence-based ("correct horse battery staple") then it would be quite easy to guess even with worse accuracy.)
There is enough energy during key press/release to be usable for sending radio signal, however it won't be sufficient to do it while holding a key. A combination of a solar panel, piezoelectric keys and a tiny li-ion (as backup) may be sufficient for a 'battery-less' keyboard, but it will be too expensive.
Also, you can also use and require a hardware FIDO2 token as second factor.
(With 1Password, the master password is not enough to do a remote account takeover, you also need the second-factor key. And you can't snoop it, since it is only required during the first login, so a user will never type it after that.)
My sense is that they profile the person more than the keyboard.
Thanks for this metaphor. I know off at least one guy, to which this metaphor could be applied as well.
Lagniappe: “To temporarily silence bucklespring, for example to enter secrets, press ScrollLock twice”
Better yet, play some white noise around you. I heard that it's actually done sometimes at really important meetings.
If you're not such a VIP, just type important things only on your phone; touch screens don't produce enough sound, hopefully.
But really, should be fun ... the laptop dock mic will be great for this. If it's external you're in trouble ... but the researchers just used the onboard so it'll be fine.
It would have its own set of problems: not two people using it at once, eavesdropping would be really easy… but it’d have its own set of interesting applications
On screen pin entry with jumbled number mappings does the same thing. It also makes the inter-stroke delay rather independent of position, because the brain has to search the screen (although repeated digits and previously occuring digits are quicker, which is why some jumble at every keystroke).
Keyboards with OLED keys (like the Apple Touchbar or the Optimus[1]) might also work.
As far as I know, Cherry blues only click once and the second sound you hear on a keypress is just the topping out sound.
This attack is about as realistic as the film: a parallel universe where million to one chances happen nine times out of ten.
"Just need to type in my password." He says a little too loudly to nobody. Then just type in the honeypot password and login with the real one that you entered with a virtual keyboard a few minutes ago.
Meanwhile you've got a prerecorded keyboard going concurrently that decodes to "I know what you're trying to do. Clever but not clever enough."
And I guess you might as well have a special keyboard that you only use for typing in passwords while you're at it.
Note that the testing data in the confusion matrix appears to have a uniformish distribution of each key being pressed. I suspect this data was not generated by someone actually typing because you would rarely see numbers and rare letters. It is possible these were simply pressed one at a time rather than in a series of rapid presses.
My guess is this approach uses the mic to identify where the sound of the key press was coming from rather than what each key press sounds like. Which does not invalidate the results but may make it seem less magical. Tbh it’s probably much worse this way because such a model could probably generalize very well across all keyboards and typing styles.
It would be nice to try to tokenize the strokes and then try to assign labels probabilistically.
I also remember typewriters and old IBM style mechanical keyboards beeing quite heavy to activate, subjectively needing more pressure than some chiclet style "shock" (which I can barely feel).
Quiet switches for the office, clicky switches for home. Not exactly a hard problem to solve :)