Input Lag and the Limits of Human Reflex, Part 2


The Manual Tracking Reflex—a Case Study of a Highly Skilled Quake Player

So far, we’ve been talking about particular type of reaction. Something suddenly happens in the environment, and we respond to it with a movement. An enemy pops out from behind a corner, and we press the fire button. A loud sound is played and we explode off the starting block. But in many situations, we are engaged in a continuous manner to real time changes in the environment.

Imagine there is an object hovering in your room, moving around in a complex pattern, and your task is to point at it and track it with your finger. It may change speed or direction and you have to keep your finger aimed at it as best you can. This task is very similar to what happens in many gaming situations, and people who are excel at it are real forces to be reckoned with. The lightning gun (known as the “LG”) in Quake Live is a classic example.

This weapon fires a “beam of electricity” at a rate of 20 “cells” per second, with each cell dealing out a specific amount of damage. It’s a hitscan weapon, which means that, unlike projectile weapons, each shot reaches its target instantaneously. So as long as you have the crosshair aimed at a target, that target will receive damage the moment you press fire (ignoring the vagaries of internet latency and netcode—see this video for an excellent introduction to this topic, presented by two Overwatch developers). In quake, people who are good with the LG can be particularly devastating. Here’s a nice demonstration of how powerful a good LG can be.

One of the interesting things about using the LG (or other similar weapons) is that when you are “in the zone”, it feels almost as if your hand is moving by itself to control the mouse. It’s a fascinating and somewhat strange experience, and one that is shared with a number of skilled players with whom I’ve talked about it. There’s a feeling of effortlessness, and a distinct sensation of observing the mouse moving rather than of consciously controlling it, very similar to descriptions of the ideomotor phenomenon.

Recently, I started wondering whether this experience points to a highly efficient closed loop control circuit in the brain that bypasses cortical processing normally associated with conscious perception. This would certainly explain both the sensation of this experience, and the incredibly fast reactions required for this level of performance. So I decided to run an experiment with a friend who goes by the handle ‘kukkii’, and who has excellent tracking aim.

I had him stand still and fire the LG at me while I dodged left and right. As he was standing still, the only way he could track me was by using the mouse, rather than by using the keyboard to move left and right to help track me. I did my best to make my dodge patterns as unpredictable as possible. Here is footage of a frag at full speed from his point of view (I’m the green guy dodging left and right, and the beep sounds indicate that a cell has hit me).

Here is that same frag slowed down:

The red vertical line indicates the enemy position, and the white vertical line indicates a landmark point in the environment (I extracted each frame of the video and then did some image processing to obtain these lines for each frame). These two lines allow me to find, for each frame, the position of the crosshair, relative to this landmark (crosshair is always in the middle of the frame), and the position of the dodging enemy, relative to the landmark. This then allows me to compare the position of the dodging enemy and the position of the crosshair, all across time.

This is what that position information looks like:

Blur Buster's Input Lag and the Limits of Human Reflex: Position Waveforms of Enemy and Xhair

These plots show the position (in pixels of the video frames) of the enemy and crosshair over time. Looking at the red curve, you can see that the enemy changes direction seven times. And if you look carefully at the peaks of these curves, you can see that the blue curve lags the red curve by a small offset. The amount of lag here is directly related to the tracking reaction time. If the player was able to instantaneously lock on to the enemy position without any delay, so that when the enemy changed positions, the crosshair would change position at exactly the same time, then there would be no offset. By measuring this offset, we can actually measure the tracking reaction time.

But before we do that, there are a couple things we need to do to the data. First, we need to filter it to remove the high frequency noise. The image processing used to extract the position information isn’t perfect to begin with (if you look carefully in the slow motion video, the red and white vertical lines are a bit shaky). Human movement is also not perfectly smooth, and we want to discount information that isn’t relevant to the question we’re asking. Filtering out this “noise” means we have a cleaner signal to work with.

Second, there’s a problem with using position information as a basis for extracting reaction time. In Quake Live, enemies do not change direction instantaneously. Instead, the game engine incorporates inertia, so that when you press a key to change direction, you decelerate over a period of time before changing direction. If someone is tracking a target that is changing direction, they might be able to notice this deceleration before the target actually changes direction. And this would provide an advance warning, or cue, that could be exploited.

For example, if it takes 200 ms from the moment a target starts to decelerate to the moment the target switches direction, and it takes a player 200 ms to react to a cue, then that player would be able to change crosshair direction at exactly the same time the enemy switched direction. And in such a case, the peaks of the position waveforms would line up perfectly. A solution to this problem is to instead look at the acceleration waveforms.

Here is the same data, but now filtered and transformed into acceleration (in pixels/s2):

Blur Buster's Input Lag and the Limits of Human Reflex: Acceleration Waveforms of Enemy and Xhair

Now we just need to figure out a way to measure the offset between these two curves. One way to do it would be to draw a vertical line at each peak, and count the time between neighboring peaks. It’s unlikely that the offset between each peak would be identical, but we could take an average. A more elegant solution that achieves something very similar is to measure what’s called the cross correlation.

This is a very cool technique that measures the correlation between two signals as a function of the offset between them. So in the above example, if you leave the curves as they are (offset = 0), and measure the correlation between them, you’ll get a particular value. If you then “slide” the red curve to the right by one unit (offset = 1) while keeping the blue curve in the same position, and measure the correlation, you’ll get another value. You keep doing this, sliding one curve over the other, measuring the correlation each time you do this.

Here’s what the result of such a procedure looks like when applied to the acceleration curves:

Blur Buster's Input Lag and the Limits of Human Reflex: Cross Correlation of Acceleration Waveform

This shows the correlation for each of those offsets we talked about. The dashed vertical line shows the offset that produced the peak correlation. This is telling us how much we had to slide one curve over the other in order to achieve the best match. And for our purposes, this is a great way of determining the reaction time of the player. For this frag, this was a value of 112.5 ms, which is a very fast reaction time. (Note: cross correlation is also the technique I used to detect the enemy and landmark positions in the image processed video, except there it was a two dimensional cross correlation, where the “signal” was slid over the image to find the best match).

Here’s another example. First the full speed footage:

And now the image processed footage:

Here is the data from this frag:

Blur Buster's Input Lag and the Limits of Human Reflex: Aggregated Frag Data

The reaction time for this frag is 87.5 ms!

Using this technique, I analyzed four separate frags from kukkii, and got the following reaction times:

  • 112.5 ms
  • 116.6 ms
  • 87.5 ms
  • 87.5 ms

Keep in mind that each of these values represents something of an average to begin with, since there were multiple individual reactions during each frag (multiple peaks in the plotted curves).

Now there are a few limitations of this approach. First, it’s possible that kukkii was using pattern recognition rather than pure reflex to guide his aim. I was doing my best to generate unpredictable, random strafe patterns when I was dodging, and this is something I’m generally quite good at, but a proper scientific test would require that the target movement is controlled by a random number generator rather than a human. To this end, I have some interesting ideas about how to achieve this in the future.

Second, it would be much cleaner to extract position data directly from the game demo file, rather than using image processing (this can be potentially achieved using UberDemoTools). For one, a lot of that high frequency noise would be eliminated (the jitter in the white vertical line for example). Also, using the position of a single vertical line relative to image frame coordinates to estimate actual position in game world coordinates has limitations.

For example, if a player is moving towards the left in a straight line and at a constant velocity, the projection of this player onto the image is not linear. As the player reaches the edge of the frame, the projection will appear to move slower and slower to the left, even though the player is moving at a constant velocity. This is because objects that are further away from the “camera” appear smaller (we seem to take for granted that objects far away from us appear smaller, but this is really only due to the way that the lenses of our eyes manipulate light).

Because of this, there is a compression of distance as things move further away from us, whether they’re moving to our left and right, or directly away from us. This problem was reduced by using a relatively narrow field of view (FOV) when doing the image processing (the in game footage is zoomed in), but it is something worth mentioning. Finally, the measurements are limited to the precision of the tickrate of the game server (40 Hz), the frame rate and refresh rate that kukkii was running Quake Live at (250 fps @ 144 Hz), and the frame rate used to capture the action (120 fps).

If it turns out that kukkii, and others like him, are genuinely able to achieve reaction times of ~90 ms when tracking targets, then this could change our understanding of the limits of human performance (and if this finding does indeed survive more rigorous scrutiny, then I propose the name “manual tracking reflex“).

Sound may also play a role here. When the beam is hitting the target, there are a series of beeps that are heard. Perhaps the sudden absence of the beeps that occurs when enemy dodges away from beam is an auditory cue that can be used to enhance reaction time. Testing with and without sound would be important to more fully understand this phenomenon. There is one more piece of evidence in favor of the manual tracking reflex being something distinct from a simple reaction task: When I tested kukkii in a simple detection task, his average reaction time was about 220 ms, which is a bit better than average, but certainly not spectacular. This marked difference in performance between the two tasks is suggestive of two distinct underlying neural mechanisms.

Ok, so what does any of this have to do with input lag? Even if we assume that the upper limits of human reaction time are around 85 ms, this is still at least an order of magnitude greater than the input lag of many displays. Well, here’s a great video from Microsoft that shows how noticeable 10 ms of input lag is. Also, consider this observation from the human benchmark website:

It’s interesting to see that the recorded reaction times have actually gotten slightly slower over the years, which is almost certainly due to changes in input / display technology.

(source)

What they’re referring to here is the change from CRTs (cathode ray tubes) to LCDs (liquid crystal displays). CRTs have an incredibly low latency, essentially limited by the speed of electrons. Prad.de measured the time between the moment a CRT receives information from the VGA cable to the moment a photo diode detected light from the phosphors as 670 nanoseconds!

There are 1000 nanoseconds in a microsecond, and 1000 microseconds in a millisecond, so 670 nanoseconds is about half a thousandth of a millisecond! CRTs are rarely used these days, although a few people, myself included, enjoy them tremendously. Modern LCDs are much faster than their predecessors, and a good gaming LCD will have less than 5 ms of display latency. But to return to the question we posed earlier, does it really matter if one display is 5 ms faster than another? To help answer this question, I’ve run two separate simulations to see how display latency affects in-game performance…



8 Comments For “Input Lag and the Limits of Human Reflex”

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sort by:   newest | oldest | most liked
myCatnip
Member
myCatnip

Can’t edit: Oh and in audio I’m absolutely always sub 100ms and generally sub 90ms with whatever lag my hardware has, non startle response, never tested that, don’t even know how to test that without potentially harming my ears.

Maybe the zoned out sub 120ms visual results are a form of startle; i have to kinda focus on zoning out and telling myself to just react, sort of mentally binding any change -> react (and I don’t click then, it’s more like my whole body suddenly tenses, and I’ll react to ANY change; dog barks, something happens outside, whatever, I’ll react in that state)

Why did I do this? I was curious how fast I can go and trying to get past what appeared to me my hard limit for conscious reaction of 123ms no matter my state no matter the day I never saw anything faster than 123ms, and tried alternative approaches (without changing hardware). At first I doubted the results when I saw 109 and 103ms occasionally, assuming I had predicted it, but eventually I found I could get into this state more consistently (but never really consistent), and had runs with 3 or 4 sub 120ms results.

Interestingly there is also more variance in this zoned out repsonse; I might get 99ms then 115m, while my conscious reaction time is highly consistent so long as I’m not tense or exhausted.

myCatnip
Member
myCatnip

I am 39 and I test 130-150ms average, assuming I’m not tired etc. I did it right now and got 143ms average with 136ms best time 147ms worst. I had 4 hours sleep today but I’m not really tired.

Sometimes I can sort of zone out and get 3 or 4 times sub 120ms. These aren’t practically useful for games, I feel like I’m somehow bypassing some part of normal processing; I’m only consciously aware of the colour change after I’ve already clicked, unlike my normal 130-150ms times where I feel the click is a conscious action after seeing the colour change.

Furthermore there is the signal path length; I am just over 183cm tall. I have a naturally longer path length for the signal than some one who is 150cm tall with proportionally shorter arms, or a small child.

My averages dropped about 10ms when upgrading from a shitty 144hz benq to 280hz with excellent response times. I also dropped another ~8ms averages when changing to an optical switch mouse, so that’s ~18ms drop going from “very good” to “even better” hardware.

The 270ms average figure is complete nonsense: I just did the test on my phone: over 300ms. The averages there mean nothing and I honestly believe that very few people havea reaction time worse than 200ms in reality.

Also reaction time does not get meaningfully worse with age until a certain point, which depending on lifestyle is typically around 60-85 years old. Forgot where I found the study, but that was my rough takeaway from it (there are exceptions ofc).

pierow
Member
pierow

Great work and interesting article. I have a few criticisms and some insight:

For the most part you only stress reaction time in relation to input lag, but perception and cognition is also extremely important. From taking the A/B input lag test someone made on the forums here, myself and a friend were able to pass the test down to recognizing a 5ms difference. From playing games with varying levels of input lag it is clear that just feeling a small amount of input lag despite a consistently high fps plays a major factor in coordination.

Regarding your tests, I play at a similar level so I can offer some additional input. I agree that oftentimes this type of aim isn’t conscious. Pattern recognition definitely plays a part. There’s a phenomenon amongst pro play in quake and many other fps games that sometimes standing still is the best dodge to make in certain situations because it’s the least predictable. I look like the end of frag4 is prediction based on the charts and opponent’s movement pattern. It’s worth noting that dodging is also reaction based and you may have been subconsciously trying to move the opposite way of his beam (tough with latency) and forced into a pattern. It might be worth blinding the dodger if you don’t completely randomize it or getting someone skilled at dodging. The fact that dodging is a skill of varying degrees even amongst pros points towards patterns playing a major role as well. I’d say that movement patterns with aiming usually work in that a movement can be highly anticipated, but still reacted upon instead of proactively aiming.

One thing to note is the LG has a knockback effect that is used to control the opponent’s movement to your advantage, but I think that can be disabled in custom game settings. I’m not sure if you did that. You’re correct to note the inertia of movement plays a role, but also notice that player animations clue in the shooter of the intended direction change. I think your method of accounting for the moment deceleration starts isn’t affected by that though.

It’s also worth noting that a player’s mouse movement may intentionally not follow the opponent. Take for example a player strafing to the right of your screen, ideally you want to track the left side of their body, so when they switch directions you don’t have to move your mouse much if at all (and they’re walking into your knockback) while you see if they’re going to continue their direction change or go back in their original direction.

I hope some of this helps, and again, really interesting article.

myCatnip
Member
myCatnip

Even 1ms differences are noticeable as a difference, and even sub 1ms differences are considering some 1k vs 4k vs 8k tests. I find it too stressful to notice 2ms change consciously in an AB test. I don’t have the patience to do 20 runs like that in one sitting. However I can clearly notice 2ms change in a game because of all the timings and overall coordination; something just feels really off when going from mouse input on a seprate thread in OW2 to CS hardcapped at 400fps, which in theory is a touch over 2ms higher latency… but it feels really wrong for about half an hour to an hour till I get used to it. Then the better input feels off till I get used to it (but it does feel subjectively better and snappier, even if I perform worse with it till I adapt).

Also there’s this: https://www.youtube.com/watch?v=fE-P_7-YiVM input AB testing, you don’t even need 1ms response etc to feel a 1ms difference.

wpDiscuz