The problem is if a game starts playing the audio for an input before actually showing it. For example, if you press jump and audio starts playing on the next frame but animation is delayed another frame. In such cases you’ll have to choose between scaling back/disabling the feature or losing 16.7 ms worth of audio.
Of course, if a game responds on the next frame with both audio and video during actual gameplay, this feature shall not be enabled at all.