You are correct. It needs to be realistic. For example, if the DSP can create the perception of 5, 7, 9, 11, or even 39 speakers placed through out the room, anchored consistently at all times, it should be fine. However, if you are sitting perfect still and the system moves the speakers around for no apparent reason, you will likely have a negative reaction.
I saw a demo in a lab where they placed speakers in the room because some of the test listeners were bothered that they could not see the speakers the VR system was presenting over the headphones.