I see what you mean, but I have difficulties trying to explain myself I think.
The geometric aspect of FOVO, which I think you refer to as "overlay of elements in the scene" and I refer to as "occluded elements" is something a lens shader can do I think.
Hence I mentioned depth of field, which allows for "scene rays" originating from scene elements which are occluded from a camera ray can still reach the sensor because of the 'circle of confusion' phenomenon.
I think you rightfully mention that 360 degrees and perhaps other lens shaders are warps on flat surfaces, a 2D matrix conversion if you will.
However, when you consider depth of field is also warping, then it certainly is in
3D (circle confusion, but also the simple fact that it requires focal distance setting) and depth of field
is a lens shader effect.
You might still be right, but this is why I think it could basically be a lens shader. A 3D lens shader then, which I did not mention explicitly enough I guess.
The penalty in FPS when playing games may be attributed to how depth of field calcuation is also more expensive, since you do not perform 2D linear matrix conversion from scene to screen space, but 3D non-linear scene space conversion to screen space.
Here's an example of non-linear 2D conversions:
http://paulbourke.net/miscellaneous/lens/Looking at this I can imagine how you can extend these principles to one extra dimension and how that would change occlusion of elements.
I'd love to have such a camera/lens/whatever it is(!) in TG, because it looks like it can really contribute to a heightened sense of scale in large scale TG environments!