But my goals are different: Keep it simple, keep it fast (in terms of
latency, but also in terms of using light and fast apps and finally. in
terms of not running through the venue to make some last-minute
configurations) and let only one machine be the one that has to be
configured - the main laptop at FOH.
I'm not so far away from that - the tools and the technology seem to be
there, already. With ffmpeg for example it's possible to stream videos
from point to point in realtime.
My ffmpeg goals are a bit different - multiple cameras filming music performances and linear timecode based audio sync in post production - but I may have some ideas that could help.
My first attempt at KISS for such a requirement would use the UDP protocol for lower latency with Open Sound Control (OSC) with each device as a server that uses mpv playback via its OSC plugin - basically giving mpv remote access to typical transport controls of "play, stop, pause" etc.
Being a server based app over UDP, there really is no latency and the client is totally text based with simple commands.