Current MITM status for netplay

Since my failure rate connecting to rooms using MITM is 100%, I’ve decided to troubleshoot it myself at my own room.

MITM server: New York, USA (hostname and port were obtained from http://lobby.libretro.com/list)

Test 1:
>>> addr = "us-east1.relay.retroarch.com"
>>> port = 34327
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.settimeout(5)
>>> s.connect((addr, port))
>>> s.getpeername()
('35.211.97.56', 34327)
>>> s.recv(24)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
socket.timeout: timed out
>>> s.send(b"RANP" + b"\x00\x04\x00\x04" + b"\x00\x00\x00\x01" + b"\x00\x00\x00\x00" + b"\x00\x00\x00\x05" + b"\x00\x00\x00\x00")
24
>>> s.recv(24)
b''
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()

Test 2:
>>> addr = "us-east1.relay.retroarch.com"
>>> port = 34327
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.settimeout(5)
>>> s.connect((addr, port))
>>> s.getpeername()
('35.211.97.56', 34327)
>>> s.send(b"RANP" + b"\x00\x04\x00\x04" + b"\x00\x00\x00\x01" + b"\x00\x00\x00\x00" + b"\x00\x00\x00\x05" + b"\x00\x00\x00\x00")
24
>>> s.recv(24)
b''
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()

Doesn’t look like MITM is working at all. It never sends us anything back and it closes connection when we send data to it, even if it’s a valid netplay header.

1 Like

Okay, I believe I’ve figured out what’s causing the MITM servers to force a disconnection; it’s strict about RetroArch versions, unlike the direct netplay code.

>>> addr = "us-east1.relay.retroarch.com"
>>> port = 40601
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.settimeout(5)
>>> s.connect((addr, port))
>>> s.send(b"RANP" + b"\x00\x04\x00\x04" + b"\x00\x00\x00\x01" + b"\x00\x00\x00\x00" + b"\x00\x00\x00\x05" + b"\x00\x00\x02\xd9")
24
>>> s.recv(24)
b'RANP\x00\x04\x00\x04\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x02\xd9'
>>> s.shutdown(socket.SHUT_RDWR)
>>> s.close()

Unless every peer is running the same RetroArch version, you will get forcefully disconnected (“Failed to receive header from client.” notification).

Whoever runs the MITM servers need to change this. The only header parameters it should check for is the netplay magic (RANP) and the protocol version (5). Also, it seems the timeouts in the MITM servers are really short. I managed to connect to my own room via another RetroArch instance and received a TCP FIN packet shortly after getting changed to “playing”.

1 Like

Found the repository for the MITM code.

case STATE_HEADER:
bool word2 = memcmp(header + sizeof(uint32_t) * 2, server_header_data + sizeof(uint32_t) * 2, sizeof(uint32_t));
bool word3 = memcmp(header + sizeof(uint32_t) * 3, server_header_data + sizeof(uint32_t) * 3, sizeof(uint32_t));

These need to be removed; they are named word, but are in fact dwords. First is whether the peer supports compression or not and the second is the random salt for passworded rooms, which are only available AFTER the server sends it to the client.

case STATE_POST_HEADER:
bool word1 = memcmp(header + sizeof(uint32_t) * 1, server_header_data + sizeof(uint32_t) * 1, sizeof(uint32_t));

This one needs to be removed too, and again, those are dwords not words. This is the implementation magic (a simple hash of RetroArch’s and the netplay protocol versions).

case STATE_RECV_INFO:
info_mismatched |= memcmp(&server_info.content_crc, &info.content_crc, sizeof(info.content_crc));

Shouldn’t be strict about the content’s CRC; RetroArch isn’t. It should be removed aswell.

This is just an initial analysis and the code is not very readable (it’s quite the mess to be honest). A refactoring of the MITM code would probably solve most (if not all) of the current problems. A Python prototype using asyncio can probably achieve the same results with about half the code lines.

Let me know if there is an interest in refactoring MITM. I can do it myself and then, if Python isn’t the desired language, I can port it to C++17.

Hmm, i would normally say twinaphex would reject c++17, but since it’s not actually part of RetroArch, and it’s already c++, it may be fine. I’ll let you know what he says.

I talked to twinaphex and he’s down for whatever you want to do for refactoring. As you say, the current code is not particularly maintainable by anyone but the original guy(s) that worked on it, so… If you’re cool with it, we can just leave the current MITM repo as it is and you can start your own thing (I don’t think there’s any major preference between python and c++17 here, as long as it gets the job done) and when it’s ready, we can try switching over.

In the meantime, is it worthwhile for us to go ahead and remove those things you mentioned above?

1 Like

Yeah, here is the list of problems that get solved by removing what I’ve mentioned above.

  • Passworded rooms will now work on relay servers (it doesn’t currently).

  • Clients not supporting zlib compression will work alongside clients that do (not 100% sure about that, need to study the code further).

  • Clients with compatible netplay code but different RetroArch versions will now work. Currently, all clients must run the same RetroArch version.

  • Clients running the same content but under different CRCs will no longer be deemable incompatible (think arcade rooms without TorrentZip).

Also, this comment inside the code (typo included):

// FIXME 2: This reqiures exact match, but direct connections are now looser.

Commenting them out and removing them from their if evaluations should be a 5 minutes job.

1 Like

I don’t have a time frame to get it done, I want to take it slow to get everything done correctly, which means I’ll be doing it while studying both the netplay and the current MITM implementations.

I want to get in as much compatibility as possible; for example, if one client is using zlib compression while another isn’t, the relay server will decompress that data when received and send it raw (and vice-versa).

I also want to implement it using a single TCP port too, this way port exhaustion isn’t an issue (current implementation is theoretically vulnerable to it, if netplay and relay servers were more popular).

1 Like

Nice, yeah, no hurry on our end. What we have “works” (with obvious issues, as you’ve found), but if you can get something better going, we’re on board :slight_smile:

2 Likes

I am planning to start working on this tomorrow.

The initial (prototype) version will be written in Python 3 to reduce development and debugging times and I would very much like if the release version remains Python 3, unless compatibility is an issue and porting it to a native compiled language is a must; processing performance isn’t likely to be an issue because the majority of the bottleneck in a relay server is going to be network I/O.

With that being said, the range of versions I am willing to work with is [3.5, 3.6, 3.7]; 3.5 minimum because that’s when type hints were implemented (these will prove invaluable if we need to port it to a statically typed language) and 3.7 maximum because that’s the latest version on my IDE.

Should I do it supporting 3.7 and later versions or is there a lower version I should base it?

I talked to Twinaphex and it sounds like we’re good for python 3.7 or whatever works on your end. Hopefully, with your cleanup and using a common and well-liked language like python, we can get more people interested in contributing/maintaining it in the future :slight_smile:

In the meantime, how does this commit look for implementing your suggestions on the existing one: https://github.com/libretro/netplay-mitm-server/commit/fae1270e7f0d7434471a2719df4c826f7f60e146

1 Like

Looks good; some features might still not work like passworded rooms (it seems the current relay server code has no functionality for passwords and it might not even be possible to implement it on the new code without breaking backwards compatibility). As for the others, if an error occurs because of incompatibility, the clients can error out on their own, like it happens on direct hosting.

My code will maintain backwards compatibility with the current implementation, but I’ll add disclaimers for things that SHOULD be changed once backwards compatibility can be ignored (maybe RetroArch 1.10 or 1.11).

EDIT:

What we can do for passwords without breaking backwards compatibility is to have the relay server generate a salt when the first client sends its header, then send that salt to the client through our header to force it to send us a password; finally, we can cache that hashed password, requiring other clients to match it. The downside is that the client creating the relay session will need to type the password in everytime he creates a new session.

1 Like

Came across another problem that also needs to be patched.

(mitm.h) From:

struct info_buf_s {
  uint32_t cmd[2];
  char core_name[NICK_LEN];
  char core_version[NICK_LEN];
  uint32_t content_crc;
};

To:

struct info_buf_s {
  uint32_t cmd[2];
  uint32_t content_crc;
  char core_name[NICK_LEN];
  char core_version[NICK_LEN];
};

(mitm.cpp) From:

qint64 readBytes = sock->read(info.core_name, info_payload_size);

To:

qint64 readBytes = sock->read(info.content_crc, info_payload_size);

The current implementation also doesn’t send the SRAM to clients, which is likely to be a problem for games that have SRAM. It’s also not easily fixed because the current netplay implementation isn’t aware that it’s actually working with a relay server. I know Fightcade packs pre-made SRAMs together with their package, probably to avoid situations like this.

Hmm… The more I go through the code, the more I believe it’s not possible to fixes some problems without making backwards incompatible changes to RetroArch’s netplay itself.

Here is an idea to ponder; how about getting rid of relay servers altogether, getting rid of the NAT traversal option, forcing UPnP to always run internally when the bound socket is LAN (net masks: 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16) and finally running a test server that the client automatically connects to and requests the server to test whether its netplay session is visible through the WAN or not, giving a big warning back to the client that his session isn’t on the internet.

Given, while most home routers come with UPnP enabled these days, UPnP still isn’t enough in some situations like intranets composed of multiple LANs (I’ve one at home to isolate radio stations). In this case, UPnP only forwards a port to the lower LAN, with all subsequent LAN devices needing to the same up to the internet gateway device, which isn’t going to happen unless you’ve something like OpenWrt with your own custom code for UPnP recursion on every device.

Still, the current situation with relay servers is… bad. I’ll postpone working on it until I’ve a bigger picture and we hold more discussions on this subject; right now it isn’t looking too bright.

I pondered upon an idea last night that involves turning RetroArch into a server daemon.

Would work like this:

  1. Add the “–server” command-line option to run RetroArch in server mode.
  2. While in server mode, RetroArch will not render a GUI, running in CLI mode instead.
  3. Create a UDP socket to accept hosting requests.
  4. A client sends an UDP packet formatted as [size of core name:uint8][core name:char*][size of core version:uint8][core version:char*][size of game name:uint8][game name:char*][size of base file without extension:uint8][base file without extension extension:char*][game’s crc32:uint32].
  5. Once the server validates the request, it runs the content and replies back to the client with the port where it’s hosting the content.
  6. If the server is already running a content (or core/game is not found within the server), the port returned is 0, meaning the request was denied.
  7. If the server doesn’t receive any clients for a short period of time (say 30s), the content is unloaded and it returns to accepting requests.
  8. The server won’t render frames, won’t play audio samples and won’t generate input.

This allows latency from all peers to be relative to the server, rather than relative to the server AND the peer. This also allows people to run high performance dedicated servers.

Backwards compatibility is maintained with the current netplay code; we will be running the code, we will just be adding extra code to it for server mode.

Whoever runs a server is also the one responsible for setting up cores and games in there, no licensing/copyright infringement on RetroArch’s part.

1 Like

Hi, speaking about mitm, Nowadays launchers like EmulationStation have their own UI that fills RA gaps by displaying the compatible sessions (with a version comparison of RA and the core, a CRC comparison of the rom…).

I think that currently there are no way to join a relayed / mitm server from command line. This means that all implementations for joining relayed servers in EmulationStation are completely wrong :sweat_smile: (For me it can’t work because they don’t use session ID and don’t negotiate with the relay server to get the mitm IP). Since all ES based distribs use pretty much the same code, I think joining a relay server is impossible no matter which retrogaming distrib is tested.

Can we plan to implement a way to join a mitm session from command line ? I’d be really interested to get some feedback on this :wink:

I ran into the “failed to receive header” error yesterday (using the supercombo.gg relay), so it seems like it’s still an issue.

Is https://github.com/libretro/netplay-tunnel-server the Python replacement for the original C++ implementation?