[Mono-list] Sockets severe non-linear performance degradation

Mon Jun 24 14:58:14 UTC 2013

Platform: linux ubuntu 12.04 AMD64
Mono version 2.10.8.1

I have just started with C# and have a simple client-server test app
that measures throughput and latency from the client for a typical 
send+recieve message scenario. As I increase the message size to more
than the socket send+recieve buffer size performance falls off a cliff,
whereas I would expect latency (round trip time) to increase approx
linearly with the ratio round_up((message size) / (socket buf size))
and throughput (bytes/sec) to stay approx constant.

I set socket buffer send and receive sizes = N

When message size < N:
Roundtrip Latency: 1-3 milliseconds
Throughput: ~2 Mbyte/sec

This compares well with my equivalent java/C code on the same machine,
about equalling java as I would expect and being a bit slower than
native C sockets (I can comfortably get sub-millisecond round trips
with C, throughput around 5Mbyte/sec)

When N < message size < 2 * N:
Roundtrip Latency: 400-1000 milliseconds
Throughput: 50-100 Kbyte/sec

Machine shows very low CPU usage so it appears that the IO subsystem is
not rescheduling a thread blocked on IO in a timely manner. This
puzzles me since a direct mapping of the Socket methods onto the C
library counterparts should achieve what is required. So have I missed
some critical option, or does mono do something with IO/thread mapping
onto the OS that is causing this?

Some more detail on my test app:
The basic loop I am timing is.
1. Client writes random sized buffer to socket
2. Server reads data
3. Server writes random sized buffer to socket
4. Client reads data and stops timer

A message consists of a small fixed header indicating the size of 
following data.
Thus:
   write requires one system call since I coalesce header+data
   read requires two system calls, one for fixed header and one for
     variable data

Both client and server sockets are given the same options:
   socket.ReceiveBufferSize = 16000;
   socket.SendBufferSize = 16000;
   socket.NoDelay = true;
   socket.SetSocketOption(SocketOptionLevel.Socket,
                          SocketOptionName.DontLinger, false);

I have tried three basic approaches to the read/write loop.
A. socket.Blocking = true and use socket.Receive()/socket.Send()
B. socket.Blocking = true/false and use the async versions of
    socket.Receive()/socket.Send(). I.e. socket.BeginReceive() followed
    immediately by socket.EndReceive()
C. socket.Blocking = false and use socket.Receive()/socket.Send() and
    socket.Poll(SelectMode.SelectRead or SelectWrite)
    to wait for data where appropriate.

All three approaches show the same problem.
Both A and C map to the same approach I have used in native sockets
programming, and B may map to C internally so I did not really expect
it to be any better.

A few more points:
1. The code is operating correctly in the sense that it reliably
    transfers data (which I CRC32 check), it is solely performance which
    is the problem.
2. I use the high resolution System.Diagnostics.Stopwatch.GetTimestamp()
3. I call an explicit System.GC.Collect() before starting each timed
    test to reduce chances of GC skewing the timings.
4. I have tested over a standard TCP socket on main machine address and
    also on local loopback address, no change.
5. I have not tested on windows since I purged microsoft from my life
    several years ago and currently have no windows machine.
6. It is not possible from test code to know if delay is in read, write
    or both read+write blocking.

Any help would be appreciated.