gRPC Troubleshooting

Here are a few quick & dirty troubleshooting tips after a couple days in the trenches with a gRPC service (C# client, C++ server): 

  • Ensure any code creating a new alarm first cancels the last alarm it added to the queue, if it hasn't been processed.  In my case, I had a server-side observer receiving updates from elsewhere in the server module & writing them to the gRPC stream.  When those updates started coming too quickly, it crashed when adding the alarm to the queue.
  • Look for double disconnects.  I had another bug where the client was getting a cancellation exception from the server on the MoveNext() call, caught it, then proceeded to try to disconnect via Cancel() on the cancellation token.  The client then threw an exception: "Shutdown has already been called".
  • Check to see if your client is setting a deadline, which can cause it to close the connection.
  • Ensure your client code handles a cancellation exception coming from MoveNext().
  • Test by adding delays in various places.  One of my crashes was a race condition and would disappear with breakpoints at different points in the process.  Specifically, I think I had a handler trying to write to a stream after it was closed.  I would simultaneously get a cancellation exception on the client and have a new handler added to the completion queue.
  • Examine keep alives.  Services can be set up with keep alives sent periodically from the server.  Though, if you're going down under high load, this may not be your issue.
  • The common advice... Enable logging:
    • Environment.SetEnvironmentVariable("GRPC_TRACE", "all");
    • Environment.SetEnvironmentVariable("GRPC_VERBOSITY", "DEBUG");
    • I got this going on the client, but still haven't gotten it working on the server.  The client-side logs haven't been terribly helpful.

Comments

Popular posts from this blog

Dude, Where's My Framework?

Permanently Mapping a Windows Share on Linux

Initialize With Care