gRPC Troubleshooting
Here are a few quick & dirty troubleshooting tips after a couple days in the trenches with a gRPC service (C# client, C++ server):
- Ensure any code creating a new alarm first cancels the last alarm it added to the queue, if it hasn't been processed. In my case, I had a server-side observer receiving updates from elsewhere in the server module & writing them to the gRPC stream. When those updates started coming too quickly, it crashed when adding the alarm to the queue.
- Look for double disconnects. I had another bug where the client was getting a cancellation exception from the server on the MoveNext() call, caught it, then proceeded to try to disconnect via Cancel() on the cancellation token. The client then threw an exception: "Shutdown has already been called".
- Check to see if your client is setting a deadline, which can cause it to close the connection.
- Ensure your client code handles a cancellation exception coming from MoveNext().
- Test by adding delays in various places. One of my crashes was a race condition and would disappear with breakpoints at different points in the process. Specifically, I think I had a handler trying to write to a stream after it was closed. I would simultaneously get a cancellation exception on the client and have a new handler added to the completion queue.
- Examine keep alives. Services can be set up with keep alives sent periodically from the server. Though, if you're going down under high load, this may not be your issue.
- The common advice... Enable logging:
- Environment.SetEnvironmentVariable("GRPC_TRACE", "all");
- Environment.SetEnvironmentVariable("GRPC_VERBOSITY", "DEBUG");
- I got this going on the client, but still haven't gotten it working on the server. The client-side logs haven't been terribly helpful.
Comments
Post a Comment