Distributed Tracing in Microservices

Manoj Bhagwat
4 min readJan 16, 2022

This is my new blog on distributed tracing in microservices

Let's first talk about the challenges that we are facing with modern microservices architecture.

Common problems with Microservices.

  1. Loss of coherence: To fulfil a single end-user request is now broken across multiple processes, possibly written in multiple frameworks and implementation languages, it is much harder for team members to understand what exactly happened in the course of processing a request. Unlike a monolithic process, where we could gather the complete story of how a request was handled from a single process written in a single language, we no longer have an easy way of doing that in a microservices environment.
  2. Increased debugging and troubleshooting costs: The act of tracking down and fixing sources of errors inside microservice architectures can be tremendously more expensive and time-consuming. In most cases failure data isn’t propagated in an immediately useful or clear manner inside microservices; instead of an immediately understandable stack trace, we have to work backwards from status codes and vague error messages propagated across the network.

3. Data silos and cross-team communication: Given that one request has to. make multiple hops over the network and has to be handled by multiple. processes developed by independent teams, figuring out exactly where an. error occurred and whose responsibility it is to fix can become an exercise in. futility and frustration.

What is distributed tracing ?.

Distributed Tracing is the process of tracking and analyzing what happens to a request (transaction) across all services it touches.

What does “tracking” and “analyzing” mean?

  • Tracking” means generating the raw data in each service that says, “I did some processing for a request with a Trace ID abc123 — here's what I did, what other services I talked to, and how long each chunk of work took."
  • Analyzing” means using any of the various searching, aggregation, visualization, and other analysis tools that help you make sense of the raw tracking data.

Why Does Your Business Need Distributed Tracing?

Few of the critical questions that DT can answer quickly and easily in a distributed system architecture:

  • What services did a request pass through? Both for individual requests and for the distributed architecture as a whole (service maps).
  • Where are the bottlenecks? How long did each hop take? Again, DT answers this for individual requests and helps point out general patterns and intermittent anomalies between services in aggregate.
  • How much time is lost due to network lag during communication between services (as opposed to in-service work)?

Distributed tracing is extremely useful even for a single service where upstream and downstream haven’t implemented DT.

Key components of tracing system

In order to discuss the core concepts for how distributed tracing works, we first need to define some common nomenclature and explain the anatomy of a trace. Lets take reference of the Google Dapper paper, so the main entities are Trace and Span. Note that distributed tracing has been around for a long time, so if you research DT you might find other tools and schemes that use different names. The concepts, however, are usually very similar:

  • Trace exposes the execution path through a distributed system. Trace is composed of one or more spans.
  • Span in the trace represents one microservice in the execution path.
  • Request is how applications, microservices, and functions talk to one another.
  • Root span is the first span in a trace.
  • Child span is a subsequent span, which can be nested.

Selecting right tool for implementation

Whatever tool you decide for implementation needs to have some basic features listed as below:

  • In a distributed tracing scheme, there needs to be some kind of “span collector” that gathers the span data from the various services in a distributed systems architecture.
  • Another important component we need is good visualization UI . This will be used by developers for writing complex queries for debugging the issues.

Conclusion

In this article, we first understood the challenges that we are facing in modern microservices and how DT system is helping us overcome all these problems.

--

--

Manoj Bhagwat

Trying new things. Breaking stuff. Likes open source | DevOps | Find me on LinkedIn 🔎. https://www.linkedin.com/in/manoj-bhagwat-73045082/