Problem
How to understand the behavior of an application and troubleshoot problems?
Forces
- Any solution should have minimal runtime overhead
Solution
Use a centralized logging service that aggregates logs from each service instance. The users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs.
Log service activity and write logs into a centralized logging server, which provides searching and alerting.
Logs are a valuable troubleshooting tool. If you want to know what’s wrong with your application, a good place to start is the log files. But using logs in a microservice architecture is challenging.
Most of the time, the log entries you need are scattered across the log files of the API gateway and several services. The solution is to use log aggregation.
The log aggregation pipeline sends the logs of all of the service instances to a centralized logging server. Once the logs are stored by the logging server, you can view, search, and analyze them. You can also configure alerts that are triggered when certain messages appear in the logs.
A Simple Use-Case Using Observability Patterns in Microservices Architecture.
Distributed Tracing
Assign each external request a unique ID and trace requests as they
flow between services.
A good way to get insight into what your application is doing is to use
distributed tracing. Distributed tracing is analogous to a performance
profiler in a monolithic application. It records information (Ex: start
time and end time) about the tree of service calls that are made when
handling a request.
Assign each external request a unique ID and trace requests as they flow between services.
A good way to get insight into what your application is doing is to use distributed tracing. Distributed tracing is analogous to a performance profiler in a monolithic application. It records information (Ex: start time and end time) about the tree of service calls that are made when handling a request.
Exception Tracking
Report exceptions to an exception tracking service, which
de-duplicates exceptions, alerts developers, and tracks the
resolution of each exception.
A service should rarely log an exception, and when it does, it’s
important that you identify the root cause. The exception might be a
symptom of a failure or a programming bug. The traditional way to view
exceptions is to look in the logs.
You might even configure the logging server to alert you if an
exception appears in the log file. A better approach is to use an
exception tracking service.
Report exceptions to an exception tracking service, which de-duplicates exceptions, alerts developers, and tracks the resolution of each exception.
A service should rarely log an exception, and when it does, it’s important that you identify the root cause. The exception might be a symptom of a failure or a programming bug. The traditional way to view exceptions is to look in the logs.
You might even configure the logging server to alert you if an exception appears in the log file. A better approach is to use an exception tracking service.
Application Metrics
Services maintain metrics, such as counters and gauges, and expose
them to a metrics server
A key part of the production environment is monitoring and alerting. So
it's important to have a monitoring system that gathers metrics, which
provide critical information about
-
Health of an application, from every part of the technology
stack.
-
Metrics range from infrastructure-level metrics, such as CPU, memory,
and disk utilization, to application-level metrics, such as service
request latency and a number of requests executed, etc...
Ex: Newrelic, Datadog, etc...
Services maintain metrics, such as counters and gauges, and expose them to a metrics server
A key part of the production environment is monitoring and alerting. So it's important to have a monitoring system that gathers metrics, which provide critical information about
- Health of an application, from every part of the technology stack.
- Metrics range from infrastructure-level metrics, such as CPU, memory, and disk utilization, to application-level metrics, such as service request latency and a number of requests executed, etc...
Ex: Newrelic, Datadog, etc...
Audit Logging
Log user actions
The purpose of audit logging is to record each user’s actions. An audit
log is typically used to help customer support, ensure compliance, and
detect suspicious behavior.
Each audit log entry records the identity of the user, the action they
performed, and the business objects.
Log user actions
The purpose of audit logging is to record each user’s actions. An audit log is typically used to help customer support, ensure compliance, and detect suspicious behavior.
Each audit log entry records the identity of the user, the action they performed, and the business objects.