Logging Best Practices for AWS Lambda and Microservices
Logging inconsistencies in your system might be keeping you up at night for all of the various nightmares it can cause.
Architects, developers, even product owners know that having the “-ilities of logging” - visibility, traceability, supportability, and telemetry - are essential for support, customer experience and profitability.
When you think about logging, it’s pretty straightforward. But it’s the implementation and standards of logging that make it successful.
While improving IT and business in a myriad of ways, modern microservices-based architectures throw a wrench in the works regarding logging. The fact that AWS Lambdas can scale so fluidly means many logs are produced concurrently from many different locations, which complicates things.
While logging benefits make it worth the concentration, without best practices for logging in AWS Lambda established, common logging pain points can lead to mistakes. Understanding these and the best ways to implement AWS Lambda and microservices logging will give you access to those much sought after “-ilities of logging.”
The Benefits of Logging
Why should you even care about logging in the first place? As mentioned above, the “-ilities of logging” are essential to keeping things up, running, easy to use, performant, productive, and easier to maintain.
Seeing into your system is not easy! Logging helps give you proper insight into your system.
Microservices have increased the number of touch points for a transaction. Logging allows you to trace those transactions across your ecosystem.
Combining the visibility and traceability, logging makes supporting your system more manageable for developers, support staff, and non-technical team members.
An honorary “-ility,” telemetry is an awesome benefit to the list. With the proper insight you’re your system, many organizations are able to collect enough data to get good telemetry on the KPIs within their system.
Logging Pain Points
There is no denying that logging pain points are amplified with serverless and can potentially worsen as your serverless footprint grows.
For instance, it’s challenging to standardize logs across many different Lambdas and microservices. Ideally, you want to enforce logging standards across development teams, systems, applications, microservices, and even developers within a team.
There are pain points for each of our “-ilities of logging”:
Lack of visibility makes it difficult to determine what is occurring in your system and makes debugging extremely challenging for developers.
Lack of traceability makes it hard to search for a transaction from end-to-end across Lambdas and microservices. Without proper traceability, you lose a transaction as it crosses the boundary from one Lambda to another.
Lack of supportability: When tracking down issues becomes difficult, support costs go up, and systems become hard to maintain.
Simply implementing logging without proper design and planning is not enough and can lead to some common mistakes.
Logging Common Mistakes
Logging mistakes in AWS Lambda create situations that can be as confusing as no logging at all. The list of potential mistakes, unfortunately, is substantial:
- No design for logging or thinking of logging as an afterthought instead of addressing it from the start
- Lack of logging standards
- Hard coding error messages
- Lack of request identifiers to help you trace a request across microservice boundaries
- Logging payloads or sensitive data (this happens frequently!
- Logging too much information, which increases the price of your logging or storage solution
- Logging too little, which doesn’t give you enough information to work from
Logging Best Practices in AWS Lambda
If you follow a few key best practices, you can alleviate the pain points without perpetuating the common mistakes.
Use whatever standards you would like, but ensure they apply across the ecosystem. We recommend and prefer JSON because so many logging targets today understand it.
Using informative logging messages increases both visibility and traceability by letting you see what exactly is occurring inside the system and at what time.
Using Transaction IDs and including them in every log message across Lambdas improves traceability and lets you trace your transaction from end to end.
Dynamic error messages
With informative, dynamic error messages, it’s easier to debug problems and pinpoint where an error occurred.
Persist logs into a centralized system
With serverless and volumes of microservices, you’ll need a means of combining and aggregating all of your logs. We recommend CloudWatch and CloudWatch Insights, ElasticSearch, ELK, Splunk, etc.
Log at appropriate levels
Determining the appropriate levels requires some thought. You can log at the INFO, ERROR, or DEBUG level for example.
Log the appropriate amount of data
This will optimize log storage costs while also maintaining the right level of visibility.
Use Lambda Layers
If you’re using Lambdas, consider using Lambda Layers to create a reusable logging framework across your Lambdas.
Conceptually, logging is straightforward. It’s also essential. But it becomes difficult with a growing collection of microservices. Big Compass has been through logging implementations with many clients, so we’re familiar with what can go wrong and how to mitigate those risks.
We’ve shared some of that knowledge in this article, and we’ve also developed the Serverless Logging Framework to help implement the best practices we recommend. These logging best practices can apply to any microservices architecture, from AWS Lambda, custom applications on EC2, MuleSoft, Boomi, or others.
Click the image below to hear an audio overview of the serverless logging framework.
ADOPTION & EXPANSION
+ Number of APIs
+ Business coverage
+ Number of contracted apps
+ API usage
+ API reuse
EFFICIENCY & COST SAVINGS
+ Number of APIs in each SDLC stage
+ Time spent in each SDLC stage
+ Cost and time to build an API
+ App development velocity
+ Number of launches per year
+ Number of defects
SECURITY & VULNERABILITIES
Time since the last version was published
Number of throttling issues
+ Time to onboard
+ Number of deployments
+ Number of incidents
+ Percentage of customers impacted. per incident
+ Time to resolve incidents