Reading time ~14 minutes
Security Logging in Cloud Environments - AWS
- Problem Statement
- Which Services Can We Leverage?
- State of the Art Security Logging Platform in AWS
If you had to architect a multi-account security logging strategy, where should you start?
This post, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, will describe a design for a state of the art multi-account security-related logging platform in AWS. Later posts will also cover a similar setup for both GCP and Kubernetes.
One of the usual requirements for Security teams is to improve the visibility over (production) environments. In this regard, it is often necessary to design and rollout a strategy around security-related logging. This entails defining the scope for logging (resources, frequency, etc.), as well as providing an integration with existing monitoring and alerting systems.
The end goal is to deploy a security logging and monitoring solution with well established metrics and integrations with a SIEM of choice (Elasticsearch in this case). In particular, the solution should be able to:
- Collect security-related logs from all environments.
- Ingest those logs into a SIEM (e.g., Elasticsearch).
- Parse those logs and use them to generate dashboards in Kibana.
- Create alerts on anomalies.
In this regard, this post is composed of two main parts. The first introduces the logging-related services made available by AWS to their customers, alongside with their main features. The second describes a state of the art the design for a security-related logging platform, and provides the high-level architecture and best practices to follow during the implementation phase.
Which Services Can We Leverage?
AWS offers multiple services around logging and monitoring. For example, you have almost certainly heard of CloudTrail and CloudWatch, but they are just the tip of the iceberg.
CloudWatch Logs is the default logging service for many AWS resources (like EC2, RDS, etc.): it captures application events and error logs, and allows to monitor and troubleshoot application performance. CloudTrail, on the other hand, works at a lower level, monitoring API calls for various AWS services.
Although listing (and describing) all services made available by AWS is out of scope for this blog post, there are a few brilliant resources which tackle this exact problem:
- “How to Enable Logging on Every AWS Service in Existence (Circa 2021)” from Matt Fuller tries to be the definitive guide to answer the question “how do I enable logging?” for every supported AWS service. Alongside this, Matt published a Google Sheet summarising the content of this blog post.
- “Logging in the Cloud: From Zero to (Incident Response) Hero” are the annotated slides (131 pages!) of a good talk delivered at RSA 2020 by the Secureworks team which tries to answer questions like “What Should I Be Logging?”, “How Specifically Should I Configure it?”, and “What Should I Be Monitoring?”. Especially interesting since it doesn’t cover only AWS, but also GCP and Azure.
- “What You Need to Know About AWS Security Monitoring, Logging, and Alerting” lays out the different AWS security monitoring and logging sources, and how to select the most appropriate collection technique for each of them.
- “Overview of AWS Logs” lists main AWS logging sources with a summary table, format, example and a Grok regex to parse log and ingest into a tool like Elastic Stack (ELK).
In the remainder of this section I’ll provide a summary of the main services we will need to design our security logging platform. Before doing so, though, it might be helpful having a high-level overview of how these services communicate (special thanks to Scott Piper for the original idea):
AWS CloudTrail is defined as:
A service that enables governance, compliance, operational auditing, and risk auditing of AWS accounts. CloudTrail can be used to log, continuously monitor, and retain account activity related to actions across an AWS-based infrastructure.
CloudTrail provides event history of AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. This event history can be leveraged for security analysis, resource change tracking, and troubleshooting. In addition, CloudTrail can be used to detect unusual activity in your AWS accounts.
In short, CloudTrail monitors AWS API calls across nearly every AWS service, recording information such as the user agent, IP address, IAM user or role ARN, and other details about the request. It delivers log files to a designated S3 bucket approximately every five minutes, along with the option of log file integrity validation. CloudTrail can also be configured to send a message via SNS when new logs are delivered, and integrates with CloudWatch Logs and Lambda for processing.
Data Events, usually not logged by default, can also be leveraged to provide visibility into the resource operations performed on or within a resource (also known as data plane operations). Be wary these events are often high-volume activities.
CloudWatch is a monitoring and observability service that provides data and actionable insights to monitor applications, systems, optimize resource utilization, and get a unified view of operational health.
It collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it using automated dashboards to provide a unified view of resources, applications, and services. CloudWatch can also be used to create alarms based on custom metric value thresholds, or can watch for anomalous metric behavior based on machine learning algorithms. Automated actions can be set up to notify members of staff if an alarm is triggered.
CloudWatch Logs can be manually exported to S3 for long-term storage, or streamed to subscriptions such as Lambda, a Kinesis Data Stream, or Kinesis Data Firehose Stream.
GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect AWS accounts and workloads.
The service uses machine learning, anomaly detection, and integrated threat intelligence (from AWS, CrowdStrike, and Proofpoint) to identify and prioritize potential threats such as:
- Crypto-currency mining.
- Credential compromise behavior.
- Communication with known command-and-control servers.
- API calls from known malicious IPs.
In addition to detecting threats, GuardDuty can perform automated remediation actions by leveraging CloudWatch Events and Lambda.
Config creates an inventory of AWS resources, including configuration history/change notification, and relationships between such resources. It provides a timeline of resource configuration changes for specific services, with snapshots stored in a specified S3 bucket, with the possibility to send SNS notifications when AWS resource changes are detected.
Main use cases for Config are tracking changes to resources configuration, as well as answer questions about resource configurations, demonstrate compliance either at a specific point in time or over a period of time, troubleshoot, or perform security analysis.
Particular mention has to be made for Access Logs, which are generated by a variety of services:
|VPC Flow Logs||
|Elastic Load Balancing (v1)||
|Application Load Balancers (ALB)||
|Network Load Balancers (NLB)||
|Databases (Redshift, RDS, DynamoDB)||
Subscribe to CloudSecList
State of the Art Security Logging Platform in AWS
So how could we design a multi-account security-related logging platform in AWS?
Let’s start with a high-level architecture diagram of a solution with multiple “projects” (or customers), each with production and non-production environments (note how every project/customer will have the same setup). Here I will assume the workloads run predominantly in a Kubernetes cluster (managed EKS), but with some stateful services involved as well (i.e., RDS).
Starting from collection, logging services should be enabled in every AWS account, so to collect logs from every environment (whether it is production or not).
In particular, the following information should be collected:
|API Call Logs||
|Application Event Logs||
|DNS Query Logs||
Since CloudTrail retains logs for a limited period of time, an Organization Trail
(see “Creating a Trail for an Organization”) should be configured to
store logs for extended periods, both to meet compliance obligations and for historical analysis.
In addition, an Organizational Trail would prevent Member (child) accounts
CloudTrail and/or modifying the trail itself.
In order to obtain a complete record of events (whether taken by a user, role, or service), each trail should be configured to log events in all AWS Regions. By logging events in all AWS Regions, it is ensured that all events that occur in an AWS account are logged, regardless of which AWS Region where they occurred. This includes logging global service events, which are logged to an AWS Region specific to a service.
Logging in all AWS Regions has the added benefit that, if an AWS Region is added after a trail has been created, that new region is automatically included, and events in that region are logged by default.
Since the integrity, completeness and availability of the collected logs is crucial for forensic and auditing purposes, a queueing system like Kinesis should be used to receive and buffer all the logs collected.
This not only will improve the resiliency of the platform by queueing (without discarding) messages in the event of the failure of a downstream component which is meant to consume logs, but it also allows to decouple log ingestion from log consumption.
Long-Term Storage and Audit Trail
A dedicated and highly restricted AWS account (here named
should also be created for each project/customer for long term (immutable) storage of the logs.
In that account a Logstash Agent can be used to pull logs directly from Kinesis
(by consuming the stream produced by it) and to store them into an S3 bucket
where they will be treated as immutable files.
This can be achieved via S3 Object Lock in
(see “Protecting data with Amazon S3 Object Lock”)
to ensure that nobody, including the
root user in the AWS account,
would be able to delete the objects during a pre-defined retention period.
In addition, a Data Loss Prevention (DLP) solution could be employed to prevent and detect cases of attempted data exfiltration. It should be noted that, to ensure the integrity of the logs stored in such projects, IAM controls should be put in place to limit access to these S3 buckets.
As an additional measure, log files should be encrypted.
Although by default
CloudTrail encrypts all log files using
S3 server-side encryption (SSE-S3), these files should be encrypted with a custom
AWS Key Management Service (SSE-KMS) key instead, backed by KMS (see “Encrypting CloudTrail Log Files with AWS KMS–Managed Keys (SSE-KMS)”).
To ensure durability of the logs collected in each
MFA Delete should be enabled on the S3 bucket where the log files are stored
(see “S3 MFA Delete”).
MFA Delete ensures that any attempt to change the versioning state of the bucket or permanently delete an object version requires additional authentication.
This helps prevent any operation that could compromise the integrity of the log files,
even if a malicious user acquires the password of an IAM user that has permissions to permanently delete S3 objects.
In case of a forensic investigation, the CloudTrail Log File Integrity Validation
(see “Validating CloudTrail Log File Integrity”)
process could be used to validate the integrity of the log files stored in each
Logging Account and detect whether the log files were unchanged, modified,
or deleted since CloudTrail delivered them.
Monitoring and Alerting
Finally, a centralized AWS account (here called
Centralized Monitoring Account)
can then be used to aggregate logs collected from the different projects.
In this account, another Logstash Agent will have dedicated subscriptions to pull logs from each Kinesis stream defined in every account and forward them to an ElasticSearch instance used by a Security Operations (i.e., SOC) team to monitor and respond to threats in (near) real time.
In conjunction, the machine learning, anomaly detection, and integrated threat intelligence provided by GuardDuty can be leveraged to obtain an out of the box set of alerts with a very good signal-to-noise ratio (i.e., if a GuardDuty alert fires, you should probably want to take a look and investigate it).
In this blog post, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, I described a possible approach for designing a multi-account security-related logging platform in AWS.
Later posts will also cover a similar setup for both GCP and Kubernetes.
I hope you found this post useful and interesting, and I’m keen to get feedback on it! If you find the information shared was useful, if something is missing, or if you have ideas on how to improve it, please let me know on Twitter.