Reading time ~16 minutes

Security Logging in Cloud Environments - GCP

Problem Statement
Which Services Can We Leverage?
State of the Art Security Logging Platform in GCP
Conclusions

If you had to architect a multi-account security logging strategy, where should you start?

This post, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, will describe a design for a state of the art multi-account security-related logging platform in GCP.

A previous post covered a similar setup for AWS, hence I tried to follow the same structure here. A later post will cover a setup for Kubernetes instead.

This is a living document, which I regularly update as new services/improvements get released.
Last updated: October 25, 2022

Problem Statement

One of the usual requirements for Security teams is to improve the visibility over (production) environments. In this regard, it is often necessary to design and rollout a strategy around security-related logging. This entails defining the scope for logging (resources, frequency, etc.), as well as providing an integration with existing monitoring and alerting systems.

The end goal is to deploy a security logging and monitoring solution with well established metrics and integrations with a SIEM of choice (Elasticsearch in this case). In particular, the solution should be able to:

Collect security-related logs from all environments.
Ingest those logs into a SIEM (e.g., Elasticsearch).
Parse those logs and use them to generate dashboards in Kibana.
Create alerts on anomalies.

In this regard, this post is composed of two main parts. The first introduces the logging-related services made available by GCP to their customers, alongside with their main features. The second describes a state of the art the design for a security-related logging platform, and provides the high-level architecture and best practices to follow during the implementation phase.

Which Services Can We Leverage?

Similar to AWS, GCP offers multiple services around logging and monitoring. Cloud Operations (formerly known as StackDriver) has been defined as a suite of products to monitor, troubleshoot, and operate services at scale. It now includes Cloud Logging, Cloud Monitoring, Cloud Trace, Cloud Debugger, and Cloud Profiler.

In the remainder of this section I’ll provide a summary of the main services we will need to design our security logging platform.

Cloud Logging

Cloud Logging receives, indexes, and stores log entries from many sources, including GCP, AWS, VM instances running the fluentd agent, and user applications:

Type	Description
Agent Logs	Application and Host/OS-level logs can be collected via the Cloud Logging Agent, an application based on `fluentd` that runs on supported VM instances. The Agent is installed by default on VMs running in Google Kubernetes Engine or App Engine. By default, it collects the following logs: For Linux: Syslog, nginx, apache2, apache-error. For Windows: Windows Event Logs. For a list of all the monitored resource types used in the Logging API, refer to the Monitored resources and services page of the GCP documentation.
Cloud Audit Logs	GCP services write audit log entries to help answer the questions of "who did what, where, and when?" within Google Cloud resources. Cloud Audit Logs maintains multiple types of audit logs (more on this below): Admin Activity System Event Data Access Policy Denied For a list of Google Cloud services that write audit logs, see Google services with audit logs.
Access Transparency Logs	Logs of actions taken by Google staff when accessing your data. The difference here is that, while Cloud Audit Logs provides logs about actions taken by members within your own organization, Access Transparency provides logs of actions taken by Google staff.

Cloud Logging Architecture

Cloud Logging is designed to be a scalable and flexible logging system. It can ingest logs from a variety of sources, and route them to different destinations.

How Cloud Logging routes and stores log entries - Courtesy of Google.

There are three important components within Cloud Logging: Cloud Logging API, Cloud Logging Router (Log Router) and log buckets (Log Storage), as outlined below:

Component Type	Purpose	Billed
Cloud Logging API	Ingests all logs	No
Cloud Log Router	Uses sinks to route logs inside and outside of Cloud Logging	No
Log Buckets	Automatically index logs, enabling fast search and retrieval via Log Explorer and Log Analytics There are three options for log buckets: Required Default User-defined or Custom	Yes

Audit Logs

As briefly mentioned above, Google Cloud Audit Logs record the who, where, and when for activity within your environment, and ultimately help security teams maintain audit trails in GCP.

With them, it is possible to attain the same level of transparency over administrative activities and accesses to data in GCP as in on-premises environments. Every administrative activity is recorded on a hardened, always-on audit trail, which cannot be disabled by any rogue actor.

Cloud Audit Logs provides the following audit logs for each Project, Folder, and Organization within a resource hierarchy:

Type	Description	Retention Period
Admin Activity Audit Logs	Contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. For example, these logs record when users create VM instances or change Cloud IAM permissions. To view these logs, you must have the Cloud IAM role `Logging/Logs Viewer` or `Project/Viewer`. Admin Activity audit logs are always written, and it is not possible neither to configure nor disable them.	400 days
System Event Audit Logs	Contain log entries for Google Cloud administrative actions that modify the configuration of resources. They are generated by Google systems (they are not driven by direct user action). For example, these logs record when GCE live migrates an instance to another host. To view these logs, you must have the Cloud IAM role `Logging/Logs Viewer` or `Project/Viewer`. System Event audit logs are always written, and it is not possible neither to configure nor disable them.	400 days
Data Access Audit Logs	Contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read user-provided resource data. Data Access audit logs consist of three sub-types: `Admin read`: reads of service metadata or configuration data (e.g., listing buckets or nodes within a cluster) `Data read`: reads of data within a service (e.g., listing data within a bucket) `Data write`: writes of data to a service (e.g., writing data to a bucket) Data Access audit logs do not record the data-access operations on resources that are publicly shared (available to `All Users` or `All Authenticated Users`) or that can be accessed without logging into Google Cloud. To view these logs, you must have the Cloud IAM roles `Logging/Private Logs Viewer` or `Project/Owner`. Data Access audit logs are disabled by default because they can be quite large. Caveat for GKE: The GKE Admin Activity logs are missing `get` operations on `Secrets` by default. To have this logged you'll have to enable Data Access Logs.	30 days
Policy Denied Audit Logs	Cloud Logging records Policy Denied audit logs when a Google Cloud service denies access to a user or service account because of a security policy violation. To view these logs, you must have the IAM role `Logging/Logs Viewer` or `Project/Viewer`.	30 days

For more information, see Best practices for Cloud Audit Logs.

Cloud Monitoring

Cloud Monitoring collects metrics, events, and metadata from GCP, AWS, hosted uptime probes, and application instrumentation. It also provides dashboards, alerts, and uptime checks that can be used to ensure systems are running reliably.

In addition, Cloud Monitoring allows to create custom alerting policies: whenever events trigger conditions in one of the alerting policies defined, Cloud Monitoring creates and displays an incident in the console. If you set up notifications, Cloud Monitoring can also send notifications to people or third-party notification services.

Cloud Identity

Cloud Identity is Google’s Identity as a Service (IDaaS) product, which can be used to provision, manage, and authenticate users across GCP environments. Cloud Identity is how people in an organization gain a Google identity, and it’s these identities that are granted access to Google Cloud resources.

In this regard, Cloud Identity logs track events that may have a direct impact on a GCP environment. Relevant logs include:

Type	Description
Admin Audit Logs	Track actions performed in the Google Admin Console. For example, it allows to see when an administrator added a user or changed a setting.
Login Audit Logs	Track when users sign in the domain. Interesting events are: `Failed Login`: logged every time a user fails to login. `Suspicious Login`: if a user logged in under suspicious circumstances, such as from an unfamiliar IP address.
Groups Audit Logs	Track changes to group settings and group memberships in Google Groups.
OAuth Token Audit Logs	Track third-party application usage and data access requests.
SAML Audit Logs	Track successful and failed logins to SAML applications. Only available to GSuite/Cloud Identity Premium customers.

Security Command Center

Security Command Center is defined by Google as a risk dashboard and analytics system for surfacing, understanding, and remediating Google Cloud security and data risks across an organization.

Security Command Center enables the generation of insights that provide a unique view of incoming threats and attacks to Google Cloud resources (called “assets”), by displaying possible security risks (called “findings”) that are associated with each asset. Findings can come from security sources that include Security Command Center’s built-in services, third-party partners (like Cloudflare, CrowdStrike, Prisma Cloud, and Qualys), or even custom sources.

Security Command Center - Courtesy of Google.

Security Command Center currently focuses on asset inventory, discovery, search, and management:

Feature	Description
Asset discovery and inventory	Cloud Asset Inventory Discover and view assets in near-real time across App Engine, BigQuery, Cloud SQL, Cloud Storage, Compute Engine, Cloud IAM, Google Kubernetes Engine, and more. Review historical discovery scans to identify new, modified, or deleted assets.
Threat prevention	Understand the security state of your Google Cloud assets. Security Health Analytics: Provides managed vulnerability assessment scanning that can automatically detect the highest severity vulnerabilities and misconfigurations for Google Cloud assets Web Security Scanner (💰 PREMIUM): Provides managed scans that identify common web application vulnerabilities (such as cross-site scripting or outdated libraries) in web applications running on App Engine, GKE, and Compute Engine The full list of finding types is available on the GCP documentation
Threat detection	Event Threat Detection (💰 PREMIUM): Monitors Cloud Logging stream and consumes logs for one or more projects as they become available It detects threats like: Malware Cryptomining Brute force SSH Outgoing DoS IAM anomalous grant Data exfiltration The full list of Event Threat Detection rules is available on the GCP documentation Container Threat Detection (💰 PREMIUM): Continuously monitors the state of Container-Optimized OS node images (see supported GKE versions) It evaluates all changes and remote access attempts to detect runtime attacks in near-real time It includes several detection capabilities, including suspicious binaries and libraries, and uses natural language processing (NLP) to detect malicious bash scripts The full list of Container Threat Detection detectors is available on the GCP documentation Virtual Machine Threat Detection (💰 PREMIUM): Provides threat detection through hypervisor-level instrumentation Scans enabled Compute Engine projects and VM instances to detect unwanted applications, such as cryptocurrency mining software, running in VMs Sensitive Actions Service (💰 PREMIUM): Detects when actions are taken in your Google Cloud organization, folders, and projects that could be damaging to your business if they are taken by a malicious actor Currently in Pre-GA

Alerts triggered by Security Command Center can be turned into real-time notifications via integrations with Pub/Sub.

Access Logs

Particular mention has to be made for Access Logs, which are generated by a variety of services:

Service	Description
Cloud Storage	Data Access logs are not recorded by default, and provides information for all of the requests made on a specified bucket, including access requests and changes made by the Object Lifecycle Management feature. Logs are created hourly, when there is activity. Cloud Storage Administrative activity, instead, is logged automatically, and includes operations that modify the configuration or metadata of a bucket, or object.
VPC FLow Logs	VPC Flow Logs capture information about the traffic going to and from a VPC's network interfaces and can be applied at the VPC, or VM level. Flow log data is stored using Cloud Logging and can be exported to BigQuery or Pub/Sub for additional analytics or visualization of network traffic flows. VPC Flow Logs can be useful when organizational legal or security policies require capturing network flow data.
Cloud Load Balancing	Logs the details of each request/connection made to the Load Balancer (i.e., HttpRequest log fields), alongside with information which explains why the load balancer returned the HTTP status that it did.
Cloud CDN	Each Cloud CDN request is logged in Cloud Logging. Logs for Cloud CDN are associated with the external HTTP(S) load balancer that the Cloud CDN backends are attached to.

📙 The CloudSec Engineer is out now!

The CloudSec Engineer is a practical guide on how to enter, establish yourself, and thrive in the Cloud Security industry as an individual contributor.

You can head over to CloudSecBooks.com to find more information about the book and its contents.

State of the Art Security Logging Platform in GCP

So how could we design a multi-account security-related logging platform in GCP?

Let’s start with a high-level architecture diagram of a solution with multiple “projects” (or customers), each with production and non-production environments (note how every project/customer will have the same setup). Here I will assume the workloads run predominantly in a Kubernetes cluster (managed GKE), but with some stateful services involved as well (i.e., CloudSQL).

Architecture Diagram - Security Logging Platform in GCP

Collection

Starting from collection, Cloud Logging should be enabled in every GCP project, so to collect logs from every environment (whether it is production or not).

In particular, the following information should be collected:

Log Type	Description
Agent Logs	A `fluentd`-based agent (that can run on supported VM instances) will collect entries from the GKE clusters and the applications running on them.
Application Event Logs	The same fluentd-based agent should be used to capture application event and error logs.
Audit and Access Transparency Logs	Both Audit and Access Transparency Logs should be collected, as described in the Audit Logs section.
Access Logs	VPC Flow Logs: `VPC Flow Logs` can be collected to comply with regulatory policies requiring to capture network flow data, as it ingests information about IP traffic going to and from a VPC's network interfaces. Cloud Storage: `Cloud Storage Access Logging` can be enabled to record made to buckets. Cloud Load Balancing: `Cloud Load Balancing Access Logging` can be enabled to record individual requests made to load balancers.
Kubernetes Logs	GKE includes native integration with Cloud Monitoring and Cloud Logging: when a new GKE cluster is setup, system and application logs are enabled by default. A dedicated agent is automatically deployed and managed on the GKE node to collect logs (alongside with metadata about the container, pod and cluster), and then forward them to Cloud Logging. Both system logs and app logs are then ingested and stored in Cloud Logging. Control plane logs: control plane API, audit, controller, authenticator, and scheduler logs are collected by GKE itself and forwarded to Cloud Logging. Worker node logs: collection depends on whether the compute plane is self-managed or GCP-managed. I'll write a follow-up post specifically on this. Task container logs: the application logs. I'll write a follow-up post specifically on this.
DNS Query Logs	Cloud DNS logging can track queries that name servers resolve for VPC networks. Queries from an external entity directly to a public zone are not logged because a public name server handles them.

In conjunction, Cloud Monitoring is going to be enabled in order to ingest events, metrics, and metadata and generate insights (through dashboards, charts, and alerts). In addition, Cloud Monitoring should also be used to create and manage custom alerting policies (more on this later).

On top of this, it could be useful to also collect findings coming from Security Command Center. Security Command Center, enabled at the GCP Organization level, ingests findings from Security Health Analytics, as well as Event Threat Detection and Container Threat Detection. Once ingested, Notification Configs can be used to dispatch each finding to a Pub/Sub topic hosted in the relevant GCP project (the one the finding is associated with).

Finally, Cloud Identity Logs (at least the Admin, Login and Groups Audit Logs) should be collected, as described in the Cloud Identity section.

Delivery

Since the integrity, completeness and availability of the collected logs is crucial for forensic and auditing purposes, a queueing system like Pub/Sub should be used to receive and buffer all the logs collected.

Since Cloud Logging retains app and audit logs for a limited period of time, export sinks are going to be configured in order to store logs for extended periods, both to meet compliance obligations and for historical analysis: Pub/Sub is going to get configured to receive and buffer all the logs forwarded by Cloud Logging, so that they can be exported to any external monitoring service. In this regard, the “Design patterns for exporting from Logging” guide, together with the “Aggregated Exports” feature (which allows to set up a sink at the Cloud IAM organization level, and export logs from all the projects inside the organization), can be used as a reference for the export strategy.

This not only will improve the resiliency of the platform by queueing (without discarding) messages in the event of the failure of a downstream component which is meant to consume logs, but it also allows to decouple log ingestion from log consumption.

Long-Term Storage and Audit Trail

A dedicated and highly restricted Project (here named Logging Project) should also be created for each project/customer for long term (immutable) storage of the logs.

In that Project, a Logstash Agent can be used to pull logs directly from Pub/Sub topics and to store them into a bucket where they will be treated as immutable files. This can be achieved via Bucket Retention Policies and Retention Policy Locks (see “Retention policies and retention policy locks”), to ensure that nobody would be able to delete the objects during a pre-defined retention period.

In addition, a Data Loss Prevention (DLP) solution could be employed to prevent and detect cases of attempted data exfiltration. It should be noted that, to ensure the integrity of the logs stored in such projects, IAM controls should be put in place to limit access to these buckets (see “Access control guide for Cloud Logging”).

Monitoring and Alerting

Finally, a centralized Account/Project (here called Centralized Monitoring Account, and hosted in another cloud provider) can then be used to aggregate logs collected from the different Projects.

In this account, another Logstash Agent will have dedicated subscriptions to pull logs from each Pub/Sub topic defined in every Project and forward them to an ElasticSearch instance used by a Security Operations (i.e., SOC) team to monitor and respond to threats in (near) real time.

As mentioned previously, Cloud Monitoring could also be used to create and manage alerting policies. This way, whenever events trigger conditions in one of the alerting policies, Cloud Monitoring creates and displays an incident in the Monitoring console. Notifications can be setup, so that Cloud Monitoring can send notifications to relevant staff members.

Conclusions

In this blog post, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, I described a possible approach for designing a multi-account security-related logging platform in GCP.

A previous post covered a similar setup for AWS, while a later post will cover Kubernetes instead.

I hope you found this post valuable and interesting, and I’m keen to get feedback on it! If you find the information shared helpful, if something is missing, or if you have ideas on improving it, please let me know on 🐣 Twitter or at 📢 feedback.marcolancini.it.

Thank you! 🙇‍♂️

About

Security Logging in Cloud Environments - GCP

Problem Statement

Which Services Can We Leverage?

Cloud Logging

Cloud Logging Architecture

Audit Logs

Cloud Monitoring

Cloud Identity

Security Command Center

Access Logs

📙 The CloudSec Engineer is out now!

State of the Art Security Logging Platform in GCP

Collection

Delivery

Long-Term Storage and Audit Trail

Monitoring and Alerting

Conclusions

Subscribe to CloudSecList

Marco Lancini

About

The CloudSec Engineer

CloudSec* Projects

Collections

Must Read

Recent Articles

Tags