Reading time ~5 minutes
Semgrep for Cloud Security
Semgrep is an emerging static analysis tool which is getting traction within the AppSec community. Its broad support to multiple programming languages, together with the easiness with which is possible to create rules, makes it a powerful tool that can help AppSec teams scaling their efforts into preventing complete classes of vulnerabilities from their codebases.
But what about cloud security? In the era of Infrastructure as Code, where tools like Terraform, CloudFormation, Pulumi (and many others) are used to provision infrastructure from (de-facto) source code, can we apply the same approach to eradicate classes of cloud-related vulnerabilities from a codebase?
I decided to spend part of my weekend experimenting with this, and to get an idea of what Semgrep can provide to cloud/platform security teams.
What is Semgrep?
Before jumping into the details, it is worth explaining what Semgrep actually is. As per their website, Semgrep is:
A fast, open-source, static analysis tool that excels at expressing code standards — without complicated queries — and surfacing bugs early at editor, commit, and CI time.
Precise rules look like the code you’re searching; no more traversing abstract syntax trees or wrestling with regexes.
The Semgrep Registry has 1,000+ rules written by the Semgrep community covering security, correctness, and performance bugs. No need to DIY unless you want to.
At a high level, Semgrep leverages Abstract Syntax Trees (ASTs) to build a model of the code you are analyzing. Unlike other tools based on ASTs, though, Semgrep lowers the entry bar by abstracting away the AST syntax itself.

Out of the box, Semgrep supports mainstream programming languages (e.g., Go, Java, Python, Ruby, Javascript, etc.) and has a library of open source rules ready to be re-used.
Explaining how to use Semgrep is out of scope for this blog post, but the official documentation is really well made, and the online playground is an excellent space where to start playing with it (without having to spend time installing anything).
Subscribe to CloudSecList
Semgrep for Infrastructure as Code
As briefly mentioned earlier, the benefit that Semgrep can bring to AppSec teams is obvious (and if you are still not convinced, I recommend you to watch this this presentation from Clint Gibler).
What I was curious to try was how well the same approach could fit a codebase
made of Terraform (HCL) and YAML files, as those languages are not currently
supported by Semgrep. Hence, I relied on its Generic Pattern Matching
engine.
Terraform
The official semgrep-rules repository already contains a folder dedicated to Terraform.

Within this folder, we can see 7 rules already made open source, mainly focusing on Terragoat scenarios and S3 buckets.
Unencrypted EBS Volumes
Let’s start wrapping our head around it by picking the unencrypted-ebs-volume
rule.
In the repo we can see a sample Terraform file (shown here below):
resource "aws_ebs_volume" "web_host_storage" {
availability_zone = "ap-southeast-2"
encrypted = false
size = 1
# ruleid: unencrypted-ebs-volume
tags = {
Name = "abcd-ebs"
}
}
Quite straightforward, with an aws_ebs_volume
resource declaring an EBS volume
with encryption disabled (as it can bee seen from encrypted = false
).
So what we want to grep
here is for an occurrence of encrypted = false
(or the lack of encrypted = true
), as shown in the
corresponding rule:
rules:
- id: unencrypted-ebs-volume
patterns:
- pattern-either:
- pattern: |
{...}
- pattern-not-inside: |
resource "aws_ebs_volume" "..." {... encrypted=true ...}
- pattern-inside: |
resource "aws_ebs_volume" "..." {...}
languages:
- generic
paths:
include:
- '*.tf'
message: |
An EBS volume is configured without encryption enabled.
severity: WARNING
You can try this rule in the Semgrep playground: https://semgrep.dev/s/ZWrA/.
Open Security Groups
As a second test, I wanted to create my first Semgrep rule to detect
a Security Group open to the world (0.0.0.0/0
), like the one below:
resource "aws_security_group" "allow_tls" {
name = "allow_tls"
description = "Allow TLS inbound traffic"
vpc_id = aws_vpc.main.id
ingress {
description = "TLS from VPC"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.1.0/24", "0.0.0.0/0"]
}
tags = {
Name = "allow_tls"
}
}
What we want to grep
here is any occurrence of 0.0.0.0/0
within an ingress
block:
rules:
- id: open-security-group
patterns:
- pattern-inside: ingress { ... }
- pattern: "0.0.0.0/0"
languages:
- generic
paths:
include:
- '*.tf'
message: |
A security group is allowing inbound traffic from the public internet (0.0.0.0/0).
severity: WARNING

You can try this rule in the Semgrep playground: https://semgrep.dev/s/ne51/.
Of course this is a very basic case, where the offending string (0.0.0.0/0
)
is directly hardcoded within the security group definition. The rule
will have to be extended if we want to take into account cases where
the CIDR can be specified, for example, via variables.
Kubernetes
Next, I wanted to create a rule more focused on Kubernetes (or, more precisely, YAML files).
Let’s take as a sample the case where you might want to enforce all your
Kubernetes Ingresses to be private, removing all the public
ones:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-ingress
annotations:
kubernetes.io/ingress.class: public
spec:
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
service:
name: test
port:
number: 80
In this example, we want to grep
for the kubernetes.io/ingress.class
annotation, and ensure it has the approved value of nginx-internal
:
rules:
- id: public-ingress
patterns:
- pattern: kubernetes.io/ingress.class
- pattern-not-inside: |
kubernetes.io/ingress.class: nginx-internal
languages:
- generic
paths:
include:
- '*.yaml'
message: |
An Ingress has been made public.
severity: WARNING
You can try this rule in the Semgrep playground: https://semgrep.dev/s/ErGE/.
Conclusions
I have to say the extensibility, and simple syntax, of Semgrep are making it very promising for cloud security teams as well. In a few hours, thanks to the official documentation and Playground, I was able to go from absolute 0 to writing my first rules.
The main challenge I can think of at the moment is: how much does Semgrep overlap with OPA Conftest? Although Conftest has been created with cloud resources in mind, and benefits from the sinergies with the rest of the OPA offering (like Gatekeeper), basically everyone in the industry at some point complained about how cumbersome the Rego language is. In my opinion, this could be a defining factor that might help expand the adotpion of Semgrep from platform teams.
I’m quite curious to hear other people’s opinions on this, so please feel free to reach out to me on Twitter.