Semgrep is an emerging static analysis tool which is getting traction within the AppSec community. Its broad support to multiple programming languages, together with the easiness with which is possible to create rules, makes it a powerful tool that can help AppSec teams scaling their efforts into preventing complete classes of vulnerabilities from their codebases.

But what about cloud security? In the era of Infrastructure as Code, where tools like Terraform, CloudFormation, Pulumi (and many others) are used to provision infrastructure from (de-facto) source code, can we apply the same approach to eradicate classes of cloud-related vulnerabilities from a codebase?

I decided to spend part of my weekend experimenting with this, and to get an idea of what Semgrep can provide to cloud/platform security teams.

What is Semgrep?

Before jumping into the details, it is worth explaining what Semgrep actually is. As per their website, Semgrep is:

A fast, open-source, static analysis tool that excels at expressing code standards — without complicated queries — and surfacing bugs early at editor, commit, and CI time.

Precise rules look like the code you’re searching; no more traversing abstract syntax trees or wrestling with regexes.

The Semgrep Registry has 1,000+ rules written by the Semgrep community covering security, correctness, and performance bugs. No need to DIY unless you want to.

At a high level, Semgrep leverages Abstract Syntax Trees (ASTs) to build a model of the code you are analyzing. Unlike other tools based on ASTs, though, Semgrep lowers the entry bar by abstracting away the AST syntax itself.

Code as ASTs
Code as ASTs. Image courtesy of Clint Gibler.

Out of the box, Semgrep supports mainstream programming languages (e.g., Go, Java, Python, Ruby, Javascript, etc.) and has a library of open source rules ready to be re-used.

Explaining how to use Semgrep is out of scope for this blog post, but the official documentation is really well made, and the online playground is an excellent space where to start playing with it (without having to spend time installing anything).

Subscribe to CloudSecList

If you found this article interesting, you can join thousands of security professionals getting curated security-related news focused on the cloud native landscape by subscribing to

Semgrep for Infrastructure as Code

As briefly mentioned earlier, the benefit that Semgrep can bring to AppSec teams is obvious (and if you are still not convinced, I recommend you to watch this this presentation from Clint Gibler).

What I was curious to try was how well the same approach could fit a codebase made of Terraform (HCL) and YAML files, as those languages are not currently supported by Semgrep. Hence, I relied on its Generic Pattern Matching engine.


The official semgrep-rules repository already contains a folder dedicated to Terraform.

Open source Terraform rules
Open source Terraform rules.

Within this folder, we can see 7 rules already made open source, mainly focusing on Terragoat scenarios and S3 buckets.

Unencrypted EBS Volumes

Let’s start wrapping our head around it by picking the unencrypted-ebs-volume rule. In the repo we can see a sample Terraform file (shown here below):

resource "aws_ebs_volume" "web_host_storage" {
  availability_zone = "ap-southeast-2"
  encrypted         = false
  size = 1
  # ruleid: unencrypted-ebs-volume
  tags = {
    Name = "abcd-ebs"

Quite straightforward, with an aws_ebs_volume resource declaring an EBS volume with encryption disabled (as it can bee seen from encrypted = false).

So what we want to grep here is for an occurrence of encrypted = false (or the lack of encrypted = true), as shown in the corresponding rule:

- id: unencrypted-ebs-volume
    - pattern-either:
      - pattern: |
    - pattern-not-inside: |
        resource "aws_ebs_volume" "..." {... encrypted=true ...}
    - pattern-inside: |
        resource "aws_ebs_volume" "..." {...}
    - generic
    - '*.tf'
  message: |
    An EBS volume is configured without encryption enabled.
  severity: WARNING

You can try this rule in the Semgrep playground:

Open Security Groups

As a second test, I wanted to create my first Semgrep rule to detect a Security Group open to the world (, like the one below:

resource "aws_security_group" "allow_tls" {
  name        = "allow_tls"
  description = "Allow TLS inbound traffic"
  vpc_id      =

  ingress {
    description = "TLS from VPC"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["", ""]

  tags = {
    Name = "allow_tls"

What we want to grep here is any occurrence of within an ingress block:

- id: open-security-group
    - pattern-inside: ingress { ... }
    - pattern: ""
    - generic
    - '*.tf'
  message: |
    A security group is allowing inbound traffic from the public internet (
  severity: WARNING
open-security-group rule
open-security-group rule.

You can try this rule in the Semgrep playground:

Of course this is a very basic case, where the offending string ( is directly hardcoded within the security group definition. The rule will have to be extended if we want to take into account cases where the CIDR can be specified, for example, via variables.


Next, I wanted to create a rule more focused on Kubernetes (or, more precisely, YAML files).

Let’s take as a sample the case where you might want to enforce all your Kubernetes Ingresses to be private, removing all the public ones:

apiVersion: extensions/v1beta1
kind: Ingress
  name: test-ingress
  annotations: public
  - http:
      - path: /testpath
        pathType: Prefix
            name: test
              number: 80

In this example, we want to grep for the annotation, and ensure it has the approved value of nginx-internal:

- id: public-ingress
    - pattern:
    - pattern-not-inside: | nginx-internal
    - generic
    - '*.yaml'
  message: |
    An Ingress has been made public.
  severity: WARNING

You can try this rule in the Semgrep playground:


I have to say the extensibility, and simple syntax, of Semgrep are making it very promising for cloud security teams as well. In a few hours, thanks to the official documentation and Playground, I was able to go from absolute 0 to writing my first rules.

The main challenge I can think of at the moment is: how much does Semgrep overlap with OPA Conftest? Although Conftest has been created with cloud resources in mind, and benefits from the sinergies with the rest of the OPA offering (like Gatekeeper), basically everyone in the industry at some point complained about how cumbersome the Rego language is. In my opinion, this could be a defining factor that might help expand the adotpion of Semgrep from platform teams.

I’m quite curious to hear other people’s opinions on this, so please feel free to reach out to me on Twitter.