When you’re running several applications, all growing, all evolving, and all built by different teams, keeping observability under control becomes a real challenge. Without clear standards, things can get messy quickly. Are the monitors you have actually doing their job? Are there critical ones missing? And how many monitors are silently aging in the background, no longer useful but still lingering in your setup?

It’s a challenge our small 7-person team faces daily, managing over 3,600 hosts and more than 6,000 monitors.

In this article, we walk through a flow integrating ServiceNow, Terraform, and GitHub Actions that reduced manual monitor creation while maintaining governance and quality.

1. ServiceNow: Request for Monitors

The first step is to get aware of new monitoring needs. Here, we make developers and tech leads accountable for keeping observability healthy. While requests come from them, our team is responsible for maintaining good practices and creating or removing monitors.

For some time, this process was manual. Development teams filled out a form, our team evaluated it, and then we created the necessary monitors. That had been working, but it could be improved.

We use ServiceNow to manage and automate our business processes. It allows us to create a form and use it as a trigger to start our automation. For that to work, we need to feed it with high-quality inputs, meaning the form must contain well-tailored fields.

Responsible team and priority. They were good starting points. Since we knew query-alerts **represented 60% of user requests, we began with those, focusing on CPU and disk usage. So the next fields became: service, environment, metric (CPU or disk), thresholds (for warning and critical alerts) and interval for data aggregation.

Datadog monitor request form showing CPU thresholds and interval configuration fields
Example of Datadog monitor configuration with warning and critical thresholds and a five-minute interval

2. Terraform: A Bridge to Datadog

If Terraform is new to you, the main idea is that it lets you manage resources as “infrastructure as code.” In practical terms, it means that with a single manifest file, you can ask Terraform to create, update, or destroy infrastructure assets, such as containers, serverless functions, or monitors.

There is no need for a deep dive into Terraform, though. It is enough to know we need to declare a resource and its configuration in a .tf file. Datadog already provides the resource for us; we just need to specify how we are going to use it. It is called datadog_monitor, and you can read all configuration options in the Datadog documentation.

Based on that, we created a function called create_monitor like this:

def create_monitor(args):
    monitor_template = '''
    module "{{ module_name }}" {
      source = "../../modules/terraform-datadog-monitors"

      datadog_monitors = {
        {{ monitor_name }} = {
          name                    = "{{ name }}"
          type                    = "{{ monitor_type }}"
          query                   = "{{ query }}"
          env                     = "{{ env }}"
          service                 = "{{ service }}"
          business_service        = "{{ business_service }}"
          group                   = "{{ group }}"
          priority                = "{{ priority }}"
          monitor_tags            = {{ monitor_tags }}
          thresholds = {
            critical = {{ thresholds.critical }}
            {% if thresholds.warning is defined and thresholds.warning is number %}
            warning = {{ thresholds.warning }}
            {% endif %}
          }
          notification_channel    = " {{ channel }} "
        }
      }
    }
    '''
    return rendered_template

However, we still need to replace its variables with the input we receive from ServiceNow. And, of course, apply the manifest.

ServiceNow allows us to send a request to GitHub to trigger a predefined workflow. That happens when a request for a monitor is approved by the team’s tech lead. The workflow needs to:

  1. Create a separate directory where all generated files will be placed. This ensures concurrent drafts don’t affect each other.
  2. Generate the main.tf file based on query-alert.tf template, rewriting what is needed accordingly to service, environment, metric, interval e thresholds values received from ServiceNow.
  3. Run terraform init and terraform apply in the newly created directory.

We wrote a Python script that performs all three steps, and the GitHub Action executes it. For clarity, here is a simplified example of the workflow we use:

name: Generate Monitor by Service
run-name: 
Generate ${{ github.event.client_payload.params.service }} /
${{ github.event.client_payload.params.monitor_name }}

on:
  repository_dispatch:
    types:
      - generate-monitor

concurrency:
  group: main

jobs:
  run-python-script:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repository
      uses: actions/checkout@v3
      # Clones the current repository to provide access to the codebase

    - name: Create AWS credentials directory
      run:
      # Creates the AWS credentials directory structure if it doesn't exist

    - name: Configure AWS credentials
      run:        
      # Creates AWS credentials file using secrets stored in GitHub repository settings
      # This enables the workflow to authenticate with AWS services

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: 1.7.3
      # Installs Terraform CLI tool for infrastructure as code management

    - name: Setup terraform keys
      run:
      # Creates Terraform variables file with Datadog API credentials
      # This file is used by Terraform to authenticate with Datadog

    - name: Set up Python environment
      uses: actions/setup-python@v4
      with:
        python-version: '3.x'
      # Installs Python runtime environment

    - name: Install Python dependencies
      run:
      # Installs required Python packages:
      # - jinja2: Template engine for configuration generation
      # - datadog-api-client: Official Datadog API client
      # - requests: HTTP library for API calls

    - name: Set up Git
      run:
      # Configures Git identity for automated commits

    - name: Run main.py
      env:        
        # Additional environment variables passed from the triggering event        
	      # Sets environment variables including:
	      # - Authentication tokens (GitHub, ServiceNow)
	      # - Monitor configuration parameters from the triggering event

      run:
      # Executes the main Python script with all provided parameters
      # This script typically:
      # 1. Validates input parameters
      # 2. Generates monitor configuration using templates
      # 3. Optionally creates/updates monitors in Datadog
      # 4. Updates ServiceNow tickets if provided
      # 5. Commits configuration changes back to repository

Conclusion

So in the end, our complete workflow comes together like this:

Diagram showing the automation workflow connecting ServiceNow, GitHub Actions, Terraform, and Datadog for automated monitor creation

Even though creating a monitor on Datadog is an easy task, a business environment requires following good standards to achieve broad observability at the same it keeps costs under control. With this automation and a smart use of tags, we were able to delegate monitor creation without losing ownership and governance.

Indeed, this solution doesn’t cover every use case, but it still reduced manual ticket creation by 80%, since it handles the most common scenarios. When you’re dealing with more than 3,800 services and 1,800 active users, this kind of setup becomes essential. Smaller teams can usually get by without it.

We still receive requests for very specific monitors, but now they are treated as they should be: exceptions.

What We Gained from Automating Monitor Creation in Datadog

This approach brought several meaningful improvements to the way we manage monitoring:

  • Reduction of manual monitor creations
  • Good standards are guaranteed across all alerts
  • Responsibility is delegated to development teams, with tech leads’ validation
  • Tracking of all created monitors and their impact on costs

What We Learned While Automating Our Observability Workflow

Through this project, we learned that automating observability is about building a workflow that teams can trust and adopt easily. These principles made the implementation successful:

  • Start with the most common cases: we covered 60% of demands with only two metrics: CPU and disk.
  • Leave room for exceptions: some manual creations are still needed, and that’s fine.
  • Engage teams from the beginning: we had a great adoption because developers helped design the form.
  • Don’t try to automate everything: focus on the 80/20 rule for fast and meaningful results.

Take Your Monitoring Strategy One Step Further

If you’re looking to go beyond automated monitor creation, our team also helps companies build a complete SRE and observability foundation. We support organizations in gaining full visibility across their cloud environments though log centralization, performance dashboards, alerting strategy, and cost optimization,

Our SRE & Observability solution is built on Datadog and designed for teams that want to improve reliability, reduce incident response time, and keep cloud costs under control.

Share
Insights

Access related expert insights

Case Studies
Case Studies
11 Dec 2025
Key Challenges & Context: Rising Azure Costs, Low Security, and No Governance Our client is a certified Belgian inspection and consultancy company specializing in energy certifications, utility inspections, and real estate compliance. Known for combining multiple services in a single visit, like EPCs, electricity inspections, and 360° building photography, the company has earned a strong […]
How We Helped a Belgian Inspection Leader Reduce Azure Costs by 40% in Under 60 Days
Expert Articles
Expert Articles
08 Dec 2025
AI Is Just One Part of the Equation Now  Enterprises are running more AI pilots than ever, but only a fraction reach production. Agentic AI is driving the next wave of business process automation, because it allows systems to understand objectives, plan tasks, connect to enterprise tools, and act autonomously. Adoption is rising quickly, with […]
Agentic AI: From Hype to Real Production – A Practical Path for Modern Enterprises 
Case Studies
Case Studies
04 Dec 2025
CBTW helped a financial data specialist replace an outdated life-insurance app with a scalable Mendix platform. Starting with a functional MVP, the team accelerated delivery, improved the user experience for investigators, and laid the technical foundation to consolidate future business applications. The result: faster case management, reusable architecture, and a solution aligned with operational needs and long-term goals.
Transforming a Life Insurance Beneficiary Investigation Platform with Mendix: From MVP to Scalable Platform