Utilizing FleetDM for Malicious Package Monitoring
Introduction
In today’s security landscape, malicious packages pose a significant threat to a companies infrastructure. Identifying and mitigating these threats promptly is essential to maintain system health and security. Automation plays a pivotal role in achieving this goal by reducing manual effort and ensuring consistency, while reducing time to detection for malicious packages installed on endpoints.
Understanding the Malicious Package Workflow
Key Components of the Workflow
FleetDM: We utilize GitOps with FleetDM to power our endpoint observability
Datadog Malicious Packages Repo: Datadog publishes a known malicious packages repository which we’ll utilize as our source material.
Configuration Files: YAML files defining policies and queries for malicious package detection.
Python Script: We utilize a script to process and update queries based on a daily check of the manifest files published by Datadog.
GitHub Actions Workflow: Orchestrates the execution of the Python script and ensures updates are committed to the repository.
Automating Query Updates
Role of the Python Script
At the heart of this workflow are our generate_{package_type}_queries.py
scripts. These scripts:
Curl the latest manifest.json file from the Datadog repo for both npm & python packages.
Update query files by incorporating new or modified definitions, removing any that are no longer needed.
Build a remediation script for both NPM & Python packages.
Policy Files and Their Structure
The policy files, such as malicious-packages-*.policies.yml
, define detection mechanisms for identifying malicious packages. These files are structured in YAML format, making them human-readable and easy to maintain. For example:
- name: Malicious-NPM-1
platform: darwin
description: Identifies systems affected by malicious NPM policy 1
query: SELECT 1 WHERE NOT EXISTS (SELECT 1 FROM npm_packages WHERE (name ="package1" AND version IN ("1.0.0", "1.0.1", "1.1.0")) OR (name = "package2" AND version IN ("999.9.9"));
Team Files and Their Structure
Team files help define your team configuration within fleet. Controlling things like what policies run on devices in that team, what scripts are available to that team, and more. An example is below:
agent_options:
path: ../lib/agent-options.yml
controls:
scripts:
- path: ../lib/macOS/scripts/collect-fleetd-logs.sh
name: Workstations
policies:
- path: ../lib/macOS/policies/npm-malicious-packages-1.policies.yml
queries: null
software: null
team_settings:
secrets:
- secret: {Your enrollment secret here}
Automatic Updates
The Python script dynamically updates both these types files by:
Fetching the latest malicious package names & versions, as sourced from Datadog.
Validating the packages included in the policy file match what is in the most recent manifest. Both adding & removing packages and versions.
Adding or removing policy files from the teams.yml file when necessary
Committing changes back to the repository for immediate use.
Trigger Mechanisms
We utilize a GitHub Actions workflow to ensure that updates are executed on a recurring basis. Key features include:
Scheduled Runs: Periodic checks for updates to ensure no malicious packages go unnoticed and we always have the most recent manifest.
Automatic PR Creation: The workflow automatically builds and submits a pull request, allowing a human maintainer (if desired) to review policy changes and either approve or reject. This can also be modified to allow for seamless auto-approval.
Benefits of the Workflow
This workflow offers several advantages:
Accuracy: Ensures malicious package definitions are always up-to-date.
Efficiency: Reduces manual intervention, saving time and effort.
Scalability: Adapts to growing infrastructures and larger datasets.
Consistency: Maintains uniformity across detection mechanisms.
Learnings:
There were a few notable things I learned along the way while building this out -
Fleet Optimization
My initial approach used individual policies for each package, resulting in thousands of policies. This proved unwieldy, so in v2 I consolidated them into larger policies based on package type (npm vs Python). However, I then discovered Fleet's query/policy size limit when I couldn’t fit many thousands of packages into a single query. The solution was to batch approximately 300 packages per policy—striking an optimal balance between readability and performance. Special thanks to Kathy from Fleet for helping me optimize the SQL!
Timing Matters
I scheduled the workflow for 3am ET daily. By chance, I ran it manually at 6am ET one day and noticed a new PR had opened which struck me as odd given they were only 3 hours apart. Upon investigation, I discovered Datadog was updating their definitions around 3:30am ET most consistently. This meant our workflow was consistently running 30 minutes too early, leaving us a full day behind on updates. The small details can matter when running automated workflows, but this is something we’ll look on improving detection for as well.
Conclusion
The malicious package workflow demonstrates the power of automation in managing security threats. By leveraging Fleet—a powerful endpoint observability tool—along with open source data from Datadog, GitHub Actions, and a Python script, we ensure timely and accurate updates to detection policies, enhancing our overall security posture. While this project is still in its early stages, I anticipate many future improvements.