Introducing YARA-Forge

Florian Roth
7 min readDec 19, 2023

Streamlined Public YARA Rule Collection

In the world of cybersecurity, alongside the traditional sharing of basic Indicators of Compromise (IOCs) like file hashes, filenames, C2 IP addresses, and mutex names, we now also have the advantage of using open signature formats such as YARA, Sigma, and Suricata. These formats enhance our capabilities by allowing for the sharing of threat information in a vendor-neutral way, and they have been a key factor in the development of detection engineering.

The rise of these open formats has led to the creation of many rule repositories. These repositories, set up by a range of people and organizations, are filled with thousands of rules. Some are crafted with great attention to detail, while others are generated by automated processes. There’s also a big difference in the amount of useful metadata that comes with these rules, and since there’s no standard for this metadata, it’s often inconsistent.

Over the last 10 years, I’ve created more than 17,000 YARA rules and have shared a good number of them with the public. My work includes tools like yarGen, a YARA rule generator; Panopticon, a performance analyzer for YARA rules; and yaraQA, a tool for ensuring the quality of YARA rules. While these tools are publicly available, at Nextron Systems, we use a different set of internal tools for our work with YARA rules.

Given the wide variety of YARA rules out there, I saw the need to create a tool that would bring some order to this space. This tool gathers, tests, organizes, and redistributes these rules in a more efficient way, making them more accessible and useful for the cybersecurity community.

Beyond Rule Diversity Challenges

YARA Forge was inspired by a common scenario I’ve observed in the field: people often start by compiling a list of online repositories to gather as many YARA rules as they can. However, this initial enthusiasm often turns to disappointment when they realize that the rules vary greatly in purpose, quality, metadata fields, and their tendency to trigger false positives.

example script I found online

Filtering and baselining can usually handle the false positives from YARA rules, but there’s a bigger issue with some rules using too much resources and slowing things down. It really depends on what you’re doing. If you’re just scanning a few samples or checking memory dumps, the difference between a scan taking 300 milliseconds or 700 milliseconds isn’t a big deal. But if you’re scanning a whole disk image or a file server, it’s a big problem if it takes 7 hours instead of 3 hours to get results. A lot of people think that one bad rule in a set of a thousand won’t make much difference, but that’s not true. Even one bad rule can slow down the whole scanning process a lot.

In the rule sets I manage, we do have some rules that are less performant, like Arnim Rupp’s webshell detection rules in the signature-base repository. But we only keep these rules if A) there’s no way to make them more efficient, B) they find general things that could be threats, and C) they’re about really important and dangerous threats.

The problem with different rule sets isn’t just about how much CPU cycles and RAM they use. Since the metadata fields don’t match up between different rule sets, it’s hard to use them in a reliable and consistent way. Usually, the only thing the same across all rules is the name, and even that’s often not enough. You see rule names like ‘lockbit’, ‘dropper_1’, ‘loader_x64’, or just ‘PE’, and figuring out what threat they’re really about or how old they are can take a lot of work. If the rule isn’t about something general, it might be outdated, and a match with it today might not mean much.

Enter YARA Forge

YARA Forge was born from a desire to impose order on chaos. Initially, I developed two distinct projects without envisioning their eventual convergence into YARA Forge. The first was yaraQA, a tool designed to report non-syntactical issues in YARA rules in a structured way. The second was an unfinished YARA Style Guide. These projects began as standalone efforts.

However, during our Sigma HQ meetings, while discussing the publication of rule set releases based on specific filters, an idea crystallized: Why not integrate every project and guide I’ve created on writing efficient and effective YARA rules? Instead of trying to teach everyone best practices, I decided to build a tool that automatically processes thousands of rules from various sources. The goal was to generate ready-to-use YARA rule sets, streamlining the entire process.

YARA Forge Overview

The home page provides a detailed explanation of the entire process. The final output consists of three distinct rule sets: core, extended, and full.

Users can select from these sets based on their specific needs:

  • Core Set: Contains only rules with high accuracy and low false positive rates, optimized for performance. Ideal for critical environments where stability is key.
  • Extended Set: Expands the Core Set with additional threat hunting rules for a wider coverage, accepting minimal increases in false positives and scan impact. Suitable for balanced security needs.
  • Full Set: Incorporates all functional rules, prioritizing breadth of threat detection. Best for scenarios where extensive coverage outweighs the cost of higher false positives and resource use.

The home page also showcases examples of rules before and after field alignment, along with screenshots of performance measurements and issues.

Example: performance issues
Example: Regex performance measurements
Example: Rule before
Example: Rule after

The more issues a rule has, and the more severe these issues are, the less likely it is to be included in one of our sets. The ‘core’ rule set has the strictest criteria, featuring only high-quality rules with minimal performance impact. Every identified issue within a rule is reported back, allowing the original authors to review and possibly rectify these issues in their repositories.

Example: Output file “yara-forge-rule-issues.yml”
Example: Header of the “core” rule set file ./packages/core/yara-rules-core.yar

An exception to our process is how we handle false positive matches. Currently, rule sets are tested using our internal infrastructure, and we keep a detailed YAML file that records negative quality values for rules that trigger a few or many false positives in these tests. Unlike other checks, this false positive testing is done manually and at irregular intervals.

Extract: yara-forge-custom-scoring.yml

The Release Packages

Github workflows generate weekly release packages. The release notes contain statistics on the rules included in each package. The first table contains statistics on each package and the following tables list statistics for each source repository in each package.

Releases on https://github.com/YARAHQ/yara-forge/releases

The release assets contain the packages as ZIP archive, the build log and a YAML file that lists all discovered issues with each of the rules.

Release assets with rule packages, the build log and issues noticed in the rules

The download links on the YARA Forge project page directly connect to the most recent versions of these rule archives.

Package Download from https://yarahq.github.io/

The Goal

The aim of YARA Forge is to develop user-friendly YARA rule sets sourced from various public repositories. In recent weeks, I have reviewed rules from over 70 YARA rule repositories and carefully chosen 20 for YARA Forge’s initial release. My search for more repositories to include is ongoing.

The process is streamlined: adding a new repository is as simple as updating the configuration file. This approach not only simplifies integration but also significantly extends the reach of smaller repositories. By including their high-quality rules in YARA Forge, these repositories gain immense exposure, contributing to a more diverse and comprehensive rule set that benefits the wider community.

YARA Forge Source Repository Configuration (yara-forge-config.yml)

Acknowledgements and Credits

I want to recognize the significant effort and expertise of the repository owners and rule authors whose original YARA rules are processed through YARA Forge. Your dedication and skill in creating these rules are fundamental to the cybersecurity community. YARA Forge builds upon this foundation, enriching, filtering, and reformatting your work to enhance accessibility and functionality for a broader user base.

Below is the list of repositories included in the initial release of YARA Forge:

  1. ReversingLabs: ReversingLabs YARA Rules
  2. Elastic: Elastic Protections Artifacts
  3. Frank Boldewin: R3c0nst YARA Rules
  4. CAPE Sandbox: CAPEv2 YARA Rules
  5. AirBnB: BinaryAlert
  6. Adam Swanda: DeadBits YARA Rules
  7. DelivrTo: DelivrTo Detections
  8. ESET: ESET Malware IOC
  9. FireEye-RT: Mandiant Red Team Tool Countermeasures
  10. GCTI: Chronicle GCTI
  11. Malpedia: Malpedia Signator Rules
  12. Trellix ARC: Trellix ARC Yara Rules
  13. Arkbird SOLG: Arkbird SOLG DailyIOC
  14. Telekom Security: Telekom Security Malware Analysis
  15. Volexity: Volexity Threat Intel
  16. JPCERT CC: JPCERTCC MalConfScan
  17. Signature Base (my own collection): Signature Base
  18. SecuInfra: Detection Repo
  19. RussianPanda: Yara-Rules
  20. Mike Worth’s repository: Open-Source-YARA-rules

Links

The YARA Forge project page can be found here:
https://yarahq.github.io/

The YARA Forge program code repository:
https://github.com/YARAHQ/yara-forge

The YARA Forge rule package releases:
https://github.com/YARAHQ/yara-forge/releases

--

--