Published on 01/03/2024 00:01 by Jacob Latonis
100 Days of Yara in 2024: Day 03
Time for some quality of life work in YARAX!
Motivation
If you’ve written rules with a large number or two, I’m sure you’ve had to count the digits in it at one point or another to make sure its the right number.
Some languages, like Rust, allow you to add underscores to numbers to improve readability and clarity like so:
// Use underscores to improve readability!
println!("One million is written as {}", 1_000_000u32);
You can find the documentation about the above here.
There’s not a lot of complex motivation behind this change other than there already being an issue (#14) for it by Victor on GitHub, and it just making things nicer to look at.
Implementation
In YARAX, the parser used for the rules and grammar is the pest
crate. Documentation and such can be found here. There’s also a digital book for learning more on pest
as well.
The current parsing grammar for numbers looks like this:
Current Parsing Grammar
Integers
integer_lit = @{
""? ~ "0x" ~ ASCII_HEX_DIGIT+ 
""? ~ "0o" ~ ASCII_OCT_DIGIT+ 
""? ~ ASCII_DIGIT+ ~ ("KB"  "MB")?
}
Breaking this down, the current grammar looks for an integer as such:
 is it negative or positive:
""?
 after the sign (or lackthereof), is there a hex or octal identifier:
"0x"
or"0o"
 if so, are the digits in the alphabet for the appropriate set:
ASCII_HEX_DIGIT
orASCII_OCT_DIGIT
and are there at least one of them+
 if no octal or hex identifier, is it all ascii digits?
ASCII_DIGIT+
 is there a file size notation at the end
("KB"  "MB")?
Floats
float_lit = @{
""? ~ ASCII_DIGIT+ ~ DOT ~ ASCII_DIGIT+
}
Breaking this down, the current grammar looks for a float as such:
 is it negative or positive:
""?
 is there at least one decimal digit:
ASCII_DIGIT+
 is there then a dot (.):
DOT
 is there then at least one decimal digit again:
ASCII_DIGIT+
Proposed Parsing Grammar to Implement Underscores
Integers
integer_lit = @{
""? ~ "0x" ~ ASCII_HEX_DIGIT+ ~ ("_"  ASCII_HEX_DIGIT)* 
""? ~ "0o" ~ ASCII_OCT_DIGIT+ ~ ("_"  ASCII_OCT_DIGIT)* 
""? ~ ASCII_DIGIT+ ~ ("_"  ASCII_DIGIT)* ~ ("KB"  "MB")?
}
Breaking this down, the proposed grammar looks for an integer as such:
 is it negative or positive:
""?
 after the sign (or lackthereof), is there a hex or octal identifier:
"0x"
or"0o"
 if so, are the digits in the alphabet for the appropriate set:
ASCII_HEX_DIGIT
orASCII_OCT_DIGIT
and are there at least one of them+
 are there any underscores or digits following the first digit?:
("_"  ASCII_HEX_DIGIT)*
or("_"  ASCII_OCT_DIGIT)*
 if no octal or hex identifier, is it** at least one ascii digit**?
ASCII_DIGIT+
 if at least one, are there any underscores or digits following the first digit?:
("_"  ASCII_DIGIT)
 is there a file size notation at the end
("KB"  "MB")?
Floats
float_lit = @{
""? ~ ASCII_DIGIT+ ~ ("_"  ASCII_DIGIT)* ~ DOT ~ ASCII_DIGIT+ ~ ("_"  ASCII_DIGIT)*
}
Breaking this down, the proposed grammar looks for a float as such:

is it negative or positive:
""?

is there at least one decimal digit:
ASCII_DIGIT+

if so, is there an underscore or another decimal digit?:
("_"  ASCII_DIGIT)*

is there then a dot (.):
DOT

is there then at least one decimal digit again:
ASCII_DIGIT+

if so, is there an underscore or another decimal digit?:
("_"  ASCII_DIGIT)*
Checking the Work
I wrote a test rule to ensure the underscores are not included when actually parsing the numbers (decimal numbers, hex numbers, numbers w/ file size, and more!)
rule test {
condition:
2_000 == 2000 and 100KB == 1_00KB and 0o12 == 1_0 and 0x2_1 == 0x21 and 0x31_1 == 7_8_5
}
and it does evaluate to true, indicating YARAX accurately parses the numbers with and without underscores to the same values. :)
Finished Work
As with previous days, there’s a PR for the work:
 #48 on YARAX
Written by Jacob Latonis
← Back to blog