Unit-testing the Splunk Processing Language
In my previous post Githubify the SOC, I declared my undying love for continuous integration and deployment capabilities applied to Detection Engineering. Now, let’s put the theory into practice! And maybe the best in class to inspire is Microsoft Azure Sentinel.
That’s some kind of serious CI/CD practices! How could we apply the same thing for our on-prem deployment?
In a “classic” software shop, developers rely on two levels of testing:
- Unit-tests, usually achieved in a few seconds. Coupled with basic tests for immediate feedback (similarly to the checks done by your IDE: syntax checks, undefined functions, etc.)
- Integration tests for more thorough scenarios, taking a few minutes to complete
In the context of Detection Engineering (i.e. Writing detection rules for a SIEM):
- Unit-tests would be checking that:
- The syntax is valid
- We are not using fields that do not exist
- The styling guideline is respected
- There is no performance trap
- While integration tests would check that:
- We are correctly alerting for a True Positive
- False-Positives are under-control
- We are not adding a hit to the platform performance
- We are paying attention to delayed events
Today, this post will address only the unit-testing’s part, applied to Splunk.
Step 1, parsing Splunk’s Search Processing Language
Splunk’s Search Processing Language (aka SPL) is a very powerful and expressive query language. It is a very pleasant language to use, especially when you come from Elastic Query DSL.
The language is very permissive, almost everything is optional: fields separator, usage of quotes, no types (string or integer, nobody cares), positions of arguments, etc.
Initially, in early 2020, I expected to have a lot of robust SPL parsers available in the opensource but… nope (well there is a caveat here, follow me).
I found two projects:
Here I am trying salspaugh/splparser, developed in 2013, and to add confidence, its README states up-front “It is capable of parsing 66 of the most common approximately 132 SPL commands”.
Of course, its Python distribution is broken (quite expected for a project with its last commit 4 years ago) but quickly fixed, I ran it on our dataset of queries and it failed on the first query because of one unsupported function. I had 0️⃣ knowledge of PLY/LEX/YACC and adding a SPL command looked abyssal to me.
I rage-quitted, thinking it was a 💩 project, and moved one. BIG MISTAKE retrospectively but 🤷♂️ sorry @salspaugh to have doubted you. More on that later.
splunk_antlr_spl implements SPL using ANTLR4, it was clearly incomplete but the experience to modify the ANTLR4 language was soooo nice that I could quickly hack it for my needs and kept trying while reading The Definitive ANTLR 4 Reference.
Eventually, I had to butcher most of the code to support enough SPL commands to parse our complete dataset. This fork lives in https://github.com/airbus-cert/splunk_antlr_spl
It works mostly fine, but it has one big problem: it is unbearingly slow. This is not surprising as I am a total n00b in ANTL4 (or even in the parsing field).
For example, parsing 338 rules takes 20 minutes. (Update: While I was writing those lines, and because I could not accept releasing such crappy tool, I optimized my ANTLR4 syntax to make it faster.)
So it was not an option to have such slow tests in our CI/CD. This post is also an opportunity for me to do some kind of introspection and see if it was worth doing it, I was curious to see what were the commands missing from salspaugh/splparser to be used in our dataset and… Only three tiny commands are missing 😢
On the other hand, the learning curve of ANTLR4 is so smooth that I had my first version in less than 5 days, and I wonder how long it would have taken me to learn Lex, Yacc, its PLY integration, and the time to implement these 3 commands and create a PR to salspaugh/splparser. 🤷
When there is no perfect solution satisfying all constraints, it is time to workaround with hackish solutions.
And the grossest, but quickest, way to do some basic checks is to use regular expressions all the way around. As an example, I shared an example of our setup on Twitter:
Thanks, we added this unit test to our CI/CD and... it was much needed indeed 😅 pic.twitter.com/4dI2rTebLV— Nicolas Bareil (@nbareil) February 23, 2021
As a take away, here is an extract from our code base: test_statically_spl.py
This Twitter discussion re-ignited my desire to level up our SPL parsing and I recently discovered a new project, kotlaluk/spl-parser.
This time, the project relies on an official Splunk feature,
splunk btool can generate the search and datatypes BNF, no need to reinvent the wheel in fact!. Such epiphany!
Thanks to Lukáš’s finding, there may be a way to leverage the Splunk’s EBNF… 🤔
Step 2: Now what?
Once we will have a parsing engine, we need to take a step back and write a high-level library that will abstract the details of the low-level parsing and expose only the “big picture”.