Recently, I started playing with joern after watching Fabian Yamaguchi’s excellent 31c3 talk. This post is intended as a resource for people who want to learn more about it. Simply, joern is a C (and limited C++) parser that stores it’s output in a graph database. Specifically, it stores abstract syntax trees combined with control-flow and data-flow graphs into a single ‘code property graph.’ With this data it’s possible to formulate graph searches to query for vulnerabilities across many common bug classes present in C programs. Fabian chose to use the graph traversal language gremlin and implemented a number of useful program analysis primitives as gremlin steps (analogous to SQL prepared statements) to be able to reason about higher-level program constructs instead of graph structures.
To get started I would recommend watching Fabian’s talk. Additionally, I found the following resources helpful in learning about gremlin and joern:
- On the Nature of Pipes: A simple introduction to the high-level ideas of gremlin
- GremlinDocs: API documentation for built-in gremlin traversals
- python-joern: The joern-specific traversals for reasoning about code property graphs.
- Modeling and Discovering Vulnerabilities with Code Property Graphs: Paper presenting joern by Fabian Yamaguchi, et al. Feel free to skip the mathematical formalism–for those familiar with basic static program analysis, the meat is in sections 1, 5, and 6.
When I initially began using joern I wrote queries against dummy code to experiment. Once I began to look at real code bases I still found it useful to test against isolated examples so I wrote a simple wrapper to ‘unit test’ joern traversals. Since there are few public joern queries, I’ve put this wrapper together with a few queries into the joern-traversals repository to serve as examples for others. There aren’t many queries yet, so I encourage you to contribute any interesting ones you might write (especially if they’ve found real bugs!) Currently it includes a progression of queries that illustrate an experiment I tried evolving from a very simple query to a fairly complex one looking for infinite loop DoS conditions in wireshark. I think it’s a compelling example because it illustrates joern’s utility in searching for bugs with complex control- and data-flow semantics (and finding them!)
Lastly, I’ve added a small joern-console script to joern-tools that you might find helpful in writing traversals.