First experience with clang analyzer

Recently I’ve tried to use static analysis to find kernel bugs. I haven’t discovered any notable high-severity bugs, but I learned a fair amount about clang analyzer. I turned to clang analyzer after hitting the limits of what I could do with joern/octopus. Though I really enjoy joern and expressing analyses with graph queries, I didn’t want to implement basic analyses like alias analysis over a high-level language representation. The Checker Development Manual gives an overview of how easy it is to develop a simple checker for clang analyzer. Currently clang analyzer supports inter-procedural analysis, but only within a single C file; however, there is a prototype to allow cross-file analysis being presented at the upcoming EuroLLVM developers meeting.

I decided to look for uninitialized memory disclosures as a starter project because it’s a low-severity bug class that I presumed would be easy to find and verify. I’ve uploaded results for FreeBSD, Linux, and XNU with false positives crossed out. I didn’t include Android results as they are very similar to the Linux results. Source code and build instructions are available here. On FreeBSD there are many useful findings; however, on Linux and XNU there are only a few results and all of them are uninteresting. Because the kernels are so large, most of the results end up only being reachable in uncommon hardware/software configurations or from privileged contexts. The checker results were a good low bar for quality control, but fell short for Linux and XNU since they have already been combed through for simple bugs.

Next I tried to extend the taint checker to find integer overflows. This has been more difficult because it exercised more of clang analyzer and exposed some missing functionality. In particular, I experienced issues stemming from the fact that clang analyzer’s constraint manager is quite simple and doesn’t support reasoning about bitwise operations and integer truncation/extension. This manifests in false positives and false negatives, including in ways that took me a while to track down. For example, a heuristic to eliminate false positives does not fire because the analyzer was never able to inline a given routine during inter-procedural analysis. It failed to inline because a loop in the routine used bitwise logic and the analyzer couldn’t determine a loop termination point. I had to hack around the constraint manager to keep false positives to an acceptable level so the source code is uglier than I would like; however, I’m upstreaming the useful changes that I could extract. The constraint manager limitations may not be an issue for much longer: Dominic Chen is landing changes to add a Z3 constraint manager! The results didn’t uncover anything notable so I’ve not uploaded them, but you can find some Linux patches slowly trickling into mainline here.

Another notable limitation of clang analyzer is that you can’t manually direct analysis. For example, to find use-after-frees you might want to find different internal interfaces to symbolically ‘fuzz’ by trying different combinations of user-controllable function calls to see if they might lead to a UaF. Clang analyzer doesn’t support that out of the box. Understanding the shortcomings of the analyzer is important in choosing what bug classes it’s suitable for finding.

Lastly, I wanted to plug Artem Dergachev’s awesome Clang Analyzer Guide. It lays out the abstractions and interfaces that the analyzer exposes far more clearly than other resources I found, including the doxygen comments.

Published on 27 Mar 2017