The Weirdest Bug I Found This Week
At work right now, I’m in the process of building software made by another contractor on the project. The software was originally written for Ubuntu 12.04, and we’re building it in 14.04 and integrating it with all of our ROS-based software. I have a source tarball that the contractor delivered to our client and a somewhat vague build procedure developed by the client. I worked with an engineer from the original contractor two weeks ago to get the software as far as building in my environment, but this week I started trying to actually run code. Let me tell you about the weirdest bug I discovered during this process.
I had the software installed on the vehicle, but one of the core processes kept terminating with a SIGABRT. I ran the program in GDB, and checked out the backtrace–this was far from the only bug I’d run into, and I’d built everything with debug symbols. I looked for the offending line of code in the source, and this is what I found, more or less:
void someFunction()
{
std::map<std::string, std::string> versions = parseVersions(VERSION_STRING);
for (std::map<std::string, std::string>::const_iterator it = version.begin(); it != versions.end(); ++it)
{
std::string module = it->first;
std::string version = it->second;
version.erase(20); /* SIGABRT THROWN HERE */
/* ... */
}
}
Ok, so why does this throw a SIGABRT? Because it’s trying to erase past the end of the string version
, which is apparently less than 20 characters long. Ok, so what’s version
? Well I know it was created by parseVersions()
. So what’s parseVersions()
? I found it a few lines up in the file.
std::map<std::string, std::string> parseVersions(std::string const& version_string)
{
std::map<std::string, std::string> version_map;
std::istringstream iss(version_string);
while(iss)
{
std::string module;
iss >> module;
iss >> version_map[module]
}
return version_map;
}
Ok, so version
and module
are just tokens parsed from VERSION_STRING
, which is apparently a whitespace-delimited string, but where does VERSION_STRING
come from? Here’s where it starts getting weird. Version string comes from a header file that looks something like this:
#DEFINE VERSION_STRING "module_1 fatal: Not a git repository (or any of the parent directories): .git
+module_2 fatal: Not a git repository (or any of the parent directories): .git
+module_3 fatal: Not a git repository (or any of the parent directories): .git
+module_4 fatal: Not a git repository (or any of the parent directories): .git
+module_5 fatal: Not a git repository (or any of the parent directories): .git
+module_6 fatal: Not a git repository (or any of the parent directories): .git
+module_7 fatal: Not a git repository (or any of the parent directories): .git
+"
Oh. Ok then. Now I’m starting to see what went wrong here. Obviously, indexing 20 characters into the word “fatal:” isn’t gonna work. But why is this header file full of git error messages? The source tree I have came from a tarball, not a git repository. And that header file? It’s not in the unbuilt source tree–it’s generated as part of the build process.
This project is built using scons. scons is a built system written in Python, which is a little ridiculous, but they used it, and it works. After digging through the Sconsrcipts for the library, I found a function in the build project build utilities that constructed the version string. In essence, it stepped through every top level directory in the project and ran this Python function:
def getGitVersion(repo):
hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'], cwd=repo)
status = subprocess.check_output(['git', 'status', '--porcelain'], cwd=repo)
if len(status) > 1:
hash = hash + '+'
return hash
In case you’re not git-savvy, here’s what that script is supposed to do:
- Get the git commit hash of the current
HEAD
(The currently-checked-out commit.) - Check whether there are any uncommitted modifications.
- If there are any uncommitted modifications, append a “+” to the commit hash.
As I mentioned above, my copy of the source code didn’t have any git data at all, so instead of doing what it’s supposed to, this function returns the git error message from git rev-parse HEAD
.
Since the generated version string are only ever used to display the software version (as far as I know), I just made a quick and dirty patch to getGitVersion()
to work without a git repository:
def getGitVersion(repo):
try:
hash = subprocess.check_call(['git', 'rev-parse', 'HEAD'], cwd=repo)
except subprocess.CalledProcessError:
return '0000000000000000000000000000000000000000'
status = subprocess.check_output(['git', 'status', '--porcelain'], cwd=repo)
if len(status) > 1:
hash = hash + '+'
return hash
Now if the git subprocess exits with an error code, a 40-character string of zeros is returned. For all intents and purposes, this is a valid SHA1 git hash, which is a 40-character hexadecimal string, but it’s vanishingly unlikely that a hash of zero would occur in a real repository. With that fix, I got the software up and running.
What I Learned
I learned a few lessons here. First, some lessons about the project:
- In spite of requiring source code delivery, the client never built and ran the code as-is.
- The contractor who wrote the code never tested their build scripts on the tarballs they deliver to the client. They might have built the code, but they never ran it.
I also learned a few things about what not to do in my own build scripts and code:
- Never treat a string from an external source as “safe.” This is a given in web development, where anything coming over an external interface is unclean and must be sanitized and checked, but apparently it can also apply to your own header files if you generate them dynamically at build time. This brings me to my next point…
- Avoid dynamically-generated code. Dynanmically generated code can be a good thing. It makes development a lot easier when you don’t have to write the same boilerplate over and over again, but it increases the risk of something like this.
- If you’re going to fail, fail at build time. If you are going to dynamically-generate your code, do it in the safest way possible. Check that any functions you call actually succeed. This whole mess happened because the build system silently accepted the failure of the call to
git
. If the build process couldn’t handle that failure, it should have failed the build then and there. Instead it generated perfectly valid C++ that was nonetheless incorrect.
Disclaimer
All of the source code in this post was written solely by me to illustrate roughly what I encountered. I made a few changes to make it easier to understand and because I’m writing from memory, but the overall logic is the same. I didn’t actually test the snippets above, so if anyone actually tries them, let me know. The original source code is the property of the client (who shall not be named) and the contractor who wrote it (ditto).