LCOV/GCOV

Long time, I know. I'm not fulfilling my blogging duties lately, and although I've been busy it still not a good enough excuse. One post every three months or something like that... really not acceptable. Anyway, hopefully I can pick up the slack in the near future and make up for the lost time.

Anyway, cutting the chase. Recently I was asked at work to increase the code coverage for a bunch of regression and unit tests. The framework used is based around python's unit test framework, and the actual code to run the test's on is C/C++. Although some additional work had to go to create the python wrappers+bindings using boost::python, this allowed to write fairly convoluted and advanced blackbox modular tests for C/C++ code with the ease and comfort python provides. I'm personally quite comfortable and happy with the CppUnit framework, but I have to admit python allows to code things very quickly.

Prior to this little task I was familiar with gcc's gcov and what a helpful tool it was to evaluate code coverage for unit tests:

% (fraction) of lines executed.
% (fraction) of functions/methods executed.
% (fraction) of classes used.
branching data.
...

What I wasn't quite aware was of the amount of data that was actually stored into gcov's data files (with .gcda extension). This is where lcov enters the game to generate some outstanding and impressive HTML reports for the corresponding code. I will get right back to this. Gcov is real similar to gprof and both work together very well to deliver some very detailed profiling information. They both require their corresponding flags activated at compile-time, -ftest-coverage and -pg respectively, to instrument the code appropriately and generate files necessary to reconstruct all runtime information. In the case of gcov these are the .gcno files which contain information to reconstruct the basic block graphs and assign source line numbers to blocks.

Once you run your tests, the .gcda files are created with the mentioned runtime info. Lcov will the parse everything to generate reports like the following (I am unable to post reports for our reports, but here are some publicy available examples):

The examples shown above are for php code (there a gcov implementation for that), but I can guarantee the reports are pretty much identical for C/C++ code. But it gets better. Lcov is recursive, so if you have a medium/large project with several modules, libraries, which are independent in their own right, as long as they're all under the same source tree lcov will do a great job at finding the results for each and every one of those modules and creating an HTML report tree that will let you navigate easily over them. Well, I actually had to hack a little script for that to be as seamless as it sounds, so here it goes:

#!/bin/bash
#
# this script assumes your project has a source tree structure as follows:
#
#    PROJ/
#          MOD_1/
#          MOD_2/
#          ...
#          MOD_n/</p>
<p>usage() {
        echo "usage: `basename $0` [options]"
        echo -e "tThis script generates gcov coverage reports.n"
        echo -e "Options:"
        echo -e "t-d DIRt: Indicate project dir.nn"
        echo -e "t-bt: generate baseline info (Call before running tests)."
        echo -e "t-tt: generate total info (Call after running tests)."
        echo -e "t-r DIRt: generate HTML reports and place in specified DIR(Call after running tests).nn"
        echo -e "t-ct: clean all gcov files in project dir."
        echo -e "t-qt: Quiet."
        echo -e "t-ht: Show this help message.nn"
}</p>
<p>ERRORS=0
GCDAS=""
GCNOS=""
CLEAN=
QUIET=
BASELINE=
TOTAL=
HTML=
HTMLDIR=
PJDIR=.
while getopts "hcbtqr:d:" OPTION
do
        case $OPTION in
                h)
                        usage
                        exit 1
                        ;;
                c)
                        CLEAN=1
                        ;;
                q)
                        QUIET=1
                        ;;
                d)
                        PJDIR=$OPTARG
                        ;;
                b)
                        BASELINE=1
                        ;;
                t)
                        TOTAL=1
                        ;;
                r)
                        HTML=1
                        HTMLDIR=$OPTARG
                        ;;
                ?)
                        usage
                        exit
                        ;;
        esac
done</p>
<p>if [[ -z $BASELINE ]] && [[ -z $TOTAL ]] && [[ -z $HTML ]] && [[ -z $CLEAN ]];
then
        echo -e "No appropriate option flag passed!n"
        usage
        exit 1
fi</p>
<p>if [[ $BASELINE == 1 ]];
then
        echo "Generating baseline info files..."
        for i in $(find $PJDIR -maxdepth 1 -type d | awk '{ if (NR != 1) { print } }' | sed 's/.///');
        do
#               for j in $(find $PJDIR/$i -type f -name "*.gcno" -exec dirname {} ;);
#               do
#                       GCNOS=" $GCNOS -d $j"
#               done</p>
<p>                if [[ ! -z $QUIET ]];
                then
                        lcov -c -i -d $PJDIR/$i -o ${i}_baseline.info >/dev/null 2>&1
                else
                        lcov -c -i -d $PJDIR/$i -o ${i}_baseline.info
                fi</p>
<p>        done
        echo "You may now run your regression."
        exit 0;
fi</p>
<p>if [[ $TOTAL == 1 ]];
then
        echo "Generating test info and gcov files..."
        for i in $(find $PJDIR -maxdepth 1 -type d | awk '{ if (NR != 1) { print } }' | sed 's/.///');
        do
#               for j in $(find $PJDIR/$i -type f -name "*.gcda" -exec dirname {} ;);
#               do
#                       GCDAS=" $GCDAS -d $j"
#               done
                if [[ ! -z $QUIET ]];
                then
                        lcov -c -d $PJDIR/$i -o $PJDIR/${i}_test.info >/dev/null 2>&1
                        lcov -a $PJDIR/${i}_baseline.info -a $PJDIR/${i}_test.info -o $PJDIR/${i}_total.info >/dev/null 2>&1
                else
                        lcov -c -d $PJDIR/$i -o $PJDIR/${i}_test.info
                        lcov -a $PJDIR/${i}_baseline.info -a $PJDIR/${i}_test.info -o $PJDIR/${i}_total.info
                fi
                        GCDAS=""</p>
<p>                if [[ ! -z $? ]];
                then
                        ERRORS=1
                fi
        done
fi</p>
<p>if [[ $HTML == 1 ]];
then
        echo "Generating html reports..."
        for i in $(find $PJDIR -maxdepth 1 -type d | awk '{ if (NR != 1) { print } }' | sed 's/.///');
        do
                if [[ -e $PJDIR/${i}_total.info ]];
                then
                        mkdir -p $HTMLDIR/${i}
                        if [[ ! -z $QUIET ]];
                        then
                                genhtml -o $HTMLDIR/${i} $PJDIR/${i}_total.info >/dev/null 2>&1
                        else
                                genhtml -o $HTMLDIR/${i} $PJDIR/${i}_total.info
                        fi</p>
<p>                        if [[ ! -z $? ]];
                        then
                                ERRORS=1
                        fi
                fi
        done
fi</p>
<p>if [[ $CLEAN == 1 ]];
then
        echo "Removing info files..."
        rm $PJDIR/*_baseline.info
        rm $PJDIR/*_test.info
        rm $PJDIR/*_total.info
        echo "Removing gcda files..."
        find $PJDIR -type f -name "*.gcda" -exec rm -f {} ;
fi</p>
<p>if [[ ! -z $ERRORS ]];
then
        echo -e "&#92;&#48;33[1mWARNING&#92;&#48;33[0m:tScript completed succesfully but there were non-critical errors."
        if [[ $QUIET == 1 ]];
        then
                echo -e "ttconsider running the script in verbose (non-quiet) mode."
        fi
fi

A quick note, for gcov to be able to correlate the execution data, the .gcno and .gcda files should contain the same timestamp tag. This basically means that both files should be generated by objects (binaries) from the same build run. This makes perfect sense, if the gcno and gcda's are not for the same source, then they simply cannot be used together. The problem is that when you have a modular project, you can have partial builds where you rebuild a single module. Now, if that source references objects/functions/methods/... from any of the other modules that were not recompiled you will start getting some bogus results. The LCOV reports will get generated, but they'll report a whole lot of 0% coverage stats. The cause is differing build runs for the modules involved. Everything should be fine if you recompile the entire project (all modules) in the same build run.

Other than the standard GCC Gcov and LCOV documentation I highly recommend this article.

You can also find this script (and others) in my github scripts repo.

Written on May 28, 2012