Analyses configuration

The Analyses configuration defines how analyses should be run in TrustInSoft CI, by describing the source files to use for the analysis, the entry point, the compilation options to preprocess the source files, and any other options to configure the TrustInSoft CI Analyzer.

The Analyses configuration file

It is recommended to read first the Configuration files section which explain the differences between a Global configuration and a Committed configuration.

If a Global configuration is used, the Analyses configuration file can be enabled and written in the Project settings page of the project in the Build configuration section:

If a Committed configuration is used, the Analyses configuration file should be written and committed in the .trustinsoft/config.json file for the branch that is going to be analyzed.

With a Committed configuration, it is possible to generate the .trustinsoft/config.json file during the Build preparation stage.

Syntax and basic usage

All examples in this section can be replayed with our demo-caesar repository, used for our Introduction tutorial. Feel free to fork this repository to try the different analysis options.

The Analyses configuration should be written using the JSON ECMA-404 standard, with a syntax extension for comments:

// ignores all characters until the end of line
/* ignores all characters until the next */

The Analyses configuration is a list of analysis configuration objects:

[
  {
    /* First analysis configuration */
  },
  {
    /* Second analysis configuration */
  }
  // And so on...
]

Each analysis configuration should contains at least:

the list of source files to analyze (usually .c or .cpp files)
the compilation options required to preprocess these files

[
  {
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I ."
    // or use "cxx-cpp-extra-args" for C++ source files
  }
]

Do not add a comma, before any closing bracket } or ]. Otherwise it will lead to a syntax error of the JSON format.

cpp-extra-args orcxx-cpp-extra-args can be omitted if the files do not need any particular pre-processing options.

Then, it is also recommended for each analysis configuration to add the following optional information:

a name to clearly identify the analysis in the result table in TrustInSoft CI
the target architecture to use (also called machdep); if omitted, the default one is "gcc_x86_32"(see also the list of Supported architecture)
the function to use ad the entry point of the analysis; if omitted, the default one is the main function

[
  {
    "name": "Test shift values 7 and -3 (gcc_x86_64)",
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I .",
    "machdep": "gcc_x86_64",
    "main": "main"
  }
]

Paths in Analyses configuration

With a Global configuration, all filenames/paths inside an Analyses configuration are relative to the root of the repository.

With a Committed configuration, all filenames/paths inside an Analyses configuration are relative to the directory where the file is, hence relative to .trustinsoft.

[
  {
    "name": "Test shift values 7 and -3",
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I ."
  }
]

.trustinsoft/config.json

[
  {
    "name": "Test shift values 7 and -3",
    "files": [ "../main.c", "../caesar.c" ],
    "cpp-extra-args": "-I .."
  }
]

For the Committed configuration, prefixing all paths by ../can be annoying. To avoid such a task, the option "prefix_path" can be used to prefix all paths by the given value:

.trustinsoft/config.json

[
  {
    "name": "Test shift values 7 and -3",
    "prefix_path": "..",
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I ."
  }
]

For instance, the "prefix_path"option can also be used with a Global configuration if all your sources files are located in a same sub-directory.

Advanced usages

Adding inputs for the entry point function

If the entry point function has type int (int argc, char * argv[]), inputs can be given to the program with the "val-args" option.

The analysis starts with argc bound to k+1 and argv pointing to a NULL-terminated array of pointers to strings program, arg_1, …, arg_k with arg_1, ..., arg_kthe arguments given to "val-args". The first character is used as separator to split the arg_k arguments.

argv[0] is set by default to program. This value can be changed with the "val-program-name" option.

[
  {
    "name": "Test from program inputs",
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I .",
    "main": "main_with_input",
    
    // argc will be set to "3"
    // argv[0] = "a.out"
    // argv[1] = "People of Earth, your attention please"
    // argv[2] = "7"
    "val-program-name": "a.out",
    "val-args": "|People of Earth, your attention please|7"
  }
]

If your entry point function has not the type int (int argc, char * argv[]), then it is not possible to given an input with "val-args". In this case, it is recommended to use a test driver function (a function written only for a test purpose) which directly calls your function with the wanted input and to use the new test driver function as the entry point for the analysis.

If you want to analyze a lot of different inputs, it is recommended to use a test driver function (a function written only for a test purpose) instead. Write this test driver function to call your entry point as many times you want and then set the new entry point of the analysis to this test driver function.

Using compilation databases

If your project uses tools such as CMake or Bear, the generated compilation database JSON file(s) can be used instead of the "cpp-extra-args" and "cxx-cpp-extra-args" options to deduce the preprocessing options to use for the analyzed source files.

First, the compilation database file(s) must be generated during the Build preparation stage:

#!/bin/bash

set -e

# Generate compile_commands.json files with Bear or CMake
bear make

Then, to use these generated compilation database file(s), the "compilation-database" option should be added in your analysis configuration object with the paths to the compilation database file(s):

[
  {
    "name": "Test with a compilation database",
    "files": [ "main.c", "caesar.c" ],
    "compilation-database": [ "compile_commands.json" ]
  }
]

If the "cpp-extra-args" or "cxx-cpp-extra-args" options are given in addition of the "compilation-database", these options are concatenated to the preprocessing command line (used by TrustInSoft Analyzer to parse the source files) after the preprocessing options extracted from the compilation database.

If a directory is given instead of a compilation database file in the "compilation-database" option, the analyzer will scan all compile_commands.json files located in this directory and sub-directories.

Selecting a C++ standard

For C++ programs, it is recommended to explicitly specify which C++ standard to use for the analysis with the "cxx-std" option. If omitted, the default C++ standard used is c++11.

The available C++ standards for TrustInSoft CI are: c++03, c++0x, c++11, c++14, c++17, c++1y, c++1z, c++20, c++2a, c++98, gnu++03, gnu++0x, gnu++11, gnu++14, gnu++17, gnu++1y, gnu++1z, gnu++20, gnu++2a, gnu++98.

Example with our C++ repository example Cxx_matrix:

[
  {
    "name": "Matrix manipulations in C++",
    "files": [ "matrix.cpp" ],
    "compilation_cmd": "-I.",
    "cxx-std": "c++14"
  }
]

Customizing the address alignment

In TrustInSoft CI Analyzer, the base addresses are assumed to be aligned to multiples of 1 by default.

If your analyzed program assumes the addresses to have a different alignment, it can be specified with the "address-alignment" option:

     // Base adresses are assumed to be aligned to multiples of 65536.
     "address-alignment": 65536

Simulating a file system

If the analyzed program uses the file system to do operations on files, the analysis may need to have a virtual file system to be deterministic, otherwise the analysis may be interrupted by a Bad libc callerror.

This virtual file system simulates a list of files available for the analyzed program. This list of files is based on files of the real file system, hence it is recommended to either commit the files needed for the analysis in your GitHub repository or to generate them during the Build preparation stage.

The virtual file system can be used by using the "filesystem" option which contains a list of "files". Each file should indicate its "name"used by the program and its associated file "from" the real file system. The contents of the "name" file during the analysis will be mapped to the one of the real "from" file, allowing a deterministic behavior of functions operating on files (such as fgetc, fread, ...).

The string in "name"needs to be exactly the same one used inside the program to open the file. Otherwise the file will not be correctly found and mapped to the "from" file of the virtual file system.

[
  {
    "name": "Test with file as input",
    "files": [ "caesar.c", "main.c" ],
    "cpp-extra-args": "-I.",
    "main": "main_with_filesystem"
    "filesystem": {
      "files": [
        {
          // Path used for the "fopen" in "main_with_filesystem".
          "name": "/var/demo/caesar/test-suite.txt",
          // Path to the file located in the repository.
          "from": "input.txt"
        },
        {
          // If "from" is omitted, the analyzer assumes this file
          // does not exist. 
          "name": "/var/demo/caesar/test-suite-2.txt"
        } 
      ]
    }
  }
]

Tweaking for performance issues

Some analyses can take too much time or memory according to the limits set by TrustInSoft CI.

About time, an single analysis is stopped after running for 15 minutes, leading to the Timeout error. This limit can be increase up to 3 hours with the "val-timeout" option.

About memory usage, TrustInSoft CI Analyzer keeps all the results of the analysis in its memory. These results are useful to Inspect the analysis with the Graphical User Interface of TrustInSoft CI Analyzer (which allow to see all values of all variables at any program point).

However it is hard to understand and anticipate how much memory will be consumed by the analysis. If an analysis is stopped with the Out of memory error, it is possible to use the "no-results"option to force TrustInSoft CI Analyzer to not keep results of the analysis in its memory. As a side effect, the "no-results" can also slightly make the analysis faster.

With the "no-results"option, the analysis may no longer hit the memory limit. However you will no longer be able to Inspect the result with the Graphical User Interface of TrustInSoft CI Analyzer.

[
  {
    "name": "Test beyond limits",
    "files": [ "main.c", "caesar.c" ],
    "cpp-extra-args": "-I .",
    
    // The value is the number of seconds. 10800 is equal to 3 hours.
    // A value greater than 3 hours will still be capped to 3 hours.
    "val-timeout": 10800,
    
    // If "true", inspecting the analysis afterwards with
    // TrustInSoft CI Analyzer is no longer possible.
    "no-results": true
  }
]

More advanced usages

Options described in this page is only a part of a long list. Most of options not described here are only useful on very specific use cases.

TrustInSoft CI Analyzer shares its options with TrustInSoft Analyzer. So the complete list of options for an analysis configuration can be found on the TrustInSoft Analyzer documentation and is closely related to the TrustInSoft Analyzer command line options.

However, some of these options are not available with TrustInSoft CI Analyzer.

Do not hesitate to contact us if you have trouble to configure your analyses.

PreviousBuild preparation stage NextTips: Switching from a Global configuration to a Committed configuration

Last updated 3 years ago