build2/compilation-database


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

- Support for Clang compilation database [feature]

Notes
=====

* For static analysis tools it can be important to "see" across projects.
  Though current Clang tools are all single translation unit.

* Incremental analysis seems to be important (i.e., only re-analyse TUs that
  have chnaged).

* Differential analysis: see only new errors compared to some base report (a
  way to live with false positives as well as gradually cleaning up existing
  codebase).

Questions
=========

* Will the database be per-amalgamation?

---

Number of CLang-based C/C++ code analyzing tools (such as clang-check,
clang-format, clang-tidy) require compilation flags to be provided for the
source file being analized. The information can be provided in the form of
Compilation Database (http://clang.llvm.org/docs/JSONCompilationDatabase.html).

The following (referenced by llvm.org) tools generate compile_commands.json
file:

* CMake >= 2.8.5. Command 'cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON' generates
  compile_commands.json among other files.

* Build EAR tool (https://github.com/rizsotto/Bear) creates json during build
  process by intercepting compiler exec*() calls.

* Bunch of wrapper scripts (https://pypi.python.org/pypi/scan-build) apply
  intercepting approach as well. Command 'intercept-build b clean update' being
  run in build2 project directory produces compile_commands.json file.

* The Ninja build system (https://ninja-build.org/manual.html) produce the file
  running a dedicated operation (ninja -t compdb rule1 rule2 ...) against
  specified list of rules (whatever it means).


So how we would like to support that database? Specifically the questions are:

* Which moment it is get generated and updated (if ever should)?

* Where the json file should be placed?

* Which files it should reference?

Seems there is not much sense to have any long-living database being updated
by consecutive build2 runs. The different source files with inconsistent
compilation flags can be present in such a database which makes it pretty much
useless (if not to say harmfull). So probably generating new database by an
explicit request is the right approach.

Probably generating the file as a part of update operation but being
explicitly asked for that would be the most intuitive way. Especially in the
light of build2 ability to change compilation options on the fly based on
command line variables. Meanwhile the resulted file should not dependend on
the fact if files it references were indeed compiled. So running update on
cleaned up and up to date project directory should result in the same database
file [not sure about that: what about incremental re-analysis?]. [That btw not
true about all mentioned interception-based tools which doesn't tell in their
favor.] For a symmetry clean operation could remove the file if requested.

Are we going to provide some integration for actually running the tools? In
other words, one could have configured a build with static analysis tools
which are automatically invoked either after update (we have a notion of
post-operation which we can probably use here) or as a separate operation?
Having static analysis diagnostics as part of normal compilation diagnostics
definitely sounds appealing (thought apparently they can take significantly
longer than compilation plus there false positives). If we go this route the
question is then why do we even need the compilation database? Are these tools
somehow rely on the ability to see all the commands at once or is it just the
least invasive way to retrofit them into existing build systems?

The location of the resulting database file can be specified as a part of that
explicit request.

The resulting database file can reference source files that are prerequisites
of the targets being built. It probably make sense to reduce the set to source
files belonging to the projects of the targets. We could even try to narrow
down further based on the assumption that user most likelly expect only the
files in the current working directory (and below) to be present in the
database. While maybe having some "extra" files is not an issue (as not very
likelly somebody will read them through too often) so project-based filtering
is probably a good compromise.

Implementation-wise we could invent configuration variable(s)
(config.cc.cdbase=<path>) not inventing any new build2 option(s). It could be
convenient to add the variable to the project configuration so the json file
reflects the last update operation and get removed with a clean operation. If
<path> is an existing directory then <path>/compile_commands.json file path is
assumed.

Should be possible to run like 'b config.cc.cdbase=. bpkg/ brep/' so
compile_commands.json common for both projects is created in the current
directory.


Example of generating/using compilation database file:

$ mkdir hello
$ cd hello
$ cat > hello.cpp <<EOF
namespace abc
{
  const int a (10);
}

void
test(int z)
{
  if (z == 0)
    int x (1 / z);
}

int
main()
{
  int b (0);
  b += 1;
  return 0;
}
EOF

$ clang-tidy hello.cpp
LLVM ERROR: Could not auto-detect compilation database for file "hello.cpp"
No compilation database found in /home/karen/projects/hello or any parent directory
json-compilation-database: Error while opening JSON database: No such file or directory

$ cat > CMakeLists.txt <<EOF
cmake_minimum_required(VERSION 2.8.5)
add_executable(hello hello.cpp)
EOF

$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
$ cat compile_commands.json
[
{
  "directory": "/home/karen/projects/hello",
  "command": "/usr/bin/c++      -o CMakeFiles/hello.dir/hello.cpp.o -c /home/karen/projects/hello/hello.cpp",
  "file": "/home/karen/projects/hello/hello.cpp"
}
]

$ clang-tidy hello.cpp
/home/karen/projects/hello/hello.cpp:1:11: warning: namespace not terminated with a closing comment [llvm-namespace-comment]
namespace abc
          ^
/home/karen/projects/hello/hello.cpp:10:9: warning: Value stored to 'x' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
    int x (1 / z);
        ^
/home/karen/projects/hello/hello.cpp:10:9: note: Value stored to 'x' during its initialization is never read
    int x (1 / z);
        ^
/home/karen/projects/hello/hello.cpp:10:14: warning: Division by zero [clang-analyzer-core.DivideZero]
    int x (1 / z);
             ^
/home/karen/projects/hello/hello.cpp:9:7: note: Assuming 'z' is equal to 0
  if (z == 0)
      ^
/home/karen/projects/hello/hello.cpp:9:3: note: Taking true branch
  if (z == 0)
  ^
/home/karen/projects/hello/hello.cpp:10:14: note: Division by zero
    int x (1 / z);
             ^
/home/karen/projects/hello/hello.cpp:17:3: warning: Value stored to 'b' is never read [clang-analyzer-deadcode.DeadStores]
  b += 1;
  ^
/home/karen/projects/hello/hello.cpp:17:3: note: Value stored to 'b' is never read
  b += 1;
  ^