// file : doc/testscript.cli // copyright : Copyright (c) 2014-2016 Code Synthesis Ltd // license : MIT; see accompanying LICENSE file "\name=build2-testscript-language" "\subject=Testscript language" "\title=Testscript Language" // NOTES // // - Maximum
 line is 70 characters.
//

// @@ Testscript vs testscript
//

"
\h1#intro|Introduction|

\h1#integration|Build System Integration|

The \c{build2} \c{test} module provides the ability to run an executable
target as a test, optionally passing options and arguments, providing
\c{stdin} input, as well as comparing the \c{stdout} output to the expected
result. For example:

\
exe{xml-parser}: test.options = --strict
exe{xml-parser}: test.input = test.xml
exe{xml-parser}: test.output = test.out
\

This works well for simple, single-run tests. In contrast the testscript
approach allows you to perform multiple test runs of potentially multi-command
(compound) tests that can perform setup/teardown actions. It also provides
concise mechanisms for commonly used test steps such as supplying input
as well as comparing output and exit status.

The integration of testscripts into buildfiles is done using the standard
\i{target-prerequisite} mechanism. In this sense, a testscript is a
prerequisite that describes how to test the target similar to how, for
example, the \c{INSTALL} file describes how to install it. For example:

\
exe{xml-parser}: test{testscript} doc{INSTALL README}
\

By convention the testscript file should be either called \c{testscript} if
you only have one or have the \c{.test} extension, for example,
\c{basics.test}. The \c{test} modules registers the \c{test{\}} target type
for testscript files.

A testscript prerequisite can be specified for any target. For example, if
our directory contains a bunch of shell scripts that we want to test together,
then it makes sense to specify the testscript prerequisite for the directory
target:

\
./: test{basics}
\

During variable lookup if a variable is not found in a testscript, then its
search continues in the buildfile starting from the testscript target. This
means a testscript can \"see\" all the existing buildfile variables and
we can use target-specific variables to pass additional information, for
example:

\
# testscript

.if ($cxx.target.class == windows)
  foo = $bar
\

\
# buildfile

test{testscript}@./: bar = baz
\

Additionally, a number of \c{test.*} variables are reused to pass specific
information to testscripts. Unless set manually as a testscript
target-specific variable, the \c{test} variable is automatically set to the
target path being tested. For example, given this \c{buildfile}:

\
exe{xml-parser}: test{testscript}
\

The value of \c{test} inside the testscript will be the absolute path to the
\c{xml-parser} executable.

The other two special variables are \c{test.options} and \c{test.arguments}.
You can use them to pass additional options/arguments to your test scripts
and together with \c{test} they form the test target command line which is
bound to a number of read-only variable aliases:

\
$* - the complete {$test $test.options $test.arguments} command line
$0 - $test
$N - (N-1)-th element in the {$test.options $test.arguments} array
\

Note that these aliases are read-only; if you need to modify any of the
values then you should use the original variable names, for example:

\
test.options += --strict

$* <\"not xml\" != 0
\

A testscript would normally contain multiple tests and sometimes it is
desirable to only run a specific test or a group of tests. For example, you
may be debugging a failing tests and would like to re-run it. Each test and
test group in a testscript has an id. As a result each test has an \i{id path}
that uniquely identifies it. The id path starts with the testscript file name
(corresponds to the id of the implied outermost test group, as described
below), may include a number of intermediate test group ids, and ends with the
test id. The ids in a path are separated with a forward slash (\c{/}). Note
that this also happens to be the filesystem path to the temporary directory
where the test is executed (again, as discussed below). As an example,
consider the following testscript file called \c{basics.test}:

\
$* foo ; foo

: fox
{{
   $* fox bar ; bar
   $* fox baz ; baz
}}
\

The id paths for the three test will then be:

\
basics/foo
basics/fox/bar
basics/fox/baz
\

To only run individual tests, test groups, or testscript files we can specify
their id paths in the \c{config.test} variable, for example:

\
$ b test config.test=basics     # Run all tests in basics.test.
$ b test config.test=basics/fox # Run bar and baz.
$ b test config.test=basics/foo # Run foo.
$ b test \"config.test=basics/foo basics/fox/bar\" # Run fox and bar.
\

\h1#lexical|Lexical Structure|

Testscript is a line-oriented language with a context-dependent lexical
structure. It \"borrows\" several building blocks (for example, variable
expansion) from the Buildfile language. In a sense, Testscript is a
specialized (for testing) continuation of Buildfile.

Blank lines are ignored except for the line count.

The backslash (\c{\\}) character followed by a newline signals the line
continuation. Both this character and the newline are removed (note: not
replaced with a whitespace) and the following line is read as if it was part
of the first line. Note that \c{'\\'} followed by EOF is invalid. For example:

\
$* foo | \
$* bar
\

An unquoted and unescaped \c{'#'} character starts a comments; everything from
this character until the end of line is ignored. For example:

\
# Setup foo.
$* foo

$* bar # Setup bar.
\

Note that there is no line continuation in comments; the trailing \c{'\\'} is
ignored except in one case: if the comment is just \c{'#\\'} followed by the
newline, then it starts a multi-line comment that spans until the closing
\c{'#\\'} comment is encountered. For example:

\
#\
$* foo
$* bar
#\
\

Similar to Buildfile, the Testscript language supports two types of quoting:
single (\c{'}) and double (\c{\"}). Both can span multiple lines.

The single-quoted string does not recognize any escape sequences (not even for
the single quote itself or line continuations) with all the characters taken
literally until the closing single quote is encountered.

The double-quoted string recognizes escape sequences (including line
continuations) as well as expansions of variables and evaluations of contexts.
For example:

\
foo = FOO
bar = \"$foo ($foo == FOO)\" # 'FOO true'
\

Characters that have special syntactic meaning (for example \c{'$'}) can be
escaped with a backslash (\c{\\}) to preserve their literal meaning (to
specify literal backslash you need to escape it as well). For example:

\
foo = \$foo\\bar # '$foo\bar'
\

Note that quoting could often be a more readable way to achieve the same
result, for example:

\
foo = '$foo\bar'
\

Inside double-quoted strings only the \c{[\"\\$(]} character set needs to be
escaped.

A character is said to be \i{unquoted} and \i{unescaped} if it is not escaped
and is not part of a quoted string. A token is said to be unquoted and
unescaped if all its characters are unquoted and unescaped.

The lexical structure of the remainder of a line (that is, the \i{context}) is
determined by the leading (unquoted and unescaped) character after ignoring
any (unquoted and unescaped) leading whitespaces. The following characters are
context-introducing.

\
':'  - description line
'.'  - directive line
'{'  - block start
'}'  - block end
'+'  - setup command line
'-'  - teardown command line
\

For the here-document lines the context is implied by the preceding line. If
none of the above determinants apply, then the line is either a variable
assignment or a test command line. Distinguishing between the two is performed
during parsing and is described below.


\h1#grammar|Grammar and Semantics|

\h#grammar-notation|Notation|

The formal grammar of the Testscript language is specified using an EBNF-like
notation with the following elements:

\
foo: ...   - production rule
foo        - non-terminal
      - terminal
'foo'      - literal
foo*       - zero or more
foo+       - one or more
foo?       - zero or one
foo bar    - concatenation (foo then bar)
foo | bar  - alternation   (foo or bar)
(foo bar)  - grouping
{foo bar}  - concatenation in any order (foo then bar or bar then foo)
foo \
bar        - line continuation
\

Rule right-hand-sides that start on a new line describe the line-level syntax
and ones that start on the same line describes the syntax inside the line. For
example, from the following two rules, the first describes a single line of
text (e.g., \c{'foofoofoo'}) while the second \- multiple lines (e.g.,
\c{'foo\\nfoo\\nfoo'}):

\
text-line: 'foo'+

text-lines:
  'foo'+
\

Lines are separated with the standard sequence of newline separators (CR/LF
combinations) and components within lines \- with the standard sequence of
non-newline whitespaces (spaces and tabs). Note that in some cases components
within lines are not whitespace-separated in which case they will be written
without a space between them, for example:

\
foo: 'foo'bar

bar: fox''baz
\

You may also notice that several production rules below end with \c{-line}
while potentially spanning several physical lines. In such cases they
represent \i{logical lines}, for example, a test, its description, and its
here-document fragments.

\h#grammar-script|Script|

\
script:
  (script-block | script-line)*
\

A testscript file is a sequence of blocks and (logical) lines that are
processed in order.

\h#grammar-blocks|Blocks|

\
script-block:
  test-block | test-group-block

test-block:
  description-line?
  '{'
  script*
  '}'

group-block:
  description-line?
  '{{'
  script*
  '}}'
\

A block establishes a nested variable scope and a cleanup context. Any
variables set within the block will only have effect until the end of the
block. All registered cleanups are triggered at the end of the block.

Additionally, entering a block triggers the creation of a nested temporary
directory with the test/group id (see below) as its name. This directory then
becomes the current working directory (\c{CWD}). Unless instructed otherwise,
this temporary directory is removed at the end of the block and the previous
\c{CWD} value is restored. (@@ Should we expect it to be empty, i.e., no
unexpected output from the test?).

Test and test group blocks have the same semantics except that in a test block
each test line is considered to be part of the same test while in the test
group each test line is treated as an individual test. Individual test lines
in a group are treated \i{as if} they were in a test block consisting of just
that line. In particular, this means that a nested temporary directory is also
created for such individual tests and cleanup happens immediately after
executing the test line.

While test group blocks can contain other test group and test blocks, test
blocks cannot contain nested blocks of any kind.

A testscript execution starts in \c{out_base} as \c{CWD} and \i{as if} in an
implicit test group block with the testscript file name (without the
extension) as this group's id.

For example, consider the following testscript file which we assume is called
\c{basics.test}:

\
: group1
{{
  foo = bar

  + setup1
  + setup2 &out-setup2

  test1 &out-test1 ; test1

  : test2
  {
    bar = baz

    test2a $baz &out-test2
    test2b 
  (': ')*
\

Description lines start with a colon (\c{:}) and are used to document tests
(either single-line or compound) as well as test groups. In a sense, they are
formalized comments.

By convention the description has the following format with all three
components being optional.

\
: 
: 
:
: 
\

If the first line in the description does not contain any whitespaces, then it
is assumed to be the test or test group id. The recommended format for an id
is \c{-...} with at least two keywords. The id is used in
diagnostics as well as to run individual tests or test groups.

If the next line is followed by a blank line, then it is assume to be the test
or test group summary. The recommended style for a summary is that of the
\c{git(1)} commit summary.

After the blank line come optional details which are free-form. For example:

\
# Only id.
#
: empty-repository

# Only summary.
#
: Test handling of empty repository

# Both id and summary.
#
: empty-repository
: Test handling of empty repository

# All three: id, summary, and detailed description.
#
: empty-repository
: Test handling of empty repository
:
: This test makes sure we handle repositories without any packages.
\

The recommended way to come up with an id is to distill the summary to its
essential keywords (i.e., by removing generic words like \"test\", \"handle\",
and so on). If you do this, then both the id and summary convey essentially
the same information. As a result, you may choose to drop the summary and only
keep the id.

For single-line tests the description (either the id or summary) can also be
specified inline after a semicolon (\c{;}), for example:

\
$* empty ; Test handling of empty repository
\

If an id is not specified then it is automatically derived from the test or
test group location. If the test or test group is contained directly in the
top-level testscript file, then just its start line number is used as an id.
Otherwise, if the test or test group reside in an included file, then the
start line number is prefixed with that file name (without the extension) in
the form \c{-}. The start line for a block (either test or group)
is the line containing opening curly brace (\c{{}) and for a simple test \-
the test line itself.


\h#grammar-directives|Directives|

\
directive-line:
  include
  if-else
\

All directive lines start with a leading dot (\c{.}). To specify a
non-directive line that starts with a dot you can either escape or quote it,
for example:

\
\.include
'.include'
\

\h2#grammar-directives-include|\c{.include}|

\
include: '.include' ( )+
\

The \c{include} directive includes one or more testscript files into
another. If the specified path is not absolute, then it is interpreted as
being relative to the including file. The semantics of inclusion is \i{as if}
the contents of the included file appeared directly in the including file
except for deriving test/group ids and displaying locations in diagnostics.

The reminder of the line after the \c{'.include'} word is expanded as a
Buildfile variable value.


\h2#grammar-directives-if-else|\c{.if} \c{.else}|

\
if-else: ('.if' | '.if!') 
  if-else-body
  elif*
  else?

elif: ('.elif' | '.elif!') 
  if-else-body

else: '.else'
  if-else-body

if-else-body:
  script-line | script-block | directive-block

directive-block:
  '.{'
  script*
  '.}'
\

The \c{if-else} directives allow for conditional exclusion of testscript
fragments. The body of the \c{if-else} directive can be either a single
(logical) line, a single block, or multiple lines/blocks. For example:

\
.if ($foo == FOO)
  bar = BAR

.if ($cxx.target.class != windows)
  $* foo

.if ($cxx.target.class != windows)
  {
    $* foo
    $* bar
  }

.if ($foo == FOO)
.{
  $* foo

  bar = BAR
  baz = BAZ

  {
    $* $bar
    $* $baz
  }
.}
\

Note that \c{if-else} operates on logical lines/blocks, for example:

\
.if ($foo == FOO)
  : foo-bar
  : Test foo bar combination
  $* foo bar >>EOO
  foo
  bar
  EOO


.if ($foo == FOO)
  : foo-bar
  : Test foo bar combination
  : foo-bar
  {
    $* foo
    $* bar
  }
\

The reminder of the line after the \c{'.if'} and \c{'.elif'} words is expanded
as a Buildfile variable value and should evaluate to either \c{'true'} or
\c{'false'} text literals.

\h#grammar-variable|Variable Assignment|

\
variable-line:  ('=' | '+=' | '=+') value-attributes? 

value-attributes: '['  ']'
\

The Testscript variable assignment semantics is equivalent to Buildfile except
that \c{} is expanded as \"strings\", not \"names\" (@@ clarify) and
the default value type is \c{strings}. Note that unlike Buildfile no variable
attributes are supported.

\h#grammar-test|Test|

\
test-line:
  description-line?
  command-expr command-exit? (';' )?
  here-document*

command-exit: ('==' | '!=') 
\

The test command line can specify an optional exist status check. If omitted,
then the test is expected to succeed (0 exit status).

Variable expansion and context evaluation is performed (using chunked parsing)
in \c{command-expr} and \c{command-exit} but not in the inline test
description.

\h#grammar-setup-teardown|Setup/Teardown|

\
setup-line: '+' command-expr
  here-document*

teardown-line: '-' command-expr
  here-document*
\

The setup and teardown command lines are similar to the test command line
except that they cannot have a test description or exit status check (they are
always expected to succeed). The main motivation for distinguishing between
test and setup/teardown commands is the ability to ignore the teardown
commands in order to preserve the setup of test. For example, of a failed test
that you are debugging. Also, the setup/teardown and test commands are shown
at different verbosity levels (\c{3/-V} and \c{2/-v} respectively).

\h#grammar-command-expr|Command Expression|

\
command-expr: command-pipe (('||' | '&&') command-pipe)*
\

Multiple commands can be combination with AND and OR operators. Note that the
evaluation order is always from left to right (left-associative) and both
operators have the same precedence and are short-circuiting. Note, however,
that short-circuiting does not apply to variable expansion.


\h#grammar-command-pipe|Command Pipe|

\
command-pipe: command ('|' command)*
\

Commands can also be combined with a pipe.

\h#grammar-command|Command|

\
command:  * {stdin? stdout? stderr? merge? cleanup*}
\

A command starts with a command path following by options and arguments, if
any. We can also redirect/merge standard streams as well as register for
automatic cleanup files and directories that may be created by the command.
Note that redirects, merge, and cleanups can appear in any order but must
come after the arguments.

\h#grammar-redirect-merge-cleanup|Redirect, Merge, Cleanup|

\
stdin:  '0'?('<' | '<<' | '<<<'     | '' | '>>' | '>>>''&'? | '>!' | '>?')
stderr: '2'('>'  | '>>' | '>>>''&'? | '>!' | '>?')

merge: '1>&2' | '2>&1'

cleanup: '&'( | )
\

The \c{stdin} stream data can come from a pipe, string, the here-document
fragment, file, or \c{/dev/null} (\c{>>&}), or \c{/dev/null} (\c{>!}). It can also be
compared to a string or the here-document fragment. For \c{stdout} specifying
both pipe and redirect is an error. If no explicit \c{stderr} redirect is
specified and the test is expected to fail (non-zero exit status), then an
implicit \c{2>!} redirect is assumed.

If no \c{stdout} or \c{stderr} redirect is specified and the test tries to
write any data to either stream, it is considered to have failed. If you need
to allow writing to the default \c{stdout} or \c{stderr}, specify \c{>?} and
\c{2>?}, respectively.

We can also merge \c{stderr} to \c{stdout} (\c{2>&1}) or vice versa
(\c{1>&2}).

If a command creates extra files or directories then we can register them for
automatic cleanup at the end of the test. Files mentioned in redirects are
registered automatically.

Note that unlike shell no whitespaces around \c{<} and \c{>} redirects
or after the \c{&} cleanups are allowed.

A here-document redirect must be specified \i{literally} on test command
line. Specifically, it must not be the result of a variable expansion or
context evaluation, which rarely makes sense anyway since the following
here-document fragment itself cannot be the result of the
expansion/evaluation either; in a sense they both are part of the syntax.

This requirement is imposed in order to be able to skip test lines and their
associated here-document fragments in the \c{if-else} directives without
performing any expansions/evaluations (which may not be valid).

The skipping procedure for a line that is either a variable assignment or a
test command is as follows: The line is lexed until the newline or EOF which
checking each token either for one of the variable assignment operators or
here-document redirects. If both kinds are present then this is an ambiguity
error which can be resolved by quoting either of the token, depending on the
desired semantics (variable assignment or test command). Otherwise, all the
here-document redirects are noted and the corresponding number of here-document
fragments is skipped (which \c{here-end} match/order validation).

Note also that this procedure is applied even in case of \c{if-else} with
\c{directive-block} since the block end (\c{.\}}) may appears literally in
one of the here-document fragments.

\h#grammar-here-document|Here-Document|

\
here-document:
  *
  
\

The here-document fragments can be used to supply data to \c{stdin} or to
compare output to the expected result for \c{stdout} and \c{stderr}. Note that
the order of here-document fragments must match the order of redirects, for
example:

\
: select-no-table-error
$* --interactive >>EOO <>EOE
enter query:
EOO
SELECT * FROM no_such_table
EOI
error: no such table 'no_such_table'
EOE
\

The lines in here-document are expanded as if they were double-quoted. This
means we can use variables and evaluation contexts but have to escape the
\c{[\"\\$(]} character set.

"