// file : doc/manual.cli // license : MIT; see accompanying LICENSE file "\name=build2-repository-interface-manual" "\subject=repository interface" "\title=Repository Interface" // NOTES // // - Maximum
line is 70 characters. // " \h0#preface|Preface| This document describes \c{brep}, the \c{build2} package repository web interface. For the command line interface of \c{brep} utilities refer to the \l{brep-load(1)}, \l{brep-clean(1)}, \l{brep-migrate(1)}, and \l{brep-monitor(1)} man pages. \h1#submit|Package Submission| The package submission functionality allows uploading of package archives as well as additional, repository-specific information via the HTTP \c{POST} method using the \c{multipart/form-data} content type. The implementation in \c{brep} only handles uploading as well as basic verification (checksum, duplicates) expecting the rest of the submission and publishing logic to be handled by a separate entity according to the repository policy. Such an entity can be notified by \c{brep} about a new submission as an invocation of the \i{handler program} (as part of the HTTP request) and/or via email. It could also be a separate process that monitors the upload data directory. The submission request without any parameters is treated as the submission form request. If \c{submit-form} is configured, then such a form is generated and returned. Otherwise, such a request is treated as an invalid submission (missing parameters). For each submission request \c{brep} performs the following steps. \ol| \li|Verify submission size limit. The submission form-data payload size must not exceed \c{submit-max-size}.| \li|Verify the required \c{archive} and \c{sha256sum} parameters are present. The \c{archive} parameter must be the package archive upload while \c{sha256sum} must be its 64 characters SHA256 checksum calculated in the binary mode.| \li|Verify other parameters are valid manifest name/value pairs. The value can only contain UTF-8 encoded Unicode graphic characters as well as tab (\c{\\t}), carriage return (\c{\\r}), and line feed (\c{\\n}).| \li|Check for a duplicate submission. Each submission is saved as a subdirectory in the \c{submit-data} directory with a 12-character abbreviated checksum as its name.| \li|Save the package archive into a temporary directory and verify its checksum. A temporary subdirectory is created in the \c{submit-temp} directory, the package archive is saved into it using the submitted name, and its checksum is calculated and compared to the submitted checksum.| \li|Save the submission request manifest into the temporary directory. The submission request manifest is saved as \c{request.manifest} into the temporary subdirectory next to the archive.| \li|Make the temporary submission directory permanent. Move/rename the temporary submission subdirectory to \c{submit-data} as an atomic operation using the 12-character abbreviated checksum as its new name. If such a directory already exist, then this is a duplicate submission.| \li|Invoke the submission handler program. If \c{submit-handler} is configured, invoke the handler program passing to it additional arguments specified with \c{submit-handler-argument} (if any) followed by the absolute path to the submission directory. The handler program is expected to write the submission result manifest to \c{stdout} and terminate with the zero exit status. A non-zero exit status is treated as an internal error. The handler program's \c{stderr} is logged. Note that the handler program should report temporary server errors (service overload, network connectivity loss, etc.) via the submission result manifest status values in the [500-599] range (HTTP server error) rather than via a non-zero exit status. The handler program assumes ownership of the submission directory and can move/remove it. If after the handler program terminates the submission directory still exists, then it is handled by \c{brep} depending on the handler process exit status and the submission result manifest status value. If the process has terminated abnormally or with a non-zero exit status or the result manifest status is in the [500-599] range (HTTP server error), then the directory is saved for troubleshooting by appending the \c{.fail} extension followed by a numeric extension to its name (for example, \c{ff5a1a53d318.fail.1}). Otherwise, if the status is in the [400-499] range (HTTP client error), then the directory is removed. If the directory is left in place by the handler or is saved for troubleshooting, then the submission result manifest is saved as \c{result.manifest} into this directory, next to the request manifest and archive. If \c{submit-handler-timeout} is configured and the handler program does not exit in the allotted time, then it is killed and its termination is treated as abnormal. If the handler program is not specified, then the following submission result manifest is implied: \ status: 200 message: package submission is queued reference:\ | \li|Send the submission email. If \c{submit-email} is configured, send an email to this address containing the submission request manifest and the submission result manifest.| \li|Respond to the client. Respond to the client with the submission result manifest and its \c{status} value as the HTTP status code.| | Check violations (max size, duplicate submissions, etc) that are explicitly mentioned above are always reported with the submission result manifest. Other errors (for example, internal server errors) might be reported with unformatted text, including HTML. If the submission request contains the \c{simulate} parameter, then the submission service simulates the specified outcome of the submission process without actually performing any externally visible actions (e.g., publishing the package, notifying the submitter, etc). Note that the package submission email (\c{submit-email}) is not sent for simulated submissions. Pre-defined simulation outcome values are \c{internal-error-text}, \c{internal-error-html}, \c{duplicate-archive}, and \c{success}. The simulation outcome is included into the submission request manifest and the handler program must at least handle \c{success} but may recognize additional outcomes. \h#submit-request-manifest|Submission Request Manifest| The submission request manifest starts with the below values and in that order optionally followed by additional values in the unspecified order corresponding to the custom request parameters. \ archive: sha256sum: timestamp: [simulate]: [client-ip]: [user-agent]: \ The \c{timestamp} value is in the ISO-8601 \c{ - - T : : Z} form (always UTC). Note also that \c{client-ip} can be IPv4 or IPv6. \h#submit-result-manifest|Submission Result Manifest| The submission result manifest starts with the below values and in that order optionally followed by additional values if returned by the handler program. If the submission is successful, then the \c{reference} value must be present and contain a string that can be used to identify this submission (for example, the abbreviated checksum). \ status: message: [reference]: \ \h1#ci|Package CI| The CI functionality allows submission of package CI requests as well as additional, repository-specific information via the HTTP \c{GET} and \c{POST} methods using the \c{application/x-www-form-urlencoded} or \c{multipart/form-data} parameters encoding. The implementation in \c{brep} only handles reception as well as basic parameter verification expecting the rest of the CI logic to be handled by a separate entity according to the repository policy. Such an entity can be notified by \c{brep} about a new CI request as an invocation of the \i{handler program} (as part of the HTTP request) and/or via email. It could also be a separate process that monitors the CI data directory. The CI request without any parameters is treated as the CI form request. If \c{ci-form} is configured, then such a form is generated and returned. Otherwise, such a request is treated as an invalid CI request (missing parameters). For each CI request \c{brep} performs the following steps. \ol| \li|Verify the required \c{repository} and optional \c{package} parameters. The \c{repository} parameter is the remote \c{bpkg} repository location that contains the packages to be tested. If one or more \c{package} parameters are present, then only the specified packages are tested. If no \c{package} parameters are specified, then all the packages present in the repository (but excluding complement repositories) are tested. Each \c{package} parameter can specify either just the package name, in which case all the versions of this package present in the repository will be tested, or both the name and version in the \c{ / } form (for example, \c{libhello/1.2.3}.| \li|Verify the optional \c{overrides} parameter. The overrides parameter, if specified, must be the CI overrides manifest upload.| \li|Verify other parameters are valid manifest name/value pairs. The value can only contain UTF-8 encoded Unicode graphic characters as well as tab (\c{\\t}), carriage return (\c{\\r}), and line feed (\c{\\n}).| \li|Generate CI request id and create request directory. For each CI request a unique id (UUID) is generated and a request subdirectory is created in the \c{ci-data} directory with this id as its name.| \li|Save the CI request manifest into the request directory. The CI request manifest is saved as \c{request.manifest} into the request subdirectory created on the previous step.| \li|Save the CI overrides manifest into the request directory. If the CI overrides manifest is uploaded, then it is saved as \c{overrides.manifest} into the request subdirectory.| \li|Invoke the CI handler program. If \c{ci-handler} is configured, invoke the handler program passing to it additional arguments specified with \c{ci-handler-argument} (if any) followed by the absolute path to the CI request directory. The handler program is expected to write the CI result manifest to \c{stdout} and terminate with the zero exit status. A non-zero exit status is treated as an internal error. The handler program's \c{stderr} is logged. Note that the handler program should report temporary server errors (service overload, network connectivity loss, etc.) via the CI result manifest status values in the [500-599] range (HTTP server error) rather than via a non-zero exit status. The handler program assumes ownership of the CI request directory and can move/remove it. If after the handler program terminates the request directory still exists, then it is handled by \c{brep} depending on the handler process exit status and the CI result manifest status value. If the process has terminated abnormally or with a non-zero exit status or the result manifest status is in the [500-599] range (HTTP server error), then the directory is saved for troubleshooting by appending the \c{.fail} extension to its name. Otherwise, if the status is in the [400-499] range (HTTP client error), then the directory is removed. If the directory is left in place by the handler or is saved for troubleshooting, then the CI result manifest is saved as \c{result.manifest} into this directory, next to the request manifest. If \c{ci-handler-timeout} is configured and the handler program does not exit in the allotted time, then it is killed and its termination is treated as abnormal. If the handler program is not specified, then the following CI result manifest is implied: \ status: 200 message: CI request is queued reference: \ | \li|Send the CI request email. If \c{ci-email} is configured, send an email to this address containing the CI request manifest, the potentially empty CI overrides manifest, and the CI result manifest.| \li|Respond to the client. Respond to the client with the CI result manifest and its \c{status} value as the HTTP status code.| | Check violations that are explicitly mentioned above are always reported with the CI result manifest. Other errors (for example, internal server errors) might be reported with unformatted text, including HTML. If the CI request contains the \c{interactive} parameter, then the CI service provides the execution environment login information for each test and stops them at the specified breakpoint. Pre-defined breakpoint ids are \c{error} and \c{warning}. The breakpoint id is included into the CI request manifest and the CI service must at least handle \c{error} but may recognize additional ids (build phase/command identifiers, etc). If the CI request contains the \c{simulate} parameter, then the CI service simulates the specified outcome of the CI process without actually performing any externally visible actions (e.g., testing the package, publishing the result, etc). Note that the CI request email (\c{ci-email}) is not sent for simulated requests. Pre-defined simulation outcome values are \c{internal-error-text}, \c{internal-error-html}, and \c{success}. The simulation outcome is included into the CI request manifest and the handler program must at least handle \c{success} but may recognize additional outcomes. \h#ci-request-manifest|CI Request Manifest| The CI request manifest starts with the below values and in that order optionally followed by additional values in the unspecified order corresponding to the custom request parameters. \ id: repository: [package]: [/ ] [interactive]: [simulate]: timestamp: [client-ip]: [user-agent]: [service-id]: [service-type]: [service-data]: [service-action]: \ The \c{package} value can be repeated multiple times. The \c{timestamp} value is in the ISO-8601 \c{ - - T : : Z} form (always UTC). Note also that \c{client-ip} can be IPv4 or IPv6. Note that some CI service implementations may serve as backends for third-party services. The latter may initiate CI tasks, providing all the required information via some custom protocol, and expect the CI service to notify it about the progress. In this case the third-party service type as well as optionally the third-party id and custom state data can be communicated to the underlying CI handler program via the respective \c{service-*} manifest values. Also note that normally a third-party service has all the required information (repository URL, etc) available at the time of the CI task initiation, in which case the \c{start} value is specified for the \c{service-action} manifest value. If that's not the case, the CI task is only created at the time of the initiation without calling the CI handler program. In this case the CI handler is called later, when all the required information is asynchronously gathered by the service. In this case the \c{load} value is specified for the \c{service-action} manifest value. \h#ci-overrides-manifest|CI Overrides Manifest| The CI overrides manifest is a package manifest fragment that should be applied to all the packages being tested. The contained values override the whole value groups they belong to, resetting all the group values prior to being applied. Currently, only the following value groups can be overridden: \ build-email build-{warning,error}-email builds build-{include,exclude} *-builds *-build-{include,exclude} *-build-config \ For the package configuration-specific build constraint overrides the corresponding configuration must exist in the package manifest. In contrast, the package configuration override (\cb{*-build-config}) adds a new configuration if it doesn't exist and updates the arguments of the existing configuration otherwise. In the former case, all the potential build constraint overrides for such a newly added configuration must follow the corresponding \cb{*-build-config} override. Note that the build constraints group values (both common and build package configuration-specific) are overridden hierarchically so that the \c{[\b{*-}]\b{build-}{\b{include},\b{exclude}\}} overrides don't affect the respective \c{[\b{*-}]\b{builds}} values. Note also that the common and build package configuration-specific build constraints group value overrides are mutually exclusive. If the common build constraints are overridden, then all the configuration-specific constraints are removed. Otherwise, if any configuration-specific constraints are overridden, then for the remaining configurations the build constraints are reset to \cb{builds:\ none}. See \l{bpkg#manifest-package Package Manifest} for details on these values. \h#ci-result-manifest|CI Result Manifest| The CI result manifest starts with the below values and in that order optionally followed by additional values if returned by the handler program. If the CI request is successful, then the \c{reference} value must be present and contain a string that can be used to identify this request (for example, the CI request id). \ status: message: [reference]: \ \h1#upload|Build Artifacts Upload| The build artifacts upload functionality allows uploading archives of files generated as a byproduct of the package builds. Such archives as well as additional, repository-specific information can optionally be uploaded by the automated build bots via the HTTP \c{POST} method using the \c{multipart/form-data} content type (see the \l{bbot \c{bbot} documentation} for details). The implementation in \c{brep} only handles uploading as well as basic actions and verification (build session resolution, agent authentication, checksum verification) expecting the rest of the upload logic to be handled by a separate entity according to the repository policy. Such an entity can be notified by \c{brep} about a new upload as an invocation of the \i{handler program} (as part of the HTTP request) and/or via email. It could also be a separate process that monitors the upload data directory. For each upload request \c{brep} performs the following steps. \ol| \li|Determine upload type. The upload type must be passed via the \c{upload} parameter in the query component of the request URL.| \li|Verify upload size limit. The upload form-data payload size must not exceed \c{upload-max-size} specific for this upload type.| \li|Verify the required \c{session}, \c{instance}, \c{archive}, and \c{sha256sum} parameters are present. If \c{brep} is configured to perform agent authentication, then verify that the \c{challenge} parameter is also present. See the \l{bbot#arch-result-req Result Request Manifest} for semantics of the \c{session} and \c{challenge} parameters. The \c{archive} parameter must be the build artifacts archive upload while \c{sha256sum} must be its 64 characters SHA256 checksum calculated in the binary mode.| \li|Verify other parameters are valid manifest name/value pairs. The value can only contain UTF-8 encoded Unicode graphic characters as well as tab (\c{\\t}), carriage return (\c{\\r}), and line feed (\c{\\n}).| \li|Resolve the session. Resolve the \c{session} parameter value to the actual package build information.| \li| Authenticate the build bot agent. Use the \c{challenge} parameter value and the resolved package build information to authenticate the agent, if configured to do so.| \li|Generate upload request id and create request directory. For each upload request a unique id (UUID) is generated and a request subdirectory is created in the \c{upload-data} directory with this id as its name.| \li|Save the upload archive into the request directory and verify its checksum. The archive is saved using the submitted name, and its checksum is calculated and compared to the submitted checksum.| \li|Save the upload request manifest into the request directory. The upload request manifest is saved as \c{request.manifest} into the request subdirectory next to the archive.| \li|Invoke the upload handler program. If \c{upload-handler} is configured, invoke the handler program passing to it additional arguments specified with \c{upload-handler-argument} (if any) followed by the absolute path to the upload request directory. The handler program is expected to write the upload result manifest to \c{stdout} and terminate with the zero exit status. A non-zero exit status is treated as an internal error. The handler program's \c{stderr} is logged. Note that the handler program should report temporary server errors (service overload, network connectivity loss, etc.) via the upload result manifest status values in the [500-599] range (HTTP server error) rather than via a non-zero exit status. The handler program assumes ownership of the upload request directory and can move/remove it. If after the handler program terminates the request directory still exists, then it is handled by \c{brep} depending on the handler process exit status and the upload result manifest status value. If the process has terminated abnormally or with a non-zero exit status or the result manifest status is in the [500-599] range (HTTP server error), then the directory is saved for troubleshooting by appending the \c{.fail} extension to its name. Otherwise, if the status is in the [400-499] range (HTTP client error), then the directory is removed. If the directory is left in place by the handler or is saved for troubleshooting, then the upload result manifest is saved as \c{result.manifest} into this directory, next to the request manifest. If \c{upload-handler-timeout} is configured and the handler program does not exit in the allotted time, then it is killed and its termination is treated as abnormal. If the handler program is not specified, then the following upload result manifest is implied: \ status: 200 message: upload is queued reference: \ | \li|Send the upload email. If \c{upload-email} is configured, send an email to this address containing the upload request manifest and the upload result manifest.| \li|Respond to the client. Respond to the client with the upload result manifest and its \c{status} value as the HTTP status code.| | Check violations (max size, etc) that are explicitly mentioned above are always reported with the upload result manifest. Other errors (for example, internal server errors) might be reported with unformatted text, including HTML. \h#upload-request-manifest|Upload Request Manifest| The upload request manifest starts with the below values and in that order optionally followed by additional values in the unspecified order corresponding to the custom request parameters. \ id: session: instance: archive: sha256sum: timestamp: name: version: project: target-config: package-config: target: [tenant]: toolchain-name: toolchain-version: repository-name: machine-name: machine-summary: \ The \c{timestamp} value is in the ISO-8601 \c{ - - T : : Z} form (always UTC). \h#upload-result-manifest|Upload Result Manifest| The upload result manifest starts with the below values and in that order optionally followed by additional values if returned by the handler program. If the upload request is successful, then the \c{reference} value must be present and contain a string that can be used to identify this request (for example, the upload request id). \ status: message: [reference]: \ \h1#package-review|Package Review Submission| \h#package-review-manifest|Package Review Manifest| The package review manifest files are per version/revision and are normally stored on the filesystem along with other package metadata (like ownership information). Under the metadata root directory, a review manifest file has the following path: \ / / /reviews.manifest \ For example: \ hello/libhello/1.2.3+2/reviews.manifest \ Note that review manifests are normally not removed when the corresponding package archive is removed (for example, as a result of a replacement with a revision) because reviews for subsequent versions may refer to review results of previous versions (see below). The package review file is a manifest list with each manifest containing the below values in an unspecified order: \ reviewed-by: result- : pass|fail|unchanged [base-version]: [details-url]: \ For example: \ reviewed-by: John Doe result-build: fail details-url: https://github.com/build2-packaging/hello/issues/1 \ The \c{reviewed-by} value identifies the reviewer. For example, a deployment policy may require a real name and email address when submitting a review. The \c{result- } values specify the review results for various aspects of the package. At least one result value must be present and duplicates for the same aspect name are not allowed. For example, a deployment may define the following aspect names: \c{build} (build system), \c{code} (implementation source code), \c{test} (tests), \c{doc} (documentation). The \c{result- } value must be one of \c{pass} (the review passed), \c{fail} (the review failed), and \c{unchanged} (the aspect in question hasn't changed compared to the previous version, which is identified with the \c{base-version} value; see below). The \c{base-version} value identifies the previous version on which this review is based. The idea here is that when reviewing a new revision, a patch version, or even a minor version, it is often easier to review the difference between the two versions than to review everything from scratch. In such cases, if some aspects haven't changed since the previous version, then their results can be specified as \c{unchanged}. The \c{base-version} value must be present if at least one \c{result- } value is \c{unchanged}. The \c{details-url} value specifies a URL that contains the details of the review (issues identified, etc). It can only be absent if none of the \c{result- } values are \c{fail} (a failed review needs an explanation of why it failed). \h1#github-ci|GitHub CI Integration| This chapter describes the integration of the \l{#ci Package CI} functionality with GitHub. \h#github-ci-background|GitHub CI Background| The GitHub CI model has a number of limitations that are important to understand in order to use the provided integration correctly. To understand the limitations, however, we first need to understand how the integration works, at least at the high level. GitHub supports integration of third-party CI services into the repository workflow by allowing such third-party services to register for events (called \i{web hooks} in the GitHub terminology). \N|This mechanism should not be confused with GitHub Actions, which is a GitHub built-in CI service. As far as we understand, it uses ad hoc integration rather than the same integration mechanism as available to third-party CI services.| While there are many repository workflow events, for CI the only relevant ones are: \ol| \li|\i{Branch push} (BP), which is triggered when a new commit is pushed to a branch in your repository.| \li|\i{Pull request} (PR), which is triggered when a new pull request is created on your repository. It is also triggered when new commits are added to the existing PR.|| \N|Another relevant event is \i{Merge queue}. However, merge queues are not yet supported by this integration.| In response to these events the third-party CI service is expected to start a number of CI jobs (called \i{checks} in the GitHub terminology) and then report their progress and results back to GitHub to be shown to the user, and, in case of PRs, to prevent them from being merged in case the result is unsuccessful. Let's examine in more detail what exactly happens in case of a branch push and a pull request. The branch push (BP) case is pretty straightforward: when you push a new commit to a branch in your repository, this commit is CI'ed by the third-party service and the result is associated with this commit. If you push another commit, the process repeats and you get a new set of CI results associated with the new commit. The important point here is that the CI results for each commit are associated with that commit id (called \i{head sha} in the GitHub terminology). The pull request (PR) case is more complicated: the aim of a PR is to merge one or more commits from one branch (called \i{head branch} in the GitHub terminology) to another branch (called \i{base branch} in the GitHub terminology). If the base branch can be fast-forwarded to the head commit of the head branch, then we can CI this head commit and the result will be representative of the merge. However, if base cannot be fast-forwarded, then a general merge of the two branches must be performed, with potential conflict resolution, etc. And in this case the CI result for the head commit may not necessarily represent the result of the merge. To support the general case (when the base branch cannot be fast-forwarded) GitHub creates a tentative merge commit (called \i{test merge commit} in the GitHub terminology) and expects the CI service to test that commit rather than the head commit. See \l{https://www.kenmuse.com/blog/the-many-shas-of-a-github-pull-request/ The Many SHAs of a GitHub Pull Request} for additional details. While the PR case is more complicated, so far everything makes sense. But that ends once we understand what GitHub associates the CI result with in case of a PR. Since the CI service is expected to test the merge commit, it would make sense to associate the result of this test with the merge commit. Instead, GitHub expects the CI service to report it as associated with the head commit! This strange decision by GitHub, which we will refer to as \"head sharing\", has two serious consequences for trusting CI results when make decisions about merging PRs. Firstly, if the branch push and/or several pull requests share the same head commit, then they will share the CI result, regardless of the state of the PRs' base branches. Or, to put it another way, in the GitHub model there is a single CI result per head commit that is shared by all the BPs and PRs with this head commit. Secondly, if the base branch of a PR moves, the CI result associated with the PR does not get invalidated (because the PR head hasn't changed). Let's consider two representative examples of each case that show how the GitHub behavior can lead us to making wrong decisions. But before we do that, a last bit of terminology: we will distinguish between \i{local PRs}, those with the head branch from the same repository, and \i{remote PRs}, those with the head branch belonging to another user/organization (called \i{forked PR} in the GitHub terminology). The first representative example is a feature branch: we develop a feature in a branch of our repository and once it is ready, we create a local PR to merge it to the \c{master}/\c{main} branch. We typically go through the PR instead of merging our branch directly in order to have the changes reviewed by someone else. In this scenario, the head commit of our feature branch and of the PR we created will be the same, which means our PR will share the CI result with the feature branch push, which is presumably successful. This can lead us to merging the PR based on this result even though the merge commit of the PR may not have the same contents as the head commit of the result. For example, we may have forgotten to rebase our feature branch on the base branch (\c{master}/\c{main} in our example) before creating the PR and the base branch has moved while we developed the feature. Or the review may have taken some time and the base branch likewise has moved in the meantime. In both these cases while the changes to the base branch may not render our head commit unmergeable (for example, due to conflicts), they may render our changes uncompilable or otherwise buggy once merged. The second representative example is a single remote PR: someone creates a PR with a feature or bugfix from their fork of our repository. There is no corresponding branch push for this PR's head commit in our repository so it sounds like there is only one place (the PR) where the CI result of the head commit will be reported in our repository and so the head sharing should not be an issue, right? While it's true that \i{spatial} sharing, that is between BP and/or several PRs, is not an issue in this case, \i{temporal} sharing still is. Specifically, if the base branch moves before we examine the PR, we again may end up merging it based on the CI results that are not representative of the merge commit. Hopefully you see the underlying theme by now: the only way to ensure correctness in the GitHub CI model is to make sure the PR's head and merge commits are the same, which is only the case when the PR base branch can be fast-forwarded to head. Thankfully, GitHub provides a branch protection rule that prevents merging of a PR with the head branch behind base. And enabling of this protection rule is a prerequisite for this CI integration to work correctly. "