Friday, 28 February 2020

Git Pipelines

Git has become the de facto standard for version control. This has given rise to many vendors hosting Git repositories. Each vendor provides Git functionality such as branching, pull requests, project membership. Now there is growing competition to provide Continuous Integration / Continuous Delivery (CI/CD) services. It has become a very competitive market. One feature that extends version control beyond just hosting source repositories, is pipelines. Pipelines are an extensible suite of tools to build, test and deploy source code. Even data hosting sites like Kaggle now support pipelines.
This article provides a brief summary of some pipeline features from three popular Git hosting sites: GitLabBitbucket and GitHub.
This article was written in GitLab flavoured Markdown, and rendered to HTML using pandoc. This provides a version controlled project that can be used to show features for each Git repository.
The features being tested are:
  • use of Docker images to provide a consistent build environment
  • customise environment to install required tool)
  • use of stages to show how ti setup a tasks that make up a stage
  • archive of generated artefacts so we can save the rendered HTML document
The work-flow is:
  1. install GNU Make
  2. install pandoc
  3. render HTML from Markdown
  4. archive rendered HTML document for later download
Here are other source repositories that offer pipelines that you may also like to try:
The source code for this project that includes the pipeline code for each Git repository is available at: https://github.com/frankhjung/article-git-pipelines

GitLab

GitLab pipelines are a well integrated tool. CI / CD pipelines are easily accessed from the sidebar:
CI/CD on sidebar
CI/CD on sidebar
Viewing jobs gives you a pipeline run history:
Pipeline job history
Pipeline job history
The YAML configuration .gitlab-ci.yml looks like:
Where:
  • image - specifies a custom Docker image from Docker Hub
  • variables - define a variable to be used in all jobs
  • stages - declares the jobs to run
  • before_script - commands to run before all jobs
  • render - name of job associated with a stage. Jobs of the same stage are run in parallel
  • stage - associates a job with a stage
  • script - commands to run for job
  • artitacts - path to objects to archive, these can be downloaded if job completes successfully
What this pipeline configuration does is:
  • load a Alpine Docker image that includes pandoc
  • invoke the build stage which
    • initialises with alpine package update and install GNU make
    • runs the render job which makes the given target
    • on successful completion, the target HTML is archived for later download
GitLab is easy to configure and easy to navigate. There are many other pipeline features including scheduling and configuring jobs by branch. One feature that I have used on Maven / Java projects is caching a local .m2 directory. This speeds up the build as you don’t have a completely new environment for each build, but can leverage previous cached artefacts. GitLab also provides a clear cache button on the pipeline page.
GitLab also provides additional services that can be integrated with you project, for example: JIRA tracking, Kubernetes and Prometheus monitoring.

Bitbucket

The example is publicly available here. The configuration is similar to that from GitLab. The pipeline and settings are easily navigated to using the side-bar.
Pipeline job history
Pipeline job history
The pipeline configuration is similar. But there are important differences. The bitbucket-pipelines.yml configuration for this project looks like:
Here, the pipeline will be triggered automatically on commits to master branch. A Docker image can be defined at the level of the pipeline step. Variables can be defined and read from the Bitbucket settings page. This is useful for recording secrets that you don’t want to have exposed in your source code. However, internal script variables are set via the script language, which in this example is Bash. Finally, in order for the build artefacts to be preserved after the pipeline completes, you can publish to a downloads location. This requires that a secure variable be configured, as described here. If you don’t, the pipeline workspace is purged on completion.
Downloads
Downloads
Pipeline build performance is very good, where this entire step takes less than a minute to complete.
Build
Build
The free account limits you to 50 minutes per month with 1GB storage.
That you have to externally / manually configure repository settings has some benefits. The consequence though, is that there are settings that are not recorded by version control. This means there are external dependencies other than what is recorded by the pipeline configuration.
A feature of being able to customise the Docker image used at the step level is that your build and test steps can use different images. This is great if you want to trial your application on a production like image.
Although the free account is limited in time there are some nice features with good performance.

GitHub

When you create a GitHub repository, there is an option to include Azure Pipelines. However this is not integrated to GitHub directly, but is configured under Azure DevOps. Broadly, the steps to set-up a pipeline are:
  • sign up to Azure pipelines
  • create a project
  • add GitHub repository to project
  • configure pipeline job
Azure DevOps Pipelines
Azure DevOps Pipelines
Builds can be run and managed from the Azure DevOps dashboard. There appears no way to manually trigger a build from the GitHub repository. Though, if you commit it will happily trigger a build for you. But, again, you need to be on the Azure DevOps dashboard to monitor the pipeline steps. The interface could really be improved by adding a direct link in the sidebar to the pipelines like the other repositories.
The following YAML configuration uses an Azure provided Ubuntu 16.04 image. There are limited images, but they are maintained and kept installed packages are kept up to date. There are many pre-installed packages.
The azure-pipelines.yml pipeline configuration looks like:
If the package you need is not installed, then you can install it if available in the Ubuntu package repositories. The default user profile is not root, so installation requires sudo.
Azure DevOps Job History
Azure DevOps Job History
Finally, to be able to download the generated artefacts you need to invoke specific PublishBuildArtifacts task as described here.
Azure DevOps Download Artefacts
Azure DevOps Download Artefacts
Azure is fast as it uses images that Microsoft build and host. The above job to install pandoc and render this page as HTML takes only 1 minute.
I found the biggest negative to Azure Pipelines was the poor integration to the GitHub dashboard. Instead, you are strongly encouraged to manage pipelines using the Azure DevOps dashboard.

Summary

Git pipelines may not be suitable in all circumstances. There are however, clear advantages to using a hosted pipeline that ensures that your project build somewhere other than your laptop. It also removes the cost of building and maintaining custom CI infrastructure. The pipeline configuration also augments your projects documentation for build, test and deployment. It is an independent executable description for your project that explicitly lists dependencies. Hosted pipelines also ease the effort for provisioning and maintaining your own build infrastructure. This could be a great benefit to projects where time constraints limit ones ability to prepare an environment.

Acknowledgements