Jenkins Test Pipeline

  | Testing

End goal:

Reduce test time from 65 minutes to 15 minutes.

Problem:

The entire GST product was a monolithic java maven project with about 20 modules and 1000s of test cases, hence the time taken for running test cases was 65+ minutes, which hampered the productivity and deployment time.

Being in the core GST backend team (Ultron was the POD name), I picked up the task to reduce the test time.

Current situation:

Test cases were executed in an M4 large EC2 machine with a TeamCity agent to get test coverage reports.

TeamCity was primarily used for:

  • Relative code coverage thresholds based on a base branch.
  • Code coverage reports.
  • Ease of integration.
  • Low cost ($300 for an agent).

Requirements:

  • Parallelise – running test cases.
  • Reduce cost (With parallelization, each TeamCity agent would cost 300$ more and require a standalone EC2 server).
  • Support relative code coverage threshold based on different base reference branches (Code coverage threshold of a branch is compared with the code coverage of the branch raised against, which is not supported in Teamcity).
  • Integration with Github and spinnaker pipeline for deployments.

Implementation:

  • Test cases were distributed across several test suites. Further (Run test cases for smaller modules in the project), to include/ exclude modules, inputFiles.lst (list of test class names) was used, refer the python script below:
import os
import re
import argparse


def includeExcludeTest(args):
    exclude = testSelection(args, False)
    include = testSelection(args, True)

    if exclude is not None and include is not None:
        test_string = exclude + "," + include
    elif exclude is not None and include is None:
        test_string = exclude
    elif include is not None and exclude is None:
        test_string = include
    command = "mvn clean verify -Dtest='" + test_string + "' -DfailIfNoTests=false -Dmaven.wagon.http.pool=false"
    print(command)
    os.system(command)


def testSelection(args, param):
    modules = args.exclude
    test_string = "!"
    suffix = ",!"
    remove_suffix = 2

    if param:
        modules = args.include
        test_string = ""
        suffix = ","
        remove_suffix = 1

    if modules is not None:
        for module in modules:
            input_test_files = open(
                module + "/target/maven-status/maven-compiler-plugin/testCompile/default-testCompile/inputFiles.lst")
            for eachLine in iter(input_test_files):
                regex = module + "/src/test/java/in/cleartax/gst/"
                if re.search(regex, eachLine):
                    each_test = eachLine[(eachLine.find(regex) + (31 + len(module))): -1]
                    test_string = test_string + each_test + suffix

        test_string = test_string[:-remove_suffix]
        return test_string


def setup_args(parser):
    parser.add_argument(
        "--include",
        nargs="*",
        type=str
    )
    parser.add_argument(
        "--exclude",
        nargs="*",
        type=str
    )


def main():
    parser = argparse.ArgumentParser()
    setup_args(parser)
    args = parser.parse_args()
    includeExcludeTest(args)


if __name__ == "__main__":
    main()

Usage:

  • python3 maven_test.py –exclude module1 module2
  • python3 maven_test.py –include module1 module2

Just an example of how the pipeline looks like.

  • Creating Jenkins pipeline

Pipeline:

  • Git pull (As the name suggests, get the latest changes of the source branch)
  • Check for the label run-ci, this was primarily added to prevent unnecessary executions of test cases and are only executed when the developer adds the label run-ci to the pull-request.
  • Build the project and push the image to docker registry, to prevent re-building the project for each parallel execution, thereby saving time.
  • Run all the test suites/ group of test cases in parallel, in this case, there were 15 parallel executions.
  • For cost reduction, AWS Spot instances were used.
  • Upload the jacoco executable (Code coverage results) to AWS S3 (_).
  • Consolidate code coverage reports, once all the executions are complete, code coverage report had to be consolidated to get the code coverage of all modules.
  • Code coverage threshold validation, a similar pipeline runs to compute the code coverage of known base branches (stage, pre-production, and production) which is triggered every time a pull-request is merged.
  • Disk clean up, all the files are deleted from the machine and picks up the next execution.
  • View code coverage and surefire (Passed and failed test cases) results, user can view the code coverage per Jenkins job build, powered by jacoco-jenkins plugin.

Jenkins pipeline example (jenkinsFile):

timeout(time: 60, unit: 'MINUTES') {
   def commands
   def imageDetails
   def imageName
   def uniqueName
   def buildCommand
   def gitUrl = 'git@github.com:PROJECT.git'
   def gitBranch = env.BRANCH_NAME
   def gitCredentialsId = 'XXXX-XXXX-XXXX'
   def lineCoverageJacoco
   def methodCoverageJacoco
   def classCoverageJacoco
   def branchBuildBumber
   def branchBuildName
   def pullNumber
   def githubUserToken = 'USERNAME:XXXX-XXXX-XXXX'
   def githubHeader = 'application/vnd.github.symmetra-preview+json'

   node(label : 'temp_docker_node') {

       stage('Git pull') {
           git branch: gitBranch,
           credentialsId: gitCredentialsId,
           url: gitUrl

           try {
               COMMIT_SHA = sh (script: 'git log -n 1 | grep -o -E -e "[0-9a-f]{40}"', returnStdout: true).trim()
               echo "Commit ID is ${COMMIT_SHA}"

               def labelResponse = sh(script: 'curl -u ' + githubUserToken + ' -H Accept:' + githubHeader + ' https://api.github.com/search/issues?q=SHA:' + COMMIT_SHA, returnStdout: true)
               def labelJson = readJSON text: labelResponse
               def labelList = labelJson.items[0].labels
               pullNumber = labelJson.items[0].number

               int labelCount = 0
               int labelFound = 0
               def sleepTime = 2

               while(labelCount < labelList.size()) {
                   if (labelList[labelCount].name == 'run-ci') {
                       labelFound = 1
                       echo "This PR is eligible for running parallel tests with label : ${labelList[labelCount].name}"
                   }
                   labelCount = labelCount + 1

                   echo "Sleeping for ${sleepTime} seconds to prevent rate limiting"
                   sleep(time:sleepTime, unit:"SECONDS")
               }
               if (labelFound != 1) {
                   error("LabelMismatchException")
               }
           }
           catch (e) {
               if (e.toString().contains("LabelMismatchException")) {
                   error("Add label [run-ci] to this PR to run parallel tests")
               }
               println e
           }
       }

       def jenkins_config = readJSON file: 'jenkins_config';
       commands = jenkins_config.testsToRun

       imageDetails = jenkins_config.imageDetails
       buildDetails = jenkins_config.buildDetails

       branchBuildBumber = "${currentBuild.id}"
       branchBuildName = "${env.BRANCH_NAME}"

       echo branchBuildBumber
       echo branchBuildName

       uniqueName = "${env.BRANCH_NAME}_${currentBuild.id}".replace("/", "_").replace("#", "_")
       imageName = "${imageDetails.imageName}:" + uniqueName
       buildCommand = "mvn -T 1C clean package -DskipTests=true"
       stage('Build') {
           sh 'rm -rf coverage'
           sh buildCommand
       }
       stage ('Push image') {
           retry (2) {
               docker.withRegistry('https://XXXX.amazonaws.com', 'ecr:REGION:AWS Spot Role') {
               def customImage = docker.build(imageName)
               customImage.push()
               }
           }
       }
   }
   tests = [:]
   int com = 0

   while(com < commands.size() ){
       String testName = "tests_${com}"
       String s3UploadCommand = "aws s3 cp \$WORKSPACE/<PATH>/target/coverage-report/merged.exec s3://<AWS-S3>/" + uniqueName + "/" + testName + ".exec"
       String commandToRun = commands[com] + " && rm -rf  \$WORKSPACE/coverage && cp -r ./  \$WORKSPACE/coverage"

       try {
           tests[testName] = {
               node(label : 'temp_docker_node') {
                       try {
                           stage (testName){
                               docker.withRegistry('https://XXXX.amazonaws.com', 'ecr:REGION:AWS Spot Role') {
                                   docker.image(imageName).inside('-v $HOME/.m2:/home/jenkins/.m2'){
                                   sh commandToRun
                                   }
                               }
                           }
                           try {
                               sh s3UploadCommand
                           }
                           catch (e) {
                               echo "Check for test failures in prior maven test step"
                               sh 'sudo pip3 install awscli --force-reinstall --upgrade'
                               sh s3UploadCommand
                           }
                       }
                       catch (e) {
                           error("There are few test failures, check the [Tests] tab or maven verify step")
                           println e
                       }
                       finally {
                           try {
                               junit '**/surefire-reports/*.xml'
                           }
                           catch (e) {
                               error("No test report files were found, check the [Tests] tab or maven verify step")
                               println e
                           }
                       }
                   }
               }
           }
       catch (e) {
           println e
       }
       com = com + 1
   }

  parallel tests
  node(label : 'temp_docker_node') {
       String s3DownloadCommand = "mkdir -p /usr/src/app/<PATH>/target/suite-reports && aws s3 cp s3://<AWS-S3>/" + uniqueName + "/" + " /usr/src/app/<PATH>/target/suite-reports --recursive"

       stage ('coverage') {
           docker.withRegistry('https://XXXX.amazonaws.com', 'ecr:REGION:AWS Spot Role') {
               docker.image(imageName).inside('-v $HOME/.m2:/home/jenkins/.m2'){
                   sh s3DownloadCommand
                   sh 'cd /usr/src/app/<PROJECT-PATH> && mvn antrun:run@finalTask -P final-report &&  rm -rf  \$WORKSPACE/coverage && cp -r /usr/src/app/  \$WORKSPACE/coverage'
               }
           }
       }

       echo "Running coverage for branch ${env.BRANCH_NAME} with build number ${currentBuild.id}"

       jacoco(
             execPattern: '**/coverage/<PATH>/target/coverage-report/final.exec',
             classPattern: '**/coverage/**',
             sourcePattern: '**/coverage/**',
             inclusionPattern: '**/*.class',
             exclusionPattern: '**/*Test*.class'
       )

       stage('Threshold') {
           String baseRefBranch
           String jenkinsJobName

           try {
               retry (2) {
                   def pullResponse = sh(script: 'curl -u ' + githubUserToken + ' -H Accept:' + githubHeader + ' https://api.github.com/repos/<REPO-NAME>/pulls/' + pullNumber, returnStdout: true)
                   def pullJson = readJSON text: pullResponse
                   baseRefBranch = pullJson.base.ref

                   if (!baseRefBranch.contains("release")) {
                       baseRefBranch = "master"
                   }
               }
           } catch (e) {
               baseRefBranch = "master"
           }

           echo "Created from branch is ${baseRefBranch}"

           if (baseRefBranch == "master") {
               jenkinsJobName = "masterParallelTest"
           }
           else if (baseRefBranch == "production-release") {
               jenkinsJobName = "productionParallelTest"
           }
           else {
               jenkinsJobName = "releaseParallelTest"
           }

           def codeCoverageResponse = sh(script: 'curl http://jenkins.<CUSTOM>.co/job/' + jenkinsJobName + '/lastSuccessfulBuild/jacoco/api/json?pretty=true --user "EMAIL:TOKEN"', returnStdout: true)
           def codeCoverageResponseJson = readJSON text: codeCoverageResponse
           branchBuildName = branchBuildName.replace("/", "%2F").replace("#", "%23")

           def branchResponse = sh(script: 'curl http://jenkins.<CUSTOM>.co/job/ParallelTest/job/' + branchBuildName + '/' + branchBuildBumber + '/jacoco/api/json?pretty=true --user "EMAIL:TOKEN"', returnStdout: true)
           def branchJson = readJSON text: branchResponse

           echo "Line coverage of master : ${codeCoverageResponseJson.lineCoverage.percentageFloat} and current value is ${branchJson.lineCoverage.percentageFloat}"
           echo "Class coverage of master : ${codeCoverageResponseJson.classCoverage.percentageFloat} and current value is ${branchJson.classCoverage.percentageFloat}"
           echo "Method coverage of master : ${codeCoverageResponseJson.methodCoverage.percentageFloat} and current value is ${branchJson.methodCoverage.percentageFloat}"

           float currentLineCoverage = "${branchJson.lineCoverage.percentageFloat}"
           float currentMethodCoverage = "${branchJson.methodCoverage.percentageFloat}"
           float currentClassCoverage = "${branchJson.classCoverage.percentageFloat}"

           float baseBranchLineCoverage = "${codeCoverageResponseJson.lineCoverage.percentageFloat}"
           float baseBranchMethodCoverage = "${codeCoverageResponseJson.methodCoverage.percentageFloat}"
           float baseBranchClassCoverage = "${codeCoverageResponseJson.classCoverage.percentageFloat}"

           baseBranchLineCoverage = baseBranchLineCoverage - 0.3
           baseBranchMethodCoverage = baseBranchMethodCoverage - 0.2
           baseBranchClassCoverage = baseBranchClassCoverage - 0.2

           echo "Line coverage threshold : ${baseBranchLineCoverage} | Class coverage threshold : ${baseBranchClassCoverage} | Method coverage threshold : ${baseBranchMethodCoverage}"

           if (currentLineCoverage < baseBranchLineCoverage) {
               error("Line coverage is low")
           }
           else {
               echo "Line coverage passed"
           }

           if (currentClassCoverage < baseBranchClassCoverage) {
               error("Class coverage is low")
           }
           else {
               echo "Class coverage passed"
           }

           if (currentMethodCoverage < baseBranchMethodCoverage) {
               error("Method coverage is low")
           }
           else {
               echo "Method coverage passed"
           }
       }

       stage('Disk clean up') {
           try {
               echo "Cleaning the disk"
               sh 'docker system prune --volumes -f'
               sh 'docker image prune -a -f'
           }
           catch (e) {
               echo "Clean failed as it is already in progress"
           }
       }
   }
}

Example of defining the number of parallel executions : testsToRun – List of commands.

{
	"buildDetails" = {
		"buildCommand" = "mvn -T 1C clean package -DskipTests=true"
	},

	"imageDetails" = {
		"imageName" = "<IMAGE-NAME>"
	},

	"testsToRun" = [
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite1 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite2 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite3 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite4 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite5 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite6 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
		"cd /usr/src/app &&  mvn verify -Dtest=TestSuite7 -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
        "cd /usr/src/app &&  mvn verify -Dtest=UnitTestSuite -DfailIfNoTests=false -Dmaven.wagon.http.pool=false",
        "cd /usr/src/app &&  python3 maven_test.py --exclude module1 module2",
        "cd /usr/src/app &&  python3 maven_test.py --include module3 module4"
	]
}

Ant scripts were used to generate code coverage reports for each execution and merge all the individual code coverage reports stored in AWS S3.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>ARTIFACT</artifactId>
        <groupId>GROUP_ID</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>1.0.0</modelVersion>

    <artifactId>COVERAGE_MODULE</artifactId>

    <properties>
        <skip.final.report>true</skip.final.report>
        <build.directory.suite-reports>../project-coverage/target/suite-reports</build.directory.suite-reports>
        <!--All modules having test cases should be added here-->
        <!--Directories-->
        <build.directory.module1>../module1/target</build.directory.module1>
        <build.directory.module2>../module2/target</build.directory.module2>
        <build.directory.module3>../module3/target</build.directory.module3>
        <build.directory.module4>../module4/target</build.directory.module4>
        <build.directory.module5>../module5/target</build.directory.module5>
        <build.directory.module6>../module6/target</build.directory.module6>
        <build.directory.module6>../module6/target</build.directory.module6>
    </properties>

    <profiles>
        <profile>
            <id>final-report</id>
            <properties>
                <skip.final.report>false</skip.final.report>
            </properties>
        </profile>
    </profiles>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <!-- Copy the ant tasks jar | Needed for ts.jacoco.report-ant -->
                    <execution>
                        <id>jacoco-dependency-ant</id>
                        <goals>
                            <goal>copy</goal>
                        </goals>
                        <phase>process-test-resources</phase>
                        <inherited>false</inherited>
                        <configuration>
                            <artifactItems>
                                <artifactItem>
                                    <groupId>org.jacoco</groupId>
                                    <artifactId>org.jacoco.ant</artifactId>
                                    <version>0.7.9</version>
                                </artifactItem>
                            </artifactItems>
                            <stripVersion>true</stripVersion>
                            <outputDirectory>${basedir}/target/jacoco-jars</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-antrun-plugin</artifactId>
                <version>1.8</version>
                <executions>
                    <execution>
                        <!--merge jacoco.exec for all modules to merged.exec-->
                        <id>mergeTask</id>
                        <phase>post-integration-test</phase>
                        <goals>
                            <goal>run</goal>
                        </goals>
                        <configuration>
                            <target>
                                <echo message="Merging JaCoCo Reports" />
                                <taskdef name="merge" classname="org.jacoco.ant.MergeTask">
                                    <classpath path="${project.basedir}/target/jacoco-jars/org.jacoco.ant.jar" />
                                </taskdef>
                                <mkdir dir="${project.basedir}/target/coverage-report" />
                                <merge destfile="${project.basedir}/target/coverage-report/merged.exec">
                                    <fileset dir="${build.directory.module1}"><include name="jacoco.exec" /></fileset>
                                    <fileset dir="${build.directory.module2}"><include name="jacoco.exec" /></fileset>
                                    <fileset dir="${build.directory.module3}"><include name="jacoco.exec" /></fileset>
                                    <fileset dir="${build.directory.module4}"><include name="jacoco.exec" /></fileset>
                                    <fileset dir="${build.directory.module5}"><include name="jacoco.exec" /></fileset>
                                    <fileset dir="${build.directory.module6}"><include name="jacoco.exec" /></fileset>
                                </merge>
                            </target>
                        </configuration>
                    </execution>

                    <execution>
                        <id>finalTask</id>
                        <phase>verify</phase>
                        <goals>
                            <goal>run</goal>
                        </goals>
                        <configuration>
                            <skip>${skip.final.report}</skip>
                            <target>
                                <echo message="Generating final executable" />
                                <taskdef name="final" classname="org.jacoco.ant.MergeTask">
                                    <classpath path="${project.basedir}/target/jacoco-jars/org.jacoco.ant.jar" />
                                </taskdef>
                                <final destfile="${project.basedir}/target/coverage-report/final.exec">
                                    <fileset dir="${build.directory.suite-reports}"><include name="*.exec" /></fileset>
                                </final>
                            </target>
                        </configuration>
                    </execution>

                </executions>

                <dependencies>
                    <dependency>
                        <groupId>org.jacoco</groupId>
                        <artifactId>org.jacoco.ant</artifactId>
                        <version>0.7.9</version>
                    </dependency>
                </dependencies>
            </plugin>

        </plugins>
    </build>

</project>

More detailed explanation for generating code coverage report for a multi-module maven project can be found here

All of the above steps cover the code pieces, however, that constitutes to about 40% of the work, a lot more effort put in for:

  • Jenkins integration with AWS EC2 auto-scaling (Based on disk space and number of instances in the spot fleet).
  • Ensure a new instance picks up the same task when an instance is killed abruptly (Spot instances are not stand-alone, hence cheaper).

About 2 minutes before the instance is killed, a file is uploaded to a defined path by AWS to signal that the instance will be killed soon.

@Slf4j
public class SpotTerminationHealthCheck extends HealthCheck {

    private static String TERMINATION_FILE = "/tmp/spot-shutdown-notice";

    @Override
    protected Result check() {
        if (new File(TERMINATION_FILE).exists()) {
            final String message = String.format("Spot Instance termination notice received: %s is present." +
                            " Marking this node unhealthy",
                    TERMINATION_FILE);

            log.warn(message);

            return Result.unhealthy(message);
        }
        return Result.healthy("Spot Instance still active.");
    }
}
  • Pipeline setup and expose APIs (this was powered by a flask application with AWS S3 as the storage hosted in a T2 small EC2 instance) for code coverage of the base branches used for defining the threshold.
  • Setting up docker. We eventually started using docker for deployments as we migrated to Kubernetes and an easy environment set-up for new joiners.
  • Fixing Jenkins jacoco plugin.
  • The parallel test framework was generic and could be used with any maven project.

Conclusion:

Test time was reduced to 14 minutes – best case, 17 minutes – average, and 22 minutes being the worst case and cost about 20$ – 25$ per week.

About Adesh Nalpet Adimurthy

Photo of Adesh Nalpet Adimurthy

Hey πŸ‘‹ I'm a computer science graduate student, a budding illustrator, GSoC contributor, and previously a backend engineer at Walmart's PhonePe and YC ClearTax, here to share my experiences, tutorials, and projects.

If you find my posts interesting, don't forget to subscribe