How to test for invalid links and images on Confluence pages

Description

This site is a Confluence Cloud site. It was migrated from the former Plugin Studio site (Migration from Plugin Studio). In migration and restructuring scenarios, it is easy to have lots of broken links and missing images. Also, many administrators know that pages that look ok to administrators may have serious problems for anonymous or other users that do not have the same authority. Links to restricted pages or images associated with restricted pages and spaces are not available to be used on open pages. The following is a way to test that the site does not show invalid (internal) links or UNKNOWN images. Unknown images result is something that looks like: .

Steps

  1. Verify you have the GINT Installation and Dependencies covered.
  2. Verify you have Atlassian Command Line Interface (CLI) or Confluence Command Line Interface (CLI) installed.
  3. Set up a CLI command to run against your site with minimal authority. anonymous is best if you have an open site, but you will need a specific user for Cloud sites as Atlassian restricts anonymous remote access.
  4. Run the test. You may want to do your initial test against a small test space. Modify the script as indicated to put in a specific space key.

Schedule to run regularly with Bamboo

Use the gint task from Groovy Tasks for Bamboo to fully automate the test. This site is automatically tested daily and the build can be run manually if there are lots of page changes that need to be verified. The automated build takes about 30+ minutes running against 2000+ pages. In addition, we automatically upload the error list (unknownErrors.csv) to this site using Confluence CLI task from Run CLI Actions in Bamboo. The page uses the CSV Macro to show any errors including linking to the page with the problem. This gives easy access to page authors.

 

 

Example CLI

I prefer to customize the atlassian.sh (or atlassian.bat for Windows) to make it easier to run CLI commands against a wide variety of remote systems or for using different logins. For this test:

atlassian.sh customization
elif [ "$application" = "confluence-bobswift-test" ]; then
    string="confluence-cli-5.1.0.jar --server https://bobswift.atlassian.net/wiki --user someuser --password somepassword "

And then use:

CLI by reference
gant -f unknown.gant -Dverbose -Dcli=atlassian confluence-bobswift-test

Or you can just use similar directly on the command line:

Direct use
gant -f unknown.gant -Dverbose -Dcli="java -jar .../lib/confluence-cli-5.1.0.jar --server https://bobswift.atlassian.net/wiki --user someuser --password somepassword "

Test Script

unknown.gant
/*
 * Test to look for unknown image attachments and unresolved internal page links for all pages in all spaces on a site
 *
 * gant -f unknown.gant -Dverbose -Dcli="atlassian confluence" -Dclean
 */

@GrabResolver(name='atlassian', root='https://maven.atlassian.com/content/groups/public/')
@Grab(group='org.swift.tools', module='gint', version='1.8.2')

import org.swift.tools.*

includeTool << org.swift.tools.Helper             // helper utilities
includeTool << org.swift.tools.GintForConfluence  // gint framework with extensions for Confluence testing

gint.initialize(this) // required

def info = gint.getServerInfoWithVerify()  // verify server is available, otherwise end test

def cmdGen = gint.getCmdGenerator()  // get default command generator - this will handle the cli command parameter

// Get a list of all spaces - products a csv file
def fileName = gint.getOutputFile('spacelist.csv')
def cmd = cmdGen.call(action: 'getSpaceList', file: fileName)
helper.runCmd(cmd: cmd)

// Read the file into a list of rows, each row is a map to access the column data - space data in this case
def mapList = helper.convertRowListToMapList(helper.csvDataAsListOfRows(fileName: fileName))
//helper.logWithFormat('mapList', mapList)

def spaces = []  // list of spaces
mapList.each { map ->
   spaces.add(map.Key)
} 

// For testing, might just want to set spaces to a specific list
//spaces = ['recipes']  // list of space keys

helper.logWithFormat('spaces', spaces)

errorFile = new File(gint.getOutputFile('unknownErrors.csv')) // simple log of each page with a broken link
errorFile.write('"Space","Title","Reason"\n')
helper.logWithFormat('errorFile', errorFile.getAbsolutePath())

// for each space, get a list of pages and create a testcase for each page
spaces.each { space ->
    fileName = gint.getOutputFile(space + '-pagelist.csv')

    cmd = cmdGen.call(
        action: 'getPageList',
        parameters: [space: space, outputFormat: 2],  // format 2 to get a csv list
        file: fileName,
    )
    helper.runCmd(cmd: cmd)

    // Read the file into a list of rows, each row is a map to access the column data - page data in this case
    mapList = helper.convertRowListToMapList(helper.csvDataAsListOfRows(fileName: fileName))

    mapList.each { map ->
        gint.add(
            name: space + '-' + map.Title.replace(' ', '+'),
            action: 'render',  // render the page
            file: true,  // result goes to the standard file
            parameters: [
                id: map.Id, // use page id to identify the page
                clean: null, // don't need styling
            ],
            output: [
                failData: [
                    'placeholder/unknown-attachment', // missing attachment like: src="/wiki/plugins/servlet/confluence/placeholder/unknown-attachment
                    'class="unresolved"', // missing link like: <a class="unresolved" href="#">missing-link</a>
                ],
            ],
            finalClosure: { testcase ->
                helper.logWithFormat('testcase', testcase)
                if (testcase.success == false) {
                    errorFile << /"${space}",${helper.quoteString(map.Title, '"')},${helper.quoteString(testcase.failReason, '"')}/ << '\n'
                }
            }
        )
    }
}
gint.finalizeTest() // final preparations for running tests