top of page
  • Writer's pictureStephen Port

Parallel Processing in PowerShell: A Game Changer for SharePoint Automation

In the realm of scripting and automation, efficiency and speed are king. PowerShell 7 introduces capabilities that are nothing short of a game changer, especially when it comes to handling repetitive tasks over a collection of objects, such as files or document libraries in SharePoint. One of the most exciting features is the ability to parallelize foreach loops, which can significantly reduce processing time and improve performance. Let's dive into how this can transform your SharePoint operations, using a practical example from a SharePoint document library update scenario I had with a client last week.


The Challenge

Consider the task of connecting to a SharePoint site and updating document sets across multiple document libraries. Traditionally, processing each document set sequentially can be time-consuming, especially when dealing with a large number of folders, documents and complex operations, such as fetching document sets, their sub-folders, and updating metadata fields.


I should point out for this scenario, the customer had a connection with another system that needed to read the metadata of folders as well as files, and document sets don't update sub-folder metadata.


The script begins with connecting to the SharePoint site, a necessary step before any operations can commence.

# Connect to the SharePoint site at the script start
$siteUrl = "https://yourtenant.sharepoint.com/sites/SandPit"
Connect-PnPOnline -Url $siteUrl -Interactive

Following the initial connection, a script block intended for parallel execution is defined, focusing on connecting to the site within each parallel block and performing a series of fetch and update operations on document sets and their sub-folders.


When I had initially created this script, it was taking far too long to run as the customer's document libraries were quite large.


The Game Changer: Parallel Execution

The real magic happens when we leverage PowerShell 7's ability to run code blocks in parallel. By using Start-ThreadJob, we can execute our script block in parallel for each document set, drastically reducing the total processing time.

# Parallel execution of the script block for each document set
Start-ThreadJob -ScriptBlock $scriptBlock -ArgumentList $docSetId, $libraryName, $siteUrl -ThrottleLimit 5

This part of the code is pivotal. It shows how each document set ID, along with the library name and site URL, are passed to the script block, which is then executed in parallel, governed by a throttle limit to control the number of concurrent jobs.


Why This Is Revolutionary

The ability to parallelise tasks in PowerShell scripts offers several benefits:

  • Efficiency: By running tasks in parallel, the total time required to process all items is significantly reduced, especially important when dealing with large datasets.

  • Scalability: Scripts can handle more data in less time, making them more scalable and suited for enterprise-level applications.

  • Flexibility: Adjusting the throttle limit allows for fine-tuning the performance impact on systems, ensuring that resources are optimally utilised without overwhelming the system.


Practical Impacts

In the context of our SharePoint document library example, using parallel processing means that updates to document sets, which might have taken hours if processed sequentially, can now be completed in a fraction of the time. This is particularly beneficial for maintenance tasks performed outside of business hours, reducing downtime and increasing productivity.


Conclusion

PowerShell 7's parallel processing capabilities represent a significant leap forward in scripting and automation. By allowing tasks to be executed concurrently, scripts become more efficient, scalable, and flexible. For SharePoint administrators and developers, this means that what was once a tedious and time-consuming task can now be completed more quickly and with less effort. The example provided, while specific to SharePoint, illustrates a broader principle that can be applied to various automation tasks across different platforms and services, truly making it a game changer in the world of scripting.


And finally, because sharing is caring. Here's how I did it.


# Connect to the SharePoint site at the script start
$siteUrl = "https://yourtenant.sharepoint.com/sites/SandPit"
Connect-PnPOnline -Url $siteUrl -Interactive

# Define a block for parallel execution
$scriptBlock = {
    param (
        [string]$docSetId,
        [string]$libraryName,
        [string]$siteUrl
    )
    
    # Inside the parallel block, each thread must establish its own connection
    Connect-PnPOnline -Url $siteUrl -Interactive

    # Now that we're connected, proceed to fetch the document set and then its sub-folders
    $docSet = Get-PnPListItem -List $libraryName -Id $docSetId
    $metadataField = $docSet["SalesforceId"]
    $subFolders = Get-PnPFolderItem -FolderSiteRelativeUrl "$libraryName/$($docSet.FieldValues.FileLeafRef)" -ItemType Folder -Recursive

    foreach ($folder in $subFolders) {
        $folderRelativeUrl = $folder.ServerRelativeURL
        $folderItem = Get-PnPFolder -Url $folderRelativeUrl -Includes ListItemAllFields
        $folderItemId = $folderItem.ListItemAllFields.Id
        
        $existingSalesforceId = $folderItem.ListItemAllFields["SalesforceId"]
        if (-not $existingSalesforceId) {
            Set-PnPListItem -List $libraryName -Identity $folderItemId -Values @{"SalesforceId" = $metadataField1}
            Write-Host "Updated folder $($folder.Name) with SalesforceId: $metadataField"
        }
    }
}

# Retrieve all document libraries excluding hidden libraries, specifically named 'Salesforce Testing DL'
$DocumentLibraries = Get-PnPList | Where-Object {$_.BaseTemplate -eq 101 -and $_.Hidden -eq $false -and $_.Title -eq 'Salesforce Testing DL'}

foreach ($DocumentLibrary in $DocumentLibraries) {
    $libraryName = $DocumentLibrary.Title
    if ($libraryName -eq "Documents") {
        $libraryName = "Shared Documents"
    }

    $documentSets = Get-PnPListItem -List $libraryName -Query "<View><Query><Where><Eq><FieldRef Name='FSObjType'/><Value Type='Integer'>1</Value></Eq></Where></Query></View>"
    foreach ($docSet in $documentSets) {
        # Pass document set ID, library name, and site URL to the script block
        $docSetId = $docSet.Id
        Start-ThreadJob -ScriptBlock $scriptBlock -ArgumentList $docSetId, $libraryName, $siteUrl -ThrottleLimit 5
    }
}

Write-Host "Processing initiated for all libraries and document sets."

Oh and lastly, a quick add on if you want to check in on the status of the jobs and do a bit of a tidy up!


# Job status checking script

# Retrieve all jobs submitted by the main script
$jobs = Get-Job

# Display a summary of job status
$jobs | ForEach-Object {
    Write-Output "Job ID: $($_.Id), Name: $($_.Name), State: $($_.State)"
}

# Offer the user an option to receive job results for completed jobs
$completedJobs = $jobs | Where-Object { $_.State -eq 'Completed' }
if ($completedJobs) {
    $choice = Read-Host "Do you want to display results for completed jobs? (Y/N)"
    if ($choice -eq 'Y') {
        $completedJobs | ForEach-Object {
            Write-Output "Results for Job ID: $($_.Id)"
            Receive-Job -Id $_.Id
        }
    }
}

# Cleanup option for completed jobs
$cleanupChoice = Read-Host "Do you want to remove completed jobs from the job list? (Y/N)"
if ($cleanupChoice -eq 'Y') {
    $completedJobs | Remove-Job
}

37 views

Comentários


bottom of page