Tuesday, September 25, 2012

SharePoint 2010 Workflow Administration

Workflows represent the future of productivity. The ability to automate is critical to making business processes efficient and consistent, and to creating a maintainable operational business environment. As business processes require more transparency for audit, more expediency for productivity, the infrastructure for workflows requires all the trappings of modern IT architecture – operational controls that promote consistency, tooling that promotes error avoidant behavior, and monitoring that promotes predictive insight and problem anticipation.

The workflow infrastructure in SharePoint 2010 leaves much to be desired from a administration management perspective. The thought process behind the workflow infrastructure is understandable, but the way that it is surfaced, from a management perspective and from a documentation perspective is not. Thought there are resources that give background to dealing with workflows, the infrastructure for administering the workflow environment is a bit Spartan at best.
This post is a synthesis of content surrounding SharePoint 2010 workflows, the resources for which are cited below. The main purpose is to organize some of the details related to workflow administration into a streamlined and readable form. 

Overview of Workflows in SharePoint 2010

Workflow capabilities are incorporated into SharePoint Foundation 2010 through Windows Workflow Foundations 3.0 (WWF). SharePoint Foundation utilizes some of the capabilities present in WWF and extends those for SharePoint specific purposes. This tightly integrates workflow capabilities into SharePoint, and allows access to key workflow services for creating and extending workflows.
Workflows available SharePoint typically originate from one of three sources:
  • Prepackaged with SharePoint
  • Declarative workflows created using SharePoint Designer
  • Developer workflows created in Visual Studio
Regardless of the source, they all run on the same underlying infrastructure, and undergo the same life cycle. The workflow life cycle consists of:
  • Workflow Deployment and Activation – Depending on the nature of the workflow, it may require deployment to the farm, and if it is not activated, it will require activation at the site collection.
  • Workflow Association – A workflow is connected to a list, library, or content type. General parameters for the use of the workflow are configured.
  • Workflow Initiation – The workflow is started via an event or manually by a participant. The workflow participant can supply information required for workflow execution, as required.
  • Workflow Execution – The workflow performs a series of task, interacting with users where necessary, to complete its function.  The workflow is persisted to storage when not in active use.
  • Workflow Finalization – All tasks for the workflow are finalized, and the workflow is marked as completed.

SharePoint 2010 Workflow Subsystem Architecture

SharePoint 2010 workflow architecture touches nearly every aspect of SharePoint. Given below is a brief description of where the workflow system interacts with the elements of the farm hierarchy.
  • Farm
    • Global settings that govern workflow processing (more on this later)
    • Workflow Timer Job – The workflow responsible for kicking off the processing of workflow events asynchronously. By default runs every 5 minutes.
    • Solution store for the deployment of custom workflows.
  • Server
    • Microsoft SharePoint Foundation Workflow Timer Service (spworkflowtimerV4) – The service responsible for processing workflow items.  Runs on every server by default.
    • W3WP Process – Service responsible for running workflows in a synchronous manner.
  • Web Application
    • Workflow Auto Cleanup Timer Job – Workflow to clean up tasks and completed workflows older than 60 days.
    • Workflow Failover – Processes events for workflows that have failed and are marked to be retried.
  • Site collection
    • Workflow feature activation.
    • Workflow dashboard for monitoring workflow status across the site collection.
  • Site
    • Sites, content types, lists, and libraries for the association of workflows.
    • Infrastructure for workflow management - This includes initiation forms, workflow status pages, task and history lists, and reporting on workflow status.
  • Database Server – Content Database
    • Workflow Table – Details workflows that are running.
    • Workflow Association Table – Provides cross reference for workflow associations with content types, lists, and libraries.
    • Scheduled Work Items Table – Work items that are queued for future processing.

Workflow Processing in SharePoint 2010

So how do workflows really work in SharePoint 2010, once they have been deployed and activated. 
The workflow logic is actually pretty simple. A user or other agent kicks off a workflow on a site, list, or library. The workflow is picked up for processing by the W3WP process. If sufficient resources exist for the processing of the workflow then it is executed by the W3WP until the first commit point. Then it is placed in storage until further processing is required. If resources are not available, the workflow is queued and run under the workflow timer service at a point in the future. The execution of queued workflows is dependent upon a timer job for execution.  Below is the indication of what the use sees when a workflow is queued for execution at a later point in time.
Queued Workflow
Understanding this processing duality is core to administering and managing workflows. The duality has a number of implications, the most important of which are:
  1. Calibrating servers for workflow processing
  2. Scaling out a farm
For most services the process of scaling out simply involves turning on the requisite service on the desired server. Thus for Excel Services, to have it run on two application servers in a medium size farm requires starting the Excel Calculation Services service on the two target application servers, and stopping it on the remaining servers.
This is not how it works with SharePoint 2010 workflows. There are a number of settings involved in scaling out a workflow topology.  The settings are available primarily through stsadm (deprecated) and PowerShell. The issue is that there is not a single environment that presents all of the relevant settings for calibration and monitoring.

Working with Workflow Settings

In SharePoint 2007 there are a number of stsadm commands that were documented and available for managing the workflow environment. Thought they are deprecated, they work as a baseline for understanding how to process workflows in SharePoint 2010. 
Prior to working through some of the settings related to the workflow infrastructure, I want to share some references that have assisted with understanding this whole process:

As I have read through the literature I have come up with the following table that translates the commands from the stsadm world to PowerShell, and provides the best explanation for the given setting. Note that these settings are global to the farm, and affect all servers. There are several other settings that are not included below which can affect performance, but are not included as part of this posting. The key settings for managing workflow processing are:

1. Number of workflow events executed at a time
  • stsadm - Workitem-eventdelivery-batchsize
  • PowerShell – Set-SPFarmConfig –WorkflowBatchSize
  • Purpose – Determines the number events (paging size) delivered to single instance of a workflow. Default is 100 events.
  • Affects – W3WP and workflow timer service executed workflows.

2. Number of workflows executed across the farm
  • stsadm -  Workflow-eventdelivery-throttle
  • PowerShell – Set-SPFarmConfig –WorkflowPostponeThreshold
  • Purpose – Controls the number of workflows that can be executed against a single content database. Default is 15 workflows.
  • Affects – W3WP process workflows.

3. Frequency of workflow timer job
  • stsadm – job-workflow
  • PowerShell – Set-SPTimerJob –Identity job-workflow
  • Purpose – Controls the length of time between the batching of workflow execution. Default is 5 minutes.
  • Affects – Workflow timer service executed workflow.

Administering and Managing a SharePoint 2010 Farm

In light of the settings detailed above it is apparent that there are some serious considerations when planning for workflows in a SharePoint 2010 farm. Out of the box all may work well, until that fateful day when the server is slowing down, or when it is time to dedicate a single server to the task of workflow processing.
The references cited above give a good background to the expected performance of a SharePoint farm, and to the means for managing the settings on the farm. However, the interaction of the three settings above can be a bit confusing.
The most misunderstood portion of the workflow processing infrastructure is the split between the W3WP process and the workflow timer service. They work in concert to provide a real time experience when necessary, but to also provide a means to throttle workflows when the demand increases beyond a give threshold. 
Real time processing (synchronous) occurs with the W3WP process. As a user interacts with a workflow the workflow typically runs under the W3WP process. However, when (1) the number of workflows exceeds the throttle size or (2) the number of event items exceeds the event delivery threshold (default 15 and 100 respectively), the workflow is queued.
Synchronous processing of the workflow takes over at this point. The workflow is now executed according to the time setting for the execution of the workflow time job, which runs every five minutes by default.

Calibrate the Throttle for the W3WP

If performance counters show that more real time processing is required, then the throttle setting can be boosted to allow for more workflows to run under the W3WP process, and users interacting with the workflow system will have a more ‘real time’ experience.
If performance counters show that the real time processing is saturating the server, and workflow usage tends to spike, then the the throttle setting can be reduced somewhat to run the workflows asynchronously.
The settings for the throttle setting can be no lower than 1, and the recommendation is not to go above 200. Each workflow runs in its own thread and opens a concurrent connection to the SQL server.

Calibrate the Batch Size and Timer for the Workflow Timer Job

Items that are queued are eventually processed, but this happens on a timer job basis. The math is pretty straight forward: by default the batch size is 100 and the timer job frame is five minutes. This means that 100 event items are processed every five minutes. 
Note that events are single items in a workflow queue – they represent a single scheduled operation that is part of a workflow. As such, a workflow could generate tens of events in a single call.
So to manage the queued events there are two options: work with the batch size or work with the timer frequency. 
  • Batch Size: Changing the batch size will change the throughput of items queued for the workflow timer service, but remember, this also affects the W3WP processing of workflows. If you change this for the workflow timer service, it will also be change for the W3WP process, altering the way that workflows are processed synchronously.   
  • Timer Job Frequency: Changing the timer job alters the frequency at which jobs are executed, and allows for managing the workflow timer job queue without affecting the W3WP process. The timer job can not be set to lower than 1 minute.

Scaling Out a Farm

The scenarios posited in the referenced documentation show workflows are managed across front end web servers. The scaling in the two primary reference documents show scenarios from one to eight web servers, and depict a linear scaling of performance from one to four, with a leveling off between five and eight.  The scenarios are good for performance calibration and documentation, but are unrealistic in terms of a farm build out. A reference architecture for a farm with eight servers typically boasts perhaps four web servers with the remaining distributed as application servers hosting search, BI, and other services.  Into this type of heterogeneous farm is the likely scenario where a workflow server may be necessary. Assessment, workflow type analysis, and monitoring are the key to figuring out if a ‘workflow server’ are necessary.
How do you scale out to a dedicated workflow server? By forcing most workflows to run under the workflow timer service. The steps would go something like this:
  • Install the server that will host the workflow timer service. Configure other services as desired.
  • On the other servers in the farm turn off the workflow timer service.

Workflow timer service
  • Set the throttle threshold to its lowest value. This will force most workflows to run under the timer service.
    • Set-SPFarmConfig –WorkflowPostponeThreshold 1
  • Adjust the workflow timer job setting so that it runs frequently, giving a more near real time experience for queued workflows.
    • Set-SPTimerJob –Identity Job-Workflow –Schedule “Every 1 minutes between 0 and 59”

Note that this is a configuration that will push workflows to the workflow application server, but will also result in some delays for users that are participating in workflow processes. Depending on the nature of the workflow environment and the type of workflows being run, this type of configuration may or may not be acceptable. Monitoring and interaction with the user base are necessary to calibrate and ascertain if the settings are optimal.
Also note that some workflows will run on all front end servers. This cannot be stopped as far as I know. The smallest increment for controlling workflows that run under the W3WP is 1 – that means the W3WP on each front end server can run one workflow at a time, and all others will be queued to the workflow timer service.

Is that all there is to it?

No.
Workflows involve a number of other settings that are integral to the management of the workflow process. Depending on the size and complexity of workflows there are a number of other elements that may require attention.  These include such things as workflow event delivery timeout, failover timer jobs, workflow failover batch size, tasks and history list size management, and more. The key is that the workflow subsystem is an important part of SharePoint 2010 and requires tuning, monitoring, and management.
This post was intended as an overview of the management landscape, and introduction to some of the tooling required for management of the workflow infrastructure. Though I have touched on some elements of this infrastructure, there is much more to be explored. And, having provided links above to several resources, the reader is tasked with delving in to some of the nuances related to workflow processing. Enjoy.

No comments:

Post a Comment