Recovering from Lost Workflow Server in SharePoint 2013

Jan 20, 2016

Recently one of my client’s main app servers in production went belly up. We believe there was some corruption in the VM image. No matter the reason it was not a pretty situation to be in. We had to rebuild the machine. As part of this it was decided to re-install the workflow engine and the previous database associated with the engine was not used. First Problem. So the workflow engine was installed and started from scratch. And from testing everything seemed to be working fine, granted testing involved creating a new workflow in SharePoint Designer to make sure it recognized that the scope and engine was configured.

Here is where we ran into a problem. We had a custom workflow definition that we were deploying to the different site collections that were created for different groups. After the recovery was done we started receiving reports from users that the workflows would throw errors and not start. I did some digging in the ULS and found that the workflow engine would throw one of two errors. It would either throw a scope not found exception or a workflow not found exception both with a root error of a 404 coming from the workflow engine (more on this later). So I deactivated the feature that added the workflow and then let the web part that started the workflow handle the activation and setup. The workflow would still fail.

ScopeNotFoundException

Getting Error Message for Exception System.Web.HttpUnhandledException
(0x80004005): Exception of type 'System.Web.HttpUnhandledException' was thrown.
\---\> Microsoft.Workflow.Client.ScopeNotFoundException: Scope
'/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895'
was not found. HTTP headers received from the server - ActivityId:
fba718a5-6589-45ea-adb3-8a00632992f2. NodeId:. Scope:
/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895.
Client ActivityId : fbe1549d-e883-f016-e080-7488597bc10f. ---\>
System.Net.WebException: The remote server returned an error: (404) Not Found.

at Microsoft.Workflow.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)

at Microsoft.Workflow.Client.HttpGetResponseAsyncResult\`1.End(IAsyncResult
result)

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content) -

\-- End of inner exception stack trace ---

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content)

at Microsoft.Workflow.Client.WorkflowManager.StartInternal(String workflowName,
WorkflowStartParameters startParameters)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowManagementClient.StartInstance(String
serviceGroupName, String workflowName, String monitoringParam, String
activationKey, IDictionary\`2 payload)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowInstanceProvider.StartWorkflow(WorkflowSubscription
subscription, IDictionary\`2 payload)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowInstanceProvider.StartWorkflowOnListItem(WorkflowSubscription
subscription, Int32 itemId, IDictionary\`2 payload)

WorkflowNotFoundException

Exception occured in scope
Microsoft.SharePoint.WorkflowServices.WorkflowInstanceService.StartWorkflowOnListItem.
Exception=Microsoft.Workflow.Client.WorkflowNotFoundException: Workflow
'c832a00a-7ea9-4746-ab4e-bcda6b7da41e', for scope
'/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895',
was not found. HTTP headers received from the server - ActivityId:
06f0483a-c93e-4657-82a7-fa961a333c58. NodeId:. Scope:
/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895.
Client ActivityId : eae2549d-48f5-f016-e080-7b0159e68245. ---\>
System.Net.WebException: The remote server returned an error: (404) Not Found.

at Microsoft.Workflow.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)

at Microsoft.Workflow.Client.HttpGetResponseAsyncResult\`1.End(IAsyncResult
result)

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content) -

\-- End of inner exception stack trace ---

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content)

at Microsoft.Workflow.Client.WorkflowManager.StartInternal(String workflowName,
WorkflowStartParameters startParameters)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowManagementClient.StartInstance(String
serviceGroupName, String workflowName, String monitoringParam, String
activationKey, IDictionary\`2 payload)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowInstanceProvider.StartWorkflow(WorkflowSubscription
subscription, IDictionary\`2 payload)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowInstanceProvider.StartWorkflowOnListItem(WorkflowSubscription
subscription, Int32 itemId, IDictionary\`2 payload)

at
Microsoft.SharePoint.WorkflowServices.WorkflowInstanceServiceServerStub.StartWorkflowOnListItem_MethodProxy(WorkflowInstanceService
target, XmlNodeList xmlargs, ProxyContext proxyContext)

at
Microsoft.SharePoint.WorkflowServices.WorkflowInstanceServiceServerStub.InvokeMethod(Object
target, String methodName, XmlNodeList xmlargs, ProxyContext proxyContext,
Boolean& isVoid)

at Microsoft.SharePoint.Client.ServerStub.InvokeMethodWithMonitoredScope(Object
target, String methodName, XmlNodeList args, ProxyContext proxyContext, Boolean&
isVoid)

Next I decided to try adding the workflow to a test site to check if things were working. So I created a test site and I activated the feature which made the workflow available to be added to the library. So I added the workflow and tested it out on a document and everything worked fine.

So I decided to try and manually add the workflow instead of letting the web part handle adding the workflow. So I removed the workflow from the list and deactivated the feature from the site. Next I activated the feature and I didn’t receive any errors. So I moved on to adding the workflow to the library and every time I tried I got a new error, ActivityNotFoundException. This error also had an underlying 404 error coming from the workflow engine. So I found that apparently the web part was swallowing the error when adding the workflow to the list. Second Problem. Even with this error the workflow was added to the list but would go back to the previous errors when I try and start it.

ActivityNotFoundException

Error publishing workflow subscription (republish or retry publish) information:
Microsoft.Workflow.Client.ActivityNotFoundException: The activity named
'WorkflowXaml_ffb80edc_19ff_472f_b578_80e875a8a8be' from scope
'/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895'
was not found. HTTP headers received from the server - ActivityId:
365ed683-fd02-4595-b820-03c256c3bb12. NodeId:. Scope:
/SharePoint/default/3db884ad-0b1e-48dd-a168-dfd118ecd838/8826d207-8bd8-4400-91d9-47c5f8033895.
Client ActivityId : 47f4549d-2892-f016-e080-7a48e1db38ab. ---\>
System.Net.WebException: The remote server returned an error: (404) Not Found.

at Microsoft.Workflow.Common.AsyncResult.End[TAsyncResult](IAsyncResult result)

at Microsoft.Workflow.Client.HttpGetResponseAsyncResult\`1.End(IAsyncResult
result)

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content) -

\-- End of inner exception stack trace ---

at Microsoft.Workflow.Client.ClientHelpers.SendRequest[T](HttpWebRequest
request, T content)

at
Microsoft.Workflow.Client.WorkflowManagementClient.SendRequest[T](HttpWebRequest
request, T content)

at Microsoft.Workflow.Client.WorkflowManager.Publish(WorkflowDescription
description, Boolean terminateActiveInstances)

at
Microsoft.SharePoint.WorkflowServices.FabricWorkflowManagementClient.PublishWorkflow(String
serviceGroupName, WorkflowDescription workflow)

at
Microsoft.SharePoint.WorkflowServices.WorkflowProxy.CreateDefinitionSubscription(String
subscriptionName, String eventSource, List\`1 eventTypes, String definitionName,
IDictionary\`2 metadata, WorkflowEventContext eventContext)

at
Microsoft.SharePoint.WorkflowServices.FabricSubscriptionService.\<\>c__DisplayClass1.\<CreateDefinitionSubscription\>b__0()

at
Microsoft.SharePoint.WorkflowServices.WorkflowServiceContextExtensions.InvokeWithEcosystemRetry(WorkflowServicesContext
context, EcosystemRequiredMethod method)

at
Microsoft.SharePoint.WorkflowServices.FabricSubscriptionService.CreateDefinitionSubscription(String
subscriptionName, Guid eventSourceId, String eventSource, List\`1 eventNames,
String definitionName, IDictionary\`2 metadata, WorkflowEventContext
eventContext)

at
Microsoft.SharePoint.WorkflowServices.WorkflowSubscriptionStorageEventReceiver.ItemUpdating(SPItemEventProperties
properties)

I started to get frustrated with what was happening and talked it through with a coworker over lunch, Thanks Trevor Seward. Trevor reminded me that when you deploy a workflow the definition is added to the workflow engine and the workflow engine database. So when our app server died this definition was no longer on the workflow engine, hence the 404 errors. Also the deactivating of the feature did not remove the definition from the workflow engine and so the site still thought the workflow engine should have the definition. So the underlying problem was that the site had set that the definition had already been pushed to the workflow engine and didn’t do it again.

So now that we found the underlying problem how do we recover from this situation. I went into SharePoint Designer and into the workflows section and my problem workflow was still there, even after deactivating the feature. This confirmed our theory about the site still thinking the workflow was setup. The workflow was not event editable in Designer, because the definition was lost. So I told Designer to remove the workflow. I then went through the manual steps again, activate the feature, add the workflow to the library, test starting the workflow. After this the workflow started up correctly. Hallelujah!!!

Now I had the problem, how do I go about cleaning up all the site collections that rely on this workflow. The answer, PowerShell. So I created the following script that will remove the workflow definition from the site and then everything can continue as needed.

**~~**

Now keep in mind that this approach completely removes the workflow. So currently running workflows may have to start over. Which if your workflow engine died then you have this problem anyway. For this client we have proposed adding a couple more Workflow Engines to the farm to prevent this from happening again.