Welcome to MrBoDean.net on Github pages

Everybody out of the pool, the Application Pool

2017-08-23

Hi my name is MrBoDean and I need to confess that I am not running a supported version of SCCM. Yes, I am migrating to Current Branch but the majority of my system are still in SCCM 2012 R2 without SP1. The reason why is quite boring and repeated and many companies I am sure, but takes a while to tell. So for the past year it feels like duct tape and bailing wire are all that is keeping the 2012 environment up while we try to upgrade between a string of crisis. It is a shame when the Premier Support engineers are on a 1st name basis with you.

So Monday night I was called at 2AM because OSD builds were failing. It happens and most times a quick review of the log files points you in the right direction. Not so much this time around. The builds were failing to even start, everyone was failing with the error Failed to get client identity with error 0x80072efe. I have 4 Management Points they can not all be down. They are are up and responding when I test them with

http://servername/sms_mp/.sms_aut?mplist
http://servername/sms_mp/.sms_aut?mpcert

Ok back the the smsts.log for the client that is failing. It starts up up fine and even does its initial communication for MPLocation and gets a response 0 https and 4 http locations are returned from MP http://MP4.mrbodean.net.

It picks the 1st MP in the list and sends a Client Identity Request. That fails quickly with a timeout error. failed to receive response with winhttp; 80072efe While it does retry it only submits the request to one MP. But the retry fails with the same error and the build fails before it even starts. Nothing stands out initially and being a little groggy, I go for the old stand by of turning it off and back on again. MP1 was the once getting the timeouts so it gets the reboot. After the reboot we try and again and get the same error. At this point a couple of hours have passed and the overnight OSD builds are canceled and I grab a quick nap and start again 1st thing in the morning. Well that was the plan until the day crew starts trying to do OSD build and everything everywhere is failing. So I open a critical case with Microsoft. While waiting for the engineer to call I keep looking at logs try to identify what is going on. I RDP into MP1 to check the iis configuration and notice that the system is slow to launch applications. I take a peek at task manager and see that RPC requests were consuming 75% on the available memory. To reset those connections and to get the system responsive quickly, down it went for another reboot. Once it came back up, I took a chance and tried to start a OSD build. This time it worked. So the good news goes out to the field techs. Now I just need to figure out what happened to explain why. Management always needs to know why and what are you doing to not let it happen again. About this time the Microsoft engineer calls and we lower the case to a normal severity. I capture some logs for him and to his credit he quickly finds that MP1 was experiencing returning a 503.2 iis status when the overnight builds were failing. To reduce the risk of this occurring again we set the connection pool limit to 2000 for the application pool “CCM Server Framework Pool” on the management points. I get the task of monitoring to make sure the issue does not return and we agree to touch base the next day. Well I am curious about what lead to this and how long it has been going on. Going back the the past couple of days I see a clear spike in the 503 errors Monday evening starting with a few thousand and ramping up to over 300,000 by Tuesday morning. While I recommend using log parser to analyze the iis logs if you are just looking for a count of a single status code you can get it with powershell. This will give you the count of the 503 status with a subcode of 2. (Just be sure to update the log file name to the date you are checking.

(get-content 'C:\inetpub\logs\LogFiles\W3SVC1\u_ex170823.log''?{$_ -match " 503 2 "}).count

While I still have not found out why, at least I know what was causing the timeout error. While that knowledge, I finally get some sleep. Surprised that no one called to wake me up because the issue was occurring again, I manage to get into the office early and start looking at the logs again and see another large spike in the 503 errors. I do a quick test to be sure OSD is working and it is. A quick email to the Microsoft engineer and some more log captures leads to an interesting conversation.

We check to make sure that the clients are using all the management points with this sql query

select LastMPServerName, count(*) from CH_ClientSummary group by LastMPServerName

And we see that the clients are using all the management points but MP1 and MP4 have about twice as many clients as the other two management points. next we check the number of web connections bot of these servers have with netstat: in a command prompt netstat -a -n'find /c "80" *Just in case you try and run this command in powershell, you will find that the powershell parser will evaluate the quotes and cause the find command to fail. To run the command in powershell escape the quotes.

netstat -a -n'find /c `"80`"

This showed the MP1 and MP4 were maintaining around 2000 connections each. With a app pool connection limit of 2000 this means that any delay on processing requests can quickly lead to the application pool being exceeded and lots of 503 errors will result. So this time the connection pool limit was set to 5000. But a word of caution before you do this in your environment. When a request is waiting in the queue, by default it must complete within 5 minutes or it is kicked out and the request will have to be retried. Be sure that your servers have the CPU and Memory resources to handle additional load that this may cause.

In SCCM 2012 R2 pre SP1 there is no preferred management point. The preferred management points where added in SP1 and improved in Current Branch to be preferred by Boundary Groups. In 2012 your 1st Management point is the preferred MP until the client location process rotates the MP or the client is unable to communicate with a MP for 40 minutes. In this case MP1 is the initial MP for all OSD builds because it is always 1st in a alphabetical sorted list. MP4 is the default MP for the script used for manual client installs. If my migration to Current Branch was done I would be able to assign management point to boundary groups and better balance out the load. But until then I am tweaking the connection limit on the application pool to keeps things working. Hopefully you are not in the same boat but if you are maybe this can help.

TP 1707 Run Scripts Thoughts

2017-08-01

I just finished running kicking the tires on the Configuration Manager Technical Preview 1707 Run Scripts feature. This is a post on my personal thoughts about the functionality and as time moves on and more improvements are made all these thoughts may (and hopefully should) become irrelevant.

This has great potential but also poses several challenges.

  • Currently the results of the script execution are hard to report on for most users. For example, Someone important wants you to run a script and report with systems reported XYZ. To do that you have to query SQL and then parse the results. But if you are not the SCCM Admin you may not have any SQL rights to the database. So building custom reports is in your future until there are some canned ones.
  • Releasing this is a large enterprise is almost a no go until the ability to apply scopes and RBAC goodness is available. I can just imagine explaining this to compliance officers and auditors. Oh the meetings we will have.
  • There seems to be plans to support revisions of scripts but for the moment you have to be perfect or start over. As my wife often reminds me, I am good but not perfect. I need to correct mistakes. As admins we get to update all the other objects if needed, we will need to update scripts.
  • A kill switch. One bad script deployed to everything could suck up resources or do worse on every system. Good review and use of the approval workflow should help but you know someone will do it.

Configuration Manager TP 1707 - Run Scripts

2017-08-01

I want to talk a bit about the new Run Script feature that was added in 1706. In Technical Preview 1707 it gained the option to add parameters to a script. This has the potential to huge benefit many users of Config Manager and is a great example of SAAS quickly delivering functionality.

Creating a script if very straight forward for this example it is just a Query of WMI for Win32_ComputerSystem

After the script is created, You must approve it. (There is a hierarchy setting to allow\stop authors to approve their own scripts. This should only be allowed in a test environment. ) After the script has been approved it can be run. To run a script go to a collection with the systems you would like to target. You can run the script against the collection as a whole or individual systems in the collection. (You must show the collection membership to target individual systems. The Run Script option is not available via the default device view.)

Next select the script to run

To view the results of the script execution you will need to use Script Status in the Monitoring view.

Any output from the script is stored in Script Output. For a good peak at what is going on behind the scenes check out this great write up from the 1706 TP by Tom Degreef

Now for the new stuff. Parameters!! Create a new script using the same simple wmi query with a parameter.

If you click next you will be able to set the default value for the parameter.

BUG… errr feature alert… If you click next or back without editing the parameter value the edit button is no longer present.

Not to worry you will be able to edit the parameter at run time.

When you run a script with a parameter you get a new dialog that allows you to edit the parameter values.

If you were not able or choose to not set the value when creating the script, click on the parameter name and click edit. Be sure the parameter name is highlighted or the edit button will not do anything. I spent a bit thinking how silly to no be able to edit a parameter more then once. Rechecking my steps proved that was not the case.

Set the parameter value and let the script run.

Hopefully this will get you started with running scripts with parameters

Cleaning Up WSUS based on what you are not deploying in Configuration Manager

2017-07-01

Let me start with this statement, I wish I had something other than WSUS stuff to talk about. It has been another long week and more issues related to patching. Even with all the other tips I have shared, we experienced major issues getting patches applied. In case you are not aware the windows update agent can have a memory allocation error . The good news is that is you keep your systems patched there is a hotfix to address the issue on most systems. The bad news is that the patch for the issue was not made available for the Standard Editions of Windows Server 2008 and 2012. If you have these operating systems installed and they are 64 versions; with plenty of memory, you may not see the issue or it may just be transitory and clear up on the next update scan. I am not that lucky and have lots of Windows 2012 Standard servers with 2GB of memory. The strange part of this is that it seemed like some systems would complete a scan and report a success only to report corruption of the windows update data store. This would cause the next update scan to be a full scan and it would rebuild the local data store and the cycle of issues would start again. The fun part is that when this is occurring if you deploy patches via Configuration Manager the client will fail to identify any patches to apply and report that is compliant for the updates in the deployment. The next successful software update scan would then find the patches missing and the system will return to a non compliance state. (This is justification for external verification of patch installs from what ever product you use to install patches. But that is a story for another day. ) So back to the post from Microsoft on the issue, basically if you can not apply the hot fix you have 2 options.

  1. Move wuauserv (Windows Update Agent) to its own process. (But on systems with less than 4GB of memory this will not gain you much and can be counter productive and impact applications running on the server. )
  2. Cleanup WSUS For my issue adding memory to the clients was recommended and the Server team to make the change. But one of the joys of working in a large enterprise is that this will take awhile, (not weeks .. months at least). But in the interim, I need to do everything possible to decline updates in WSUS to reduce the catalog size. At the start of these steps I had ~6200 un-declined updates in WSUS. The guidance I got from Microsoft was to target between 4000 -5000 updates in the catalog. But the lower the number the better off we would be.

Step one review the products and categories that we sync. This was easy because we already review this routinely. There was not much to change but I did trim a few and could decline a 100 or so updates. Not much everything helps.

Step two review the superseded updates. Due to earlier patching issues our patching team had requested that we keep superseded updates for 60 days. Now this was before the updates had moved to the cumulative model and at this point ensuring the current security patches were applying was critical. (Thank you wanttocry and notpetya) So I checked to see which updates had been superseded for 30 days. I found ~1300, checking for less then 30 days only found 1 more. Big win there so after declining those the WSUS catalog was down to ~4700 updates. That got us under the upper limit of the suggested target. After triggering scans on the systems having issues and reviewing the status, it did help but not enough to call it significant improvement.

Step three break out the coffee and dig in. Wouldn’t be great to see what patches had not been declined that and are not deployed in Configuration Manager. Easy enough to see what is not deployed in the console for SCCM but you have to look up the update in WSUS to see if it has been declined. At this point I am on the hook to stay up and monitor the patching installs and help the patching team; there are a couple of hours to kill between the end of the office day and when the bulk of our patch installs occur. So I started poking around to see what I could do to automate the comparison between Configuration Manager and WSUS. Our good friend PowerShell to the rescue. First thing is to get the patches from SCCM. This

$SCCMServer = "YourServerHere"
$SiteCode = "YourSiteCodeHere"
Get-WmiObject -Namespace "root\SMS\site_$SiteCode" -class SMS_SoftwareUpdate -ComputerName $SCCMServer'select -First1

This connects to your server and gets all the patches listed in the console and selects the first one so you can take a look at all the properties. I am excluding a few with identifying information but you will see something similar.

ArticleID                          : 949189
BulletinID                         :
CategoryInstance_UniqueIDs         : {UpdateClassification:cd5ffd1e-e932-4e3a-bf74-18bf0b1bbd83, Product:ba0ae9cc-5f01-
                                     40b4-ac3f-50192b5d6aaf, Locale:0}
CI_ID                              : 16783644
CI_UniqueID                        : 2e6d75eb-f486-4e40-909d-615e43de537f
CIType_ID                          : 8
CIVersion                          : 101
CreatedBy                          :
CustomSeverity                     : 0
CustomSeverityName                 :
DateCreated                        : 20140104043952.000000+000
DateLastModified                   : 20170629045030.980000+000
DatePosted                         : 20080408170000.000000+000
DateRevised                        : 20080408170000.000000+000
EffectiveDate                      :
EULAAccepted                       : 2
EULAExists                         : False
EULASignoffDate                    :
EULASignoffUser                    :
ExecutionContext                   : 0
IsBundle                           : True
IsContentProvisioned               : False
IsDeployable                       : False
IsDeployed                         : False
IsDigest                           : True
IsEnabled                          : True
IsExpired                          : True
IsHidden                           : False
IsLatest                           : True
IsMetadataOnlyUpdate               : False
IsOfflineServiceable               : True
IsQuarantined                      : False
IsSuperseded                       : False
IsUserDefined                      : False
IsVersionCompatible                :
LastModifiedBy                     :
LastStatusTime                     : 20170630172253.503000+000
LocalizedCategoryInstanceNames     : {Updates, Windows Server 2008, }
LocalizedDescription               : Install this update to resolve an issue where a replica domain controller may sile
                                     ntly fail to receive updates to some object attributes during the inbound replicat
                                     ion, when the replica domain controller is running Windows Server 2008 with the Ja
                                     panese Language locale installed. After you install this item, you may have to res
                                     tart your computer.
LocalizedDisplayName               : Update for Windows Server 2008 x64 Edition (KB949189)
LocalizedEulas                     :
LocalizedInformation               :
LocalizedInformativeURL            : http://go.microsoft.com/fwlink/?LinkId=111973
LocalizedPropertyLocaleID          : 9
MaxExecutionTime                   : 600
ModelID                            : 16783644
ModelName                          : Site_56FE87C5-F355-45F9-BE43-BE8D575809F4/SUM_2e6d75eb-f486-4e40-909d-615e43de537f
NumMissing                         : 0
NumNotApplicable                   : 59326
NumPresent                         : 0
NumTotal                           : 59568
NumUnknown                         : 242
PercentCompliant                   : 100
PermittedUses                      : 0
PlatformCategoryInstance_UniqueIDs : {}
PlatformType                       : 1
RequiresExclusiveHandling          : False
RevisionNumber                     : 101
SDMPackageLocalizedData            :
SDMPackageVersion                  : 101
SDMPackageXML                      :
SecuredScopeNames                  : {}
SedoObjectVersion                  :
Severity                           : 0
SeverityName                       :
Size                               :

Looks great and lots of thing to use to select patches to check on. However if you use query or filter you will find that a lot of those properties are lazy properties. If you pull all the properties for the 1000s of patches the script will run a looooong time. However if you so a select on the object you will get the value reported from the query and you can select what you want using a where-object in PowerShell. I decided that the following properties would allow me to evaluate the patches: LocalizedDisplayName, CI_UniqueID, IsDeployed, NumMissing

$SCCMServer = "YourServerHere" 
$SiteCode = "YourSiteCodeHere" 
$cmpatches = Get-WmiObject -Namespace "root\SMS\site_$SiteCode" -class SMS_SoftwareUpdate -ComputerName $SCCMServer'LocalizedDisplayName, CI_UniqueID, IsDeployed, NumMissing

Now to get patches that are not deployed and are not required

$cmpatches'?{($_.IsDeployed -eq $false) -and ($_.NumMissing -eq 0)}'out-gridview

And patches that are not deployed and are required

$cmpatches'?{($_.IsDeployed -eq $false) -and ($_.NumMissing -ne 0)}'out-gridview

Using this information you are determine a criteria to select the patches to decline. I settled on patches that are not required and not deployed and have been available for more then 30 days. You can download the script from https://gallery.technet.microsoft.com/Decline-Update-in-WSUS-e934565f

Function Unpublish-UnUsedCMPatches{
    [CmdletBinding()]
    Param(
        # WSUS Server Name
        [Parameter(Mandatory=$true,
                   ValueFromPipelineByPropertyName=$true)]
        [string]$WsusServer,
        # Use SSL for the WSUS connection. Default value is $False
        [Parameter()]
        [bool]$UseSSL = $False,
        # Port to use for WSUS connection. Default value is 8530
        [Parameter()]
        [int]$PortNumber = 8530,
        # Configuration Manager Site Server with SMS Provider role
        [Parameter(Mandatory=$true,
                   ValueFromPipelineByPropertyName=$true)]
        [string]$SCCMServer,
        # Configuration Manager Site Code
        [Parameter(Mandatory=$true,
                   ValueFromPipelineByPropertyName=$true)]
        [string]$SiteCode,
        # Decline Updates. Defaults to False for safety
        [Parameter()]
        [switch]$Decline
    )
    $outPath = Split-Path $script:MyInvocation.MyCommand.Path
    $outDeclineList = Join-Path $outPath "DeclinedUpdates.csv"
    $outDeclineListBackup = Join-Path $outPath "DeclinedUpdatesBackup.csv"
    if(Test-Path -Path $outDeclineList){Copy-Item -Path $outDeclineList -Destination $outDeclineListBackup -Force}
    "UpdateID, RevisionNumber, Title, KBArticle, SecurityBulletin, LastLevel" ' Out-File $outDeclineList
    $cmpatchlist = Get-WmiObject -Namespace "root\SMS\site_$SiteCode" -class SMS_SoftwareUpdate -ComputerName $SCCMServer'Select-Object LocalizedDisplayName, CI_UniqueID, IsDeployed, NumMissing
    $cmpatchlistcount = $cmpatchlist.Count
    If($cmpatchlistcount -eq 0){ 
        Throw "Error Collecting patches from $SCCMServer"
        return
    }
    "Found $cmpatchlistcount updates on $SCCMServer"
    $cmpatchlist = $cmpatchlist'?{($_.IsDeployed -eq $false) -and ($_.NumMissing -eq 0)}
    $cmpatchlistcount = $cmpatchlist.Count
    "Found $cmpatchlistcount updates on $SCCMServer that are not deployed and not required. These will be evaluated to determine if they are older then 30 days and not declined. "
    #Connect to the WSUS 3.0 interface.
    [reflection.assembly]::LoadWithPartialName("Microsoft.UpdateServices.Administration") ' out-null
    If($? -eq $False)
    {       Write-Warning "Something went wrong connecting to the WSUS interface on $WsusServer server: $($Error[0].Exception.Message)"
            $ErrorActionPreference = $script:CurrentErrorActionPreference
            Return
    }
    $WsusServerAdminProxy = [Microsoft.UpdateServices.Administration.AdminProxy]::GetUpdateServer($WsusServer,$UseSSL,$PortNumber);
    if($i){Remove-Variable i}
    If($updatesDeclined){Remove-Variable updatesDeclined}
    $updatesDeclined =0
    Foreach($item in $cmpatchlist){
        Remove-Variable patch -ErrorAction silentlycontinue
        $i++
        $percentComplete = "{0:N2}" -f (($i/$cmpatchlistcount) * 100)
        Write-Progress -Activity "Decline Unused Updates" -Status "Checking if update #$i/$cmpatchlistcount - $($item.LocalizedDisplayName)" -PercentComplete $percentComplete -CurrentOperation "$($percentComplete)% complete"
        Try{
            $patch = $WsusServerAdminProxy.getUpdate([guid]$item.CI_UniqueID)
            if(($patch.IsDeclined -eq $false) -and ( ($patch.CreationDate) -lt (get-date).AddDays(-30) ) ){
                Write-Progress -Activity "Decline Unused Updates" -Status "Declining update #$i/$cmpatchlistcount - $($item.LocalizedDisplayName)" -PercentComplete $percentComplete -CurrentOperation "$($percentComplete)% complete"
                "$($patch.Id.UpdateId.Guid), $($patch.Id.RevisionNumber), $($patch.Title), $($patch.KnowledgeBaseArticles), $($patch.SecurityBulletins), $($patch.HasSupersededUpdates)" ' Out-File $outDeclineList -Append
                If($Decline){$patch.Decline()}
                $updatesDeclined++
            }else{
                Write-Progress -Activity "Decline Unused Updates" -Status "Update #$i/$cmpatchlistcount - $($item.LocalizedDisplayName) is already declined or was recieved withing the last 30 days" -PercentComplete $percentComplete -CurrentOperation "$($percentComplete)% complete"
            }
        }catch{
        #"$($item.LocalizedDisplayName) was not found in WSUS"
        Write-Progress -Activity "Decline Unused Updates" -Status "Update #$i/$cmpatchlistcount - $($item.LocalizedDisplayName) was not found in WSUS" -PercentComplete $percentComplete -CurrentOperation "$($percentComplete)% complete"
        }
    }
    If($Decline -eq $False){"$updatesDeclined updates would have been declined"}else{"$updatesDeclined updates were declined"}
}

Another ~2500ish declined and now the WSUS catalog is down to ~2200 patches. This did help improve the scans and patch deployments for all but the servers with 2GB of memory. But the patches for them can be delivered via a software distribution package until all the memory upgrades are completed.

WSUS Error Codes

2017-06-26

I have found that troubleshooting WSUS is like peeling an Onion. Fix one thing only to find another problem. It is enough to make you cry\scream\drink\etc…

This post is how I approach two common issues. The error codes below come from the client logs and\or SQL. If you need some help pulling the error codes from SQL see http://www.mrbodean.net/2017/06/25/software-update-troubleshooting-finding-the-problem-children/


0x80072EE2 - The operation timed out

This can be caused by anything that impacts communication between the client and the WSUS server. Here is my list to check before asking the network guys what changed:

  • Ensure that the WSUS IIS application pool is running on the WSUS server the client is communicating with.
  • Check the CPU & Memory Utilization on the WSUS server. High utilization by the WSUS IIS application pool can cause timeouts for the clients. This is also a sign that you may need to do some clean up or reindex the WSUS database, see http://www.mrbodean.net/2017/06/04/wsus-the-redheaded-step-child-of-configuration-manager/
  • Check the event logs on the WSUS server for WSUS IIS application pool crashes. This is a definite sign that you need to do some clean and reindex the WSUS database. see http://www.mrbodean.net/2017/06/04/wsus-the-redheaded-step-child-of-configuration-manager/
  • Make sure the WSUS server is up. Yes, I know that this should be 1st. But if you follow directions like me, it is right where it should be.
  • Ensure that the client can communicate to WSUS server over the correct port. Use this url and replace the server name and port to match your environment. http://:8530/ClientWebService/susserverversion.xml

    • If the xml request fails you may have a new firewall and\or acl blocking communication. Bake some cookies and ask the network team what happened. Withhold the cookies until everything works or they prove it is not the network.

0x80244010 - Exceeded max server round trips

This is a long standing issue with WSUS, see https://blogs.technet.microsoft.com/sus/2008/09/18/wsus-clients-fail-with-warning-syncserverupdatesinternal-failed-0x80244010/

1st step is to decline unused updates. Make sure you only sync what you are patching and decline what is not being used, see http://www.mrbodean.net/2017/06/04/wsus-the-redheaded-step-child-of-configuration-manager/ (It feels like I am beating a dead horse, but you have no ideal how many times that has been the resolution.)

After doing the clean up you may find that you may need to increase the Max XML per Request. By default the xml response is capped at 5MB and limited to 200 exchanges (round trips) See the Microsoft Blog post above. The sql query will below will allow for an unlimited sized response. (BE AWARE THIS CAN HAVE NEGATIVE IMPACTS! - Your network team may come find you and withhold cookies until you stop saturating all the WAN Links.) You may need to turn this on and off to address issues. If you have a large population of clients on the other side of a slow link and need to frequently enable this, you may need to rethink your design for WSUS or SUP for SCCM.

USE SUSDB
GO
UPDATE tbConfigurationC SET MaxXMLPerRequest = 0

To return this to the default setting

USE SUSDB
GO
UPDATE tbConfigurationC SET MaxXMLPerRequest = 5242880