Pages

Tuesday, April 29, 2014

The Story of the 30 seconds freeze on Netapp filer every five hours at 9am, 2pm, 7pm and 12am

We have two netapp filers in active/active HA. One filer is for the SAS drives mostly storing the OS disks for the vmware VMs and the second filer filer02 hosting the data virtual disks for our VMware vms, CIFS and exchange database and logs.

A few weeks ago, we started noticing about 30-45 seconds freeze in outlook at 9:00am and 2:00pm. We only noticed the freeze at these times because this was during hours. We thought it was something to do with exchange servers. There were events related to delayed writes on the database and logs LUNs. However, the issue was occurring on all the exchange mailbox servers. We changed the backup schedules on all the servers. We take a backup every 4 hours. However, the issue persisted.
We then thought it could be a network issue. We use dedicated switches for storage traffic. However, our VMs ran fine and we did not notice the freeze on the VMs. When we checked the performance logs for the SAN we noticed a spike in NFS latency and CIFS latency around 9:01am, 2:01pm, 7:01pm and 12:00am. None of the de-dup schedules were running at all these four times. In the syslogs of the filer we saw the following for each of the exchange server’s iscsi initiators at the times mentioned above.
We still couldn’t understand what was causing the LUNS to reset the connection.
Event:
iscsi.notice






Severity:
notice

Message:
ISCSI: Initiator (iqn.1991-05.com.microsoft:server-mbx-05.domain.com) sent LUN Reset request, aborting all SCSI commands on lun 7
Triggered:
Sun Apr 27 00:01:32 PDT

iscsi.notice
Severity:
notice

Message:
ISCSI: New session from initiator iqn.1991-05.com.microsoft:srv-mbx-07.domain.com at IP addr 172.20.1.64

Triggered:
Fri Apr 11 09:01:19 PDT

We changed the schedule for all our VMware backups and any other backups that happened on the hour. There was still no change in this behavior.
Then I learned about the bug mentioned in
about the Netapp and NFS issue. We applied the workaround by changing the NFS.maxqueuedepth to 64 on all our hosts. Because the issue had started appearing after we had added a new host.
It still did not make a difference. Something was off and didn’t make sense. Because our VMs were not showing the freeze condition.
We went through every netapp log, every vmware log on each host but the only thing that pertained to this situation was the following in the \var\log\vmkernel.log for each host at the mentioned time.

2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x410013247ce8  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x4100132692a8  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x41001219b328  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x41001218a228  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x410013242c68  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x41001219fae8  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x41001219e5e8  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x41001219fe68  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x410012190268  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x410013234fe8  3
2014-04-28T21:01:42.787Z cpu9:10653)NFSLock: 608: Stop accessing fd 0x410012190968  3
2014-04-28T21:01:51.783Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410013246468 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001219b328 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410012190268 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410013247ce8 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001219fe68 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410013242c68 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410012190968 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001219fae8 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001218a228 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x4100132692a8 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x410013234fe8 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001323dda8 again
2014-04-28T21:01:51.801Z cpu6:8198)NFSLock: 568: Start accessing fd 0x41001219e5e8 again


We were about to give up and call support for vmware and netapp, when I thought of aggregate snapshots. When I checked the aggregate snapshots….there it was..each aggregate snapshot was taking place at the mentioned times. Changed the schedule to off-peak hours, and saw the iscsi.notice at the times i changed the schedule to. I would have never thought aggregate snapshots could cause a freeze.



Saturday, February 22, 2014

Script to monitor Exchange 2010 queue and send email alerts if threshold is reached

This is a script that can be scheduled as a scheduled task and run every few minutes on a server and exchange management tools installed. This script monitors the exchange queues on your CAS servers and send email alerts (make sure you specify an external email address as well) if the queue threshold that you specify is reached. 

# Script:    Exch2010QueueMonitor.ps1 
# Purpose:  This script can be set as a scheduled task to run every 30minutes and will monitor all exchange 
#2010 queue's. If a threshold of 30 is met an  
#            output file with the queue details will be e-mailed to all intended admins listed in the e-mail settings 

# Comments: Lines 27, 31-35 should be populated with your own e-mail settings 
# Notes:     
#            - tested with Exchange 2010 SP1 - SP3
#            - The log report output file will be created under "C:\Support\Scripts\queue.txt" 

$snapins = Get-PSSnapin
$snapins | foreach-Object {

if ($_.name -match "Exchange")
{
$exchloaded = $TRUE
}
}
if ($exchloaded -eq $TRUE)
{
if ($showgui)
{
Write-Host -ForegroundColor Green "Exchange 2010 Snapin already loaded."
}
}
else
{
Add-PSSnapin *Exchange*
if ( $showgui ) { Write-Host -ForegroundColor Green "Exchange 2010 Snapin had to be loaded." }
}

$filename = “C:\Support\Scripts\queue.txt” 
Start-Sleep -s 10 
if (Get-ExchangeServer | Where { $_.isHubTransportServer -eq $true } | get-queue | Where-Object { $_.MessageCount -gt 30 }) 



Get-ExchangeServer | Where { $_.isHubTransportServer -eq $true } | get-queue | Where-Object { $_.MessageCount -gt 30 } | Format-Table -Wrap -AutoSize | out-file -filepath C:\Support\Scripts\queue.txt 
Start-Sleep -s 10 

$smtpServer = “smtprelay.domain.com”   #your smtp server
$msg = new-object Net.Mail.MailMessage
$att = new-object Net.Mail.Attachment($filename) 
$smtp = new-object Net.Mail.SmtpClient($smtpServer) 
$msg.From = “noreply_exchange@domain.com” #send as address
$msg.To.Add("admin1@domain.com")  #change this address for admin address
$msg.To.Add("admin2@externaldomain.com") # add external email address like gmail etc
$msg.Subject = “CAS SERVER QUEUE THRESHOLD REACHED - PLEASE CHECK EXCHANGE QUEUES” 
$msg.Body = “Please see attached queue log file for queue information” 
$msg.Attachments.Add($att) 
$smtp.Send($msg) 

}

Citrix Xenapp 6 - Published Desktop disconnects when published application is launched from within a published desktop

When you attempt to launch a published application from within a published desktop session, the published desktop session will disconnect. This is because Citrix receiver by default tries to reconnect to all your open sessions on launch. So when you attempt to start the published application, it disconnects the session of the published desktop. You can modify this behavior by changing the following registry key.

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\Dazzle]
"WSCReconnectMode"="0" 

Change this key on your published desktop. Once this is modified, relaunch the published application and your desktop session should remain intact. 

Monday, February 17, 2014

SOLVED - Citrix Xenapp 6.0 hotfix rollup fails to install - Error 1904 Module C:\program files (x86)\citrix\system32\rpm.dll failed to register


When you try to install hotfix rollup on a Citrix Xenapp 6.0, you receive the following error.


To resolve this, leave this window open. And browse to C:\windows\system32\ and rename the file cutildll64.dll to cutildll64.dll.old. Now hit retry and the hotfix should install successfully. after the install completes, reboot the server and then copy the new file from C:\program files (x86)\citrix\system32\cutildll64.dll to C:\windows\system32\. 

Restore items that had last modified date of a particular day from a Netapp snapshot


We ran into a problem where we got hit by a cryptolocker on one of our cifs and that ended up encrypting a bunch of files. Now this particular cifs share had around 500GB of data. we managed to restore the share from a snapshot but the users had modified a large number of files just before the virus hit. So the dilemma was how to just restore the files that were changed on a particular day. 

Well, I love powershell for a reason. Some of the things to make sure are.
1. .snapshot directory should be visible on the netapp cifs share.
2. get the snapshot name of the day (or hour) you want to restore from. In this example, it is nightly.4

The command is

Get-ChildItem -path Y:\~snapshot\nightly.4\share \operations *.* -Recurse | where-object {$_.lastwritetime.day -eq 5 -AND $_.LastWriteTime.Month -eq 9 -AND $_.lastwritetime.year -eq 2013} | copy-item -destination Y:\share \temp\restored


you can change the following to reflect the last modified date. 
1. $_.lastwritetime.day to day of the month
2. $_.LastWriteTime.Month to the month of the year
3. $_.lastwritetime.year to the year




Monday, February 3, 2014

How to resolve Citrix desktop delivery service console discovery process errors



The discovery process might fail with the following error
“Errors occurred when using servername in the discovery process”

- If the local computer is member of the farm, start the discovery process again and add local computer to the list of the servers and run the discovery again.

- If the discovery still fails, check if the server(the datacollector) is up. And MFCOM service and IMAservice is running on the server and also on local computer.

- If MFCOM service is not running, the server will need to be rebooted

- Run the command qfarm /load to check if the local server, and the servers are in the list. If they are not, run the following command on the server that is not in the list
Net stop imaservice
Net start imaservice
Rerun discovery

- If the process still fails, check if the licensing serveris up and does not have any errors in the eventlog related to citrix

- Also check if datastore (SQl server) is up and instance that the datastore is residing on is running on it.
Once we are sure that the datastore is up, open command prompt on any of the XenApp servers and run the following command. (make sure the command prompt is run as administrator)
Dscheck
See if any inconsistencies show up. If there are inconsistencies, run the following command to clear the inconsistencies
Dscheck /clean

Run the discovery again. This should fix the problem 

Thursday, January 30, 2014

Powershell script to remove smtp addresses with a domain from mailboxes in Exchange 2010

This script is to remove a smtp domain from a client’s mailboxes. Email address policy is how these domains get added to the mailboxes. However email address policies are additive only and cannot be used to remove the domain that was added using email address policy. They have to removed manually from each mailbox.

When will we need this?

A good scenario is when a client company ABC has changed their company smtp domain from abc.com to xyz.com and no longer want to receive any email on the old smtp abc.com that is they want that if somebody sends an email to abc.com they should get a bounceback. The old domain abc.com has been  removed from the accepted domains and the MX records for abc.com no longer point to your exchange server. Externally this will work correctly because obviously the DNS has been modified and abc.com has been removed from accepted domains in our exchange. But internally on the same exchange server, users will still be able to send and receive an email to abc.com address. The email addresses with abc.com either needs to be removed manually from each users exchange properties or you can use powershell to do it for you.

How will this work?


1       I am using a custom attribute for filtering the get-mailbox command, but you can use –scope to use OU for filtering or select all the users. Modify the customattribute1 value to the of the client and domain name (domain.com) to the target smtp domain in the script and copy the script
2       Open the exchange management shell and paste the script in the shell window (press enter once)

All the email address for client with clientcode ‘clientcode’ that contain email addresses with smtp domain ‘domain.com’ will be removed.

#Script to remove email address for a particular domain as EAP is additive Only
# BEFORE USING - please change the domain name and custom attribute as mentioned in comments
# IMPORTANT: DOMAIN IS NOT YOUR AD DOMAIN BUT THE SMTP DOMAIN YOU WANT TO REMOVE


#Gets the client mailboxes for the users with customattribute1 set as 'abc'

foreach($Clientmailbox in Get-mailbox -ResultSize Unlimited | where{$_.CustomAttribute1 -eq 'abc'})
{
#for each mailbox grabs the email addresses and filters the addresslist
#for the smtp domain the needs to be removed
#and then removes the email address
#CHANGE THE DOMAIN from domain.com to the corresponding domain
$Clientmailbox.EmailAddresses |
    ?{$_.AddressString -like '*@domain.com'} | %{
      Set-Mailbox $clientmailbox -EmailAddresses @{remove=$_}
    }
}