Delayed Write Errors Copying Large Files to a Compressed folder

UPDATE – 2/8/2012
Edit – 2/9/2012 (changed the RAM to reflect an upgrade to 16GB. I incorrectly stated 18GB)

We purchased a new more powerful SAN (HP P2000) and migrated our SQL Server over to it. I was excited to test this again on the new SAN to see if it made any improvement by having a faster disk sub-system. I took a 60GB file this time from one LUN and copied it over to another LUN into a compressed folder. We past the 36GB (previously the maximum size file we could copy in this scenario) mark and there were no signs of the Dirty Pages getting close to the threshold. We past 40GB….we past 50GB…and as I was about to stop the process I noticed that the Dirty Pages began to rise and rise quickly. It got to 57GB this time before we encountered the same issues as we previously did. Discussing this with a co-worker, I finally saw the light. We not only changed our SAN but we increased the memory of the server from 6GB to 16GB. This issue appears to have a direct link to the amount of memory on the server.  It appears when compressing and decompressing files that this process is happening in memory and when the cache fills up…that’s it. You need to have enough memory on the machine to handle the compression/decompression…period. So now our new threshold is 57GB with 16GB of memory.

———————————————————————————

In short, we had an issue that was caused by copying a large file to a compressed folder on a different LUN on our MSA1000 SAN. Basically the Windows Internal Cache Manager, which is a subsystem of the Memory Manager, gets filled up faster than the data can be written to disk. Since the Memory Manager/Cache Manager are GLOBAL to the system, this caused EVERYTHING on the system to come to a crawl while the uncommitted writes attempted to complete.

This Microsoft article describes the issue that we had but unfortunately the fix doesn’t work in our situation. The issue that is described in the article is related to copying a large file from a fast disk to a slower disk. In our case, the source disk and destination disk are both the exact same speed. What was making the writes slower at the new location was the compression. Even though the file is compressed at the source location, during the copy, the file is decompressed and then re-compressed. Nothing I could find directly addressed this scenario.

To prove this issue was happening, I installed the Debugging Tools for Microsoft Windows on the affected server and then ran a !defwrites like the article suggested. I let the copy start and kept running !defwrites every few minutes and sure enough the CcTotalDirtyPages continued to increase until it hit the Threshold. You can see this below. At this point the throttling kicked in and began slowing everything down so that the writes could attempt to finish.

*** Cache Write Throttle Analysis ***

CcTotalDirtyPages:                764841 ( 3059364 Kb)
CcDirtyPageThreshold:             764834 ( 3059336 Kb)
MmAvailablePages:                 413392 ( 1653568 Kb)
MmThrottleTop:                       450 (    1800 Kb)
MmThrottleBottom:                     80 (     320 Kb)
MmModifiedPageListHead.Total:     401989 ( 1607956 Kb)

CcTotalDirtyPages >= CcDirtyPageThreshold, writes throttled

Check these thread(s): CcWriteBehind(LazyWriter)
Check critical workqueue for the lazy writer, !exqueue 16
Cc Deferred Write list: (CcDeferredWrites)
File: fffffadf6c02ff40 Event: fffffadf5d0a7548
File: fffffadf6c0c2600 Event: fffffadf5cefc548
File: fffffadf6c715370 Event: fffffadf5b819708

The workaround in our case was to ensure that compression was turned off on the folders in the destination that contained files over about 36GB in size. Testing showed that any more than this would cause insufficient resources during the write operation (found by running Process Monitor) and eventually bring the server to its knees.

Following are some other related links that I found during research of this problem.

http://www.eggheadcafe.com/software/aspnet/32252624/server-generates-delayed-write-errors-copying-very-large-files.aspx
http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1285465634286+28353475&threadId=1432253
http://www.techspot.com/blog/224/slow-system-performance-when-copying-large-files-in-xp-x64-server-2003-x64/
http://blogs.msdn.com/b/saponsqlserver/archive/2008/08/30/server-running-out-of-memory-when-copying-backup-files-to-other-server.aspx

About Brian Drab

IT Manager with an absolute love for technology!
This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to Delayed Write Errors Copying Large Files to a Compressed folder

  1. gil jeffer says:

    I’m running into the same problem as you describe here (but on XP), thanks for taking the time to post. I’m confused though: I see where it would bring the system to a crawl when it exceeds the threshold, but since that event supposedly kicks in “throttles” as a remedy, why does it go on to error anyway?

    I, too, could copy to uncompressed directories – but I’m copying SQL databases – they compress down to 10% – 25% of raw size (depending on content) and I really want to take advantage of that space saving in my backups!

    I’m wondering if I could stress the source disk with a higher-priority do-nothing job to slow down the reads. Do you think that may work? Seems like a stupid solution (even if it works); the fact that Microsoft hasn’t come up with a solution in the intervening years is rather disheartening.

    – Gil

Leave a comment