Guest CrashHunter Posted July 25, 2007 Posted July 25, 2007 I have a Windows 2003 Server x64 Enterprise Edition with SP2 with 4GB RAM and an application writing a huge file. The write throughput is quite good in the beginning (~33MB/s) but it keeps decreasing, while the CPU keeps increasing. In the beginning, the Kernel CPU (both in the Task Manager and the Processor\% Privileged Time in performance monitor) is pretty low, but it keeps increasing. Total CPU is ~ 50%, with Kernel taking ~8% in the beginning, while later on, the Total CPU reaches 80-90% with Kernel using almost all of that CPU; at that point, the throughput is very low. Other numbers: the System Cache (in Task Manager) reaches very soon 3.5 GB and it stays at that value, but the one that seems to be the problem is the Paged Pool, which keeps increasing. In poolmon, I can see that Mmst is the one that keeps increasing and it does not free the memory unless I stop the write to file. Before starting the process the Mmst uses 1.8MB, while after 1h:20min it uses 200 MB (at that point, the Total CPU avarage is 74, with 48% in Privileged mode). I read some info about the Paged pool (including the KB304101), but most of them apply to x86 version, which has a limited value for the max pool size (some 460 MB). On my computer (x64), the size should not be a problem (120GB is a max value) and I do not get errors but the performance steadily goes down, even with the Paged Pool under 100 MB ! I do not have anything else running on this computer and the behavior is reproducible every time. As soon as I stop the process, the System Cache and Paged Pool memory usage go down, so there is no memory leak. My application writes data to disk using regular WriteFile API, with an overlapped structure to write asynchronously. It writes a buffer, processes the next one and then waits for the previous write to complete before issuing another write request. I also tried to use the FILE_FLAG_WRITE_THROUGH flag when opening the file; the general behavior is similar: increasing Paged Pool, increasing usage of CPU usage in Kernel mode and decreasing throughput, with some differences, like the starting throughput is much lower (~6MB/s), and it goes down slightly slower than the other scenario.
Guest CrashHunter Posted July 25, 2007 Posted July 25, 2007 RE: Decreasing throughput & increasing CPU when writing a huge file 20 I can reproduce the same behavior even with a simple tool just writing random data to a file (256 GB). This tool only uses synchronous WriteFile. It takes a little longer than with the async one, but the behavior follows the same pattern: After running it for ~3h, the speed is ~7.5MB/s, the processor time is 77% out of which 60% is in privileged mode, the total kernel memory is 384MB (351 being paged memory) and the Mmst pool uses 317 MB) Any suggestion would be greatly appreciated.
Guest CrashHunter Posted July 25, 2007 Posted July 25, 2007 RE: Decreasing throughput & increasing CPU when writing a huge fil RE: Decreasing throughput & increasing CPU when writing a huge fil Another update (the simple app writing random data): after 6hours, the average speed is ~5.5 MB/s, the processor time is 81.8% out of which 69% is in privileged mode, the total kernel memory is 510 MB (474 being paged memory) and the Mmst pool uses 443.5 MB).
Guest Tony Sperling Posted July 25, 2007 Posted July 25, 2007 Re: Decreasing throughput & increasing CPU when writing a huge file 20 I'm not really qualified to make assumptions - just guesses. If your HD(s) and subsystem are relatively modern and properly configured ( it is many years since I've seen figures as low as 33MB/s!) then I would suspect your application. I don't like all this buffer shuffling, you shouldn't have to do that. If you know the size of the file, in my day you just created a file and filled it up (the system serves you the buffer it needs) if you don't know the size you employ some recursive programing (do this - do that - do it all again untill you're finished). I'm sorry, but it looks like you have been working hard to make a simple job infinitely more complicated, and succeded. ;0) But, I really don't feel qualified to pass judgements. (What is your Hardware?) What figures do you get if you run a HD benchmark like HDTach or HD Tune? Tony. . .
Guest CrashHunter Posted July 26, 2007 Posted July 26, 2007 Re: Decreasing throughput & increasing CPU when writing a huge fil Re: Decreasing throughput & increasing CPU when writing a huge fil The hardware is pretty old; the specs are: - one Dynamic volume stripped over 2 x 900 GB RAID5 disk subsystems - HBA: QLogic QLA2340 PCI Fibre Channel Adapter - SAN: Metastore (emulating IBM 3526 0401) with 2 x LSI (INF-01-00) controllers with a total of 30 x 72GB Seagate 10K SCSI disks The HDTach results (for read, since it is the trial version): - Random access: 10.5 ms - CPU utilization: 1% - Avg speed: 44.1 MB/s The HDTune: - Transfer rate (Min/Max/Avg): 13.7/24.2/20.6 - Burst speed: 51.2 - Access Time: 10.2 ms The results were similar for each of the 2 disk subsystems In regards to making a simple job complicated, I started from the existing implementation of our application and ending using a simple application with just a loop, generating random data and writing it to the disk (just simple synchronous WriteFile calls, no extra buffering, or anything else) I restarted the test writing to the local 250GB WDC-WD2500KS-00MJB0 SATA disk to eliminate the potential un-optimized SAN configuration and, again, the behavior follows the same pattern: the write started with ~22.5 GB/s and ~1% CPU in privileged mode; after writing ~50GB, the speed dropped to 20.2 MB/s, 7.9% CPU in privileged mode and paged pool of 150MB (~123MB in the Mmst pool). again the tools does something like while (size less than targeted one) { generate buffer with random bytes write buffer to file } The only thing I can point to now is the OS; it either needs some fine tunning or it has a problem... "Tony Sperling" wrote: > I'm not really qualified to make assumptions - just guesses. If your HD(s) > and subsystem are relatively modern and properly configured ( it is many > years since I've seen figures as low as 33MB/s!) then I would suspect your > application. > > I don't like all this buffer shuffling, you shouldn't have to do that. If > you know the size of the file, in my day you just created a file and filled > it up (the system serves you the buffer it needs) if you don't know the size > you employ some recursive programing (do this - do that - do it all again > untill you're finished). I'm sorry, but it looks like you have been working > hard to make a simple job infinitely more complicated, and succeded. ;0) > > But, I really don't feel qualified to pass judgements. > > (What is your Hardware?) > > What figures do you get if you run a HD benchmark like HDTach or HD Tune? > > > Tony. . . > > >
Guest CrashHunter Posted July 26, 2007 Posted July 26, 2007 Re: Decreasing throughput & increasing CPU when writing a huge fil Re: Decreasing throughput & increasing CPU when writing a huge fil I have some more news: - I changed the PagedPoolSize to 2GB and the PoolUsageMaximum to 5 (therefore to 100MB) hoping to see a difference. Although these changes took effect (the Mmst pool was kept under 100 MB - about 98MB) the trend was exactly the same !!! After writing 160 GB in about 3h:20m, the CPU utilization in privileged mode is over 60 and the speed has been constantly going down. The write is still to the local SATA drive (an empty volume), no other application running (except the monitoring ones) I am out of ideas here. Can anybody give me some (constructive) suggestions? "CrashHunter" wrote: > The hardware is pretty old; the specs are: > - one Dynamic volume stripped over 2 x 900 GB RAID5 disk subsystems > - HBA: QLogic QLA2340 PCI Fibre Channel Adapter > - SAN: Metastore (emulating IBM 3526 0401) with 2 x LSI (INF-01-00) > controllers with a total of 30 x 72GB Seagate 10K SCSI disks > > The HDTach results (for read, since it is the trial version): > - Random access: 10.5 ms > - CPU utilization: 1% > - Avg speed: 44.1 MB/s > > The HDTune: > - Transfer rate (Min/Max/Avg): 13.7/24.2/20.6 > - Burst speed: 51.2 > - Access Time: 10.2 ms > > The results were similar for each of the 2 disk subsystems > > In regards to making a simple job complicated, I started from the existing > implementation of our application and ending using a simple application with > just a loop, generating random data and writing it to the disk (just simple > synchronous WriteFile calls, no extra buffering, or anything else) > > I restarted the test writing to the local 250GB WDC-WD2500KS-00MJB0 SATA > disk to eliminate the potential un-optimized SAN configuration and, again, > the behavior follows the same pattern: the write started with ~22.5 GB/s and > ~1% CPU in privileged mode; after writing ~50GB, the speed dropped to 20.2 > MB/s, 7.9% CPU in privileged mode and paged pool of 150MB (~123MB in the Mmst > pool). > > again the tools does something like > while (size less than targeted one) { > generate buffer with random bytes > write buffer to file > } > > The only thing I can point to now is the OS; it either needs some fine > tunning or it has a problem... > >
Guest Tony Sperling Posted July 27, 2007 Posted July 27, 2007 Re: Decreasing throughput & increasing CPU when writing a huge fil Re: Decreasing throughput & increasing CPU when writing a huge fil As I said, I am no good in a server environment, but I would certainly expect an older HD system to be more of a bottleneck on a faster machine. The phenomenon of the data throughput slowing down is pretty much standard, I believe. I've never run a benchmark for the lenght of time you are employing but 30 - 40% slowdown over a few minutes I would expect on a standard IDE system. My own current SATA/RAID0 shows a nearly flat curve over a few minutes time, hovering around a 100 MB/s. You might consider tweaking your machine's use of resources depending on wether you are runnign those tests in the foreground or background. I would make sure I had plenty of swap and I would check the signal cables to those disks if they are of the same generation. Temperature, might also be an issue with HD's working hard over long periods. I believe too, that servers have an option to tweak the system cache far more than the Pro Editions I'm used to. In short, what you are seeing may be quite natural - but you may be able to beat more performance out of it. ( I suggest to pay a visit to the 'Knowledge Base' - go there and search for "system cache", quite a few hits there. Something might lead you further?) Tony. . .
Recommended Posts