PDA

View Full Version : vm6 sluggish


deeperbydesign
05-12-2005, 07:01 AM
Has anyone else on vm6 found things to be INCREDIBLY sluggish lately?

It seems like someone on the box is grinding the disk because the network times are fine and I'm not running anything CPU bound. But do anything that needs to access the disk and it's time to go get some coffee while waiting for it to finish.

This is a new development, I've not seen this sort of behavior since I've been on unixshell#. Maybe it's vm6 hardware? Or someone doing something disk intensive / stupid in another vm?

Anyone got any ideas?

Cheers,
Brian

jkf
05-12-2005, 02:58 PM
I can second this... Last night, something as simple as ls in a small directory was taking 5-10 seconds to complete. Everything seems ok, right now though.

matta
05-12-2005, 05:26 PM
How long did this last? I just checked the disk read speeds and was getting over 70MB/s. The only thing that should be able to cause this is multiple OS reloads and/or snapshots being performed on the host at once. I'd say with 9% accuracy that was it. I'll look into some sort of queuing for these processes so only 1 of each can run at a time, albiet it will take much longer for each operation. I'd like to monitor before making any changes though just to make sure there is a real problem.

matta
05-12-2005, 05:30 PM
Next time it happens try running "hdparm -tT /dev/sda1". I just went through the daily monitoring looking for jumps in CPU usage and load average and it looks like at the following times the main OS was doing some heay work:

4:10 - 4:20AM
6:40 - 6:50AM
10:10 - 10:20AM

For all other periods it was >98% idle... which doesn't correlate to your post time. The Xen disk I/O is very good.. so I just can't imagine it being that, there are many 32/64MB VM's that are constantly swapping that Xen keeps in line.

deeperbydesign
05-12-2005, 07:11 PM
I will definitely do some more forensic work next time it happens. I had actually been wondering if it was cool to use hdparm on these things. I wonder if the linux-2.6 readahead tuning would help them at all under Xen? Probably not.

Just to clarify, nearly all day yesterday there were sporadic problems. Not just at the time of my post.

I'm so happy with the increase stability too, thanks!

Elumin8
05-12-2005, 09:53 PM
We were noticing the same slow downs and we believe it is our apache service. We are looking into the cause of this but there were 15 processes with 40mb a piece which filled all of our ram/swap. When this happened it seemed to have slowed down the hole VM until our VM restarted or killed services randomly until things settled down.

Mid day today is when we found out that it was ours. The wait was at 98% and the VM was acting really sluggish but still working. We terminated Apache and lowered the children to 5 with a timeout of 1 minute. This should hopefully stop Apache from acting insane like it was.

I also plan on doing a reinstall of Apache and all of its modules sometime this week since we are on VM6 now.

If you notice the slowdowns please drop me a PM and I will try to check my VM right away so that we can conclude if it is mine or not. The reason I say this is because it might have just been a fluke earlier when the VM cleared up right as we killed Apache but I doubt it.

deeperbydesign
05-12-2005, 10:00 PM
Elumin8,

I wouldn't read too much into the 98% wait, my vm showed 90-100% wait during that time as well. I imagine that everyone on vm6 would see that whenever the host machine is thrashing the disk.

However, I don't doubt that with that many apache processes of that size you were swapping pretty heavily and could very well have been the cause of the slowdown on the whole machine. Thanks for the heads up.

- Brian

deeperbydesign
05-13-2005, 11:40 PM
# hdparm -t /dev/sda1

/dev/sda1:
Timing buffered disk reads: 2 MB in 9.67 seconds = 211.79 kB/sec
BLKFLSBUF failed: Function not implemented
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Function not implemented


So who's doing it?

matta
05-14-2005, 12:15 AM
Timing buffered disk reads: 168 MB in 3.09 seconds = 54.37 MB/sec

seems fine as of now... are you sure it's not something being performed inside your VM? Xen is very fair with how it shares resources and with 6 loaded servers now this is the first complaint we've ever had regarding performance. During the beta we had all the beta testers on vm1 along with 6 of our own VM's all constantly swapping/forking/disk thrashing and even then it was great.

deeperbydesign
05-14-2005, 12:36 AM
Yeah, nothing going on in my vm other than some typing over SSH and idle postfix, apache, mysqld. Free showed about 80mb usage and a minute or so later the hdparm reading was much faster.

Don't get me wrong, I'm not complaining - just noticing something new. It hasn't become a problem WRT services at all.

Cheers,
Brian

zeroion
05-14-2005, 12:10 PM
I have cacti monitoring of hdparm's output here (http://mars.coldray.com/cacti/graph_view.php) on VM6.

I just started collecting the data last night. NOTE: I have a 'The 64' plan.

matta
05-14-2005, 05:17 PM
All in all that doesn't look too bad. I do see the drops and may correlate to VM or host server activity. Sending the reload/snapshot processes in a low priority (ie nice 19) will not help as they will still be waiting for I/O.

For the time being there is not much I can do, the current I/O scheduler is a combination of round robin + proportional latency which is better than any other virtual server software available. User Mode Linux, FreeVPS, Linux VServer, and Virtuozzo don't have any Disk I/O QoS in regards to bandwidth/latency. VMware ESX _might_, but even if it does... that'd raise the price of the VM's substantially due to software costs.

I believe fine grained I/O control will be in Xen 3.0. This will allow 'units' to be assigned to each VM such as with the CPU. At that time it will automatically be utilized (ie a 64 plan will get 64 disk I/O units).

I'll still see if I can catch it in a low performance state, but it looks like it is very rare and short so it will be difficult.

matta
05-14-2005, 05:30 PM
Also have to add... I don't think running hdparm every 5 minutes is going to help performance at all.

deeperbydesign
05-14-2005, 06:25 PM
[QUOTE=matta]Also have to add... I don't think running hdparm every 5 minutes is going to help performance at all.[/QUOTE]

I would definitely agree. Perhaps it's not worth investigating unless it becomes a performance issue WRT services on the box.

zeroion
05-14-2005, 08:02 PM
[QUOTE=matta]Also have to add... I don't think running hdparm every 5 minutes is going to help performance at all.[/QUOTE]

OK. I've stopped the statistics script.

deeperbydesign
06-03-2005, 04:25 PM
I am seeing < 1 MB/s from hdparm -t /dev/sda1 on vm6 right now. Anyone know why?

matta
06-03-2005, 04:41 PM
That was a few minutes ago, but I just tried now and got 44MB/s, not stellar but much higher than 1MB/s.

Elumin8
06-03-2005, 04:42 PM
Hey, I believe my Apache went AWAL. I restarted my VM and everything was back to normal for me also. I ran a php script that I think caused Apache to take a shit... Let me know if you see it again.

aws
06-22-2005, 08:02 PM
I've noticed that when the VM starts to use swap heavily the performance rapidly approaches zero.

It is crucial to set Apache's MaxClients and MaxSpare directives to something reasonable for the amount of RAM on the VM.

MySQL and other database products also tend to ship with defaults that are not optimal for small accounts.

My account is a 96 and I set 5 max clients and reduced several of the buffer sizes in Mysql. This leaves me (currently) with about 30 MB of free memory.