View Full Version : All these reboots
anonet
05-03-2006, 04:52 PM
All these reboots are getting old guys. Come on, this is admin 101 stuff. You test in a TEST environment till you get it right and KNOW that it is right.
What is the deal? Xen-3.0.2 is giving you grief, correct? Well go back to the version that WASN'T causing all the reboots -- PLEASE.
I am on vm26 IP 207.210.101.103. At ~8:55am EST today (5-3-2006) vm26 was rebooted. Can I get an explanation as to why?
--Brian
ttarabula
05-04-2006, 05:12 AM
VM25
reboot @ Wed May 3 04:15
We're being fairly patient overall, and I can personally handle the downtime right now, but without an explanation or warning, I'm not very confident in putting more production stuff on my VPS.
I have to emphasise that for me the most disturbing aspect of this is the lack of communication.
peterd
05-04-2006, 08:14 AM
[QUOTE=anonet]Come on, this is admin 101 stuff. You test in a TEST environment till you get it right and KNOW that it is right. [/QUOTE]
While I agree, and as much as I love Xen and Unixshell's services, it's not ready for prime time mission critical services or applications.
Business 101 would ensure you run a business and production quality systems on high availability servers, not on unproven VM software.
peterd
matta
05-04-2006, 04:22 PM
We ran the same config for weeks on a subset of servers without problems. Many servers haven't rebooted since we performed the mass upgrade to 3.0.2-2 Xen. It seems the specific workload of a few servers cause problems, I do have a production spec server right here where I am trying to reproduce the problem. I also have backtraces from the console logs of production servers, but I haven't found anyone interested in looking at them (it is Linux kernel specific, NOT Xen).
augustz
05-04-2006, 05:26 PM
"I haven't found anyone interested in looking at them"
There are a lot of linux support companies out there.
If you need help finding one that would be willing to assist in return for payment I'd be happy to help broker that. At this point it is really beyond belief that no one is available to help, RedHat alone has a number of kernel guys on staff though a bit pricey.
Or did you mean, no one is willing to help for free? Given that you are running a business here, it seems that if you want help (or your customers are asking for it) that unixshell might consider paying for it.
tracer07
05-05-2006, 09:55 AM
or mayb the man knows his job but just didnt come up with anything strange in the logs ? ^^
matta
05-05-2006, 03:05 PM
This is definitely #1 priority right now. We do have a testbed setup, but I am not able to reproduce the panic's I have seen so it's difficult to find out whether the changes fix anything.
augustz
05-05-2006, 08:00 PM
Yeah, in fairness, non-reproducible issues are maybe 10 - 100x harder to debug. With a reproducible bug and source code, there IS a way to figure it out.
One idea would to migrate everyone off a machine when it goes down, then bring it back up again just the way it was and see if you can stress it into panic. Hard to reproduce bugs can sometimes be a hardware issue as well, especially if setups are not really really identical.
matta
05-05-2006, 09:20 PM
Our testbed is very close to production. Same processors, same motherboard, same RAID card, etc. We're running various stress tests on the same kernel / Xen config as in production. Once we find the test that can produce a panic then we can start tweaking the kernel and rebuild with full debugging (to get the file/line number, etc).
anonet
05-06-2006, 01:29 AM
I am going to keep adding to this post everytime my VPS reboots.
Don't know if it will help anything, but at least it will be a log.
5-5-2006 @ ~18:30EST reboot.
VM26 - VPS 207.210.101.103
matta
05-06-2006, 01:54 AM
And what is your point in doing so? We know when they reboot, what the problem is, and are taking steps to resolve it. This seems to be some sort of method to make us look bad in public. If that's the case and you're so unhappy with the service I can't fathom why you still have an account with us.
anonet
05-06-2006, 03:32 AM
I am not trying to make you look bad in public. I was doing it so that others that happen to be on vm26 could use it for a comparison.
Are all the VMs having this problem? If so I will just let it go till you say the definitive "it is fixed".
I am not trying to twist the thorn that this has put in your side. But I have services that can't start automatically on reboot (scripting is not an option). So reboots hurt.
I'm willing to go back to kernel 2.6.12, is there anyway you can setup a box and move people like me that don't mind staying on an older kernel to it?
anonet
05-06-2006, 02:51 PM
boot fell off. Who was left?
reboot :)
05-06-2006 @ ~6:00am EST
matta
05-06-2006, 11:12 PM
It's affecting quite a few servers, it seems to depend on workload. Every running server is able to trigger the bug, it just depends on if it is or not. Since the problem is related to IP forwarding/netfilter I've updated the hosts kernels and disabled most functionality except for that which is specifically required. The hosts will not boot this new kernel until tjhey reboot themselves.
americantechie
05-07-2006, 04:54 AM
That sucks. I know at our signshop I get stressed when I can't get our printers to work right, especially when I am not quite sure what the problem is. I couldn't imagine the stress of a hard to pin down problem affecting hundreds of web sites.
What OS is the servers running on?
I don't notice the reboots, but I am only using Unixshell for a personal site, development, and a general sandbox. I am barely using the 196 plan that I have. I am very happy with the service though. I just don't trust Xen for anything else until it becomes mainstream.
indigo
05-08-2006, 08:17 PM
I really do hope for all the sublte insults that seem to be here that you guys really know what you are doing when you give matta suggestive grief.
aonet: any **production** system should be able to be recovered via scripting. What service are you running that cant be scripted.
augustz: Redhat doesnt support 2.6.16-xen kernel. The only people that can help unixshell out is XenSource. Redhat frowns on 3rd-party packages and tainted kernels....
We ALL know WHAT Xen is by now, who supports it, and its infancy in the market its growing into. For the price we pay we get a fantastic service that is near impossible to find elsewhere currently. If you guys really want what you asking for (a team of RHCE's stroking your server daily) go to rackspace.com and pay $200/month instead of $20
anonet
05-09-2006, 02:48 PM
I really do hope for all the sublte insults that seem to be here that you guys really know what you are doing when you give matta suggestive grief.
aonet: any **production** system should be able to be recovered via scripting. What service are you running that cant be scripted.
I am not going to tell you what I am or am not doing with my VPS. However I will give you an example, encrypted filesystem. A scenario where a key must be uploaded everytime the box is rebooted to mount an encrypted fs.
There are plenty of production scenarios that can't (or you would rather not) script.
When I first got my VPS here I had 35 days of uptime. Over the past week I haven't been able to keep more than a days worth. As has been stated by others on here, it doesn't matter that we pay less than someone who is getting a dedicated server or paying for colo space. We should have the same reliability. I don't belive it says anywhere in the TOS that: "We reserve the right to reboot your VPS when we feel like it. Oh and by the way, we have no SLA". As a matter of fact (and please someone correct me if I am wrong) I believe they tout the fact that they have 100% uptime on the network and 99% uptime on the VPS.
If that's the case and you're so unhappy with the service I can't fathom why you still have an account with us.
I still have an account here because I had those 35 days of uptime before the upgrades. I know eventually things will be smoothed out.
Bottom line indigo, don't pretend to know what people are and are not doing with their boxes.
matta
05-09-2006, 03:02 PM
[QUOTE=anonet] I don't belive it says anywhere in the TOS that: "We reserve the right to reboot your VPS when we feel like it. Oh and by the way, we have no SLA". As a matter of fact (and please someone correct me if I am wrong) I believe they tout the fact that they have 100% uptime on the network and 99% uptime on the VPS.[/quote]
No, of course not. Although we do have a hard SLA in the TOS. It's pretty much industry standard type and details exactly what you receive in event of excessive downtime.
Of course, I'd personally never see another server reboot ever. Having this current random reboot problem is not fun on my side or for my pager either and I can only keep saying we are working on the problem, it's just an odd one in that we need to perform trial and error testing.
kenjb
05-10-2006, 03:25 AM
indigo, you appear to be more insulting in one single post to more people than any post I have read on this forum.
I am happy that my server reboots and comes up on its own. I hardly notice reboots other than the fact that I get an email each time my site comes back up from a random reboot. If it were otherwise, I'd be spending as much time as possible trying to correct my own boot issues. I believe the effort in trying to keep us (customers) informed seems to have increased and I am happy to see the new posts by unixshell staff.
Thanks guys and girls at unixshell.
and to think I had an unusual day today, I've managed to squeak out some nice words to others.
I hope that a google search doesn't reveal what I've said, and authorities come to my home and bust down my door, take my computer and throw me in jail. Wait, it's a free country still, that stuff only happens in China, I hear. whew! Close one.
;-)
ztnews
05-10-2006, 02:25 PM
[QUOTE=kenjb]
I am happy that my server reboots and comes up on its own. I hardly notice reboots other than the fact that I get an email each time my site comes back up from a random reboot.
;-)[/QUOTE]
My ubuntu server on VM22 hasn't been rebooting by itself after these outages. Any tips on how to have it do that? I'm sure I can find it somewhere, but I hadn't even thought about it until last night. Thought I'd ask here. Thx
PS to my own post: unixshell support tells me my host (VM22) did not restart, so the trouble with my server last night was something else. Possibly a network problem of some sort. But when I did get my Teknic console back (which wasn't very long, BTW) my server was down. Even if I had crashed my VPS, I shoud expect Teknic to work. And I'm not sure why a network problem would crash my VPS. Well, something to keep an eye on.
sebyte
05-10-2006, 04:20 PM
I think I'm right in saying VPSes reboot automatically provided everything's working correctly, i.e., there's no user-configurable setting that affects this.
kenjb, do you have an init-script in place that emails you in the event of a reboot?
sdt
limey
05-10-2006, 09:08 PM
I'm finding it very awkward at the moment. Unfortunately, when I'm at work I'm unable to restart my server with teknic due to firewall restrictions. Because my server is going down nearly every other day, and is often not bringing itself back up again, the production website I'm running on it is often down for hours until I can get home to my unfiltered internet connection and reset it.
This is very bad indeed, and although I like unixshell, and have been very impressed in the past I'm getting flack for having sites down from other people.
kenjb
05-10-2006, 10:52 PM
I'm not sure what it takes to get unbuntu vm to start up on it's own. I hope you find out though, it sure makes life easier when it does.
http://puny1.com/temp/uptime.jpg
As far as getting the email when my VM reboots, MySQL sends out an email telling me about corrupt tables or at least about open threads when the system comes back up.
kenjb
05-10-2006, 11:16 PM
It looks like this is the last part of the script that sends out the email when my VM boots. It looks like the happy part. Could probably spend a little time on this and get it to send out a file that says "boot!", assuming you have mailx installed. Looks like it drops a note in the log too, but you could remove that really.
If you get it running, you should post it somewhere.
if [ -s $tempfile ]; then
(
/bin/echo -e "\n" \
"Improperly closed tables are also reported if clients are accessing\n"
\
"the tables *now*. A list of current connections is below.\n";
$MYADMIN processlist status
) >> $tempfile
mailx -e -s"$MYCHECK_SUBJECT" root < $tempfile
(echo "$MYCHECK_SUBJECT"; cat $tempfile) | logger -p daemon.warn -i -t$0
indigo
05-11-2006, 05:40 AM
[QUOTE=kenjb]indigo, you appear to be more insulting in one single post to more people than any post I have read on this forum.
[/QUOTE]
Sorry, all...
Rereading this, it was a little harsh, but I guess my attitude comes from hearing tons of junk about UnixShell and Matt on WHT....
Im in love with this server and Im a big advocate...just sometimes too extreme :-)
-matt
matta
05-11-2006, 03:06 PM
I think the random reboots should be a thing of the past. I had suspected it was related to SCTP and recently removed all SCTP support from our kernels. Over the past week most of the kernel.org bugfix releases have been related to SCTP..
Things seem to have smoothed out, though those still running on an older kernel may be affected, but only once as the host will then be on the new kernel.
anonet
05-11-2006, 05:57 PM
...do you mean the 2.6.12 kernel?
I switched my VPS back to 2.6.12 hoping it would alleviate some of the problem (yes I realize it was more a host kernel problem, but figured it wouldn't hurt). So should I move back to the 2.6.16 kernel?
--Brian
matta
05-11-2006, 06:02 PM
The guest kernel shouldn't be affected. It's the host kernel that has been modified. A guest can never make the host crash, just things on the host such as it's own kernel and xenstore/tdb corruption.
achillе
05-12-2006, 04:50 AM
Ok, mine just rebooted once more 3 hours ago. (VM22) ... anything I could do to stop those reboots?
osierra
05-12-2006, 11:59 AM
My virtual host in VM12 also rebooted a few hours ago.
achillе
05-13-2006, 07:55 PM
just had yet another reboot on vm22, this one was 4 minutes ago. Can I request my shell be moved?
Do we qualify for the 5% discount yet?
> If server downtime exceeds .5% in a given month,
> the Customer will be credited 5% of their monthly
> hosting fee and an additional 5% for each additional
> 2 hours of downtime up to 50% of the Customer 's
> monthly hosting fee. (http://www.unixshell.com/tos.html )
matta
05-13-2006, 08:16 PM
Read the random reboot outages thread, I documented things there.
cdenneen
05-13-2006, 08:18 PM
[QUOTE=achillе]just had yet another reboot on vm22, this one was 4 minutes ago. Can I request my shell be moved?
Do we qualify for the 5% discount yet?
> If server downtime exceeds .5% in a given month,
> the Customer will be credited 5% of their monthly
> hosting fee and an additional 5% for each additional
> 2 hours of downtime up to 50% of the Customer 's
> monthly hosting fee. (http://www.unixshell.com/tos.html )[/QUOTE]
5% sounds good to me :D
limey
05-16-2006, 03:39 PM
Random Reboot for me. Yet again. This is now being accompanied by an ear bashing from the guy who's site I run who is rapidly loosing patience.
I'm rapidly loosing patience now too........
I appreciate that this is a cheap service, but also appreciate that it's one hosted on Open Source technology, and one that is doing well judging by the explosive increase in size since I joined.
Sometimes it's a good idea to undertake managed growth periods however because otherwise this sort of thing can happen. Please try to ensure that 'upgrades' are fully tested in the future before being deployed to production use, even if new technologies look even more profitable and attractive to Unixshell.
I really used to love Unixshell, but my fond memories of you previous excellent service aren't going to be enough to stop me from moving on if the service keeps being this unstable.
Sorry
Chris
astleycoder
05-16-2006, 09:12 PM
[QUOTE=limey]Random Reboot for me. Yet again. This is now being accompanied by an ear bashing from the guy who's site I run who is rapidly loosing patience.
I'm rapidly loosing patience now too........
I appreciate that this is a cheap service, [/QUOTE]
Rule 101 of web hosting. You get what you pay for. If you get a cheap service, then you may encounter problems. If your clients complain to you about the level of service that he has paid to you, then it is your responsibility to provide what he/she has paid for. If you do want to provide your customers with a reliable service, then you need to pay for it. This is why hosting companies have clustered networks with multiple web servers, to offer this kind of redundancy. I know of several people in the UK who charged hundreds of pounds for web hosting promising 100% uptime with SLAs, however the service they pay for only offers 99.5%, and they believe that it is the hosting company at fault, when they were the ones who charged for something they could not offer.
I offer my clients web space and email hosting, but I say that I cant guarantee the uptime. I try to get them to take up hosting elsewhere. I accept its a pain when suddenly your SSH session dies, or your website or email stops working, but over the last year or so, I have come to the conclusion that I get what I pay for, and I am happy with it. Matt has been a star helping me move over from Tektonic to Unixshell, and I appreciate it.
jgillmanjr
05-17-2006, 01:20 AM
[QUOTE=astleycoder]Rule 101 of web hosting. You get what you pay for.[/QUOTE]
Yeah, but I do expect something a little more than constant typelag or downtime for what I'm paying.
astleycoder
05-17-2006, 07:09 AM
[QUOTE=jgillmanjr]Yeah, but I do expect something a little more than constant typelag or downtime for what I'm paying.[/QUOTE]
Constant typelag? Theres a great big ocean between vm30 and me, and I dont get constant typelag. Maybe this is either an issue with your vps, your ISP or you may be exaggerating a little.
[QUOTE=astleycoder]Constant typelag? Theres a great big ocean between vm30 and me, and I dont get constant typelag. Maybe this is either an issue with your vps, your ISP or you may be exaggerating a little.[/QUOTE]
he's on vm4 ;p
mihel
05-17-2006, 04:19 PM
[QUOTE=wjr]he's on vm4 ;p[/QUOTE]
I'm on VM4, and sitting on the opposite side of the Earth everything works great for me.
Except for that recent power issue - it was really weird.
Especially when www.unixshell.com went down along with my VM. Started to think they decided to flee away to Mexico with all our money
do you do a lot of interactive shell operations requiring non-cached reads and writes?
10-15 seconds to open a new file with vi, or to list a directory seems excessive :)
mihel
05-17-2006, 10:47 PM
[QUOTE=wjr]10-15 seconds to open a new file with vi, or to list a directory seems excessive :)[/QUOTE]
It never takes me that long... Maybe you're out of ram?
I get lags sometimes, but only when free reports 0 memory left and
reasonable swap usage. Lags are 2-3 seconds though.
[QUOTE=mihel]It never takes me that long... Maybe you're out of ram?
I get lags sometimes, but only when free reports 0 memory left and
reasonable swap usage. Lags are 2-3 seconds though.[/QUOTE]
no. i never ever use swap. using swap is death on vm4.
maybe jgillman is out of ram, perhaps you mean.
jgillmanjr
05-17-2006, 11:38 PM
[QUOTE=wjr]no. i never ever use swap. using swap is death on vm4.
maybe jgillman is out of ram, perhaps you mean.[/QUOTE]
God knows my stuff isn't exactly optimized, but I've had this thing run like a raped ape before in the same configuration.
vBulletin v3.0.6, Copyright ©2000-2009, Jelsoft Enterprises Ltd.