PDA

View Full Version : Uptime Guarantee


Dr. Bob
01-31-2006, 05:17 PM
According to the Home link on the main unixshell# web site the features of your hosting include

99.9% network uptime guarantee. [View Report]
99.5% server uptime guarantee.

Currently the "View Report" link is reporting "Device Data Currently Unavailable"

By my own records, on Jan 27th my server was offline for just over 60 minutes because a default route was accidentally removed somewhere in the chain outside of my virtual machine. Today a GNAX router (209.51.131.102) went offline for just over 45 minutes, blocking all access to the unixshell# server farm. There is a lot of fault tolerance alluded to on the Technology page on the unixshell# web site. Specifically

"Our network consists of 6 gigabit links to 5 different premium Tier-1 providers with the 7th link peering with over 20 networks at the Atlantic Internet Exchange. The servers are located in Atlanta, Georgia where they are co-located in one of the premier hosting datacenters. Currently there are connections to Global Crossing, Level (3), Telia, Savvis, and BTN Access.

The network utilizes RouteScience architecture to constantly monitor and adjust BGP routes for the best routing based on latency and performance. This is much better than traditional BGP as it is able to detect and re-route around peering problems located on the Internet."

So I have three questions:

Why were we offline for 45 minutes from a single router being down at GNAX?

Why is the "View Report" link returning ""Device Data Currently Unavailable" and when can we expect to see some accurate network uptime information?

What does the phrase "uptime guarantee" mean? Again by my calculations, 99.9% uptime allows for approximately 44 minutes of down time per month and in the past week, I've been offline for over 100 minutes. Should I be expecting a refund on my hosting fee for January?

matta
01-31-2006, 05:21 PM
The link on the site is outdated. I will fix it. You can view the Alertra reports for our router at http://uptime.alertra.com/uptime.php?pin=507850&abrv=BRD1&color1=White&color2=DarkSlateGrey

Dr. Bob
01-31-2006, 06:53 PM
Matt:

I find the 100.000% uptime statistic somewhat suspect. Earlier today there was a 45 minute connectivity outage between gig0-1.13-atl-1.gnax.net (207.51.131.6) and your network (209.51.131.102). Sorry I don't have the traceroute output logged, but I'm sure someone at GNAX can confirm this.
(*edit* My bad, I see the Alertra stat was last updated Jan 28th.)

So if someone pulls a cable out of the router or changes a routing table to break the networking, is Alertra's monitoring supposed to pick this up?

"Uptime" means nothing if the environment isn't properly configured. What I reasonably expect as your customer is 99.9% network accessability and 99.5% server availability. If my server is offline because of network issues for 45 minutes, I don't think you can count this as "network up time".

I'm very pleased with the server configuration you are offering, but I'm concerned about the network down time in the past week. I've only been a customer for one week, so I have only this to judge expected future performance.

matta
01-31-2006, 08:24 PM
The "network" really is rock solid. Something such as the default route being removed would be specific to the host server and would fall under the server SLA. We use the Alertra monitoring to monitor our router and monitoring server off-site, they just happen to let us provide those reports so we do. Next time save a traceroute if you suspect a network problem. We have thousands of clients between our companies and if there was a real problem and our entire network was down we'd be quickly flooded with tickets.

There _have_ been power issues at the datacenter. Our router and many of our servers were not affected by this. Our internal servers (www/mail/dns) were on one of the racks that were affected by GNAX's power problems.

matta
01-31-2006, 08:35 PM
FYI, the Alerta data is NOT "last updated" on the 28th. It stating it's been monitoring the device BRD1 SINCE 9-28-2005 (about 4 months).

Dr. Bob
02-01-2006, 05:01 AM
[QUOTE=matta]Next time save a traceroute if you suspect a network problem. We have thousands of clients between our companies and if there was a real problem and our entire network was down we'd be quickly flooded with tickets.
[/QUOTE]

Matt: Here's the last half of a normal traceroute into my server. The failed route was identical with the obvious exception of the failure between hop 16 and 17. It was reporting no route to host.


10 124 ms 142 ms 125 ms core2-newyork83-pos0-0.in.bellnexxia.net [206.10
8.103.214]
11 121 ms 118 ms 122 ms 64.230.223.122
12 133 ms 135 ms 124 ms so7-0-0-2488m.ar2.nyc1.gblx.net [208.50.13.129]

13 146 ms 146 ms 143 ms so0-0-0-9953m.ar4.atl1.gblx.net [67.17.68.230]
14 279 ms 377 ms 233 ms 206.41.25.230
15 169 ms 155 ms 173 ms atl-core-a-tgi2-1.gnax.net [209.51.149.105]
16 167 ms 157 ms 172 ms gig0-1.l3-atl-1.gnax.net [209.51.131.6]
17 172 ms 179 ms 179 ms 209.51.131.102
18 802 ms 166 ms 162 ms 72.9.242.61
19 344 ms 181 ms 181 ms synysys.com [207.210.78.216]


I was unable to ping a network device with a hop count of two from my hosted server, so I'm at a loss to explain why I seem to be the only person who noticed this outage. What I can say authoritatively is that for just over 45 minutes, device 209.51.131.6 was not communicating with 209.51.131.102.

I'd like to know what route the Alertra monitoring software was finding to hit 209.51.131.102 during this period, because that route wasn't being offered to me and I was already two hops into the GNAX Atlanta premises before I hit the failure. How many paths are there into the 209.51.131.102 router that don't involve 209.51.131.6 and why wasn't my traffic getting routed around the failed connection if an alternate route existed? I should add that during this same outage, Netcraft was unable to pull any stats off my server.

I don't mean to beleager the point, I'm just trying to understand why my service was interrupted.

matta
02-01-2006, 10:42 PM
Alertra monitors from 5 geographically diverse locations to account for routing issues on the global internet.

"No route to host" seems like the route for your VM was removed, this is happening lately and it should be fixed soon. It comes from rebooting within the VM and not from Teknic.

jbw
02-02-2006, 02:00 AM
I am seeing a similar issue but a hop later at the moment:

traceroute to vs11.n2ip.net (207.210.85.121), 64 hops max, 44 byte packets
1 98.67-19-84.reverse.theplanet.com (67.19.84.98) 0.397 ms 0.353 ms 0.298 ms
2 vl7.dsr01.dllstx4.theplanet.com (67.18.116.65) 0.370 ms 0.384 ms 0.300 ms
3 vl41.dsr01.dllstx3.theplanet.com (70.85.127.81) 0.797 ms 0.621 ms 0.554 ms
4 70.85.127.37 (70.85.127.37) 0.502 ms 0.526 ms 0.674 ms
5 ge2-4.br01.dal01.pccwbtn.net (63.218.23.25) 0.692 ms 0.757 ms 0.680 ms
6 ge3-2.br01.atl01.pccwbtn.net (63.216.31.42) 19.731 ms 19.769 ms 19.658 ms
7 gnax.ge2-3.br01.atl01.pccwbtn.net (63.216.31.158) 20.672 ms 20.365 ms 20.367 ms
8 209.51.149.109 (209.51.149.109) 20.496 ms 20.290 ms 20.164 ms
9 65.254.48.2 (65.254.48.2) 23.101 ms 21.063 ms 22.158 ms
10 209.51.131.102 (209.51.131.102) 20.716 ms 20.437 ms 20.402 ms
11 209.51.131.102 (209.51.131.102) 3020.619 ms !H 3020.937 ms !H 3021.039 ms !H

matta
02-02-2006, 02:12 AM
You should submit a support ticket for that. No one here on the forum is going to be able to be able to help you with adding a route on the host :)


(I know I check in here often, but the forum is by no means an official form of support.)

jbw
02-02-2006, 08:16 AM
I agree and I did, even before I posted. Still waiting for a reply to it though.

jbw
02-02-2006, 07:37 PM
[QUOTE=matta]You should submit a support ticket for that. No one here on the forum is going to be able to be able to help you with adding a route on the host :)


(I know I check in here often, but the forum is by no means an official form of support.)[/QUOTE]

I am still waiting for them to act on my support ticket btw, if it i just adding a route to the dom0 host shouldn't that be done by now?