Web1 Outage
Thursday, May 27th, 2010We are aware of an outage on Web1 in london. We are currently awaiting a hard reboot to be performed. ETA 20-30mins.
Update 1:05AM
The server returned to normal service at 1:01AM

We are aware of an outage on Web1 in london. We are currently awaiting a hard reboot to be performed. ETA 20-30mins.
Update 1:05AM
The server returned to normal service at 1:01AM
We are aware of an outage on server daedalus which hosts both email and a small number of websites. The server stopped responding at 09:10.
A hard reboot has been performed but the box has failed to return to normal operation. We are investigating further.
Update 09:36
Service was restored at 09:33 (The system performed a chkdsk due to the hard reboot)
Investigations are still continuing into why the box stopped responding.
We are aware web1.gbdns.net has stopped responding to web requests. All though the box appears to be online it is no longer responding to RDP access requests. A ticket has been raised with the DC for the box to be hard rebooted. This will be done within 10mins.
The server is also the primary MySQL 4/5 server for our hosting cluster so all sites using MySQL will be offline until the reboot is completed.
Update 12:56
All services were restored at 12:37. At the moment the cause of the failure is not clear. No errors are in showing in the logs and the raid array is not displaying any issues. DNS requests were still be served by the server even when HTTP failed i.e. the server was active but we couldn’t login.
We are aware of an outage at KSP which is causing the majority of our services to be offline. We are awaiting an update on the issue.
Update 13:15
The current outage is being caused by a similar issue to the outage on the 24th November 2009.
Estimated time for a fix is 2PM GMT.
Update 14:15
Latest update from the DC
09/12/2009 @ 14:15: Power has been restored to the UPS units, although the two rooms have had to be split onto their own seperate systems. As a result the UPS are throwing a bypass not available problem. We have the UPS manufacturers onsite who are going to resolve this by removing the parallel cards from both sides.
Update 14:35
Power was restored at 14:34. All servers were online by 14:40.
A full RFO will be issued later today.
We are aware of an outage at Kent causing multiple servers to be offline. We are awaiting more info on the problem.
Update 25/11/2009 00:25:
We have received an update on this issue. The current outage is power related. We do not have an ETA on a fix at the moment.
Update 25/11/2009 04:00:
Power has been restored, all servers are now operational. A full RFO will be issued later on today.
Again web1.gbdns.net is experiencing network issues. After the last outage we made the secondary NIC interface live which is still responding so we believe the primary NIC is faulty.
IP’s are being moved from one NIC to the other at the moment and we are awaiting a VLAN update from the network provider.
Service should be restored within 5-10mins maximum.
Update 12:45
Service was fully restored at 12:39.
We do apologise for the length of the outage but we chose to fix the issue fully rather than partially fixing the issue and having another outage later on to fully fix the problem.
We are currently investigating an outage of web1.gbdns.net.
This looks to be NIC related.
Update 22:48
A KVM was attached to the server, the primary NIC wasn’t passing any traffic in or out. Disabling / re-enabling the NIC didn’t solve the problem so we had to perform a reboot.
As precaution we are going to enable the secondary NIC interface in case the first fails again so we can access the server and solve the problem.
We are aware of web1 and web2 along with mail and mail2 being offline at the moment. This is currently being investigated.
This only affects Windows 2003 shared / reseller hosting customers.
Update 8:07am - Everything appears to be back online, total downtime 9mins. We are currently awaiting an RFO.
Update 8:15am - Both boxes appear to offline again, we have escalated this back to the network provider for resolution.
Update 8:26am - Both boxes are back online, we are waiting to hear if this issue has been resolved or if it is still ongoing.
Update 8:59am - Unfortunately everything is up and down at the moment. We have received a notification from our network provider that the issues are related to one of their routers failing to detect a second router in the VRRP pair failing hence the working router was still routing traffic to a dead router.
Update 10:08am - We still don’t have an ETA from our network provider on fully resolving this issue. We have been informed this issue is known by the router manufacture and we are awaiting a further update.
Access to our 193.219.118.X block of addresses has been restored but pings are around 17ms above normal. This means web2 and email are functioning again.
Access to our 195.74.55.X block of addresses is still unavailable.
We will continue to update the status page when we have more info.
Update 12:15 - All services have been restored. A full RFO will be sent to all customers later on today.
All email held at the backup mail servers is being processed but it may take 1-2hrs for this to arrive in users inboxes.
We are currently experiencing an outage on all servers located in Kent. This includes primary email, all websites hosted on web2.gbdns.net and web3.gbdns.net.
Update: 00:01 - Service has been restored we are awaiting an RFO.
Update: 11:55 - RFO Below
We suffered a loss of connectivity to the F25 data-centre on the Kent Science Park at around 23:20hrs for just over half an hour. The problems leading up to the loss of service started at around 22:00hrs when the Kent Science Park campus started receiving an Over-Voltage from the national grid of 275V-AC.
The UPS equipment in the F25 data-centre where customer services are located is capable of dealing with an Over-Voltage and has been pulling the voltage down to 230V for the last three hours without issue.
The UPS equipment located in building B300 where the fibre ring terminates with our connectivity provider is also capable of dealing with an over-voltage, but not exceeding 260V-AC. As a result their UPS system switched to battery followed shortly by generator at around 22:00hrs. Unfortunately at 23:20 they suffered a catastrophic failure of a UPS which took out connectivity. Power was restored by isolating the failed UPS from the cluster around half an hour later which restored service.
The campus is still receiving a Over-Voltage of 275V-AC as I write this announcement. As a result KSP should be considered at risk while the national grid are working to restore the correct incoming voltage.
Going forward direct fibre to the F25 Data-centre is currently being dug, which will terminate in Telehouse North. Once this is completed we will no longer have electrical reliance on building B300.
We must apologise for any inconvenience caused this evening and will ensure we do everything possible to prevent a reoccurrence.
An update will be issued once incoming power voltage has returned to normal.
Update: 13:00
Incoming voltage was restored to 230V by the national grid at around 7am this morning. As a result building B300 is now running from mains, so our at-risk period has ended.
In the short-term we have been assured by our connectivity provider that they are currently undergoing investigation with the help of their UPS provider to discover how one of their UPS units failed shortly after their generator started.
Additionally we have been given an estimated completion time of approximately 8 weeks from NEOs who are installing the secondary circuit to Telehouse North. This will remove our electrical reliance on the ring which terminates in B300. This will also give diversity from our Telehouse East core.
All servers located in kent including primary email and managed servers are currently offline. We are investigating the issue and will provide an update as soon as possible
Update 22:30
Service was restored at 22:23 (17m 59s downtime).
An RFO will be added shortly.