PDA

View Full Version : 30M LE2,LE3,LE4 outage all day since 0.30am


Chrysalis
07-03-2012, 16:53
Major outage which tech support said was only on my port but the 151 message says 3 postcode areas,alot of areas covered by 1 port?

currently the modem connects again but all TCP is frozen, no sites load etc. massive loss on udp.

can see in my first tbb link in sig.

---------- Post added at 16:53 ---------- Previous post was at 16:44 ----------

automated message says 6pm fix but rep says the updated status is a 24-48hour fix. They need to replace chassis. O_o

craigj2k12
07-03-2012, 17:14
if thats the case then maybe load balancing will be sorted after the works

Chrysalis
07-03-2012, 17:16
I suspect they were doing speed doubling work and ball'sed it up.

VM have agreed to cover the cost of me using this 3G for the duration at least.

Also rebooted the modem and it cant connect again. So TBB will go back to all red.

qasdfdsaq
07-03-2012, 17:34
Curious, that's a severe degradation of service, sure, but not a complete outage. Doesn't look like a backhaul/core problem, but doesn't look too much line line/port issues either except possibly a modulator/amplifier fault (i.e. something on the RF level) Mind PM'ing me your IP so I can do a trace?

Andrewcrawford23
07-03-2012, 17:36
Curious, that's a severe degradation of service, sure, but not a complete outage. Doesn't look like a backhaul/core problem, but doesn't look too much line line/port issues either except possibly a modulator/amplifier fault (i.e. something on the RF level) Mind PM'ing me your IP so I can do a trace?

if he offline he unliely to havea ip address

Chrysalis
07-03-2012, 17:37
I dont mind but right now the ip is offline as the modem wont connect.

although the loss is not 100% on tbb and when doing pings, I can confirm no TCP works at all so its effectively a complete outage even if the modem connects.

qasdfdsaq
07-03-2012, 17:40
Well if you PM me anyway I can try see if any other modems are online in the area, or at least see if the problem's local (which I guess it is)

And yes I'll agree if you can't get anything done it "feels" like an outage, but on the DOCSIS layer, something is getting through. In the digital world where everything is binary, I sometimes forget not all failures are all-or-nothing ;)

Andrewcrawford23
07-03-2012, 17:40
I dont mind but right now the ip is offline as the modem wont connect.

although the loss is not 100% on tbb and when doing pings, I can confirm no TCP works at all so its effectively a complete outage even if the modem connects.

if tbb is pinging it then you must have aip get the ip from thinkbroadband site im pretty curious myself liek qas now

craigj2k12
07-03-2012, 17:42
if tbb is pinging it then you must have aip get the ip from thinkbroadband site im pretty curious myself liek qas now

the modem isnt connected (thats the same as modem is offline) so you can try pinging it but it aint gonna work

Chrysalis
07-03-2012, 17:43
tbb was pinging when the modem connected, it isnt right now.

not all of LE3 is offline either as my sister's net is fine. So the affected users are scattered across those postcodes.

I can ping my ubr ip fine from a server. But the ips I have tried so far in the same subnet dont respond. Although I didnt try many.

morley04
07-03-2012, 17:45
Im from LE3 and still online but been having a few problems lately

Andrewcrawford23
07-03-2012, 17:49
tbb was pining when th modem connected, it isnt right now.

not all of LE3 is offline either as my sister's net is fine. So the affected users are scattered across those postcodes.

I can ping my ubr ip fine from a server. But the ips I have tried so far in the same subnet dont respond. Although I didnt try many.

what the ip of the first hope off your newtowrk?

---------- Post added at 17:49 ---------- Previous post was at 17:48 ----------

the modem isnt connected (thats the same as modem is offline) so you can try pinging it but it aint gonna work

chy explains belo you ost :p i had misunderstood i thought he meant it was pinigng him when the modem was showing no i[p

qasdfdsaq
07-03-2012, 17:58
I've got several dozen IPs on the same subnet responding, but that's not what's confusing me... Trying to trace the actual cable and line card. It's a pain.

Inbound traffic to your IP goes via 3 different last hops before dropping:
leic-cmts-14-gigaether-21.network.virginmedia.net
leic-cmts-14-gigaether-141.network.virginmedia.net
leic-cmts-14-gigaether-151.network.virginmedia.net

You're on CMTS 14 (clearly) - cpc14-leic14-2-0-custXXX.8-1.cable.virginmedia.com Of the custXXX on supposedly the same "chassis", most are getting ~25% packet loss. There are however a few that have 0% packet loss. Certainly not the whole "chassis" having a fault as they claim, probably more like a line card or amp gone bust.

Oh, and quite a few people have Superhubs with IP flood detection turned on ;)

Chrysalis
07-03-2012, 18:01
I can run traceroutes fine, its not a routing issue. Also if it was a routing issue the modem would still connect to the UBR without blinking for hours on end as well.

When I tried the superhub that also doesnt connect, it shows no upstream channel at all and only 1 downstream channel.

The vmng300 shows the upstream channel and 1 downstream channel.

qasdfdsaq
07-03-2012, 18:13
As above, based on my limited knowledge of VM's network it looks like something on the line-card level. One channel (or channel group) gone bust, which also explains the localized and sporadic occurrences. So yes, basically just "your port".

Chrysalis
07-03-2012, 18:17
If it just my port the number of postcodes gives an idea how many areas may have been crammed on it, also interesting that its areas far apart from each other, unless maybe some other isolated ports are affected as well.

The 151 message now says 8.25pm.

qasdfdsaq
07-03-2012, 18:22
Not sure quite how it all works to be honest. Possibly several "ports" and channel sets go to each optical node, which then fans out several dozen copper splitter cabs. Then everyone in all the postcodes would be physically connected to all the upstream ports, but get allocated onto a single one depending on how VM do these things.

Or something else. I'm just thinking out loud.

Chrysalis
07-03-2012, 18:33
modem leds show connected again so you can try my ip.

still hosed, sites sit trying to load forever till aborted.

qasdfdsaq
07-03-2012, 18:40
Lots of packet loss still.

I'm not sure what the 'cpc' in a VM RDNS means. But people on different 'cpcXX' to you are having problems while not everyone on the same cpc are (e.g. cpc11-leic14-2-0...etc)

Chrysalis
07-03-2012, 18:43
Whats interesting is sites dont time out, they show as connected but then just stay frozen in that state for hours until I forcefully abort it. same with ssh, it sits there in a connecting state but doesnt time out or send an error back.

qasdfdsaq
07-03-2012, 18:45
I suspect that's TCP trying to cope with a severely lossy connection. Since it does get data through once in a while it's not your normal "no connection" situation. It never gets enough through to do anything useful (the packet loss occurs several times a second) but it does get enough through to reset the timeout counter each time. The actual periods of packet loss are only ~10-20ms in length but it does drop all data in that time.

Chrysalis
07-03-2012, 18:51
agreed

---------- Post added at 18:51 ---------- Previous post was at 18:46 ----------

in last 3 hours 408 T3 timeouts and 16 T4 timeouts.

qasdfdsaq
07-03-2012, 18:55
What about FEC/HEC errors? How's your line stats?

If it's amp/modem errors on the line card I'd expect you'd see a lot of corrupt data. I'm not actually sure if the CPE requests a retransmit if it receives a corrupt packet from the CMTS.

Chrysalis
07-03-2012, 19:24
the codeword stats dont seem out of the ordinary

correctable codewords for DS channels in order
2 1 0 2
uncorrectable in order
285 299 296 273

---------- Post added at 19:01 ---------- Previous post was at 18:59 ----------

the channel ids have changed again today as well.

---------- Post added at 19:24 ---------- Previous post was at 19:01 ----------

LE1 also affected now according to 151 and estimated time is removed from message.

thenry
07-03-2012, 21:11
VM are sending a message! turn your thinkbroadband monitor upside down.

honestly are any of the stated placed apperently ready for upgrade actually ready at all?

desi112
07-03-2012, 21:47
Sorry for the late reply Only just saw this post. I've had no issues all day LE3 6, connection has been up all day

I'm on CPC3-LEIC16

VM status page:

You will currently be experiencing a loss of broadband internet service. Our engineers are aware of this issue and are currently working to fix it fast. We apologise for any inconvenience this may be causing.
Date Issue raised: March 07 2012, 09:25
Estimated repair time: March 08 2012, 09:00
Fault reference: F001916713

Chrysalis
07-03-2012, 23:00
well VM seem to have proven themselves cowboys in this affair, why me and anyone else affected hasnt simply been moved of this cursed port I would like to know.

Chrysalis
08-03-2012, 10:13
my connection is back on now.

qasdfdsaq
08-03-2012, 10:46
Over a day without internet. I'm impressed you survived :D

Chrysalis
08-03-2012, 10:51
yesterday I had a lot of stuff to do outside anyway, but when I was back I used 3G.

qasdfdsaq
08-03-2012, 11:03
Didn't you have (used to have?) an ADSL line on backup and a multi-homed router?

Chrysalis
08-03-2012, 11:10
that stopped a while ago. :)

I killed it when I realised for emergencies I have 3G and VM also had improved.

But when FTTC is launched round here I will probably start the multihomed poor man's BGP setup again :)

Chrysalis
08-03-2012, 13:10
People on different channel groups were affected, my neighbour who is on diff US's to me confirmed and I also seen his graph now as well as another connection on a different US also been affected. I am hoping it was due to upgrade work been bodged up but given it now seems to have affected multiple ports it seems less likely.

my neighbours graph a few doors down. (he is on diff upstreams and has very good US performance usually).

http://www.thinkbroadband.com/ping/share-thumb/c54909461fca7b99cfc2a28ea6e96001-08-03-2012.png (http://www.thinkbroadband.com/ping/share/c54909461fca7b99cfc2a28ea6e96001-08-03-2012.html)

http://www.thinkbroadband.com/ping/share-thumb/185ae4b1e8114822270d3675cebbd88e-06-03-2012.png (http://www.thinkbroadband.com/ping/share/185ae4b1e8114822270d3675cebbd88e-06-03-2012.html)

qasdfdsaq
08-03-2012, 14:08
Wait, is the upper graph yours or your neighbours? It's silly that they all have the same name >_<

Also is/were they on a different downstream as well or just different upstream? It looks like a downstream problem.

thenry
08-03-2012, 14:15
the top ones his qas, the bottom ones his neighbours.

craigj2k12
08-03-2012, 14:30
they are both his neighbors, look at his live graph its different

---------- Post added at 14:30 ---------- Previous post was at 14:29 ----------

and to add, your live graph is looking better than beforehand chrys

thenry
08-03-2012, 15:16
so it is :o: whats going on with his?

craigj2k12
08-03-2012, 15:28
read this thread???

thenry
08-03-2012, 15:31
yeah i know its been botched, i meant a more technical explanation if at all possible. it still hasn't settled, unless the 'ports' overcrowded?

craigj2k12
08-03-2012, 15:32
yes is has

this is chrys' live graph
http://www.thinkbroadband.com/ping/share-thumb/b57b99f28b6b0dc40e4df3690c2401fe.png

Chrysalis
08-03-2012, 16:01
Wait, is the upper graph yours or your neighbours? It's silly that they all have the same name >_<

Also is/were they on a different downstream as well or just different upstream? It looks like a downstream problem.

Both graphs are my neighbours, mine is in my sig.

One was the day before the outage showing how it normally is over 24 hours, the other is today's.

I posted showing that it affected different ports also, but also how lame it is that 2 people on the same street get such a marked difference in service.

From what I can tell based on the channel frequencies is he is on a different set of upstream channels to me but on the same downstreams.

The jitter/latency differences are upstream related not downstream. With that graph he doesnt very often get full speed tho as the downstream channels are congested, he is on the 100mbit service and usually gets 15-50meg at peak and 80mbit or so off peak, when I say off peak I mean in the morning as he never uses it during the night. Although I guess even with the same frequencies it is possible his downstreams are also different to mine.

Yeah the name is wrong, I got a few marked all the same name as copy and pasted, I will rename it so less confusion, also will pm you his ip if you want to compare traceroutes with mine.


---------- Post added at 16:01 ---------- Previous post was at 15:59 ----------

they are both his neighbors, look at his live graph its different

---------- Post added at 14:30 ---------- Previous post was at 14:29 ----------

and to add, your live graph is looking better than beforehand chrys

it was until more modems came back online. Now its looking more like before the outage. I expect within 48 hours it will be the same as it was before maybe even worse as the sales guy will still be working. The channel id's have changed but none of the frequencies have and the ip is still the same as well.

AaronCooper
08-03-2012, 16:28
No outage for me, checked my tbb charts just to double check.

LE3, cpc5-leic16-2-0-cust137.8-1.cable.virginmedia.com

Chrysalis
08-03-2012, 16:31
yeah I have only found one other guy online who was also affected, someone who replied on VMs forums.

qasdfdsaq
08-03-2012, 16:36
No outage for me, checked my tbb charts just to double check.

LE3, cpc5-leic16-2-0-cust137.8-1.cable.virginmedia.com
You're on a completely different CMTS.

Chrysalis
08-03-2012, 16:37
how are you identifying the ctms?

if I spoof my router MAC I can easily get into a different rdns. I can even change to different cpcXX. So I dont think thats a reliable way of identifying.

craigj2k12
08-03-2012, 16:39
cpc14-leic14-2-0 compared to cpc5-leic16-2-0

I presume thats the difference?

Chrysalis
08-03-2012, 16:42
I just checked some of past ips I have been on.

current is cpc14
I have also been on cpc9 and cpc15 simply from forcing a dhcp change.

although they all leic14

qasdfdsaq
08-03-2012, 16:54
cpcXXX doesn't seem to relate to your CMTS. The leic14 bit though, always correlates to the leic-cmts-14-gigaether-141.network.virginmedia.net on the inbound trace. Correspondingly, traffic to leic16 always goes through leic-cmts-16-gigaether-XXX.

I would assume then as leic16 = leic-cmts-16 and that "cmts-16" is explicitly stating the CMTS number, then that would be your CMTS.