PDA

View Full Version : Frequent packet loss inside VM network


play
13-09-2011, 17:12
Hi,

I seem to be getting this on a daily basis over the past week. Of course the system status page says everything's fine, despite packet loss (up to 70% at times) inside the VM network.

Here's some mtr output to news.bbc.co.uk of the the problem that's occurring right now:


Host Loss% Snt Last Avg Best Wrst StDev
1. 10.99.188.1 0.0% 126 8.0 12.7 5.5 56.4 8.2
2. lutn-cam-1a-v134.network.virginmedia.net 0.0% 126 9.2 14.0 6.4 49.9 8.5
3. lutn-core-1a-ae2-0.network.virginmedia.net 0.0% 126 47.3 14.0 6.3 52.5 9.1
4. popl-bb-1a-as5-0.network.virginmedia.net 0.0% 126 10.4 18.3 7.5 152.0 19.1
5. nrth-bb-1b-as3-0.network.virginmedia.net 37.6% 126 39.4 58.1 23.4 187.5 29.1
6. nrth-tmr-2-ae6-0.network.virginmedia.net 36.8% 125 51.0 57.8 26.4 123.7 17.0
7. tele-ic-1-as0-0.network.virginmedia.net 40.3% 125 43.9 56.1 26.1 88.6 14.2
8. pos6-1.rt0.thdo.bbc.co.uk 36.8% 125 56.2 61.2 29.7 208.6 24.0
9. 212.58.238.129 32.8% 125 63.4 60.9 31.3 177.7 18.9
10. te12-1.hsw0.cwwtf.bbc.co.uk 41.9% 125 65.9 58.0 27.4 91.2 12.8
11. 212.58.255.12 28.2% 125 53.4 58.7 25.1 95.1 11.9
12. bbc-vip005.cwwtf.bbc.co.uk 31.2% 125 74.8 55.7 26.5 91.0 12.6


Why does this happen so often?

Thanks.

Sephiroth
13-09-2011, 17:22
MTR doesn't tell us whether packet loss is at a destimation hop or a pass-through hop. If the latter, then it doesn't matter. That said, your final destination hop (BBC) shows 31.2%.

Anyway, we need to see a number of things:

1/
The results of PATHPING WWW.BBC.CO.UK which differentiates between destination & pass-through packet loss.

2/
Your full modem stats, including the Operational Config, so we can see if there's a discernable cause.

3/
The Modem Event Log (not after a reboot but after the PATHPING) so we can see some history to correlate with the stats.

It needn't be a problem inside the VM network. It could be local congestion; it could be circuit noise spoiling the ping in one direction or other.

play
13-09-2011, 17:38
I just used the BBC as an example, the problem is with every target. I don't believe I'm misinterpreting the mtr output though, it just happened to be particularly bad at the point I measured (it has since resolved, so I can't replicate right now), and packets are not making the return trip. Generally the loss only occurs within the VM network. I'll try to grab some more stats next time it happens.

I suppose it could be my modem (but then I would surely see loss from my CPE address and the 10/8 onwards) but it doesn't appear to have a management interface. Pathping is a Windows thing, I'm on Debian and OS/X where the equivalent is mtr.

Sephiroth
13-09-2011, 17:47
It's a shame you can't run pathping. However, bearing in mind that the packet loss could be in either direction, it may well be an impairment such as noise and thus not "in the VM network".

That said, as I'm sure you know, a busy VM network can discard pings when there is real data waiting. But again, the loss occurs at unexpected points such as the interconnect (7) and BBC.

That's why we'd want to see the stats and event log. Incidentally, when you post the stats, please tell us the time of day when that MTR was taken in case there are correlating events in the Event Log.

play
13-09-2011, 17:59
Will do. How do I get to the management page of a 255 modem? I'm not seeing any port 80/443 open when I nmap it.

Sephiroth
13-09-2011, 18:12
It's on IP address 192.168.100.1 from a web browser.

play
13-09-2011, 18:29
Thanks, don't think I've ever looked at that before.

Nothing in my event log for the past few days (except one DHCP renew warning on the 11th), I will check it next time it happens, more than likely tomorrow.

Cheers.

Sephiroth
13-09-2011, 18:34
Good. Just copy & paste the stats, operational config and the event log.

If you feel like it, you could keep track of your situation on an ongoing basis by setting yourself up at www.thinkbroadband.com/ping .

I keep all my stats, event logs and TBB graphs to help me correlate and prove to VM (if it becomes necessary) that what I say has merit!

play
13-09-2011, 19:17
Shame the forum doesn't take monospace as-is :)

Cable Modem Information

Cable Modem : DOCSIS 1.0/1.1/2.0 Compliant
MAC Address : 00:14:a4:xx:xx:xx
Serial Number : 0014A4XXXXXX
Boot Code Version : 3.1.6d
Software Version : 2.94.1015
Hardware Version : 1.19


Cable Modem Status

Item Status Comments
Acquire a Downstream Channel 403000000 Hz Locked
Connectivity State OK Operational
Boot State OK Operational


Cable Modem Downstream

Downstream Lock : Locked
Downstream Channel Id : 0
Downstream Frequency : 403000000 Hz
Downstream Modulation : QAM256
Downstream Symbol Rate : 5360.537 Ksym/sec
Downstream Interleave Depth : taps32Increment4
Downstream Receive Power Level : 3.6 dBmV
Downstream SNR : 41.2 dB


Cable Modem Upstream

Upstream Lock : Locked
Upstream Channel ID : 4
Upstream Frequency : 37500000 Hz
Upstream Modulation : QPSK
Upstream Symbol Rate : 2560 Ksym/sec
Upstream transmit Power Level : 42.0 dBmV
Upstream Mini-Slot Size : 2


Cable Modem Upstream Burst

Modulation Type QPSK QPSK QPSK QPSK QPSK
Differential Encoding Off Off Off Off Off
Preamble Length 64 128 128 100 80
Preamble Value Offset 396 6 6 396 396
FEC Error Correction (T) 0 5 5 3 9
FEC Codeword Information Bytes (k) 16 34 34 78 232
Scrambler Seed 338 338 338 338 338
Maximum Burst Size 0 0 0 35 254
Guard Time Size 8 48 48 25 134
Last Codeword Length Fixed Fixed Fixed Short Short
Scrambler on/off On On On On On


Cable Modem Operation Configuration

Network Access : Allowed
Maximum Downstream Data Rate : 10240000
Maximum Upstream Data Rate : 1072000
Maximum Upstream Channel Burst : 8160
Maximum Number of CPEs : 1
Modem Capability : Concatenation Enabled, Fragametation Enabled, PHS Enabled

Event Log

Sun Sep 11 18:24:01 2011 Sun Sep 11 18:24:01 2011 Warning (5) DHCP RENEW WARNING - Field invalid in response
Sun Sep 04 20:54:48 2011 Sun Sep 04 20:54:48 2011 Critical (3) Started Unicast Maintenance Ranging - No Response received - ...
Sun Sep 04 18:24:01 2011 Sun Sep 04 18:24:01 2011 Warning (5) DHCP RENEW WARNING - Field invalid in response
Sat Sep 03 17:25:24 2011 Sat Sep 03 17:25:24 2011 Critical (3) Started Unicast Maintenance Ranging - No Response received - ...
Thu Sep 01 06:24:01 2011 Thu Sep 01 06:24:01 2011 Warning (5) DHCP RENEW WARNING - Field invalid in response
Tue Aug 30 21:27:37 2011 Tue Aug 30 21:27:37 2011 Critical (3) Started Unicast Maintenance Ranging - No Response received - ...
Sun Aug 28 18:24:01 2011 Sun Aug 28 18:24:01 2011 Warning (5) DHCP RENEW WARNING - Field invalid in response
Sun Aug 28 17:09:58 2011 Sun Aug 28 17:09:58 2011 Critical (3) Started Unicast Maintenance Ranging - No Response received - ...
Thu Aug 25 06:24:01 2011 Thu Aug 25 06:24:01 2011 Warning (5) DHCP RENEW WARNING - Field invalid in response
Thu Aug 25 00:02:05 2011 Thu Aug 25 00:02:05 2011 Critical (3) Started Unicast Maintenance Ranging - No Response received - ...
Tue Aug 23 09:50:35 2011 Tue Aug 23 09:50:35 2011 Information (7) The s/w filename specified in the config file is the same as ...
Tue Aug 23 09:50:35 2011 Tue Aug 23 09:50:35 2011 Information (7) A software upgrade filename was specified in the config file.
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) Authorized
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) Registration complete!
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) We registered with a DOCSIS 1.1 config file!
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) Received a REG-RSP message from the CMTS...
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) Sending a REG-REQ to the CMTS...
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) CableModem SNMP configure complete
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) IP init completed ok
Tue Aug 23 09:50:34 2011 Tue Aug 23 09:50:34 2011 Information (7) CableModem TFTP init ok
Time Not Established Time Not Established Information (7) CableModem DHCP client init ok
Time Not Established Time Not Established Critical (3) DHCP WARNING - Non-critical field invalid in response.
Time Not Established Time Not Established Information (7) MAP w/initial maintenance region received
Time Not Established Time Not Established Critical (3) No Ranging Response received - T3 time-out
Time Not Established Time Not Established Information (7) MAP w/initial maintenance region received
Time Not Established Time Not Established Information (7) Downstream sync ok
Time Not Established Time Not Established Information (7) MAP w/initial maintenance region received
Time Not Established Time Not Established Information (7) Beginning initial ranging...
Time Not Established Time Not Established Information (7) downstream time sync acquired...
Time Not Established Time Not Established Information (7) Downstream sync ok
Time Not Established Time Not Established Information (7) starting ds time sync acquisition...
Time Not Established Time Not Established Information (7) Locked on the downstream. Waiting for UCDs...

Sephiroth
13-09-2011, 20:00
Thanks for that.

Cable Modem Upstream

Upstream Lock : Locked
Upstream Channel ID : 4
Upstream Frequency : 37500000 Hz
Upstream Modulation : QPSK
Upstream Symbol Rate : 2560 Ksym/sec
Upstream transmit Power Level : 42.0 dBmV
Upstream Mini-Slot Size : 2

At first sight, your problem appears to be the fact that you are on QPSK upstream modulation. On a less noisy upstream circuit, your modulation would be 16QAM which packs 4 bits per symbol presented to the error correction cisuit at either end. QPSK is the more noise tolerant 2 bits per symbol - for that channel.

This potentially has two effects:

1/
Your line is so noisy that packets don't reach their destination. However, your event log doesn't contain evidence of corrupted data, which would appear as serious numbers of T3 timeouts. This suggests that your circuit is coping with noise on QPSK modulation.

2/
Because your upstream only packs 2 bits per symbol, everyone on your channel has this modulation. The capacity of the channel is only half of what it would be at 16QAM so it congests sooner because people don't stop doing stuff if it appears to work. That sort of congestion on your upstream isn't in the VM network - it's between you and your entry point to the VM network.

Obviously I can't rule out something in the network, but seeing QPSK in your upstream makes nme raise the proverbial eyebrow.

IMO you should post all of this (MTR, stats, event log) and your story on the VM Forum and wait for a tech to check your circuit. They'll fo that anyway - they'll have the history and it won't matter if you haven't seen a naff MTR recently.

Hope that helps.

play
13-09-2011, 21:46
Great, thanks a lot for your assistance, I owe you a beer or something :)

I suppose, since this has only really been happening the past week or so, that any contention below those levels wouldn't have been apparent. It doesn't rear its head that much during normal browsing (there's a squid and caching BIND on my side), but is quite obvious over a VPN or ssh session, which I use heavily.

Ignitionnet
14-09-2011, 10:57
There's nothing wrong with your modem that I can see play, however this is concerning:

4. popl-bb-1a-as5-0.network.virginmedia.net 0.0% 126 10.4 18.3 7.5 152.0 19.1

5. nrth-bb-1b-as3-0.network.virginmedia.net 37.6% 126 39.4 58.1 23.4 187.5 29.1

I wonder if there's an intermittent problem between Poplar and Northampton? I don't see how the issue could be with your modem or local area given everything works as it should in between Luton and Poplar then suddenly goes wrong between Poplar and Northampton.

Forget everything else but post the MTR, that's the most convincing bit and points to an issue with VM's core network.