We expect bidirectional connectivity in any modality of communication. The only kind of unidirectional connectivity we would see is when we're making copies of traffic for things like security. Otherwise we typically see bidirectional communication. And we want to have good end-to-end connectivity between devices that are in the same subnet and devices that are connected up over large geographic distances and that might mean that they're separated by a few switches and many routers. So if we're troubleshooting end-to-end connectivity what sorts of things are we going to be thinking about?
There's quite a few different components that we're going to have to be focused on here when it comes to end-to-end connectivity. First and foremost, what devices are along that path from end-to-end, right? What is at each end? In some particular case, we have our PC, we have our server. But there'll be switches, there'll be routers, there'll physical cables, might be WAN connections, Ethernet connections. So there are all these different components that come into play in the overall troubleshooting strategy.
Verifying IPv4 connectivity
For troubleshooting to be effective, you have to have a strategy. You can't just troubleshoot haphazardly. If you have a flow that you can follow and a structured approach, it really makes your life a lot easier. Troubleshooting can be a very emotional process. There has been numerous cases where you might have been working on a problem, put your head down in disgust, oh, I'm never going to solve this problem. Well there's always a solution to a problem but having that structured approach will help you take that emotional aspect out of it.
I'll tell you something that isn't a good recipe for success in troubleshooting and that is, thinking that your device is somehow misbehaving. That something is wrong just because, let's say, the router operating system is performing wrong.
No, generally speaking we should assume there is something, that is an expression of the fact that maybe our configuration is a mess or the routing table is showing us the signs of the problem. But if we constantly get ourselves into a situation where we think all the programming is wrong; that protocol isn't working right. It's probably going to steer us towards the wrong path. To steer us in the right path, we want to be confident that we can solve any issue. And the reality is, we may not be able to solve every single issue but if we have the confidence to say, you know what, I can tackle any issue. That's going to help you be successful in troubleshooting even though this is one of those touchy-feely ideas.
So here's the flow we're going to follow here. Your flow maybe different and that's okay. But we're going to follow this and in the end if you piece together different components of this flow and your flow is different and it works for you, that's perfectly fine.
So our first stop in this flow is, end-to-end connectivity. Is it operational? Yes, we're done, there's nothing else we have to do. But how can we verify whether we really have that end-to-end connectivity or not?
Using ICMP protocol tools for troubleshooting
Well we can look at the PC here and their typical tools that we can execute from a command prompt.
We can ping which launches out Internet Control Message Protocol, or ICMP echos and we expect an echo reply in return. So when I do this ping command, I get to see replies back. And if I see 4 replies, I have good connectivity.
Let's say, I'm on Ethernet, should we be terrified if the first ping is lost and then I see 3 successful ICMP echo replies in return? Let me clarify - with Ethernet at layer 2 we need MAC addresses in the frame. Well if I don't know the destination MAC address, what do I have to do? I've to send an ARP request. I have to ARP. So if I lose that very first ping, it's because I was ARPing for that destination MAC address. And then if the second, third, and fourth are successful, well that means the ARP reply was successful and I was able to complete this ping packet that was going out. And we see a sign of even that going on right here where the first ping takes up substantially longer to get a reply back to, than the others.
C:\>ping 172.16.1.100Pinging 172.16.1.100 with 32 bytes of data:Reply from 172.16.1.100: bytes=32 time=7ms TTL=64Reply from 172.16.1.100: bytes=32 time=1ms TTL=64Reply from 172.16.1.100: bytes=32 time=1ms TTL=64Reply from 172.16.1.100: bytes=32 time=2ms TTL=64Ping statistics for 172.16.1.100:Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),Approximate round trip times in milli-seconds:Minimum = 1ms, Maximum = 7ms, Average = 3ms
C:\>tracert 172.16.1.100Tracing route to 172.16.1.100 over a maximum of 30 hops:1 1 ms <1 ms <1 ms 10.1.10.12 3 ms 1 ms 3 ms 192.168.1.23 5 ms 2 ms 2 ms 172.16.1.100Trace complete.
I start to worry when I see anything less than the latter 3 coming back as successful. If you were to see something where you saw one success, one failure, one success, one failure, what could that be a sign of? So I'm getting a very consistent pattern in my ping output - success/failure, success/failure, success/failure, and you do this over and over again. Up arrow, press enter which also works in the DOS prompt. What would you be thinking about if you saw that from a lot of different clients on a VLAN, or in an entire site spread amongst multiple VLANs going to a distant subnet?
Well it sounds to me like we have sporadic connectivity at this point. I'm able to successfully connect only 50% of the time. It might be possible that we have multiple paths and load balancing going on and some traffic is getting through and some traffic is not getting through. So we're probably not going to troubleshoot if it's like what we see here with this ping.
What might it mean if we do a ping and we see a 'U' and we see these on routers also? We get a U in return, what does that mean to us? A 'U' means the destination is unreachable so some router along that path has returned to us an ICMP packet that says "I can't get there". The destination is unreachable. I do not have a route in order to get this packet any further on. And so, therefore we might go to the traceroute command and we might use the traceroute command to see how far do I get before I start getting my traffic in this fail condition where the router is being kind enough to send us that ICMP destination unreachable.
We can also see things like latency statistics. In the real world, we may use this to determine suboptimal pathways. I've certainly seen it where there are certain configurations that easily can cause suboptimal routing.
When we connect up to the internet with multiple service providers, we may want to influence which service provider we go to. A traceroute could show us that. Traceroute shows us, hey, we're going over the slow ISP even though the primary ISP is up. Seen it where people's VPN configuration, used it like a static route to point to the other side of a site. And if someone uses a static route in conjunction with dynamic routing, what's the interaction of a default configured, not a default route, but just a static route that I plug in and I haven't done anything special to that static route, how is that going to play with dynamic routing? Really doesn't play at all, because that static route is only available on that particular route you configured on. It doesn't get advertised to any other device. But we also have to remember that the static route is more preferred than any other type of dynamic route because of the administrative distance. For static route, it's one for the dynamic routing protocol, you're using Enhanced Interior Gateway Routing Protocol, or EIGRP, it's 90, Open Shortest Path First, or OSPF, it's 110. So that static route is more believable, more preferred than any type of dynamic route we have.
So is ping only available on our PCs or can we take that ping command as well as that traceroute command and use it elsewhere? We can use it on our switches; we can use it on our routers. So it's not just for testing connectivity from that particular PC to the destination. We could go to those intermediary devices along the path and test connectivity from there as well. Now I want you to think about something. If I issue a ping, as we see here when we look at switch one.
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.100, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/8 ms
Type escape sequence to abort.Tracing the route to 172.16.1.100
1 10.1.1.1 8 msec 0 msec *1 172.16.1.100 0 msec 0 msec 0 msec
If I issue a ping on switch one, what am I doing? Am I testing connectivity for the end users of this switch? So I got 20 PCs plugged into the switch. Would issuing a ping from the switch be a good troubleshooting methodology to see if our end users on that switch should have connectivity?
No, not at all. Here's the reason why. When I'm pinging from that switch, I'm testing layer 3 connectivity from the switch to whatever that destination IP address I'm pinging to. Does that mean my client successfully has layer 2 connectivity up to the switch?
No, that doesn't mean anything in regards to that. So our end station could be in the wrong VLAN. They could be connected to the wrong port, they could have port security configured and, you know, we don't have the ability to test that layer 2 connectivity with this ping from the switch. So all you're really doing here is, you're checking management connectivity for the switch itself and that's what we could be confident with if we got a successful ping. Oh, I can communicate to my next hop router. I probably can be managed via layer 3 services and traceroute would perform a similar function in that regard. So we're not going to be doing a lot of traceroute from a switch.
Now we're all pretty much familiar with the use of telnet. What do we use telnet for? Of course, remote connectivity. We use telnet to remotely connect to our switches or routers so that we could administer them from a far-off location. If we use telnet and we don't specify any particular port number, which port number is used by default for telnet? The default telnet port is 23 and that's in fact pretty much always what we see in a Cisco administered environment. What if we tack on a number afterwards like here?
Trying 172.16.1.100 ... Open
SW1#telnet 172.16.1.100 80
Trying 172.16.1.100, 80 ... Open
SW1#telnet 10.1.1.1 25
Trying 10.1.1.1, 25 ...
% Connection refused by remote host
Well first and foremost, when we did our first telnet connecting to port 23, it really gave us two, two verifications here. One, we could remotely connect to the device, that's great. But two, it's telling us that port 23 is open on that device. So in this case, when we see that we've tacked on 80, we are specifying a different port to connect to utilizing telnet. So instead of the standard port of 23, let's use port 80 and notice here it says open. That means, we had a successful connection to port 80 using telnet. It means that port 80 is running on switch one. That port is used for HTTP so that would indicate that they have the command IP HTTP server enabled in global configuration mode, which is a problem. This would, in fact, be a good way to validate if your switch had been hardened by disabling HTTP and it's proving to be the case where we're not hardened, we are not secured.
What about port 25? That's mail. In this example, here switch one does not have port 25 open. It's not running any type of mail services. We can see that connection was refused by the remote host. Yeah, and we don't expect our routers and switches to be running port 25. But we know about one other pretty common error message that you'll see if you do a telnet. Let's say you do this and it says, open. You didn't change the port, you just did a raw telnet and it says password required but none set. I want everybody to know the answer to that one.
We could go back to our PCs and we could start to think about ARP. Are we going to troubleshoot ARP? Not often. We could troubleshoot ARP or view the ARP table if I want to see relationship between a known IP address that might be causing problems in the MAC address or vice versa. For instance there are hacks that people can do that will poison your ARP cache. They can spoof the default gateway and if you were to do this command, you saw the IP address of the default gateway but the MAC address of some that you know that isn't your Cisco router or your multilayer switch that would be a sign of a real problem. Someone is hacking your network or there is another device claiming to be your default gateway.
By looking at the show mac address-table command, you know, that's different, a lot of people think that the ARP cache is the same as the MAC address table. Well the ARP cache is the layer 3 IP address mapping to the layer 2 physical address mapping. Whereas if we go down to the MAC address table, it's the layer 2 physical MAC address mapping to the port mapping on that particular switch. And how does the switch learn about MAC addresses? Learns it from the frame. What part of the frame? The source MAC address of the frame. So frame arrives on an interface, the switch looks at the source MAC address and then associates that MAC address with the port it came in on.
So why might we look at the MAC address table? And by the way, I like to tack on dynamic at the end of this because we don't usually look at the static entries because most of the times, they're also system generated static MAC addresses. Well I might come in here and I might see an inordinate number of MAC addresses. We're talking thousands upon thousands of MAC addresses paired up with a single port and then I do a command similar to this, show MAC address-table count. And my show MAC address table count that I'm envisioning in my mind that isn't here, is showing us that our entire MAC address table is filled, entire MAC address table is filled. Some switches have MAC address table like a 3750 has about 6100 entries in this MAC address table. I'm looking it, I see it filled. What should I be thinking about then?
Well if your MAC address table is filled, it's quite possible that you have some type of attack going on within your organization. Someone is flooding your switch with an exceptional number of MAC addresses. So that way there, the table overflows and then your switch no longer behaves like a switch, behaves more like a hub. You're spoofing a lot of MAC addresses there. So how would we deal with this attack against our network and say I want to limit the number of MAC addresses that can be paired up to a single physical port. What's our configuration that will solve that? Port security. We can set a maximum number of MAC addresses that can be learned on a port. And as a result, if there's that one station, that malicious station, that is flooding our switch with MAC addresses. If we set a maximum of two or three then we'll only learn three on that port and we'll stop that type of attack from happening.