In this lesson we will use the OSI model and a Cisco IOS command-line interface to identify the common issues and problems in networking. We will categorize the issues in different brackets, for example media issues, like excessive noise and collisions; port issues like speed and duplex; and configuration issues, like password and configuration file management.
The Layered Approach
In laying out a troubleshooting methodology, one of the best places to start is the OSI model. The layered approach allows us to focus on certain functions for certain layers, knowing what each layer can do. This divide and conquer approach will also help us categorize the tools we are going to use to troubleshoot each layer. A traditional layer 2 switch will operate at layers 1 and 2 and so we will have to deal with physical connectivity, RJ-45 connectors, cabling, and also Ethernet access into the media. We have to make the distinction between a layer 2 switch and a multilayer switch. Here we are talking about something like the 2960 Catalyst.
The multilayer switch will also deal and work at layer 3, have the routing function, and because of that require a different level of troubleshooting. In the case of the layer 2 switch, you may have layer 3 issues, but mostly related to the management functions of the switch. In other words, the switch ports on the layer 2 switch all deal with Ethernet problems and layer 2 components; however, the switch as a device will have an IP address and a default gateway, so that you can Telnet to it, SSH into it, and use SNMP to monitor it. In that sense, then you may have layer 3 issues as well related to IP addresses and default gateways.
Switched Media Issues
In laying out the troubleshooting methodology, some people start at layer 1 and start looking at potential media issues like damage to wiring or interference by electromagnetic sources. The category of UTP wiring will be critical. Cat-3 will have more sensitivity to the certain sources of EMI or electromagnetic interference, like air-conditioning systems. Cat-5 will have better enclosures and plastic around the wiring to protect it from such sources. Poor cable management could, for example, put a strain on RJ-45 connectors causing some cables to break.
Physical security could also be a cause of media issues. If you allow people to connect hubs to your switches or connect unwanted sources of traffic into the switch, then traffic patterns may change, not necessarily related to media or physical layer, but collisions would increase if you install the hub and connect it to your switch. This is related to physical connectivity and so it could be categorized as a physical layer or media issue.
After laying out the methodology and knowing that, for example, you are going to start at layer 1 and try to look at errors on that layer, and it is important to identify the output of the commands and relate them to the layers. The show interface command displays a wealth of information that is very useful. Again, it is important to match that information with your methodology. For example, the first line there shows that Fast Ethernet 0/1 is up and line protocol is up. Well, that first up is physical layer, second up is related to layer 2, and so if the interface is shown as down or disabled at physical layer, you know where to go. It could be a cabling problem, connectivity problem, simply there is no cable attached or there are so many physical layer errors that the switch decided to disable the interface.
Switch#sh interfaces fa 0/1
FastEthernet0/1 is up, line protocol is up (connected)
Hardware is Fast Ethernet, address is 0023.aca4.f091 (bia 0023.aca4.f091)
MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 10/100BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:14, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 5000 bits/sec, 6 packets/sec
1065544 packets input, 229455974 bytes, 0 no buffer
Received 109157 broadcasts (99147 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 99147 multicast, 0 pause input
0 input packets with dribble condition detected
8430743 packets output, 1316399122 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
It could also show as administratively down, which means that it was manually shut down by an administrator and a simple command would bring it back to life. Statistics are important. The input errors down there could be a signal of physical layer problems, for example, if there are multiple CRC type errors, this could be an indication of noise in the network or faulty Ethernet equipment. Similarly, overruns mean that the input rate exceeded the switch’s ability to process that traffic and ignored means that the switch is running low on internal buffers. You also have an indication of output errors, number of collisions, which really is not an indication of a problem. The number itself is not an indication of a problem; changes in the number may be an indication of a problem. Also, number of restart switch will tell you how many times the Ethernet controller has restarted because of errors.
If you suspect excessive noise, the symptom of that could be number of CRC errors or rather changes in the numbers of CRC errors not related to collisions; in other words, the CRC errors could be a result of collisions, but if the number of collisions is constant, consistent, and does not change or have peaks, then the CRC errors could be caused by excessive noise.
When this happens, then cable inspection is probably the first step. You can use the multitude of cable testers and tools available for that purpose. Poor design in using perhaps something other than CAT-5 cabling for Fast Ethernet and 100 Mb/s networks could be the cause of this and so the cable testing plus documentation could tell you the way to fix this problem.
If the rate of collisions exceeds the baseline for your network, then there are other different types of solutions to the problem. There are several rules of thumb in terms of what that baseline should be. Most of them point to the number of collisions with respect to total number of output packets to be less than 0.1 percent.
If collisions are a problem, it could be a defective or ill behaving device, for example, a network interface card sending excessive garbage into the network. This typically happens when there are circuitry or logic failures or even physical failures on the device. This condition is typically known as jabbering and again it relates to network interface cards and other devices continuously sending random or garbage data into the network. A time domain reflectometer or TDR could be used to find unterminated Ethernet cabling, which could be reflecting signals back into the network and causing collisions.
We know that a collision occurs when a transmitting Ethernet station detects another signal while transmitting a frame. A late collision is a special type of collision. If a collision occurs after the first 512 bits of data are transmitted by the transmitting station, then a late collision is said to have occurred. Most importantly, late collisions are not resent by the network interface card; in other words, they are not resent by Ethernet, unlike collisions occurring before the first 64 octets or bytes. It is left for the upper layers of the protocol stack to determine that there was loss of data and retransmit.
Late collisions should never occur in a properly designed Ethernet networks. Possible causes are usually incorrect cabling or a noncompliant number of hubs in the network; perhaps a bad network interface card could also cause late collisions. If they happen, they are typically detected using protocol analyzers and also verifying cabling distances and physical layer requirements and limitations of Ethernet.
Port Access Issues
Port access issues will most likely have very visible symptoms. Users will not be able to connect to the network and most of the times support teams will quickly notice because the users will complain. All of these problems are related to media and faulty equipment, network interface cards, etc., but a good portion of them are related to duplex and speed settings.
One of the most common causes of performance issues on Fast Ethernet links occurs when one port of the link operates at half-duplex, while the other port operates at full-duplex. This may occur when one or both ports on the link are reset and the autonegotiation process does not result in both link partners having the same configuration. It can also occur when users reconfigure one side of a link and forget to reconfigure the other side. In general terms, both sides of a link should have either autonegotiation on or both sides should have it off; however, duplex is subservient to speed in the sense that if speed is set to auto, for example, then the duplex cannot be manually set. You might even see a cyclic redundancy check or CRC error messages when both the speed and duplex settings are hard coded on the two devices.
Here is a good summary of the duplex issues related to duplex modes; one end set to full and the other set to half will result in a mismatch and a visible error message. One end set to full and the other to autonegotiation will revert to half-duplex if autonegotiation fails, but it will basically result in a mismatch. Other combinations may not even render a mismatch or error message, but they will revert to half-duplex. Even when both ends are set to auto negotiate, they may revert by default to different things, for example, a gigabit Ethernet defaults to full-duplex. If autonegotiation fails, while 10/100 ports default to half, in that case it would be a mismatch. So although autonegotiation is a useful feature, it allows you to have generic ports that could get any type of connectivity depending on what is being connected on the other end. Still, autonegotiation may be a problem and it is sometimes avoided in favor of static configurations of duplex settings.
Something similar happens with speed-related issues. If you set one end to one speed and set the other end to another speed, then that is obviously a mismatch, but even when they auto negotiate, they may result in a mismatch if only one side is negotiating and the other side is not. Again, duplex is related closely to speed and so autonegotiation of speed may result in things like half-duplex.
Other issues may be related to configuration errors. Some of them are related to losing configurations and not have proper backups; some of them are related to change management issues. In any case, here are some guidelines to lay out a good configuration management methodology. To keep hard copies is a good first step; save things in text files and centralize configurations in things like a TFTP server, always considering the security limitations of TFTP.
In terms of change management, keep multiple versions of the configurations before and after a change; be sure and be able to roll back to previous configurations if the fix did not work or if the fix actually affected other areas. Always save the configuration into NVRAM, so it is made available in the next bootup.
Finally, among other things, secure the configuration by password protecting the console, VTY, and other types of access into the devices. Also, protect the configuration backups on servers and other locations.