TSHOOT Notes

Network Maintenance Models:

ITIL – IT Infrastructure Library – Best practice for IT service management.

FCAPS – Fault, Configuration, Accounting, Performance and security – ISO model for network maintenance.

TMN – Telecommunication Management Network – ITU-T adapted model of FCAPS.

PPDIOO – Cisco method. Prepare, plan, design, implement, operate, optimize.

**********************************************************************************

– If we have already provided username/password for FTP connections in the configuration, (using “ip ftp username/password” or “ip http client {username | password}“) we don’t need to specify credentials in “copy start ftp” command.

– HTTPs and SCP are more secure than HTTP/FTP which sends password in clear text.

**********************************************************************************

Archive configuration:

archive

                  path <flash>

                  write-memory

                  time-period <minutes>

– Configuration Rollback: using “configure replace <path which has new running-config>  list

**********************************************************************************

NTP: “ntp server <ip>”

Daylight saving: “clock summer-time xxxx”

Logging: “logging console <>”. By default level-7(debug) to level-0(emergency) are stored.

“logging <ip>”. BY default all messages except level-7 are sent to Syslog server.

**********************************************************************************

Structured troubleshooting: Define problem, gather information, analyse, *eliminate*, propose hypothesis, test hypothesis, resolve problem.

“Shoot from the Hip”: Define problem, gather information, propose hypothesis, test hypothesis, resolve problem.

Approaches:

  1. Top down
  2. Bottom up
  3. Divide and conquer
  4. Follow the path
  5. Spot the differences
  6. Move the problem/components.

**********************************************************************************

Integrating Troubleshooting into Maintenance:

  • Update-to-date documentation.
  • Creating baseline for network performance. How much CPU% is normal?
  • Has anything changed recently before the problem occurred

**********************************************************************************

  • Use output filtering to identify problem. Eg: use “show ip route 1.1.1.1” instead of “show ip route”. Note Default route is not displayed when using show ip route filtering. Or use “show ip route 1.1.1.0 255.255.255.0 longer-prefix” to see all routes above 1.1.1.0/24 networks.
  • Filter show command outputs using {begin|include|exclude| section}.. Also we can use regular expression like “show ip route | include ^C” to show only connected routes.
  • To redirect output to a file in flash/tftp, use “| tee” or “|redirect” command. Tee will display the output in screen and copy the content to remote file. ‘Redirect’ option will only copy the content to remote file.
  • To append outputs to an existing file,use “| append” keyword.
  • Extended ping for options like DF bit,repeat,source, size etc
  • To check whether a TCP port is open use “telnet <IP> <port number>”. Eg: “telnet 1.1.1.1 80” will check whether port 80(HTTP) is opened in 1.1.1.1
  • Hardware diag commands: show inventory, show controllers, show platform, show diag, GOLD (Generic OnLine Diag), TDR (Time Domain Reflectometer)

**********************************************************************************

Following process in TS steps, will be benefited by the use of tool:

  •  Define problem: syslog, event triggered via SNMP, EEM.
  • Gather information: SPAN/RSPAN/ERSPAN. RSPAN cannot cross L3 boundry. Only pass via trunks.
  • Analyze: by SNMP statistics or NetFlow usage. Netflow enabled via “ip flow ingress/egress” command. Replaces old “ip route-cache flow” command. “show ip cache flow” to check active flow pass via router. “ip flow-export” command to export the data from router to collector.
  •  Test Hypothesis: configuration replace or rollback.

**********************************************************************************

Layer-2 forwarding checks:

show mac-address-table

show vlan

show interfaces switchport

show interfaces trunk

show platform forward  <<<< Info from TCAM.

traceroute mac <<<< traceroute to an MAC address, provided CDP is enabled.

**********************************************************************************

STP Check:

Note: Cost of the incoming interface is added, when forwarding the BPDU.

If there is any loop, only packets with DST MAC not available in the mac-address-table will be in loop.

show spanning-tree vlan xxx

show spanning-tree interface xxxxxxx

**********************************************************************************

  • Difference between Router and Multilayer switches:
    • Routers support many interface/media types. Switches support almost only Ethernet.
    • Packet-switching throughput of router is less than MLS.
    • Routers support many features compared to MLS.
  • Control plane troubleshooting is same for both routers and Multilayer switches.
  • Date plane troubleshooting differs. “show ip cef, show adjacency, show arp” on Router. “show platform, show mls cef” on Multilayer switch.
  • MLS can perform
    • Switching between VLANs… “vlan x” to create VLAN database.
    • Routing between VLANs… Using SVI. “int vlan x” and “ip routing” command required.
    • Routing between VLANs and Outer world:
      • Using SVI
      • Using “no switchport” on the interface and thereby making it as “Routing” port.
  • On the routers, subinterfaces are used to connect to the downstream switches via trunk interface. Make sure the native VLAN is matched on both ends.

!

interface GigabitEthernet2/3.10

encapsulation dot1Q 10

ip address 10.10.10.254 255.255.255.0

no ip redirects

end

**********************************************************************************

Troubleshooting FHRP:

HSRP:

–  When the current active routers’ failure is by administrative action (by shutting down interface/reload of router), it sends ‘resign’ message that cause standby router to immediately take over the master ship. NO packet loss for 10 sec hold timer.

– If we add a router with priority greater than the current ‘active’ router and if pre-empt if enabled, this router will send “coup” message and takes over the mastership.

– Always recommended to have some delay before taking mastership so that active router converge its routing protocol and loads its RIB.

VRRP:

– Hello interval is 1 sec. Premption enabled by default. IETF standard. VIP can be same as interface IP.

show {vrrp | glbp | hsrp } brief

**********************************************************************************

Troubleshooting Performance problems in MLS:

-User expectation, Business expectation and technical expectations are used as baseline to identify performance problem.

-Typical scenarios: Duplex mismatch, TCAM limitations, high CPU load.

**********************************************************************************

-To identify packet loss at interface level use: “show int xxx counters” and “show int xxxx counters errors”

  • Align-Err:  Frames which do not have even number of octets and have CRC errors. Possible bad port, cable.
  • FCS-Err:  Valid size but with Frame check sequence errors. Possible bad cable, NIC. Increases on the Full-duplex end, in case of duplex mismatch.
  • Xmit-Err/Rcv-Err: Internal Tx and Rx buffers are full.
  • UnderSize: Frames with less than 64 bytes but with Valid CRC.
  • Single-Col/Multi-Col: Counters to indicate single/multiple collision occurs before transmitting a frame. Duplex-mismatch.
  • Late-Col: Collision detected on a port late in transmission process. Possible Reasons: Duplex-mismatch or Ethernet cable that is too long. Increases on the half-duplex end, in case of duplex mismatch.
  • Excess-Col: Packets dropped due to excessive collision. When a packet has collision for 16 times, this counter is incremented and the packet is dropped.
  • Carri-Sen: Normal on half-duplex. When controller senses the wire and checks if it not busy.

— Runts: Frames less than 64 bytes and with bad CRC. Because of Bad cable or port.

— Giants: Frames Frames greater than 1518 with bad CRC. Reason: Mostly by bad NIC.

— IEEE 802.3 Frame size: 64 bytes to 1518 for non-jumbo Ethernet.

— Running half-duplex is better than duplex-mismatch.

**********************************************************************************

Auto-MDIX:  Automatic Media-dependent interface crossover. It used to automatically detect whether straight/cross-over cable is required and automatically configures the interface.

-If speed and duplex auto-negotiation is disabled, then MDIX will also be disabled.”mdix auto” to enable MDIX. “show interfaces xxx transceiver properties” to check the MDIX.

**********************************************************************************

Troubleshooting TCAM problems:

  • ‘Decision making logic’ in ‘forwarding hardware’ uses high-performance lookup memory called TCAM.
  • Control-plane information like MAC-address table, routing table, PBR, QOS and ACL are programmed in TCAM.
  • Frames cannot forward by TCAM, will be punt to CPU. If there is more packets punt to CPU, then it might affect throughput.
  • Reasons for TCAM to punt packets to CPU:
    • Packets destined to CPU like telnet/SSH/SNMP.
    • MC/BC protocol control packets which sent to CPU in-addition to flooding.
    • If a feature is not supported. Eg: GRE.
  • Due to TCAM limitations, some entries may not be written and hence ‘soft-forwarding’ is performed.
  • To verify TCAM space on Cat 3560, use “show platform tcam utilization”. Check used/max column.
  • TCAM space allocation depends on Switch database Manager (SDM) profile. [ Similar to CAM-profile in Force10]
  • To check TCAM allocation failures for prefixes of specific length, use “show platform ip unicast counts”
  • “sw forwarding” count in the output of “show controllers cpu-interface” indicates the number of packets punt to CPU.
  • To avoid above problems: Perform Route summarization so that number of prefix fit the TCAM space. OR change SDM profile.

**********************************************************************************

Troubleshooting High CPU load:

  • On switches, packets are switched by hardware. Hence CPU load and traffic load are not related. In low-range routers, there exists direct relationship.
  • Check CPU process using “show process cpu sorted”. “CPU utilization for five seconds: x%/%y”. X represents time spent on process & interrupts. Y represents time spent on only interrupts.
  • On switches, if the cycles spent by CPU for interrupts are more than 10%, we may need to investigate the cause as this implies CPU does packet-forwarding. (Interrupts – punt by TCAM)

**********************************************************************************

Troubleshooting Layer-3 connectivity:

  • HDLC does not require any L3-L2 mapping.
  • Data Structures used for Routing: RIB, FIB (doesnt have Protocol information), L3-L2 mapping table and Cisco Express Forwarding adjacency table.
  • “show ip route” for RIB and “show ip cef {exact-route SRC DST}” for FIB table. “exact-route” option to identify which path the router will select in case of equal-cost load-balancing.
  • “show ip arp / show frame map” for L3-L2 mapping table and “show adjacency detail” for adjacency table.

**********************************************************************************

Troubleshooting EIGRP:

  • Data Structures has interface table (active interfaces list), neighbor table (all active EIGRP neighbors), topology table(to save all received routes)
  • Note: Only successors are advertised to neighbors.  “show ip eigrp { interface | neighbor | topology}” . “debug eigrp packets” or “debug ip eigrp”. There is no “debug ip eigrp packet” command!
  • “Neighbor not on common subnet” log message will be displayed if the received SRC IP of EIGRP hello is different from the configured network statement.

**********************************************************************************

Troubleshooting OSPF:

  • Data Structures has interface table (active interfaces list), neighbor table (all active OSPF neighbors), LSDB- Link state database.
  • “show ip ospf { interface | neighbor | database}” . “debug ip ospf {packets | events}”

**********************************************************************************

Troubleshooting Route Redistribution:

  • If OSPF is redistributed into EIGRP, all OSPF routes in routing table in-addition to the ospf enabled interface networks are added into EIGRP topology table.
  • For RIP, EIGRP, the default seed metric is “Unreachable”. Need to specify the metric for redistribution to happen.
  • “show ip route x.x.x.x y.y.y.y” should show both “Advertised via OSFP/EIGRP” and “Advertised by OSPF/EIGRP”. If latter is not displayed, it means redistribution has been configured but the routes are not redistributed as expected.
  • Add “subnet” option when you redistribute from any protocols to OSPF.
  • Interface command to change the OSPF timers to default values: “default ip ospf hello-interval”.
  • “debug ip ospf events” and “debug ip ospf adja” commands useful to identify why neighborship was failed to form.

**********************************************************************************

Troubleshooting BGP:

  • Data Structure: Neighbor table (show ip bgp summary/neighbors) and BGP table (show ip bgp)
  • Split-horizon is enabled. A router will not advertise a path back to the originating router, which it has selected as best path.
  • Check “Advertised to update-group” in the output of show ip bgp, to verify whether we advertise BGP routes to peers.
  • To check neighbor-> update-group relation, use “show ip bgp update-group”
  • “debug ip bgp” to check adjacency state. No such command as “debug ip bgp adja”

**********************************************************************************

Troubleshooting Performance Problems on Routers:

  • Some process responsible for high CPU: Arp input process (originating ARP), net background process(When an interface needs some buffer but nor available, this process create buffers from main buffer pool), IP backgroup process (to change interface status, IP, encapsulation), TCP timer process (TCP sessions on Router)
  • “Show processes cpu history”. Graphical view about CPU usage for last 60 secs/60 min/72 hrs.
  • “Show ip interface xxx” details about which switching method is enabled on that interface. “ip route-cache” enables Fast-switching (for unicast).
  • CEF should be enabled for NBAR, Auto-Qos, MQC-FRTS, MPLS, QOS, CBWRED and other features to work. “ip route-cache cef” on interface mode.
  • “Show ip cache” to display fast-switching cache. “show ip cef” to display CEF entries. “show {ip} cef adjacency <interface> <NH>” to see adjacency table.
  • “show memory allocating-process totals” to check memory allocated to each process, Total/used memory. To check any memory leak: “Show process memory”
  • When we notice “Input queue: 76/75(current/max)” in the output of “show interface”, it indicates memory leak problem. (Wedged Interface). Incoming packets are dropped on this interface. IOS bug. Upgrade/reload is the solutions.
  • “Show buffers” might give clue about memory leak. “Free list” count will be very less compared to “total” count.
  • “show diag” command to display DRAM memory available on routers’ linecard.

**********************************************************************************

Troubleshooting Security Features:

  • Options to secure management plane, control plane and data plane.
  • On some devices, “no service password-recovery” command is available which blocks successful recovery of configuration and password.
  • Cisco IOS firewall. “ip inspect name <> tcp/udp/http” and then apply “ip inspect <> out” interface connecting to internet. “show ip inspect all”, “debug ip inspect”
  • Zone based Firewall: zones are created and matched packets from one zone to another can be denied/passed/inspect. “show zone security”, “show zone-pair”
  • AAA: “debug tacacs”, “debug aaa accounting”
  • TACACS: Separates AAA as each process. Uses TCP. Encrypt entire packet. Limited accounting. It was developed by Cisco.
  • RADIUS: Combines authentication and authorization. Uses UDP. Encrypt only password. Extensive accounting. Open Standard.
  • Cisco Secure access Control System (ACS): Web-based GUI to authenticate users via AAA.
  • RSA key can be generated using “crypto key generate rsa”. Need to save configuration to write this key in NVRAM. Can’t see in running config. Key is generated using hostname and domain name.
Advertisements
This entry was posted in IOS and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s