Auto Draft

Cisco IOS-XE IP SLA and Object Tracking: Automated Dual-ISP Failover

The Problem With Static Failover

Most dual-ISP configurations rely on a floating static route: a lower-priority default route through the secondary ISP that only activates when you manually remove the primary. That works fine — until your primary ISP’s physical link stays up but loses internet reachability. BGP flap, prefix withdrawal, upstream fiber cut. The interface stays green. The floating static never kicks in. Users lose internet access while every dashboard shows the link as healthy.

Cisco IOS-XE’s IP Service Level Agreements (IP SLA) engine solves this by probing actual connectivity through the primary path and driving automated failover via Object Tracking. When the probe fails, a tracking object changes state, the primary route is removed from the RIB, and the floating static installs — all without human intervention. This guide walks you through building a production-grade dual-ISP failover system on IOS-XE from scratch.

IP SLA Fundamentals

IP SLA is a built-in active monitoring framework that generates synthetic traffic to measure real network performance. For failover, four probe types matter:

  • ICMP Echo — Pings a target. Fast and low-overhead, but blocked by firewalls that filter ICMP.
  • TCP Connect — Opens a TCP connection to a specific port. More reliable than ICMP since most services require TCP and it’s rarely filtered at the transport layer.
  • HTTP — Performs an actual HTTP GET. Best for validating the full path including routing and DNS.
  • UDP Jitter — Measures latency, jitter, and packet loss. Use for VoIP and real-time application SLAs.

For ISP failover, a TCP Connect probe to a reliable destination (DNS server, CDN endpoint) gives the best signal-to-noise ratio — it validates end-to-end reachability and is hard to inadvertently block.

Scenario: Dual-ISP Failover on an ASR 1001-X

Here’s the topology we’ll build:

  • ISP-A (primary): GigabitEthernet0/0/0, next-hop 203.0.113.1, administrative distance 5
  • ISP-B (secondary): GigabitEthernet0/0/1, next-hop 198.51.100.1, administrative distance 10
  • Probe target: 8.8.8.8 TCP port 53 (Google DNS over TCP — reliable, globally reachable)
  • Failover trigger: Probe fails for 15 seconds → tracking object transitions Down → secondary route installs

Step 1: Configure the IP SLA Probe

Router# configure terminal

! Define IP SLA probe #10 — TCP connect to Google DNS on port 53
Router(config)# ip sla 10
Router(config-ip-sla)# tcp-connect 8.8.8.8 53 source-interface GigabitEthernet0/0/0
Router(config-ip-sla-tcp)# threshold 3000
Router(config-ip-sla-tcp)# timeout 3000
Router(config-ip-sla-tcp)# frequency 10
Router(config-ip-sla-tcp)# exit

! Schedule the probe to start immediately and run forever
Router(config)# ip sla schedule 10 life forever start-time now

Router(config)# end

Key parameters:

  • source-interface GigabitEthernet0/0/0 — Forces the probe out ISP-A’s interface. Critical: without this, the probe may route out the secondary link and report success even when ISP-A is down.
  • threshold 3000 — Round-trip time above 3000ms counts as a violation (milliseconds).
  • timeout 3000 — Probe times out after 3 seconds with no response.
  • frequency 10 — Run the probe every 10 seconds.

Verify the probe is running:

Router# show ip sla statistics 10
IPSLAs Latest Operation Statistics

IPSLA operation id: 10
        Latest RTT: 12 milliseconds
Latest operation start time: 14:32:07.412 UTC Wed Jul 2 2026
Latest operation return code: OK
Number of successes: 1247
Number of failures: 0
Operation time to live: Forever

Return codes to watch for: OK means the probe succeeded. Timeout means the destination didn’t respond within the timeout window. ConnectionLoss means TCP SYN sent but no SYN-ACK received — the probe got to the target but the port is unreachable, possibly indicating a host issue rather than an ISP outage.

Step 2: Create a Tracking Object

A tracking object watches the IP SLA result and exposes a binary Up/Down state that other IOS-XE features can consume. This decouples probe logic from routing logic.

Router# configure terminal

! Track object 1 follows IP SLA 10's reachability state
Router(config)# track 1 ip sla 10 reachability

! Dampening: wait 15s before declaring Down, 10s before declaring Up
! Prevents route flapping from transient probe failures
Router(config-track)# delay down 15 up 10
Router(config-track)# exit

Router(config)# end

! Verify tracking object state
Router# show track 1
Track 1
  IP SLA 10 Reachability
  Reachability is Up
    1 change, last change 00:22:41
  Delay up 10 secs, down 15 secs
  Latest operation return code: OK
  Latest RTT (millisecs) 12
  Tracked by:
    Static IP Routing 0

The delay parameters are critical in production. Without dampening, a single probe failure immediately withdraws the primary default route — any transient packet loss causes an unnecessary failover. With delay down 15, the probe must fail continuously for at least 15 seconds before the tracking object transitions to Down.

Step 3: Tie Routes to the Tracking Object

Router# configure terminal

! Primary default route via ISP-A — only in RIB when track 1 is Up
Router(config)# ip route 0.0.0.0 0.0.0.0 203.0.113.1 5 track 1

! Floating static via ISP-B — no track, always present at AD 10
! Becomes best route only when the AD 5 tracked route is withdrawn
Router(config)# ip route 0.0.0.0 0.0.0.0 198.51.100.1 10

Router(config)# end

! Verify routing table — primary route should be active
Router# show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 5, metric 0
  Routing Descriptor Blocks:
  * 203.0.113.1, via GigabitEthernet0/0/0
      Route metric is 0, traffic share count is 1

Step 4: Test Failover

Simulate an ISP-A outage by adding a null route to the probe destination, forcing the probe to time out:

! Block the probe target to simulate ISP-A losing internet reachability
Router(config)# ip route 8.8.8.8 255.255.255.255 Null0

! After 15-20 seconds, check tracking object and routing table
Router# show track 1
Track 1
  IP SLA 10 Reachability
  Reachability is Down
    2 changes, last change 00:00:18
  Latest operation return code: Timeout

Router# show ip route 0.0.0.0
  * 198.51.100.1, via GigabitEthernet0/0/1    ← secondary is now active

! Restore primary path
Router(config)# no ip route 8.8.8.8 255.255.255.255 Null0

! After 10 seconds (up delay), track 1 returns Up and primary reinstalls
Router# show ip route 0.0.0.0
  * 203.0.113.1, via GigabitEthernet0/0/0    ← back to primary

Enhanced Object Tracking: AND/OR Boolean Logic

What if you want failover only when both probe targets fail, not just one? Enhanced Object Tracking (EOT) lets you build boolean logic across multiple tracking objects, eliminating false positives from a single unreachable probe host.

! Second probe — TCP to Cloudflare DNS as a backup verification
Router(config)# ip sla 11
Router(config-ip-sla)# tcp-connect 1.1.1.1 53 source-interface GigabitEthernet0/0/0
Router(config-ip-sla-tcp)# timeout 3000
Router(config-ip-sla-tcp)# frequency 10
Router(config-ip-sla-tcp)# exit
Router(config)# ip sla schedule 11 life forever start-time now

! Track object 2 follows probe 11
Router(config)# track 2 ip sla 11 reachability
Router(config-track)# delay down 15 up 10
Router(config-track)# exit

! Track object 10 = track 1 AND track 2 (BOTH must fail to declare Down)
Router(config)# track 10 list boolean AND
Router(config-track-list)# object 1
Router(config-track-list)# object 2
Router(config-track-list)# exit

! Replace single-probe route with combined tracking object
Router(config)# no ip route 0.0.0.0 0.0.0.0 203.0.113.1 5 track 1
Router(config)# ip route 0.0.0.0 0.0.0.0 203.0.113.1 5 track 10

Router(config)# end

Router# show track 10
Track 10
  List Boolean AND
  Boolean AND is Up
    2 objects, 2 up, 0 down
  object 1 Up
  object 2 Up

Now the primary route only withdraws when both Google DNS and Cloudflare DNS are unreachable via ISP-A — a strong signal that ISP-A genuinely lost connectivity rather than a single probe host being temporarily unavailable.

Logging Failover Events with EEM

IP SLA tracks current state but doesn’t log historical events by default. Add EEM applets to capture failover events in syslog. The Cisco EEM automation guide covers more complex event-driven scripts, but for ISP failover logging these two applets are all you need:

! Log to syslog when ISP-A fails over to ISP-B
Router(config)# event manager applet ISP_FAILOVER_DOWN
Router(config-applet)# event track 10 state down
Router(config-applet)# action 1.0 syslog priority critical msg "ISP-A FAILOVER: Primary path down, switching to ISP-B"
Router(config-applet)# action 2.0 cli command "show ip route 0.0.0.0"
Router(config-applet)# exit

! Log when primary path recovers
Router(config)# event manager applet ISP_FAILOVER_UP
Router(config-applet)# event track 10 state up
Router(config-applet)# action 1.0 syslog priority warning msg "ISP-A RECOVERY: Primary path restored, returning to ISP-A"
Router(config-applet)# end

These syslog messages feed directly into your SIEM or LibreNMS alerting pipeline — failover events are logged automatically without anyone logging into the router.

Advanced: UDP Jitter Probes for WAN Performance

Beyond failover, IP SLA UDP Jitter probes give you continuous latency, jitter, and packet loss data for WAN circuits and VoIP paths. This requires an IP SLA responder on the far-end device.

! On the far-end router: enable the IP SLA responder
Far-Router(config)# ip sla responder

! On the local router: configure UDP Jitter probe
Router(config)# ip sla 20
Router(config-ip-sla)# udp-jitter 10.0.2.1 5000 num-packets 10 interval 20
Router(config-ip-sla-jitter)# frequency 60
Router(config-ip-sla-jitter)# exit
Router(config)# ip sla schedule 20 life forever start-time now

Router# show ip sla statistics 20 details
IPSLA operation id: 20
Type of operation: udp-jitter
        Latest RTT: 4 milliseconds
Latest operation return code: OK
RTT Values:
        RTT Min/Avg/Max: 3/4/6 milliseconds
Latency one-way time:
        Source to Destination Latency Min/Avg/Max: 1/2/3 milliseconds
        Destination to Source Latency Min/Avg/Max: 1/2/3 milliseconds
Jitter Time:
        Source to Destination Jitter Min/Avg/Max: 0/1/2 milliseconds
Packet Loss Values:
        Loss Source to Destination: 0
        Loss Destination to Source: 0

Export these statistics via SNMP (OID tree 1.3.6.1.4.1.9.9.42) into Grafana for long-term WAN performance trending. When jitter or packet loss crosses your SLA threshold, a tracking object tied to a jitter measurement probe can trigger automatic QoS policy changes or an EEM alert.

Verification Cheat Sheet

Command What It Shows
show ip sla configuration 10 Probe parameters — type, target, frequency, timeout
show ip sla statistics 10 Latest result, RTT, success/failure counts
show ip sla statistics 10 details Jitter, per-direction latency, packet loss (UDP Jitter)
show track 1 Tracking object state and last change timestamp
show track brief All tracking objects and current states at a glance
show ip route 0.0.0.0 Which default route is currently active in RIB
debug ip sla trace Real-time probe trace (use carefully in production)

Common Pitfalls

No source-interface on the probe: Without source-interface, the probe may egress via the secondary ISP and return success even when ISP-A is completely down. Always source probes from the interface you’re monitoring.

Firewall blocking the probe: If you use ICMP Echo probes, upstream ACLs or provider firewalls may silently drop them. Switch to TCP Connect on port 53, 80, or 443 — much harder to inadvertently block.

Missing delay dampening: Without delay down X up Y, a single probe timeout immediately withdraws the primary route. Add dampening in production — a 15-second down delay costs nothing during a real outage but prevents dozens of unnecessary failover events per month from transient packet loss.

Forgetting to schedule: Configuring ip sla 10 without ip sla schedule 10 life forever start-time now means the probe never starts. Verify with show ip sla statistics — if the operation count isn’t incrementing, check the schedule.

Final Thoughts

IP SLA with Object Tracking transforms dual-ISP failover from a manual procedure into an automated, probe-verified system that responds to real connectivity loss — not just physical link state. The combination of TCP Connect probes, boolean AND tracking logic, dampening delays, and EEM logging gives you failover that’s sensitive enough to catch real outages but resilient enough to ignore transient probe jitter.

This same pattern scales beyond ISP failover: track MPLS path availability, VPN tunnel reachability, or data center connectivity. Pair it with the Python network automation toolkit to validate failover state across a fleet of routers automatically, and you’ve got a self-healing WAN edge that operates the same whether it’s 2 p.m. or 2 a.m. For a broader look at how IOS-XE builds a security foundation on top of this kind of automated policy, the Control Plane Policing guide covers how to protect the routing engine itself while your SLA probes keep tabs on the data plane.

Enjoying this post?

Get more guides like this delivered straight to your inbox. No spam, just tech and trails.