Hello there! It has been a long time since my last post. Currently I am dealing with EVPN VxLAN and managed it to run a demonstration on my small modest virtual lab environment.

One of my first posts in this blog was about how to run VxLAN over an IPSEC Tunnel using a Fortigate firewall. This setup has some limitations when it comes to scalability or performance. In this case we are gonna see a very simple design and implementation using a spine & leaf architecture and unicast to replicate our data across the network.

There are still a lot of legacy DC running spanning-tree out there. It is not a crime doing it, since the network should meet customer requirements without adding unnecessary complexity. But… imagine you are running a financial network and your network flow relies on STP. You will face two challenges:

  1. The convergence time between failure and recovery are possibly to high.
  2. You will have some interfaces in the blocking state, which will impede you to use all the BW available in the network. More precisely you will end up with a lot of unused ports.

Let us see how we come solve some of this issues using a spine and leaf architecture. There we go with the topology…

Spine and Leaf and its scalability

In a spine and leaf architecture is much easier to expand your datacenter, because the devices are always in two layers and therefore the same amount of hops away like the other network devices in the network. Spines are running independent of other spines. Wehenver you add a new one, you only need to connect all the desired leafs to it.

It is easy. Whenever you need more bandwidth in your network, you should expand your spines. This will lead into more uplinks from your leafs and therefore additional bandwidth to send data over these new links.

Whenever you need more capacity you will need to add more leafs to attach more hosts to your network.

Partial- vs. Full Mesh

We can observe three spines and three leafs distributed along two data centers. Normally when you google for these kind of designs you will mostly (98% of the time) see a full mesh topology. But it does not mean that this is the right way to build it.

Let us discuss about my first choice in this design. I have been teached that there is no „best“ way to build a network. When it comes to design, our choice has to meet the customer requirements.

Choosing a partial mesh topology will reduce the amount of links between data centers, which will lead to reduce costs, without loosing the reliability of the network, since we can still loose SPINE-1 & 3 and LEAF-101 will be still able to communicate with all the leafs in the network.

Another important point to mention is, that choosing this topology could prevent the increase of round-trip time in case two application running on the same data center but different leaf would like to communicate eachother. In a full mesh topology no one will prevent the packet be sent across a spinein a remote DC since they are always 1 hop far away from each other. This is very critical when it comes to financial networks that are very sensitive to latency.

Choosing my uplinks

It does not matter which topology you choose, when it comes to redundancy/reliability, you should always know the mapping of the interfaces to the ASICs. Two uplinks comming from the same ASIC wont offer redundancy/reliability in case there is a problem with this specific ASIC.

Sadly I do not have any hardware to demonstrate it. But according to the official documentation running „show interface hardware-mappings“ in the console will show us the mapping of the interfaces.

IP-management

I configured the interfaces in a way that my scalalability is limited to 154 leafs or 99 spines, which could lead into many issues in case we have a larga datacenter. Its better you assign continuous a /31 or /30 range. Here is an example of the way I did it for better understanding of every link.

Interfaces: 10.LEAF-ID.SPINE-ID.0/30
Loopback: 10.0.0.DEVICE-ID

Example: LEAF-101 interface towards SPINE-1
On leaf
10.101.1.1 255.255.255.252

On Spine
10.101.1.2 255.255.255.252

Configuration

Features that need to be enabled on spine switches.

feature ospf

Features that need to be enabled on leafs switches.

feature ospf
feature nv overlay
feature vn-segment-vlan-based

First of all I established IP-Connectivity between all devices using my previous mentioned IP-Scheme.

IP-connectivity

SPINES (Example on SPINE-1)

int e1/1
no switchport
ip address 10.101.1.2/30
no shut
ip router ospf 1 area 0.0.0.0
ip ospf network point-to-point

int e1/2
no switchport
ip address 10.102.1.2/30
no shut
ip router ospf 1 area 0.0.0.0
ip ospf network point-to-point

interface loopback1
ip address 10.0.0.1/32
ip router ospf 1 area 0.0.0.0

router ospf 1
router-id 10.0.0.1

LEAFS (Example of LEAF-102)

feature nv overlay
feature vn-segment-vlan-based
feature ospf

int e1/1
no switchport
ip address 10.102.1.1/30
no shut
ip router ospf 1 area 0.0.0.0
ip ospf network point-to-point

int e1/2
no switchport
ip address 10.102.2.1/30
no shut
ip router ospf 1 area 0.0.0.0
ip ospf network point-to-point

int e1/3
no switchport
ip address 10.102.3.1/30
no shut
ip router ospf 1 area 0.0.0.0
ip ospf network point-to-point

interface loopback1
ip address 10.0.0.102/32
ip router ospf 1 area 0.0.0.0

router ospf 1
router-id 10.0.0.102

Creating VLAN 10 and adding it to VxLAN 10002 (Only on leafs)

vlan 10
vn-segment 10002

Creating a NVE (Network Virtual Interface) – A logical interface where the encapsulation and de-encapsulation occur. Summarized… creating an interface where the magic happens.

interface nve1
no shut
source-interface loopback1
member vni 10002
ingress-replication protocol static
peer-ip 10.0.0.101
peer-ip 10.0.0.103

By configuring the interface with the „ingress-replication protocol static“ statement we are using unicast and we will tell the switch that the traffic for the segment defined above (member vni 10002) should be propagated out there to the peers specified below this statement (peer-ip).

Let us test it. I will ping from R7 (192.168.10.7) to R8 (192.168.10.8), which are in different places but sharing the same VLAN.

R7#ping 192.168.10.8
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.8, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 7/7/9 ms
R7#

Let us now do a continuos ping and shut down some interfaces on the leafs towards spines.

R7#ping 192.168.10.8 repeat 1000000000
Type escape sequence to abort.
Sending 1000000000, 100-byte ICMP Echos to 192.168.10.8, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.
Success rate is 99 percent (2511/2513), round-trip min/avg/max = 6/11/281 ms
R7#

While pinging I took the interface towards SPINE-2 and only one ping has been lost.

We can see that the network has been recovered very quickly.

Conclusion

Using unicast is not a bad idea if you have a small datacenter. Otherway the number of peering will increase with the number of leafs and whenever a new leaf has been added, you will have to re-configure all the other leafs in order to be able to peer with the new one.

There are still more features that need to be enabled to be able to perform a faster failover, even if it looks great or more precisely better than with STP.

Thanks for reading and I hope it was helpful for whoever has read it. Cheers…

Hinterlasse einen Kommentar

Diese Seite verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden..