After announcing it had cooked up its own fast data center switch called Wedge in 2014, Facebook last week announced its engineers have now designed an even faster, next-generation switch. The first Wedge was a 40 Gigabit Ethernet switch, but the second one pushes 100 Gigs – an extremely high bandwidth for a switch of its type.
In addition to its bandwidth, however – you can get a 100-Gig switch from the likes of Cisco or Juniper – one of Wedge’s distinctive features is that it is software-agnostic. It is the first time Facebook applied its philosophy of disaggregation, used in its custom server designs, to networking. Wedge disaggregates networking hardware from networking software, which is something incumbent data center network vendors have started doing only recently.
The point of disaggregation is being able to advance individual system components independently of the system. The first step toward disaggregating the network, separating hardware from software, was meant to “spur the development of more choices for each,” Facebook engineers wrote in a blog post last year, when the company announced the first Wedge switch and FBOSS, its Linux-based network operating system that allows its data center operators to manage network switches using tools that are similar to the tools they use to manage compute and storage.
Now, as the company designed its second Wedge, as well as the Wedge-based high-capacity aggregation switch, called Six Pack, it is seeing the benefits of that disaggregation come to life. It is designing more powerful hardware independently of software.
The same FBOSS software runs on Wedge 40, Six Pack, and Wedge 100, Jay Parikh, Facebook’s VP of engineering, said at the Structure conference in San Francisco last week. Switch software development is at Facebook is now on a separate cycle from switch hardware development, he said.
The company developed FBOSS to handle the weekly rate of feature roll-outs and bug fixes, Facebook engineers Zhipping Yao and Jasmeet Bagga, wrote in a blog post last week. One of the key capabilities that support that is being able to update thousands of switches without traffic loss that could lead to outages.
The custom-built tool that deploys weekly FBOSS software updates across the network is called fbossdeploy and modeled after the company’s software load balancer Proxygen. It deploys code to a small set of switches first, and the team monitors them for problems before deploying at full scale.
The software stack running on a Wedge switch is fairly similar to the one running on compute servers in Facebook data centers. Both have the Linux kernel, system tools and libraries, monitoring daemons, and configuration management. In addition to those components, the network stack has a routing daemon and an FBOSS agent.
There are now thousands of Wedge 40 switches running in production in Facebook data centers, and the goal is to have Wedge switches replace every top-of-rack switch in the company’s infrastructure. In an emailed statement, the company also said it has “hit the limits of what 40 Gbps switches can handle,” so the next step is to complete and start deploying the 100-Gig, 32-port Wedge.