Research applications perform up to 60 percent faster on Purdue University's 10 Gigabit Ethernet network.
Business Challenge
Founded in 1869, Purdue University offers undergraduate and graduate degrees and is one of the leading research institutions in the United States. Faculty have been recognized for their work on diverse research topics, such as climate modeling, DNA sequencing, geoengineering techniques to lessen the effects of global warming, and nanotechnology.
To meet constantly increasing needs for computational capacity, Purdue's central IT organization (ITaP) and faculty researchers work in partnership to build new server clusters, usually once a year. ITaP provides and maintains the racks, networking, and storage, and faculty members purchase their own servers. When faculty make a request to purchase or temporarily borrow additional server nodes, the IT department provisions them within four hours. "This model allows faculty to buy computation for their average needs instead of their peak needs," says John Campbell, associate vice president of ITaP's Rosen Center for Advanced Computing, Purdue University.
In 2009, ITaP began planning the "Coates Cluster" (named after Ben Coates, former head of Purdue's electrical engineering department), based entirely on 10 Gigabit Ethernet servers. With 1280 nodes, it would be the world's largest academic cluster with all 10 Gigabit Ethernet. To handle huge traffic volumes, the cluster needed a cost-effective 10 Gigabit Ethernet switch platform.
"We didn't want to require our researchers to recompile their applications. The only difference now is that they submit jobs to a different queue and experience better performance."
- John Campbell, Associate Vice President of the Rosen Center for Advanced Computing, Purdue University
Solution and Results
The IT department chose the Cisco® Nexus platform after comparing its price-performance with other networking alternatives. "We needed a platform that would deliver the performance our researchers require and enable us to scale more cost-effectively," says Campbell.
The new cluster consists of 40 server racks, each connected over lossless 10 Gigabit Ethernet to a Cisco Nexus 5000 Switch at the top of the rack. The Cisco Nexus 5000 Switches connect to dual Cisco Nexus 7000 Switches at the core, and also to Network File System (NFS)-mounted storage. Major benefits of the Cisco Nexus platform for Purdue University's new cluster include:
• High Performance: The Cisco Nexus platform provides the combination of high bandwidth and very low latency required by compute-intensive applications. "A mechanical-engineering faculty member reported that her application performed 60 percent faster after it was moved to the new cluster," Campbell says. "The gain is attributable to the increased network bandwidth, because the difference in processing power in the new cluster is negligible."
• Avoiding Code Changes: Researchers' existing applications can operate over the 10 Gigabit Ethernet network without modification. "We didn't want to require our researchers to recompile their applications," says Campbell. "The only difference now is that they submit jobs to a different queue and experience better performance."
• Familiar Management Interface: The university's core network includes Cisco equipment, so IT staff already have experience with Cisco hardware. They needed no additional training to use the Cisco NX-OS.
• Simplified IT Organization: The IT department has staff that specialize in high-performance computing, networking, and storage. "Now the networking group manages all networking on the cluster, freeing up the HPC [high-performance computing] group to focus on optimizing application performance," Campbell says.
ITaP is currently evaluating server virtualization, with the goal of adding checkpointing to applications that do not inherently support it. One application used for chemistry research, for example, is used for processes that take up to 30 days, making it difficult to schedule jobs between maintenance activities. Checkpointing the virtual machine will allow the researcher to start anytime, and continue where the process left off before being interrupted for maintenance. The Cisco Nexus platform offers the bandwidth to support virtualization.