High performance networks for the ATLAS Tier-1 @ TRIUMF
Payne, C., Deatrich, D., Liu, S., McDonald, S., Tafirout, R., Walker, R., Wong, A. and Vetterli, M. (2008) High performance networks for the ATLAS Tier-1 @ TRIUMF. In: 22nd International Symposium on High Performance Computing Systems and Applications, HPCS 2008, 9 - 11 June, Quebec, Canada pp. 161-166.
*Subscription may be required
Networking at an ATLAS Tier-1 (T1) facility is a demanding aspect which is vital to the overall performance and efficiency of the facility. External connectivity of the facility to other tiers of the Large Hadron Collider Optical Private Network (LHCOPN) is largely via dedicated lightpaths as required to meet Memorandum of Understanding (MOU) commitments. Our primary dedicated link to CERN has an independent, although smaller capacity, dedicated backup link for redundancy. Dedicated lightpaths to the Canadian Tier-2 facilities, and the international partner Tier-1 facilities failover to national and international research networks in the event of failure. The distance between TRIUMF and CERN, and even TRIUMF to some of it's Tier-2 facilities in Canada is thousands of kilometers. Transferring data at the hundreds of terabytes of data level (per year) over such distances and complex networks requires both dedicated bandwidth and network resiliency. Failure scenario, including both failover and fail-back, must be handled efficiently and for the most part automatically. Although modern network routing protocols handle this well, monitoring processes become key to management of the infrastructure as the complexity of our Tier-1 site connectivity grows. Internal networking efficiency is also vital to the ATLAS computing model. Large data sets are moved from onsite storage to local scratch disks before analysis, and proper network scaling is vital for efficient use of compute nodes. Although 10 Gigabit network infrastructure (routers) is well established, server 10 Gigabit network components and drivers are not as mature as their 1 Gigabit counterparts, and resource issues have been observed during extreme load testing. In this paper, internal facility networking, as well as external connectivity issues will be discussed.
|Publication Type:||Conference Paper|
|Copyright:||© 2008 IEEE.|
|Item Control Page|