Date of Award
Master of Science (MS)
Ronald A. Marsh
The ability to gather data from many types of new information sources has grown quickly using new technologies. The ability to store and retrieve large quantities of data from these new sources has created a need for computing platforms that are able to process the data for information. High Performance Computing Cluster systems have been developed to fulfill a role required for fast processing of large amounts of data for many difficult types of computing applications.
Beowulf Clusters use many separate compute nodes to create a tightly coupled parallel HPCC system. The ability for a Beowulf Cluster HPCC system to process data depends on the ability of the compute nodes within the HPCC system to be able to retrieve data, share data, and store data with as little delay as possible. With many compute nodes competing to exchange data over limited network connections, network congestion can occur that can negatively impact the speed of computations.
With concerns about network performance optimization, and uneven distribution of computational capacity, it is important for Beowulf HPCC System Administrators to be able to evaluate real-time data transfer metrics for congestion within a particular HPCC system. In this thesis, Heat-Maps will be created to identify potential issues with Infiniband network congestion due to simultaneous data exchanges between compute nodes.
Aguilar, Michael James, "Identifying Data Exchange Congestion Through Real-Time Monitoring Of Beowulf Cluster Infiniband Networks" (2015). Theses and Dissertations. 1858.