Ernel and file locks. The processors without SSDs sustain web page caches
Ernel and file locks. The processors devoid of SSDs preserve web page caches to serve applications IO requests. IO requests from applications are routed towards the caching nodes by way of message passing to cut down remote memory access. The caching nodes maintain message passing queues in addition to a pool of threads for processing messages. On completion of an IO request, the data is written back towards the location memory directly then a reply is sent for the issuing thread. This design opens possibilities to move application computation for the cache to decrease remote memory access.NIHPA Author Manuscript NIHPA Author Manuscript NIHPA Author ManuscriptICS. Author manuscript; available in PMC 204 January 06.Zheng et al.PageWe separate IO nodes from caching nodes so as to balance computation. IO operations require considerable CPU and operating a cache on an IO node overloads the processor and reduces IOPS. This is a design choice, not a requirement, i.e. we can run a setassociative cache on the IO nodes at the same time. Inside a NUMA machine, a big fraction of IOs need remote memory transfers. This takes place when application threads run on other nodes than IO nodes. Separating the cache and IO nodes does increase remote memory transfers. Having said that, balanced CPU utilization makes up for this impact in overall performance. As systems scale to more processors, we expect that couple of processors will have PCI buses, that will raise the CPU load on these nodes, so that splitting these functions will continue to be advantageous. Message passing creates numerous tiny requests and synchronizing these requests can come to be highly-priced. Message passing could block sending threads if their queue is full and receiving threads if their queue is empty. Synchronization of requests frequently requires cache PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/26991688 line invalidation on shared information and thread rescheduling. Frequent thread rescheduling wastes CPU cycles, stopping application threads from getting enough CPU. We reduce synchronization overheads by amortizing them over larger messages.NIHPA Author Manuscript NIHPA Author Manuscript NIHPA Author Manuscript5. EvaluationWe conduct experiments on a nonuniform memory architecture machine with 4 Intel Xeon E54620 processors, clocked at two.2GHz, and 52GB memory of DDR3333. Each processor has eight cores with hyperthreading enabled, resulting in 6 logical cores. Only two processors inside the machine have PCI buses connected to them. The machine has 3 LSI SAS 9278i host bus adapters (HBA) connected to a SuperMicro storage chassis, in which 6 OCZ Vertex 4 SSDs are installed. In addition to the LSI HBAs, there is certainly 1 RAID controller that connects to disks with root filesystem. The machine runs Ubuntu Linux two.04 and Linux MedChemExpress Ponkanetin kernel v3.two.30. To evaluate the best functionality of our method design with that of your Linux, we measure the system in two configurations: an SMP architecture making use of a single processor and NUMA using all processors. On all IO measures, Linux performs best from a single processor. Remote memory operations make applying all 4 processors slower. SMP configuration: 6 SSDs connect to a single processor by way of two LSI HBAs controlling eight SSDs every. All threads run around the exact same processor. Information are striped across SSDs. NUMA configuration: 6 SSDs are connected to two processors. Processor 0 has five SSDs attached to an LSI HBA and one through the RAID controller. Processor has two LSI HBAs with 5 SSDs every. Application threads are evenly distributed across all four processors. Information are distributed.