I/O Scheduling for SAN and Virtualization

by Martin Monperrus
I recently encountered performance and reliability problems using iSCSI. One of the most efficient solutions was to set the I/O scheduler to "noop". Noop is the simplest elevator/scheduler in Linux. It is based on FIFO -- first request in, first out. It does no re-ordering of I/O requests but may merge I/O requests for optimization.

Noop seems to be the best IO scheduler for Storage Area Networks (SAN, e,g, iSCSI) because 1) it does not make any assumption on the underlying physical device, and 2) contrary to other I/O schedulers, it does not make re-orderings that can compete or clash with the underlying intelligent I/O controllers (NCQ, HBA, etc.). Noop also seems appropriate to virtualized environments and RAID devices for the same reasons.

To set the scheduler to "noop":
$ echo noop > /sys/block/sda/queue/scheduler

See also: scheduler queue size and resilience to heavy IO

Main sources on this topic:
"The NOOP scheduler is a simple FIFO queue and uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O. It assumes performance of the I/O has been or will be optimized at the block device (memory-disk) or with an intelligent HBA or externally attached controller." (Choosing an I/O Scheduler for Red Hat® Enterprise Linux® 4 and the 2.6 Kernel)

"If the no-op I/O scheduler finds that the new request cannot be merged into the existing request because it is not near enough, the scheduler looks for a place within the queue between existing requests. If the new request calls for I/O to sectors between existing requests it is inserted into the queue at the determined position. If there are no places the request can be inserted, it is placed on the tail of the request queue." (The Linux Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures)

"The main usage of the noop scheduler revolves around non disk-based block devices (such as memory devices [sic SSD's, flash disk]), as well as specialized software or hardware environments that incorporate their own I/O scheduling [sic SAN] and (large) caching functionality [sic SAN or caching RAID controller], and therefore require only minimal assistance from the kernel. Therefore, in large I/O subsystems that incorporate RAID controllers and a vast number of contemporary physical disk drives (TCQ drives), the noop scheduler has the potential to outperform the other 3 I/O schedulers as the workload increases." (Enhancements to Linux I/O Scheduling, found in this excellent post at Sybase)

"The Noop scheduler is implemented using a simple FIFO queue and performs only basic merging and sorting. It assumes that I/O performance will be optimized in the block device. It is particularly appropriate for devices such as Storage Area Networks (SANs) that have much more knowledge of the underlying physical devices than the operating system." (Does Virtualization Make Disk Scheduling Passé?)

"By the way, a similar argument applies to virtualized guest operating systems running under a host OS. The host is probably in a better position to sort I/O than the guests, so they should also just use the noop I/O elevator." (Linux i/o elevators considered harmful)

"Redhat suggests noop in virtualized environments. This leaves i/o scheduling to ESX and/or your storage. Your guest knows very little about the actual disk layout; it's many layers of virtualization deep. Makes little sense to have your guest try to optimize in this situation." (iSCSI, Linux I/O scheduler, VMFS, citing http://kbase.redhat.com/faq/docs/DOC-5428)

"According to a citation in Dell Equallogic (2009), the Open-iSCSI group reports that sometimes the NOOP scheduler works best for iSCSI server environments." (Red Hat Enterprise Linux 5.x iSCSI and Device Mapper Multipath HOWTO).

Tagged as: