Scheduler queue size and resilience to heavy IO

by Martin Monperrus
My server regularly stalled, freezed and sometimes crashed under heavy write accesses (open-iscsi on root filesystem). I solved the problem by changing the scheduler and increasing the maximal I/O queue size (nr_requests).
$ echo noop > /sys/block/sda/queue/scheduler 
$ echo 1024 > /sys/block/sda/queue/nr_requests
While the scheduler effect is recognized by authoritative sources (see this post), the queue size effect is less known. Appart from Yoshinori Matsunobu's post: "Linux I/O scheduler queue size and MyISAM performance", few sources discuss this effect. As far as I understand, increasing the queue size has three effects: first it puts the pending requests into RAM (which is OK if you don't need to read after write), second it augments the probability of merging requests and third it improves their ordering.

Note: to get the current I/O queue size, look at the field avgqu-sz of $ iostat -x

Warning: if the queue is full, the total memory taken by pending requests is bounded by 2*nr_requests*max_sectors_kb, so be careful to keep this limit reasonable to avoid out-of-memory errors.

Sources
-------

Suppose 100,000 random read I/O requests are coming. When I/O scheduler queue size is 128 (default in many cases), TCQ gets I/O requests by almost random order, so pretty high disk seek overhead happens for each action (requests within single queue is dispersed). When I/O scheduler queue size is 100,000, TCQ gets I/O requests by fully sorted order, so seek overhead can be much smaller. (Linux I/O scheduler queue size and MyISAM performance)

nr_requests (RW): This controls how many requests may be allocated in the block layer for read or write requests. Note that the total allocated number may be twice this amount, since it applies only to reads or writes (not the accumulated sum). (http://alinux.tv/Kernel-2.6.34/block/queue-sysfs.txt)

Increase nr_requests to 1024 (improves on most I/O workloads) (http://www.ibm.com/developerworks/wikis/display/LinuxP/Performance+Tuning)

With /sys/block/sdb/queue/nr_requests at 8192 to hide my IO ordering submission problems (http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg00493.html)

increasing the nr_requests of the queue to 512 helps the most. (http://www.gossamer-threads.com/lists/linux/kernel/420792?do=post_view_threaded)

Now give the IO scheduler more flexibility by increasing the number of schedulable requests and by increasing the maximum delay to avoid disk seeking: $ echo 4096 > /sys/block/sdX/queue/nr_requests (http://www.fhgfs.com/wiki/wikka.php?wakka=ServerTuning)

See also
-------

https://www.monperrus.net/martin/IO+scheduling+for+san+and+virtualization
Comments highly welcome!
Tagged as: