1. 22 Feb, 2018 1 commit
    • Jens Axboe's avatar
      blk-wbt: account flush requests correctly · 0528a533
      Jens Axboe authored
      commit 5235553d upstream.
      Mikulas reported a workload that saw bad performance, and figured
      out what it was due to various other types of requests being
      accounted as reads. Flush requests, for instance. Due to the
      high latency of those, we heavily throttle the writes to keep
      the latencies in balance. But they really should be accounted
      as writes.
      Fix this by checking the exact type of the request. If it's a
      read, account as a read, if it's a write or a flush, account
      as a write. Any other request we disregard. Previously everything
      would have been mistakenly accounted as reads.
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  2. 25 Dec, 2017 1 commit
  3. 20 Jun, 2017 2 commits
    • Ingo Molnar's avatar
      sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming · 2055da97
      Ingo Molnar authored
      So I've noticed a number of instances where it was not obvious from the
      code whether ->task_list was for a wait-queue head or a wait-queue entry.
      Furthermore, there's a number of wait-queue users where the lists are
      not for 'tasks' but other entities (poll tables, etc.), in which case
      the 'task_list' name is actively confusing.
      To clear this all up, name the wait-queue head and entry list structure
      fields unambiguously:
      	struct wait_queue_head::task_list	=> ::head
      	struct wait_queue_entry::task_list	=> ::entry
      For example, this code:
      	rqw->wait.task_list.next != &wait->task_list
      ... is was pretty unclear (to me) what it's doing, while now it's written this way:
      	rqw->wait.head.next != &wait->entry
      ... which makes it pretty clear that we are iterating a list until we see the head.
      Other examples are:
      	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
      	list_for_each_entry(wq, &fence->wait.task_list, task_list) {
      ... where it's unclear (to me) what we are iterating, and during review it's
      hard to tell whether it's trying to walk a wait-queue entry (which would be
      a bug), while now it's written as:
      	list_for_each_entry_safe(pos, next, &x->head, entry) {
      	list_for_each_entry(wq, &fence->wait.head, entry) {
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    • Ingo Molnar's avatar
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar authored
      	wait_queue_t		=>	wait_queue_entry_t
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  4. 21 Apr, 2017 1 commit
  5. 19 Apr, 2017 1 commit
  6. 11 Apr, 2017 1 commit
    • Jan Kara's avatar
      block: Fix list corruption of blk stats callback list · 3f19cd23
      Jan Kara authored
      When CFQ calls wbt_disable_default(), it will call
      blk_stat_remove_callback() to stop gathering IO statistics for the
      purposes of writeback throttling. Later, when request_queue is
      unregistered, wbt_exit() will call blk_stat_remove_callback() again
      which will try to delete callback from the list again and possibly cause
      list corruption.
      Fix the problem by making wbt_disable_default() called wbt_exit() which
      is properly guarded against being called multiple times.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  7. 21 Mar, 2017 2 commits
    • Omar Sandoval's avatar
      blk-stat: convert to callback-based statistics reporting · 34dbad5d
      Omar Sandoval authored
      Currently, statistics are gathered in ~0.13s windows, and users grab the
      statistics whenever they need them. This is not ideal for both in-tree
      1. Writeback throttling wants its own dynamically sized window of
         statistics. Since the blk-stats statistics are reset after every
         window and the wbt windows don't line up with the blk-stats windows,
         wbt doesn't see every I/O.
      2. Polling currently grabs the statistics on every I/O. Again, depending
         on how the window lines up, we may miss some I/Os. It's also
         unnecessary overhead to get the statistics on every I/O; the hybrid
         polling heuristic would be just as happy with the statistics from the
         previous full window.
      This reworks the blk-stats infrastructure to be callback-based: users
      register a callback that they want called at a given time with all of
      the statistics from the window during which the callback was active.
      Users can dynamically bucketize the statistics. wbt and polling both
      currently use read vs. write, but polling can be extended to further
      subdivide based on request size.
      The callbacks are kept on an RCU list, and each callback has percpu
      stats buffers. There will only be a few users, so the overhead on the
      I/O completion side is low. The stats flushing is also simplified
      considerably: since the timer function is responsible for clearing the
      statistics, we don't have to worry about stale statistics.
      wbt is a trivial conversion. After the conversion, the windowing problem
      mentioned above is fixed.
      For polling, we register an extra callback that caches the previous
      window's statistics in the struct request_queue for the hybrid polling
      heuristic to use.
      Since we no longer have a single stats buffer for the request queue,
      this also removes the sysfs and debugfs stats entries. To replace those,
      we add a debugfs entry for the poll statistics.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    • Omar Sandoval's avatar
      blk-stat: use READ and WRITE instead of BLK_STAT_{READ,WRITE} · fa2e39cb
      Omar Sandoval authored
      The stats buckets will become generic soon, so make the existing users
      use the common READ and WRITE definitions instead of one internal to
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
  8. 02 Feb, 2017 1 commit
  9. 02 Jan, 2017 2 commits
  10. 09 Dec, 2016 1 commit
  11. 01 Dec, 2016 1 commit
  12. 28 Nov, 2016 3 commits
  13. 16 Nov, 2016 1 commit
  14. 11 Nov, 2016 3 commits
  15. 10 Nov, 2016 1 commit
    • Jens Axboe's avatar
      blk-wbt: add general throttling mechanism · e34cbd30
      Jens Axboe authored
      We can hook this up to the block layer, to help throttle buffered
      wbt registers a few trace points that can be used to track what is
      happening in the system:
      wbt_lat: 259:0: latency 2446318
      wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
                     wmean=518866, wmin=15522, wmax=5330353, wsamples=57
      wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32
      This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
      dumps the current read/write stats for that window, and wbt_step shows a
      step down event where we now scale back writes. Each trace includes the
      device, 259:0 in this case.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>