- 12 Feb, 2019 16 commits
-
-
Christoph Hellwig authored
bio_check_eod() should check partition size not the whole disk if bio->bi_partno is non-zero. Do this by moving the call to bio_check_eod() into blk_partition_remap(). Based on an earlier patch from Jiufei Xue. Fixes: 74d46992 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reported-by:
Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 52c5e62d)
-
Ilya Dryomov authored
Similar to blkdev_write_iter(), return -EPERM if the partition is read-only. This covers ioctl(), fallocate() and most in-kernel users but isn't meant to be exhaustive -- everything else will be caught in generic_make_request_checks(), fail with -EIO and can be fixed later. Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit a13553c7)
-
Ilya Dryomov authored
Regular block device writes go through blkdev_write_iter(), which does bdev_read_only(), while zeroout/discard/etc requests are never checked, both userspace- and kernel-triggered. Add a generic catch-all check to generic_make_request_checks() to actually enforce ioctl(BLKROSET) and set_disk_ro(), which is used by quite a few drivers for things like snapshots, read-only backing files/images, etc. Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 721c7fc7)
-
weiping zhang authored
export these two interface for cgroup-v1. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
weiping zhang <zhangweiping@didichuxing.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 17534c6f)
-
Bart Van Assche authored
The __blk_mq_register_dev(), blk_mq_unregister_dev(), elv_register_queue() and elv_unregister_queue() calls need to be protected with sysfs_lock but other code in these functions not. Hence protect only this code with sysfs_lock. This patch fixes a locking inversion issue in blk_unregister_queue() and also in an error path of blk_register_queue(): it is not allowed to hold sysfs_lock around the kobject_del(&q->kobj) call. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 2c2086af)
-
Mike Snitzer authored
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this is block/genhd.c:add_disk() has requires that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant that drivers may use to add a disk without also calling blk_register_queue(). Driver must call blk_register_queue() once its request_queue is fully initialized. - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED is not set. It won't be set if driver used add_disk_no_queue_reg() but driver encounters an error and must del_gendisk() before calling blk_register_queue(). - Export blk_register_queue(). These changes allow DM to use add_disk_no_queue_reg() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known its request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization performed by blk_register_queue() is missed for DM devices anymore. Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit fa70d2e2)
-
Mike Snitzer authored
The original commit e9a823fb (block: fix warning when I/O elevator is changed as request_queue is being removed) is pretty conflated. "conflated" because the resource being protected by q->sysfs_lock isn't the queue_flags (it is the 'queue' kobj). q->sysfs_lock serializes __elevator_change() (via elv_iosched_store) from racing with blk_unregister_queue(): 1) By holding q->sysfs_lock first, __elevator_change() can complete before a racing blk_unregister_queue(). 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED in case elv_iosched_store() loses the race with blk_unregister_queue(), it needs a way to know the 'queue' kobj isn't there. Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is held until after the 'queue' kobj is removed. To do so blk_mq_unregister_dev() must not also take q->sysfs_lock. So rename __blk_mq_unregister_dev() to blk_mq_unregister_dev(). Also, blk_unregister_queue() should use q->queue_lock to protect against any concurrent writes to q->queue_flags -- even though chances are the queue is being cleaned up so no concurrent writes are likely. Fixes: e9a823fb ("block: fix warning when I/O elevator is changed as request_queue is being removed") Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 667257e8)
-
Mike Snitzer authored
device_add_disk() will only call bdi_register_owner() if !GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call bdi_unregister() if !GENHD_FL_HIDDEN. Found with code inspection. bdi_unregister() won't do any harm if bdi_register_owner() wasn't used but best to avoid the unnecessary call to bdi_unregister(). Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN") Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit bc8d062c)
-
Christoph Hellwig authored
That we we can also poll non blk-mq queues. Mostly needed for the NVMe multipath code, but could also be useful elsewhere. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit ea435e1b)
-
Christoph Hellwig authored
With this flag a driver can create a gendisk that can be used for I/O submission inside the kernel, but which is not registered as user facing block device. This will be useful for the NVMe multipath implementation. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 8ddcd653)
-
Christoph Hellwig authored
The hidden gendisks introduced in the next patch need to keep the dev field in their struct device empty so that udev won't try to create block device nodes for them. To support that rewrite disk_devt to look at the major and first_minor fields in the gendisk itself instead of looking into the struct device. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 517bf3c3)
-
Christoph Hellwig authored
This helpers allows to bounce steal the uncompleted bios from a request so that they can be reissued on another path. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit ef71de8b)
-
Christoph Hellwig authored
This helper allows reinserting a bio into a new queue without much overhead, but requires all queue limits to be the same for the upper and lower queues, and it does not provide any recursion preventions. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Javier González <javier@cnexlabs.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit f421e1d9)
-
Bart Van Assche authored
This patch does not change any functionality. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 14a23498)
-
Bart Van Assche authored
These two functions are only called from inside the block layer so unexport them. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 83d016ac)
-
Richard Zhu authored
errata: When a read command returns less data than specified in the PRDs (for example, there are two PRDs for this command, but the device returns a number of bytes which is less than in the first PRD), the second PRD of this command is not read out of the PRD FIFO, causing the next command to use this PRD erroneously. workaround - forces sg_tablesize = 1 - modified the sg_io function in block/scsi_ioctl.c to use a 64k buffer allocated with dma_alloc_coherent during the probe in ahci_imx - In order to fix the scsi/sata hang, when CD_ROM and HDD are accessed simultaneously after the workaround is applied. Do not go to sleep in scsi_eh_handler, when there is host failed. Signed-off-by:
Richard Zhu <Richard.Zhu@freescale.com>
-
- 29 Dec, 2018 2 commits
-
-
Mikulas Patocka authored
[ Upstream commit b88aef36 ] If __blkdev_issue_discard is in progress and a device mapper device is reloaded with a table that doesn't support discard, q->limits.max_discard_sectors is set to zero. This results in infinite loop in __blkdev_issue_discard. This patch checks if max_discard_sectors is zero and aborts with -EOPNOTSUPP. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Tested-by:
Zdenek Kabelac <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
[ Upstream commit af097f5d ] Don't build discards bigger than what the user asked for, if the user decided to limit the size by writing to 'discard_max_bytes'. Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- 21 Dec, 2018 1 commit
-
-
Jens Axboe authored
[ Upstream commit 2527d997 ] If an IO scheduler is selected via elevator= and it doesn't match the driver in question wrt blk-mq support, then we fail to boot. The elevator= parameter is deprecated and only supported for non-mq devices. Augment the elevator lookup API so that we pass in if we're looking for an mq capable scheduler or not, so that we only ever return a valid type for the queue in question. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196695Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- 21 Nov, 2018 1 commit
-
-
Ming Lei authored
commit 8dc765d4 upstream. c2856ae2 ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 Nov, 2018 1 commit
-
-
Paolo Valente authored
[ Upstream commit cbeb869a ] BFQ schedules entities (which represent either per-process queues or groups of queues) as a function of their timestamps. In particular, as a function of their (virtual) finish times. The finish time of an entity is computed as a function of the budget assigned to the entity, assuming, tentatively, that the entity, once in service, will receive an amount of service equal to its budget. Then, when the entity is expired because it finishes to be served, this finish time is updated as a function of the actual service received by the entity. This allows the entity to be correctly charged with only the service received, and then to be correctly re-scheduled. Yet an entity may receive service also while not being the entity in service (in the scheduling environment of its parent entity), for several reasons. If the entity remains with no backlog while receiving this 'unofficial' service, then it is expired. Also on such an expiration, the finish time of the entity should be updated to account for only the service actually received by the entity. Unfortunately, such an update is not performed for an entity expiring without being the entity in service. In a similar vein, the service counter of the entity in service is reset when the entity is expired, to be ready to be used for next service cycle. This reset too should be performed also in case an entity is expired because it remains empty after receiving service while not being the entity in service. But in this case the reset is not performed. This commit performs the above update of the finish time and reset of the service received, also for an entity expiring while not being the entity in service. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 Oct, 2018 1 commit
-
-
Ilya Dryomov authored
commit 587562d0 upstream. trace_block_unplug() takes true for explicit unplugs and false for implicit unplugs. schedule() unplugs are implicit and should be reported as timer unplugs. While correct in the legacy code, this has been inverted in blk-mq since 4.11. Cc: stable@vger.kernel.org Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 Sep, 2018 3 commits
-
-
Ming Lei authored
[ Upstream commit 1311326c ] SCSI probing may synchronously create and destroy a lot of request_queues for non-existent devices. Any synchronize_rcu() in queue creation or destroy path may introduce long latency during booting, see detailed description in comment of blk_register_queue(). This patch removes one synchronize_rcu() inside blk_cleanup_queue() for this case, commit c2856ae2(blk-mq: quiesce queue before freeing queue) needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but when queue isn't initialized, it isn't necessary to do that since only pass-through requests are involved, no original issue in scsi_execute() at all. Without this patch and previous one, it may take more 20+ seconds for virtio-scsi to complete disk probe. With the two patches, the time becomes less than 100ms. Fixes: c2856ae2 ("blk-mq: quiesce queue before freeing queue") Reported-by:
Andrew Jones <drjones@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
[ Upstream commit b04f50ab ] Only attempt to merge bio iff the ctx->rq_list isn't empty, because: 1) for high-performance SSD, most of times dispatch may succeed, then there may be nothing left in ctx->rq_list, so don't try to merge over sw queue if it is empty, then we can save one acquiring of ctx->lock 2) we can't expect good merge performance on per-cpu sw queue, and missing one merge on sw queue won't be a big deal since tasks can be scheduled from one CPU to another. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by:
Kashyap Desai <kashyap.desai@broadcom.com> Reported-by:
Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Snitzer authored
[ Upstream commit 42c9cdfe ] Set max_discard_segments to USHRT_MAX in blk_set_stacking_limits() so that blk_stack_limits() can stack up this limit for stacked devices. before: $ cat /sys/block/nvme0n1/queue/max_discard_segments 256 $ cat /sys/block/dm-0/queue/max_discard_segments 1 after: $ cat /sys/block/nvme0n1/queue/max_discard_segments 256 $ cat /sys/block/dm-0/queue/max_discard_segments 256 Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 19 Sep, 2018 4 commits
-
-
Mauricio Faria de Oliveira authored
[ Upstream commit 14cb2c8a ] The if-block that sets a successful return value in aix_partition() uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized. For example, if 'numlvs' is zero or alloc_lvn() fails, neither is initialized, but are used anyway if alloc_pvd() succeeds after it. So, make the alloc_pvd() call conditional on their initialization. This has been hit when attaching an apparently corrupted/stressed AIX LUN, misleading the kernel to pr_warn() invalid data and hang. [...] partition (null) (11 pp's found) is not contiguous [...] partition (null) (2 pp's found) is not contiguous [...] partition (null) (3 pp's found) is not contiguous [...] partition (null) (64 pp's found) is not contiguous Fixes: 6ceea22b ("partitions: add aix lvm partition support files") Signed-off-by:
Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
[ Upstream commit d43fdae7 ] Even if properly initialized, the lvname array (i.e., strings) is read from disk, and might contain corrupt data (e.g., lack the null terminating character for strings). So, make sure the partition name string used in pr_warn() has the null terminating character. Fixes: 6ceea22b ("partitions: add aix lvm partition support files") Suggested-by:
Daniel J. Axtens <daniel.axtens@canonical.com> Signed-off-by:
Mauricio Faria de Oliveira <mfo@canonical.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
[ Upstream commit 75d6e175 ] The passed 'nr' from userspace represents the total depth, meantime inside 'struct blk_mq_tags', 'nr_tags' stores the total tag depth, and 'nr_reserved_tags' stores the reserved part. There are two issues in blk_mq_tag_update_depth() now: 1) for growing tags, we should have used the passed 'nr', and keep the number of reserved tags not changed. 2) the passed 'nr' should have been used for checking against 'tags->nr_tags', instead of number of the normal part. This patch fixes the above two cases, and avoids kernel crash caused by wrong resizing sbitmap queue. Cc: "Ewan D. Milne" <emilne@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Omar Sandoval <osandov@fb.com> Tested by: Marco Patalano <mpatalan@redhat.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konstantin Khlebnikov authored
commit d5274b3c upstream. Fix trivial use-after-free. This could be last reference to bfqg. Fixes: 8f9bebc3 ("block, bfq: access and cache blkg data only when safe") Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Sep, 2018 2 commits
-
-
Bart Van Assche authored
[ Upstream commit f7ecb1b1 ] This patch does not change any functionality but avoids that gcc reports the following warnings when building with W=1: block/cfq-iosched.c: In function ?cfq_back_seek_max_store?: block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4756:1: note: in expansion of macro ?STORE_FUNCTION? STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0); ^~~~~~~~~~~~~~ block/cfq-iosched.c: In function ?cfq_slice_idle_store?: block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4759:1: note: in expansion of macro ?STORE_FUNCTION? STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1); ^~~~~~~~~~~~~~ block/cfq-iosched.c: In function ?cfq_group_idle_store?: block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4760:1: note: in expansion of macro ?STORE_FUNCTION? STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1); ^~~~~~~~~~~~~~ block/cfq-iosched.c: In function ?cfq_low_latency_store?: block/cfq-iosched.c:4741:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4765:1: note: in expansion of macro ?STORE_FUNCTION? STORE_FUNCTION(cfq_low_latency_store, &cfqd->cfq_latency, 0, 1, 0); ^~~~~~~~~~~~~~ block/cfq-iosched.c: In function ?cfq_slice_idle_us_store?: block/cfq-iosched.c:4775:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4782:1: note: in expansion of macro ?USEC_STORE_FUNCTION? USEC_STORE_FUNCTION(cfq_slice_idle_us_store, &cfqd->cfq_slice_idle, 0, UINT_MAX); ^~~~~~~~~~~~~~~~~~~ block/cfq-iosched.c: In function ?cfq_group_idle_us_store?: block/cfq-iosched.c:4775:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/cfq-iosched.c:4783:1: note: in expansion of macro ?USEC_STORE_FUNCTION? USEC_STORE_FUNCTION(cfq_group_idle_us_store, &cfqd->cfq_group_idle, 0, UINT_MAX); ^~~~~~~~~~~~~~~~~~~ Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Edwards authored
[ Upstream commit d6c02a9b ] In commit ed996a52 ("block: simplify and cleanup bvec pool handling"), the value of the slab index is incremented by one in bvec_alloc() after the allocation is done to indicate an index value of 0 does not need to be later freed. bvec_nr_vecs() was not updated accordingly, and thus returns the wrong value. Decrement idx before performing the lookup. Fixes: ed996a52 ("block: simplify and cleanup bvec pool handling") Signed-off-by:
Greg Edwards <gedwards@ddn.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 09 Sep, 2018 3 commits
-
-
Maciej S. Szmigiero authored
commit fc8ebd01 upstream. The value that struct cftype .write() method returns is then directly returned to userspace as the value returned by write() syscall, so it should be the number of bytes actually written (or consumed) and not zero. Returning zero from write() syscall makes programs like /bin/echo or bash spin. Signed-off-by:
Maciej S. Szmigiero <mail@maciej.szmigiero.name> Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support") Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
commit b233f127 upstream. Runtime PM isn't ready for blk-mq yet, and commit 765e40b6 ("block: disable runtime-pm for blk-mq") tried to disable it. Unfortunately, it can't take effect in that way since user space still can switch it on via 'echo auto > /sys/block/sdN/device/power/control'. This patch disables runtime-pm for blk-mq really by pm_runtime_disable() and fixes all kinds of PM related kernel crash. Cc: Tomas Janousek <tomi@nomi.cz> Cc: Przemek Socha <soprwa@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Patrick Steinhardt <ps@pks.im> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
xiao jin authored
commit 54648cf1 upstream. We find the memory use-after-free issue in __blk_drain_queue() on the kernel 4.14. After read the latest kernel 4.18-rc6 we think it has the same problem. Memory is allocated for q->fq in the blk_init_allocated_queue(). If the elevator init function called with error return, it will run into the fail case to free the q->fq. Then the __blk_drain_queue() uses the same memory after the free of the q->fq, it will lead to the unpredictable event. The patch is to set q->fq as NULL in the fail case of blk_init_allocated_queue(). Fixes: commit 7c94e1c1 ("block: introduce blk_flush_queue to drive flush machinery") Cc: <stable@vger.kernel.org> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
xiao jin <jin.xiao@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 Aug, 2018 1 commit
-
-
Dan Carpenter authored
[ Upstream commit ce042c18 ] resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: 455a7b23 ("block: Add Sed-opal library") Reviewed-by:
Scott Bauer <scott.bauer@intel.com> Tested-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 Aug, 2018 1 commit
-
-
Paolo Valente authored
commit 4baa8bb1 upstream. This commit fixes a bug that causes bfq to fail to guarantee a high responsiveness on some drives, if there is heavy random read+write I/O in the background. More precisely, such a failure allowed this bug to be found [1], but the bug may well cause other yet unreported anomalies. BFQ raises the weight of the bfq_queues associated with soft real-time applications, to privilege the I/O, and thus reduce latency, for these applications. This mechanism is named soft-real-time weight raising in BFQ. A soft real-time period may happen to be nested into an interactive weight raising period, i.e., it may happen that, when a bfq_queue switches to a soft real-time weight-raised state, the bfq_queue is already being weight-raised because deemed interactive too. In this case, BFQ saves in a special variable wr_start_at_switch_to_srt, the time instant when the interactive weight-raising period started for the bfq_queue, i.e., the time instant when BFQ started to deem the bfq_queue interactive. This value is then used to check whether the interactive weight-raising period would still be in progress when the soft real-time weight-raising period ends. If so, interactive weight raising is restored for the bfq_queue. This restore is useful, in particular, because it prevents bfq_queues from losing their interactive weight raising prematurely, as a consequence of spurious, short-lived soft real-time weight-raising periods caused by wrong detections as soft real-time. If, instead, a bfq_queue switches to soft-real-time weight raising while it *is not* already in an interactive weight-raising period, then the variable wr_start_at_switch_to_srt has no meaning during the following soft real-time weight-raising period. Unfortunately the handling of this case is wrong in BFQ: not only the variable is not flagged somehow as meaningless, but it is also set to the time when the switch to soft real-time weight-raising occurs. This may cause an interactive weight-raising period to be considered mistakenly as still in progress, and thus a spurious interactive weight-raising period to start for the bfq_queue, at the end of the soft-real-time weight-raising period. In particular the spurious interactive weight-raising period will be considered as still in progress, if the soft-real-time weight-raising period does not last very long. The bfq_queue will then be wrongly privileged and, if I/O bound, will unjustly steal bandwidth to truly interactive or soft real-time bfq_queues, harming responsiveness and low latency. This commit fixes this issue by just setting wr_start_at_switch_to_srt to minus infinity (farthest past time instant according to jiffies macros): when the soft-real-time weight-raising period ends, certainly no interactive weight-raising period will be considered as still in progress. [1] Background I/O Type: Random - Background I/O mix: Reads and writes - Application to start: LibreOffice Writer in http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.13-IO-LaptopSigned-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Angelo Ruocco <angeloruocco90@gmail.com> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by:
Lee Tibbert <lee.tibbert@gmail.com> Tested-by:
Mirko Montanari <mirkomontanari91@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 03 Aug, 2018 3 commits
-
-
Greg Edwards authored
commit 5151842b upstream. After the bio has been updated to represent the remaining sectors, reset bi_done so bio_rewind_iter() does not rewind further than it should. This resolves a bio_integrity_process() failure on reads where the original request was split. Fixes: 63573e35 ("bio-integrity: Restore original iterator on verify stage") Signed-off-by:
Greg Edwards <gedwards@ddn.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Wilck authored
commit b403ea24 upstream. If the last page of the bio is not "full", the length of the last vector slot needs to be corrected. This slot has the index (bio->bi_vcnt - 1), but only in bio->bi_io_vec. In the "bv" helper array, which is shifted by the value of bio->bi_vcnt at function invocation, the correct index is (nr_pages - 1). v2: improved readability following suggestions from Ming Lei. v3: followed a formatting suggestion from Christoph Hellwig. Fixes: 2cefe4db ("block: add bio_iov_iter_get_pages()") Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Filippo Muzzini authored
[ Upstream commit a12bffeb ] In bfq_requests_merged(), there is a deadlock because the lock on bfqq->bfqd->lock is held by the calling function, but the code of this function tries to grab the lock again. This deadlock is currently hidden by another bug (fixed by next commit for this source file), which causes the body of bfq_requests_merged() to be never executed. This commit removes the deadlock by removing the lock/unlock pair. Signed-off-by:
Filippo Muzzini <filippo.muzzini@outlook.it> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <alexander.levin@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 Jul, 2018 1 commit
-
-
Alan Jenkins authored
commit 1dc3039b upstream. When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. The PREEMPT_ONLY flag was introduced later in commit 3a0a5299 ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks. So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1] [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 Without this fix, I get an IO error in this test: # dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds The interruptible wait was added to blk_queue_enter in commit 3ef28e83 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct. Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: stable@vger.kernel.org Signed-off-by:
Alan Jenkins <alan.christopher.jenkins@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-