| BUFFERIO(9) | Kernel Developer's Manual | BUFFERIO(9) |
BUFFERIO, biodone,
biowait, getiobuf,
putiobuf, nestiobuf_setup,
nestiobuf_done —
#include <sys/buf.h>
void
biodone(struct
buf *bp);
int
biowait(struct
buf *bp);
struct buf *
getiobuf(struct
vnode *vp, bool
waitok);
void
putiobuf(struct
buf *bp);
void
nestiobuf_setup(struct
buf *mbp, struct buf
*bp, int offset,
size_t size);
void
nestiobuf_done(struct
buf *mbp, int
donebytes, int
error);
BUFFERIO subsystem manages block I/O buffer
transfers, described by the struct buf structure, which
serves multiple purposes between users in BUFFERIO,
users in buffercache(9),
and users in block device drivers to execute transfers to physical disks.
BUFFERIO wishing to submit a buffer for block
I/O transfer must obtain a struct buf, e.g. via
getiobuf(), fill its parameters, and submit it to a
block device with
bdev_strategy(9), usually
via VOP_STRATEGY(9).
The parameters to an I/O transfer described by bp are specified by the following struct buf fields:
->b_flagsB_READB_ASYNC->b_iodone and must
not call
biowait(bp).B_WRITE, which is zero.->b_data->b_bcount->b_blkno->b_iodoneB_ASYNC must not be set
in bp->b_flags.Additionally, if the I/O transfer is a write associated with a
vnode(9)
vp, then before the user submits it to a block device,
the user must increment
vp->v_numoutput. The user
must not acquire vp's vnode lock between incrementing
vp->v_numoutput and
submitting bp to a block device — doing so will
likely cause deadlock with the syncer.
Block I/O transfer completion may be notified by the
bp->b_iodone callback, by
signalling biowait() waiters, or not at all in the
B_ASYNC case.
->b_iodone callback to
a non-NULL function pointer, it will be called in
soft interrupt context when the I/O transfer is complete. The user
may not call
biowait(bp) in this
case.B_ASYNC is set, then the I/O transfer is
asynchronous and the user will not be notified when it is completed. The
user may not call
biowait(bp) in this
case.->b_iodone is
NULL and B_ASYNC is not
specified, the user may wait for the I/O transfer to complete with
biowait(bp).Once an I/O transfer has completed, its struct
buf may be reused, but the user must first clear the
BO_DONE flag of
bp->b_oflags before reusing
it.
After initializing the b_flags,
b_data, and b_bcount
parameters of an I/O transfer for the buffer, called the
master buffer, the user can issue smaller transfers for
segments of the buffer using nestiobuf_setup(). When
nested I/O transfers complete, in any order, they debit from the amount of
work left to be done in the master buffer. If any segments of the buffer
were skipped, the user can report this with
nestiobuf_done() to debit the skipped part of the
work.
The master buffer's I/O transfer is completed when all nested
buffers' I/O transfers are completed, and if
nestiobuf_done() is called in the case of skipped
segments.
For writes associated with a vnode vp,
nestiobuf_setup() accounts for
vp->v_numoutput, so the
caller is not allowed to acquire vp's vnode lock
before submitting the nested I/O transfer to a block device. However, the
caller is responsible for accounting the master buffer in
vp->v_numoutput. This must
be done very carefully because after incrementing
vp->v_numoutput, the caller
is not allowed to acquire vp's vnode lock before
either calling nestiobuf_done() or submitting the
last nested I/O transfer to a block device.
For example:
struct buf *mbp, *bp;
size_t skipped = 0;
unsigned i;
int error = 0;
mbp = getiobuf(vp, true);
mbp->b_data = data;
mbp->b_resid = mbp->b_bcount = datalen;
mbp->b_flags = B_WRITE;
KASSERT(0 < nsegs);
KASSERT(datalen == nsegs*segsz);
for (i = 0; i < nsegs; i++) {
struct vnode *devvp;
daddr_t blkno;
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL);
VOP_UNLOCK(vp);
if (error == 0 && blkno == -1)
error = EIO;
if (error) {
/* Give up early, don't try to handle holes. */
skipped += datalen - i*segsz;
break;
}
bp = getiobuf(vp, true);
nestiobuf_setup(bp, mbp, i*segsz, segsz);
bp->b_blkno = blkno;
if (i == nsegs - 1) /* Last segment. */
break;
VOP_STRATEGY(devvp, bp);
}
/*
* Account v_numoutput for master write.
* (Must not vn_lock before last VOP_STRATEGY!)
*/
mutex_enter(&vp->v_interlock);
vp->v_numoutput++;
mutex_exit(&vp->v_interlock);
if (skipped)
nestiobuf_done(mbp, skipped, error);
else
VOP_STRATEGY(devvp, bp);
d_strategy member of struct
bdevsw (driver(9)), to
queue a buffer for disk I/O. The inputs to the strategy method are:
->b_flagsB_READ->b_data->b_bcount->b_blknoIf the strategy method uses bufq(9), it must additionally initialize the following fields before queueing bp with bufq_put(9):
->b_rawblknoWhen the I/O transfer is complete, whether it succeeded or failed, the strategy method must:
->b_error to zero
on success, or to an errno(2)
error code on failure.->b_resid to the
number of bytes remaining to transfer, whether on success or on failure.
If no bytes were transferred, this must be set to
bp->b_bcount.biodone(bp).biodone(bp)To be called by a block device driver. Caller must first set
bp->b_error to an error
code and bp->b_resid to
the number of bytes remaining to transfer.
biowait(bp)->b_error.
To be called by a user requesting the I/O transfer.
May not be called if bp has a callback
or is asynchronous — that is, if
bp->b_iodone is set, or
if B_ASYNC is set in
bp->b_flags.
getiobuf(vp,
waitok)NULL, the transfer
is associated with it. If waitok is false, returns
NULL if none can be allocated immediately.
The resulting struct buf pointer must
eventually be passed to putiobuf() to release
it. Do not use
brelse(9).
The buffer may not be used for an asynchronous I/O transfer,
because there is no way to know when it is completed and may be safely
passed to putiobuf(). Asynchronous I/O transfers
are allowed only for buffers in the
buffercache(9).
May sleep if waitok is true.
putiobuf(bp)getiobuf(). Either bp must
never have been submitted to a block device, or the I/O transfer must have
completed.BUFFERIO subsystem is implemented in
sys/kern/vfs_bio.c.
BUFFERIO abstraction provides no way to cancel an
I/O transfer once it has been submitted to a block device.
The BUFFERIO abstraction provides no way
to do I/O transfers with non-kernel pages, e.g. directly to buffers in
userland without copying into the kernel first.
The struct buf type is all mixed up with the buffercache(9).
The BUFFERIO abstraction is a totally
idiotic API design.
The v_numoutput accounting required of
BUFFERIO callers is asinine.
| March 29, 2015 | NetBSD 9.3 |