How PUFFS should deal with EDQUOT?

Discussion:

Emmanuel Dreyfus

2014-09-22 04:28:38 UTC

Hi

When a PUFFS filesystem enforces quota, a process doing a write over
quota will end frozen in DE+ state.

The problem is that we have written data in the page cache that is
supposed to go to disk. The code path is a bit complicated, but
basically we go in genfs VOP_PUTPAGE, which leads to genfs_do_io() where
we have a VOP_STRATEGY, which cause PUFFS write. The PUFFS write will
get EDQUOT, but genfs_do_io() ignores VOP_STRATEGY's return value and
retries forever.

In other words, when flushing the cache, the kernel ignores errors from
the filesystem and runs an endless loop attempting to flush data, during
which the process that did the over quota write is not allowed to
complete exit().

What is the proper way to deal with that? Is it reasonable to wipe the
page cache using puffs_inval_pagecache_node() when write gets a failure?
Any failure? Or just EDQUOT and ENOSPC? Should that happen in libpuffs
or in the filesystem (libperfuse here)?

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
***@netbsd.org

David Holland

2014-09-22 05:10:49 UTC

Permalink

Post by Emmanuel Dreyfus
When a PUFFS filesystem enforces quota, a process doing a write over
quota will end frozen in DE+ state.
The problem is that we have written data in the page cache that is
supposed to go to disk. The code path is a bit complicated, but
basically we go in genfs VOP_PUTPAGE, which leads to genfs_do_io() where
we have a VOP_STRATEGY, which cause PUFFS write. The PUFFS write will
get EDQUOT, but genfs_do_io() ignores VOP_STRATEGY's return value and
retries forever.
In other words, when flushing the cache, the kernel ignores errors from
the filesystem and runs an endless loop attempting to flush data, during
which the process that did the over quota write is not allowed to
complete exit().
What is the proper way to deal with that? Is it reasonable to wipe the
page cache using puffs_inval_pagecache_node() when write gets a failure?
Any failure? Or just EDQUOT and ENOSPC? Should that happen in libpuffs
or in the filesystem (libperfuse here)?

The cache shouldn't discard data; that isn't cool. The filesystem
should generate EDQUOT before entering the offending data into the
cache. (Once you enter the data into the cache, the application gets
told it succeeded... so errors afterwards aren't helpful.)

Does this happen with ffs quotas, or with some fuse thing that does
quotas its own incompatible way?

--
David A. Holland
***@netbsd.org

Antti Kantee

2014-09-22 06:18:31 UTC

Permalink

I'd guess the key to success would be to support genfs_ops in puffs so
that the file server is consulted about block allocations.

See also tests/vfs/t_full.c

Emmanuel Dreyfus

2014-09-22 11:57:48 UTC

Permalink

Post by Antti Kantee
I'd guess the key to success would be to support genfs_ops in puffs so
that the file server is consulted about block allocations.

You mean gop_alloc() ? Is it documented somewhere? That would ease its
implementation.

If I understand correctly, that would be called when the kernel is going
to need a new block for a file, and here the filesystem could say ENOSPC
or EDQUOT before the data goes into the cache, right?

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
***@netbsd.org

Emmanuel Dreyfus

2014-09-27 04:36:18 UTC

Permalink

I'd guess the key to success would be to support genfs_ops in puffs so that
the file server is consulted about block allocations.

It seems I just have to call GOP_ALLOC in puffs_vnop_write()
(see below) and implement puffs_vnop_fallocate(). Since having
VOP_FALLOCATE() in FFS looks difficult in the near future,
libperfuse turns it into big writes to the file, which fail on
EDQUOT or NOSPC. That way failure is produced before data enter
the page cache, which is what we were looking for.

Opinion?

--- puffs_vnops.c.orig 2014-09-27 06:30:02.000000000 +0200
+++ puffs_vnops.c 2014-09-27 06:30:13.000000000 +0200
@@ -2342,8 +2342,14 @@
*/
if (ap->a_ioflag & IO_APPEND)
uio->uio_offset = vp->v_size;

+ origoff = uio->uio_offset;
+ error = GOP_ALLOC(vp, origoff, uio->uio_resid,
+ 0, curlwp->l_cred);
+ if (error)
+ goto out;
+
while (uio->uio_resid > 0) {
oldoff = uio->uio_offset;
bytelen = uio->uio_resid;

--
Emmanuel Dreyfus
***@netbsd.org

Emmanuel Dreyfus

2014-09-27 05:47:29 UTC

Permalink

Post by Emmanuel Dreyfus
It seems I just have to call GOP_ALLOC in puffs_vnop_write()
(see below)

Wait, there are a lot of missing bits such as this:
puffs_gop_alloc(struct vnode *vp, off_t off, off_t len,
int flags, kauth_cred_t cred)
{
return _puffs_vnop_fallocate(vp, off, len);
}

And this (also called by puffs_vnop_fallocate() after
acquiring mutex on pn_sizemtx:

int
_puffs_vnop_fallocate(struct vnode *vp, off_t pos, off_t len)
{
PUFFS_MSG_VARS(vn, fallocate);
struct puffs_mount *pmp = MPTOPUFFSMP(vp->v_mount);
int error;

PUFFS_MSG_ALLOC(vn, fallocate);
fallocate_msg->pvnr_off = pos;
fallocate_msg->pvnr_len = len;
puffs_msg_setinfo(park_fallocate, PUFFSOP_VN,
PUFFS_VN_FALLOCATE, VPTOPNC(vp));

PUFFS_MSG_ENQUEUEWAIT2(pmp, park_fallocate, vp->v_data,
NULL, error);
error = checkerr(pmp, error, __func__);
PUFFS_MSG_RELEASE(fallocate);

return error;
}

At that point you may wonder why I do not send a clean patch with
all the changes. It is because I need to revert the changes
from hashlist to vcache as described in kern/49234 so that FUSE
work again, which produce a fuzzy patch.

--
Emmanuel Dreyfus
***@netbsd.org