Discussion:
dumps & shutdown order (was Re: CVS commit: src/sys/arch)
(too old to reply)
David Young
2009-06-29 23:00:06 UTC
Permalink
Hi,
A1 dump core
A2 sync filesystems and TOD clock
I think this deserves a heads-up to users because I, for one, usually
hardware reboot my computer while it dumps its GBs of RAM to the disk.
You're absolutely right. Let me see what I can do with the order of
things. I already have some ideas in mind. I can give the problem some
attention on Monday.
Looks like I can move dumps back to their customary place in the
shutdown order by opening the dump device early in cpu_reboot(9) to
prevent its detachment, running the detachment/unmount loop, closing
the device, dumping, and running the detachment/unmount loop again.
Pseudo-C for cpu_reboot(9) follows.

bool postdump = false; /* postpone dump */
int s;

if (panicstr != NULL)
howto |= RB_NOSYNC;

switch (howto & (RB_NOSYNC|RB_DUMP|RB_HALT)) {
case RB_DUMP|RB_NOSYNC:
s = splhigh();
dumpsys();
splx(s);
break;
case RB_DUMP:
/* Dump after unmounting filesystems and detaching
* most devices. Open the dump device so that it
* will not be detached.
*/
if (dumpdev != NODEV)
postdump = (bdev_open(dumpdev, FWRITE, S_IFBLK, l) == 0);
break;
default:
break;
}

/* sync filesystems, detach devices */

if (postdump) {
bdev_close(dumpdev, FWRITE, S_IFBLK, l);
s = splhigh();
dumpsys();
splx(s);

/* sync filesystems and detach devices that remain */

}

/* run PMF shutdown hooks */

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933
Joerg Sonnenberger
2009-06-29 23:36:12 UTC
Permalink
Post by David Young
Looks like I can move dumps back to their customary place in the
shutdown order by opening the dump device early in cpu_reboot(9) to
prevent its detachment, running the detachment/unmount loop, closing
the device, dumping, and running the detachment/unmount loop again.
Pseudo-C for cpu_reboot(9) follows.
I'm not sure if it should be moved back. The current place makes it IMO
more reliable to get a dump.

Joerg
Thor Lancelot Simon
2009-06-30 00:31:47 UTC
Permalink
Post by David Young
I think this deserves a heads-up to users because I, for one, usually
hardware reboot my computer while it dumps its GBs of RAM to the disk.
You're absolutely right. Let me see what I can do with the order of
things. I already have some ideas in mind. I can give the problem some
attention on Monday.
Looks like I can move dumps back to their customary place in the
shutdown order by opening the dump device early in cpu_reboot(9) to
prevent its detachment, running the detachment/unmount loop, closing
the device, dumping, and running the detachment/unmount loop again.
I don't think you should do this. It makes it unlikely we will ever get
a useful dump of a system which paniced from a filesystem problem.

Those who don't want dumps should...disable them. And the sparse dump
code makes dumping even many large-memory systems very fast.
Paul Goyette
2009-06-30 00:36:57 UTC
Permalink
Post by Thor Lancelot Simon
Those who don't want dumps should...disable them. And the sparse dump
code makes dumping even many large-memory systems very fast.
The other day someone hinted that sparse-dump was sysctl'd, but I don't
seem to have it:

quicky:paul {399} uname -rsm
NetBSD 5.99.14 amd64
quicky:paul {400} sysctl -a | grep -e dump -e sparse
kern.dump_on_panic = 1
kern.coredump.setid.dump = 0
kern.coredump.setid.path = /var/crash/%n.core
kern.coredump.setid.owner = 0
kern.coredump.setid.group = 0
kern.coredump.setid.mode = 0600 (rw------- )
proc.curproc.rlimit.coredumpsize.soft = unlimited
proc.curproc.rlimit.coredumpsize.hard = unlimited
quicky:paul {401}

How do I get sparse-dump? :)


-------------------------------------------------------------------------
| Paul Goyette | PGP DSS Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
Michael L. Hitch
2009-06-30 03:43:31 UTC
Permalink
The other day someone hinted that sparse-dump was sysctl'd, but I don't seem
quicky:paul {399} uname -rsm
NetBSD 5.99.14 amd64
...
How do I get sparse-dump? :)
Run i386?

net3$ sysctl machdep.sparse_dump
machdep.sparse_dump = 1
net3$ uname -rsm
NetBSD 5.0 i386


--
Michael L. Hitch ***@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
matthew green
2009-06-30 01:03:42 UTC
Permalink
Post by David Young
I think this deserves a heads-up to users because I, for one, usually
hardware reboot my computer while it dumps its GBs of RAM to the disk.
You're absolutely right. Let me see what I can do with the order of
things. I already have some ideas in mind. I can give the problem some
attention on Monday.
Looks like I can move dumps back to their customary place in the
shutdown order by opening the dump device early in cpu_reboot(9) to
prevent its detachment, running the detachment/unmount loop, closing
the device, dumping, and running the detachment/unmount loop again.
I don't think you should do this. It makes it unlikely we will ever get
a useful dump of a system which paniced from a filesystem problem.

it also means that filesystems won't get a final sync if the
dump hangs, which i see occasionally.

i would prefer the old ordering, or at least a choice.


.mrg.
David Holland
2009-06-30 02:49:35 UTC
Permalink
Post by Thor Lancelot Simon
I don't think you should do this. It makes it unlikely we will ever get
a useful dump of a system which paniced from a filesystem problem.
it also means that filesystems won't get a final sync if the
dump hangs, which i see occasionally.
I was seeing this more than occasionally for a while. :-/
Post by Thor Lancelot Simon
i would prefer the old ordering, or at least a choice.
What I'd like is an easy way to abort a dump that's in progress, like
by hitting spacebar or something. Then there's no need to push the
reset button, and the ordering becomes less important.

Turning dumps off isn't a good solution in general because the value
of a dump depends heavily on what happened leading up to it.
--
David A. Holland
***@netbsd.org
Greg A. Woods
2009-06-30 16:23:46 UTC
Permalink
At Tue, 30 Jun 2009 02:49:35 +0000, David Holland <dholland-***@netbsd.org> wrote:
Subject: Re: dumps & shutdown order (was Re: CVS commit: src/sys/arch)
Post by David Holland
Post by Thor Lancelot Simon
I don't think you should do this. It makes it unlikely we will ever get
a useful dump of a system which paniced from a filesystem problem.
it also means that filesystems won't get a final sync if the
dump hangs, which i see occasionally.
I was seeing this more than occasionally for a while. :-/
Post by Thor Lancelot Simon
i would prefer the old ordering, or at least a choice.
Indeed, most often for me on production systems it's most important that
the filesystems be left in the best possible state, and then if I can
get a crash dump as well that's good, but if not, well too bad, but at
least it'll hopefully be quicker and easier to get the system running
again.

If you know you're getting unusable crash dumps from an ongoing problem
then you can flip to the more dangerous option and hopefully get a clean
copy.
Post by David Holland
What I'd like is an easy way to abort a dump that's in progress, like
by hitting spacebar or something. Then there's no need to push the
reset button, and the ordering becomes less important.
Indeed, that would also be very helpful, especially when the reset
button is on the other side of the house/office/city/country/world.

However unless doing so also means continuing on with filesystem syncs
(and any other cleanups still feasible) then that's kind of a separate
discussion, no?

I'm curious though whether the new sparse dumps will occur so quickly in
common cases that one no longer has the stronger urge to interrupt them.
--
Greg A. Woods
Planix, Inc.

<***@planix.com> +1 416 218-0099 http://www.planix.com/
der Mouse
2009-06-30 16:40:13 UTC
Permalink
[...] the new sparse dumps [...]
While we're wishing about kernel coredumps - I think having sparse
dumps available is a good thing. I think inflicting them on every
kernel coredump, without even an option to dump everything, is not.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

Loading...