Making global variables of if.c MPSAFE

Discussion:

Ryota Ozaki

2014-07-27 05:38:37 UTC

Hi,

This is another work toward MPSAFE networking.
sys/net/if.c contains several global variables
(e.g., ifnet_list) that should be protected
from parallel accesses somehow once we get rid
of the big kernel lock.

Currently there is a mutex for serializing
accesses to index_gen, however, that's not
enough; we have to protect the other variables
too.

The global variables are read-mostly, so I
replace the mutex with a rwlock and use it
for all. Unfortunately, ifnet_list may be
accessed from interrupt context (only read
though) so that I add a spin mutex for it;
we hold the mutex when we modify ifnet_list
as well as the rwlock.

http://www.netbsd.org/~ozaki-r/ifnet-mutex.diff
https://github.com/ozaki-r/netbsd-src/tree/ifnet-mutex

Here is the patch. It doesn't remove any existing
locks and splnets yet.

Any comments and suggestions are welcome.

Thanks,
ozaki-r

Dennis Ferguson

2014-07-29 18:33:03 UTC

Permalink

Post by Ryota Ozaki
The global variables are read-mostly, so I
replace the mutex with a rwlock and use it
for all. Unfortunately, ifnet_list may be
accessed from interrupt context (only read
though) so that I add a spin mutex for it;
we hold the mutex when we modify ifnet_list
as well as the rwlock.

I don't think this is a good way to do this. I think
it would be better to either eliminate the need to
access that list from the packet processing path or
replace the structure with one which can be read while
it is being modified so that readers don't need to take
a lock on every access to prevent something (a write)
which is hardly ever going to happen anyway. For the
latter it looks to me that most (all?) readers treat
the thing as a singly linked list which is about
the most straight forward structure to arrange to
support lockless readers. A rwlock might be justifiable
for a structure which needs to be frequently written
as well as read (though it would be even better not
to have the packet processing path access structures
like that), but for a read-mostly structure it is way
better to eliminate all unnecessary work from the readers
even at the cost of greater complexity for the writer.

There's a bigger issue here, however. Code which thinks
it is reasonable to scan the entire list of interfaces
looking for something is making the assumption that this
list is quite short. That code will fail if this isn't
true. While it is often the case that the number of
interfaces is small sometimes it isn't; this is particularly
true since, at the protocol level, an "interface" is
a software construct that doesn't map one-to-one to hardware
so you don't need to be running on a monster piece of
hardware (though I've seen those too) to have a lot of
"interfaces". I've seen boxes with a small number of
10 Gbps ethernet ports with >100,000 interfaces
configured (think a PPPoE concentrator), and even a
single ethernet with a bunch of VLANs can get the "interface"
count up to where it really doesn't work to do linear
searches. I don't think there's a good reason to keep
code which fails on machines like that so I think at
some point it is going to be necessary to rethink what
interface structures do, and what the packet processing
path needs them to do, so it is no longer necessary to
have code like that.

Dennis Ferguson

Hisashi T Fujinaka

2014-07-29 21:40:23 UTC

Permalink

The global variables are read-mostly, so I replace the mutex with a
rwlock and use it for all. Unfortunately, ifnet_list may be accessed
from interrupt context (only read though) so that I add a spin mutex
for it; we hold the mutex when we modify ifnet_list as well as the
rwlock.

This breaks BEAGLEBOARDXM, btw.

--
Hisashi T Fujinaka - ***@twofifty.com
BSEE(6/86) + BSChem(3/95) + BAEnglish(8/95) + MSCS(8/03) + $2.50 = latte

Ryota Ozaki

2014-07-30 02:52:34 UTC

Permalink

Post by Hisashi T Fujinaka

This breaks BEAGLEBOARDXM, btw.

I'm sorry for the breakage. (Though the criminal is another patch of mine.)

Here is a patch: http://www.netbsd.org/~ozaki-r/ethersubr_bigpktpps_fix.diff
I'll commit it if it's appropriate.

Thanks for your report,
ozaki-r

Post by Hisashi T Fujinaka
--
BSEE(6/86) + BSChem(3/95) + BAEnglish(8/95) + MSCS(8/03) + $2.50 = latte

Hisashi T Fujinaka

2014-07-30 14:49:03 UTC

Permalink

Post by Ryota Ozaki

Post by Hisashi T Fujinaka

This breaks BEAGLEBOARDXM, btw.

I'm sorry for the breakage. (Though the criminal is another patch of mine.)
Here is a patch: http://www.netbsd.org/~ozaki-r/ethersubr_bigpktpps_fix.diff
I'll commit it if it's appropriate.
Thanks for your report,

It fixed my build, thanks.

--
Hisashi T Fujinaka - ***@twofifty.com
BSEE(6/86) + BSChem(3/95) + BAEnglish(8/95) + MSCS(8/03) + $2.50 = latte

Ryota Ozaki

2014-07-30 02:47:39 UTC

Permalink

On Wed, Jul 30, 2014 at 3:33 AM, Dennis Ferguson

Post by Dennis Ferguson

We may be able to use pserialize(9) instead of rwlock(9)
for lockless read accesses. Can we simply replace rwlock
with pserialize?

OTOH, in either case, we need a spin lock for interrupt
context because neither pserialize nor rwlock can be
used in interrupt context, IIUC.

Post by Dennis Ferguson
There's a bigger issue here, however. Code which thinks
it is reasonable to scan the entire list of interfaces
looking for something is making the assumption that this
list is quite short. That code will fail if this isn't
true. While it is often the case that the number of
interfaces is small sometimes it isn't; this is particularly
true since, at the protocol level, an "interface" is
a software construct that doesn't map one-to-one to hardware
so you don't need to be running on a monster piece of
hardware (though I've seen those too) to have a lot of
"interfaces". I've seen boxes with a small number of
10 Gbps ethernet ports with >100,000 interfaces
configured (think a PPPoE concentrator), and even a
single ethernet with a bunch of VLANs can get the "interface"
count up to where it really doesn't work to do linear
searches. I don't think there's a good reason to keep
code which fails on machines like that so I think at
some point it is going to be necessary to rethink what
interface structures do, and what the packet processing
path needs them to do, so it is no longer necessary to
have code like that.

Yes, we have to think of scalability and optimize the code
at some point. Nevertheless, most codes using IFNET_FOREACH
in question are not called from packet processing that
is performance sensitive. So we can optimize them after
we encounter a performance problem.

ozaki-r

Post by Dennis Ferguson
Dennis Ferguson

Ryota Ozaki

2014-08-01 01:35:17 UTC

Permalink

Post by Ryota Ozaki
On Wed, Jul 30, 2014 at 3:33 AM, Dennis Ferguson

Post by Dennis Ferguson

We may be able to use pserialize(9) instead of rwlock(9)
for lockless read accesses. Can we simply replace rwlock
with pserialize?

Self-answer: no

pserialize(9) requires that we don't block (sleep) in
a critical section while rwlock(9) allows to do so.
And there is another undocumented constraint of pserialize,
IIUC. It requires lockless operations on data shared
between readers and a writer. (Correct me if I'm wrong.)

In the case of ifnet_list, both requirements are not
satisfied as it is. Adapting ifnet_list is not so easy
task, and so I think using rwlock(9) is reasonable
at this point.

...or rdlock(9) may be better?
http://thread.gmane.org/gmane.os.netbsd.devel.kernel/46314/focus=46339

ozaki-r

Post by Ryota Ozaki
OTOH, in either case, we need a spin lock for interrupt
context because neither pserialize nor rwlock can be
used in interrupt context, IIUC.

Post by Dennis Ferguson
Dennis Ferguson

Dennis Ferguson

2014-08-01 05:24:56 UTC

Permalink

Post by Ryota Ozaki

Post by Ryota Ozaki
We may be able to use pserialize(9) instead of rwlock(9)
for lockless read accesses. Can we simply replace rwlock
with pserialize?

Self-answer: no
pserialize(9) requires that we don't block (sleep) in
a critical section while rwlock(9) allows to do so.
And there is another undocumented constraint of pserialize,
IIUC. It requires lockless operations on data shared
between readers and a writer. (Correct me if I'm wrong.)
In the case of ifnet_list, both requirements are not
satisfied as it is. Adapting ifnet_list is not so easy
task, and so I think using rwlock(9) is reasonable
at this point.

?? I'm confused. Which data shared between readers and writers
are you trying to protect, and were is it stored? ifnet_list is
the head of a linked list. About the only thing a reader can do
with a linked list is walk it, and about the only thing a writer
can do to a list is add something to it or delete something from
it. I can't think of anything that would block any of those
operations. How would a sleep happen?

Or is it the case that what you are trying to do is keep anyone
from modifying anything in a struct ifnet while someone else
is reading it? If this is the problem you are trying to address
then I guess you would have to protect all struct ifnet readers,
not just those who found the ifnet structure from via the ifnet_list.
Readers mostly find ifnet structures via the pointers to them
stored in packet mbufs and in routes.

I thought I understood what it is you wanted to accomplish, but
I think I was mistaken.

Dennis Ferguson

Ryota Ozaki

2014-08-01 06:23:01 UTC

Permalink

On Fri, Aug 1, 2014 at 2:24 PM, Dennis Ferguson

Post by Dennis Ferguson

Post by Ryota Ozaki

Post by Ryota Ozaki
We may be able to use pserialize(9) instead of rwlock(9)
for lockless read accesses. Can we simply replace rwlock
with pserialize?

Self-answer: no
pserialize(9) requires that we don't block (sleep) in
a critical section while rwlock(9) allows to do so.
And there is another undocumented constraint of pserialize,
IIUC. It requires lockless operations on data shared
between readers and a writer. (Correct me if I'm wrong.)
In the case of ifnet_list, both requirements are not
satisfied as it is. Adapting ifnet_list is not so easy
task, and so I think using rwlock(9) is reasonable
at this point.

?? I'm confused. Which data shared between readers and writers
are you trying to protect, and were is it stored? ifnet_list is
the head of a linked list. About the only thing a reader can do
with a linked list is walk it, and about the only thing a writer
can do to a list is add something to it or delete something from
it. I can't think of anything that would block any of those
operations. How would a sleep happen?
Or is it the case that what you are trying to do is keep anyone
from modifying anything in a struct ifnet while someone else
is reading it? If this is the problem you are trying to address
then I guess you would have to protect all struct ifnet readers,
not just those who found the ifnet structure from via the ifnet_list.
Readers mostly find ifnet structures via the pointers to them
stored in packet mbufs and in routes.
I thought I understood what it is you wanted to accomplish, but
I think I was mistaken.

I'm sorry for confusing you. That's just an explanation why
rwlock(9) cannot be simply replaced with pserialize(9).
I don't think using rwlock protects all ifnet data; it can
protect only ifnet_list itself.

ozaki-r

Post by Dennis Ferguson
Dennis Ferguson

Mindaugas Rasiukevicius

2014-08-03 23:21:52 UTC

Permalink

Post by Ryota Ozaki
Hi,
This is another work toward MPSAFE networking.
sys/net/if.c contains several global variables
(e.g., ifnet_list) that should be protected
from parallel accesses somehow once we get rid
of the big kernel lock.
Currently there is a mutex for serializing
accesses to index_gen, however, that's not
enough; we have to protect the other variables
too.
The global variables are read-mostly, so I
replace the mutex with a rwlock and use it
for all. Unfortunately, ifnet_list may be
accessed from interrupt context (only read
though) so that I add a spin mutex for it;
we hold the mutex when we modify ifnet_list
as well as the rwlock.
<...>

I generally agree with Dennis that is not the way we want to take in
the long-term. The cost of read-write lock is very high. The plan
is to use passive serialisation to protect the interfaces and their
addresses. Also, the ultimate goal would also be to use a better
data structure (linked lists are not really efficient) and change the
way interfaces are referenced i.e. instead of referencing ifnet_t,
the network stack should use a unique ID. Note that the code paths
looking up the interface or its address(es) should not block (if they
do, the code can be rearranged). Also, in the long run, ifnet list
should not be accessed from the hard interrupt context -- all users
ought to be running in the softintr(9) context.

We may need to take an intermediate solution, but I think we can
already switch to pserialize(9) + reference counting on ifnet_t for
the ip_input/ip_output() paths. I need to resume my work on the
routing subsystem patch-up, though.

--
Mindaugas

Ryota Ozaki

2014-08-04 07:59:48 UTC

Permalink

On Mon, Aug 4, 2014 at 8:21 AM, Mindaugas Rasiukevicius

Post by Mindaugas Rasiukevicius

I have no objection to the direction. My concern is an intermediate
solution.

Post by Mindaugas Rasiukevicius
Note that the code paths
looking up the interface or its address(es) should not block (if they
do, the code can be rearranged).

Some codes under sys/compat can be blocked during the iterations,
for example linux_getifconf at [1] that may block due to copyout.

[1] http://nxr.netbsd.org/xref/src/sys/compat/linux/common/linux_socket.c#1134

Post by Mindaugas Rasiukevicius
Also, in the long run, ifnet list
should not be accessed from the hard interrupt context -- all users
ought to be running in the softintr(9) context.

The ifnet list is accessed in m_reclaim that may be called from
hardware interrupt context via say MCL_GET.

Post by Mindaugas Rasiukevicius
We may need to take an intermediate solution, but I think we can
already switch to pserialize(9) + reference counting on ifnet_t for
the ip_input/ip_output() paths. I need to resume my work on the
routing subsystem patch-up, though.

I think we need to get rid of blockable operations mentioned the above.

Thanks,
ozaki-r

Post by Mindaugas Rasiukevicius
--
Mindaugas

Mindaugas Rasiukevicius

2014-08-25 19:28:54 UTC

Permalink

Post by Ryota Ozaki

Post by Mindaugas Rasiukevicius
I generally agree with Dennis that is not the way we want to take in
the long-term. The cost of read-write lock is very high. The plan
is to use passive serialisation to protect the interfaces and their
addresses. Also, the ultimate goal would also be to use a better
data structure (linked lists are not really efficient) and change the
way interfaces are referenced i.e. instead of referencing ifnet_t,
the network stack should use a unique ID.

I have no objection to the direction. My concern is an intermediate
solution.

Right, but I think we can still avoid read-write lock for the intermediate
solution. ID based interface referencing requires a greater revamp.

Post by Ryota Ozaki

Post by Mindaugas Rasiukevicius
Note that the code paths
looking up the interface or its address(es) should not block (if they
do, the code can be rearranged).

Some codes under sys/compat can be blocked during the iterations,
for example linux_getifconf at [1] that may block due to copyout.
[1]
http://nxr.netbsd.org/xref/src/sys/compat/linux/common/linux_socket.c#1134

Yes, but it is not performance sensitive path. You can acquire the lock
used on the pserialize(9) writer side. Basically, we need to optimise the
IP input/output paths; the rest can be heavy-locked for now.

Post by Ryota Ozaki

Post by Mindaugas Rasiukevicius
Also, in the long run, ifnet list
should not be accessed from the hard interrupt context -- all users
ought to be running in the softintr(9) context.

The ifnet list is accessed in m_reclaim that may be called from
hardware interrupt context via say MCL_GET.

Yes, but my point was that we should eliminate the access of ifnet list,
as well as other network stack structures, from the hard interrupt. This
means converting the existing code to use softintr(9). In the long term,
the drivers should always trigger softintr(9) and L2 never be called from
the hard interrupt context. As for m_reclaim(9), we might workaround it
with a test on cpu_intr_p() for now.

Post by Ryota Ozaki

I think we need to get rid of blockable operations mentioned the above.

Or just treat them as writers.

--
Mindaugas