This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
- From: Russ Allbery <eagle at eyrie dot org>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Rich Felker <dalias at aerifal dot cx>, Siddhesh Poyarekar <siddhesh at redhat dot com>, <libc-alpha at sourceware dot org>
- Date: Wed, 26 Feb 2014 17:42:46 -0800
- Subject: Re: [PATCH 4/4] Mark nscd service as forking in systemd service file (#16639)
- Authentication-results: sourceware.org; auth=none
- References: <20140226172242 dot GE6419 at spoyarek dot pnq dot redhat dot com> <20140226183950 dot GK184 at brightrain dot aerifal dot cx> <20140226185509 dot GG6419 at spoyarek dot pnq dot redhat dot com> <87a9ddwkyg dot fsf at windlord dot stanford dot edu> <20140227004603 dot GN184 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1402270106190 dot 17207 at digraph dot polyomino dot org dot uk>
"Joseph S. Myers" <joseph@codesourcery.com> writes:
> And for a build in the glibc context you'd want to use dlopen to avoid
> circular dependencies (dependencies of code built with glibc on any
> other library that needs glibc to build are best avoided where
> possible), complicating things further.
Or embed the equivalent code. It's fairly straightforward and only
differs in some details from what Rich proposes (primarily to allow
notifications to be multiplexed across one listening socket and to support
the other features of the notification protocol, which are not in play
here).
I think the question here is how much effort you want to put into
detecting nscd failures and converting them into service activation
failures, and what types of failures you want to detect that way.
In general, if a daemon starts and then dies, the command to start it
often doesn't know about the failure and still returns success (often
before the daemon dies). This has been true as long as there have been
init systems, and it's an unsolvable problem in general since the daemon
could die at any point and the start command can't wait forever for
failures. You already have to handle those failures some other way,
either by alerting someone or by attempting to restart the process or
both.
The point of a notification process is to decide when the service is
sufficiently up to allow other services that depend on it to be started.
This is a complex question with no clear solution that works for everyone;
to some extent it comes down to local policy. Some people want everything
to start as fast as possible provided that no queries to
correctly-configured daemons will be lost. Other people want each service
to be fully verified to be running before any services that depend on it
have started. Yet other groups may actually want to stop downstream
services if an upstream service fails unexpectedly. Some number of those
failures won't be caught by the startup command because they happen too
late. The only thing that one can do in practice is move around where
"too late" is based on what one thinks the common case is. (This is true
regardless of init system; all notification protocols have the same basic
set of tradeoffs.)
Anyway, in practice, there are five notification methods you can use that
work with current init systems:
1. None. Treat the service as ready as soon as the process starts. With
socket activation, this satisfies the requirement that no requests to
correctly-configured services will be lost, but it means that you will
not detect runtime misconfiguration at the time of service start and
will need to catch that some other way (such as, for example, asking
systemd what services have failed).
All widely-used init systems except traditional init scripts support
this method.
2. Exit of the parent process. This requires a forking service model,
which has various drawbacks and which essentially all init systems
written after the classic shell script init system have tried to move
away from. This allows you to detect all errors that can be detected
before the fork, but requires some sort of internal IPC mechanism to
tell the parent process when to exit if you want to detect errors that
happen after the fork. This is the most common historical method, but
it's usually incorrectly implemented because getting the details right
is hard.
All widely-used init systems support this method.
3. Writing of the PID file. The service is considered started when the
PID file is created. This has various problems with stale PID files,
locking concerns when two copies of the daemon are started at the same
time with the same PID file, and so forth, but is often easier to get
right than coordinating the parent process exit.
I'm not sure any init system actually supports this. Debian's
start-stop-daemon wrapper used with traditional init does not; it still
uses exit of the parent process. systemd can read the PID file but
doesn't appear to take it into account for startup notification.
However, in theory, it would be possible, and it may be that the
traditional init libraries on platforms I'm less familiar with than
Debian do use this method.
4. sd_notify, which uses an anonymous or UNIX domain socket to communicate
to the init system when the daemon is actually ready. This is the
easiest to get right since the daemon has complete control over the
notification timing without having to do things like coordinate process
exit. However, the notification protocol is the most complex of the
four options.
Of the widely-used init systems, only systemd supports this method.
5. Raising SIGSTOP when the process is ready. This is equivalently easy
to sd_notify to get the timing right, for the same reasons, but uses
(or abuses, depending on how you feel about it) SIGSTOP for something
other than its documented purpose and requires the init system to raise
SIGCONT or the results are very confusing.
Of the widely-used init systems, only upstart supports this method.
Basically, pick your poison. They all have advantages and disadvantages.
But since it's an IPC protocol, you have to pick some method that the init
system you're targeting actually supports, or there's no point.
The suggestion in the bug was to switch nscd from type 1 to type 2
notifications. That may or may not be the right thing to do. It depends
on what errors you want to catch during startup, whether catching those
errors is worth the additional complexity of a forking service, etc. It's
hard to make a general decision for everyone.
In a systemd world, from the system administrator perspective, supporting
type 4 notifications is a clear win, since then they can use either
Type=notify or Type=simple based on their local requirements and both work
as expected. But the significant drawback from a glibc perspective is
that the daemon side of the sd_notify protocol is, while relatively
straightforward, not trivial.
--
Russ Allbery (eagle@eyrie.org) <http://www.eyrie.org/~eagle/>