diff options
author | Andrew Clayton <a.clayton@nginx.com> | 2022-11-18 23:53:30 +0000 |
---|---|---|
committer | Andrew Clayton <a.clayton@nginx.com> | 2023-02-17 21:24:18 +0000 |
commit | b0e2d9d0a185e4e2ff4bb87e399ad89119f76d1a (patch) | |
tree | 0031787d8821b924ba86107d15f4715616c88003 /src/nodejs/unit-http/nxt_napi.h | |
parent | d98a1b0dd7c5a4105c44fa1696d4f01b9f3e0db0 (diff) | |
download | unit-b0e2d9d0a185e4e2ff4bb87e399ad89119f76d1a.tar.gz unit-b0e2d9d0a185e4e2ff4bb87e399ad89119f76d1a.tar.bz2 |
Isolation: Switch to fork(2) & unshare(2) on Linux.
On GitHub, @razvanphp & @hbernaciak both reported issues running the
APCu PHP module under Unit.
When using this module they were seeing errors like
'apcu_fetch(): Failed to acquire read lock'
However when running APCu under php-fpm, everything was fine.
The issue turned out to be due to our use of SYS_clone breaking the
pthreads(7) API used by APCu. Even if we had been using glibc's
clone(2) wrapper we would still have run into problems due to a known
issue there.
Essentially the problem is when using clone, glibc doesn't update the
TID cache, so the child ends up having the same TID as the parent and
that is used in various parts of pthreads(7) such as in the various
locking primitives, so when APCu was grabbing a lock it ended up using
the TID of the main unit process (rather than that of the php
application processes that was grabbing the lock).
So due to the above what was happening was when one of the application
processes went to grab either a read or write lock, the lock was
actually being attributed to the main unit process. If a process had
acquired the write lock, then if a process tried to acquire a read or
write lock then glibc would return EDEADLK due to detecting a deadlock
situation due to thinking the process already held the write lock when
in fact it didn't.
It seems the right way to do this is via fork(2) and unshare(2). We
already use fork(2) on other platforms.
This requires a few tricks to keep the essence of the processes the same
as before when using clone
1) We use the prctl(2) PR_SET_CHILD_SUBREAPER option (if its
available, since Linux 3.4) to make the main unit process inherit
prototype processes after a double fork(2), rather than them being
reparented to 'init'.
This avoids needing to ^C twice to fully exit unit when running in
the foreground. It's probably also better if they maintain their
parent child relationship where possible.
2) We use a double fork(2) technique on the prototype processes to
ensure they themselves end up in a new PID namespace as PID 1 (when
CLONE_NEWPID is being used).
When using unshare(CLONE_NEWPID), the calling process is _not_
placed in the namespace (as discussed in pid_namespaces(7)). It
only sets things up so that subsequent children are placed in a PID
namespace.
Having the prototype processes as PID 1 in the new PID namespace is
probably a good thing and matches the behaviour of clone(2). Also,
some isolation tests break if the prototype process is not PID 1.
3) Due to the above double fork(2) the main unit process looses track
of the prototype process ID, which it needs to know.
To solve this, we employ a simple pipe(2) between the main unit and
prototype processes and pass the prototype grandchild PID from the
parent of the second fork(2) before exiting. This needs to be done
from the parent and not the grandchild, as the grandchild will see
itself having a PID of 1 while the main process needs its
externally visible PID.
Link: <https://www.php.net/manual/en/book.apcu.php>
Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=21793>
Closes: <https://github.com/nginx/unit/issues/694>
Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
Diffstat (limited to 'src/nodejs/unit-http/nxt_napi.h')
0 files changed, 0 insertions, 0 deletions