summaryrefslogtreecommitdiffhomepage
path: root/src/nxt_process.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-11-18Isolation: Switch to fork(2) & unshare(2) on Linux.Andrew Clayton1-9/+247
On GitHub, @razvanphp & @hbernaciak both reported issues running the APCu PHP module under Unit. When using this module they were seeing errors like 'apcu_fetch(): Failed to acquire read lock' However when running APCu under php-fpm, everything was fine. The issue turned out to be due to our use of SYS_clone breaking the pthreads(7) API used by APCu. Even if we had been using glibc's clone(2) wrapper we would still have run into problems due to a known issue there. Essentially the problem is when using clone, glibc doesn't update the TID cache, so the child ends up having the same TID as the parent and that is used in various parts of pthreads(7) such as in the various locking primitives, so when APCu was grabbing a lock it ended up using the TID of the main unit process (rather than that of the php application processes that was grabbing the lock). So due to the above what was happening was when one of the application processes went to grab either a read or write lock, the lock was actually being attributed to the main unit process. If a process had acquired the write lock, then if a process tried to acquire a read or write lock then glibc would return EDEADLK due to detecting a deadlock situation due to thinking the process already held the write lock when in fact it didn't. It seems the right way to do this is via fork(2) and unshare(2). We already use fork(2) on other platforms. This requires a few tricks to keep the essence of the processes the same as before when using clone 1) We use the prctl(2) PR_SET_CHILD_SUBREAPER option (if its available, since Linux 3.4) to make the main unit process inherit prototype processes after a double fork(2), rather than them being reparented to 'init'. This avoids needing to ^C twice to fully exit unit when running in the foreground. It's probably also better if they maintain their parent child relationship where possible. 2) We use a double fork(2) technique on the prototype processes to ensure they themselves end up in a new PID namespace as PID 1 (when CLONE_NEWPID is being used). When using unshare(CLONE_NEWPID), the calling process is _not_ placed in the namespace (as discussed in pid_namespaces(7)). It only sets things up so that subsequent children are placed in a PID namespace. Having the prototype processes as PID 1 in the new PID namespace is probably a good thing and matches the behaviour of clone(2). Also, some isolation tests break if the prototype process is not PID 1. 3) Due to the above double fork(2) the main unit process looses track of the prototype process ID, which it needs to know. To solve this, we employ a simple pipe(2) between the main unit and prototype processes and pass the prototype grandchild PID from the parent of the second fork(2) before exiting. This needs to be done from the parent and not the grandchild, as the grandchild will see itself having a PID of 1 while the main process needs its externally visible PID. Link: <https://www.php.net/manual/en/book.apcu.php> Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=21793> Closes: <https://github.com/nginx/unit/issues/694> Reviewed-by: Alejandro Colomar <alx@nginx.com> Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2022-11-18Isolation: Rename NXT_HAVE_CLONE -> NXT_HAVE_LINUX_NS.Andrew Clayton1-5/+5
Due to the need to replace our use of clone/__NR_clone on Linux with fork(2)/unshare(2) for enabling Linux namespaces(7) to keep the pthreads(7) API working. Let's rename NXT_HAVE_CLONE to NXT_HAVE_LINUX_NS, i.e name it after the feature, not how it's implemented, then in future if we change how we do namespaces again we don't have to rename this. Reviewed-by: Alejandro Colomar <alx@nginx.com> Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2022-12-10Isolation: wired up per-application cgroup support internally.Andrew Clayton1-0/+12
This commit hooks into the cgroup infrastructure added in the previous commit to create per-application cgroups. It does this by adding each "prototype process" into its own cgroup, then each child process inherits its parents cgroup. If we fail to create a cgroup we simply fail the process. This behaviour may get enhanced in the future. This won't actually do anything yet. Subsequent commits will hook this up to the build and config systems. Reviewed-by: Alejandro Colomar <alx@nginx.com> Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2022-08-11Fixing isolated process PID manipulation.Max Romanov1-3/+31
Registering an isolated PID in the global PID hash is wrong because it can be duplicated. Isolated processes are stored only in the children list until the response for the WHOAMI message is processed and the global PID is discovered. To remove isolated siblings, a pointer to the children list is introduced in the nxt_process_init_t struct. This closes #633 issue on GitHub.
2022-07-18Removed code used when NXT_HAVE_POSIX_SPAWN is false.Alejandro Colomar1-48/+0
posix_spawn(3POSIX) was introduced by POSIX.1d (IEEE Std 1003.1d-1999), and was later consolidated in POSIX.1-2001, requiring it in all POSIX-compliant systems. It's safe to assume it's always available, more than 20 years after its standardization. Link: <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html>
2021-11-24Fixing alerts on router restart.Max Romanov1-3/+12
Splitting the process type connectivity matrix to 'keep ports' and 'send ports'; the 'keep ports' matrix is used to clean up unnecessary ports after forking a new process, and the 'send ports' matrix determines which process types expect to get created process ports. Unfortunately, the original single connectivity matrix no longer works because of an application stop delay caused by prototypes. Existing applications should not get the new router port at the moment.
2021-11-24Fixing alerts on router restart.Max Romanov1-3/+12
Splitting the process type connectivity matrix to 'keep ports' and 'send ports'; the 'keep ports' matrix is used to clean up unnecessary ports after forking a new process, and the 'send ports' matrix determines which process types expect to get created process ports. Unfortunately, the original single connectivity matrix no longer works because of an application stop delay caused by prototypes. Existing applications should not get the new router port at the moment.
2021-11-09Introducing application prototype processes.Tiago Natel de Moura1-62/+210
2021-11-09Changed nxt_process_* for reuse.Tiago Natel de Moura1-12/+148
This enables the reuse of process creation functions.
2020-10-28Preserving the app port write socket.Max Romanov1-2/+3
The socket is required for intercontextual communication in multithreaded apps.
2020-08-20Moved isolation related code to "nxt_isolation.c".Tiago Natel de Moura1-382/+0
2020-08-11Process structures refactoring in runtime and libunit.Max Romanov1-1/+0
Generic process-to-process shared memory exchange is no more required. Here, it is transformed into a router-to-application pattern. The outgoing shared memory segments collection is now the property of the application structure. The applications connect to the router only, and the process only needs to group the ports.
2020-08-11Changing router to application port exchange protocol.Max Romanov1-37/+0
The application process needs to request the port from the router instead of the latter pushing the port before sending a request to the application. This is required to simplify the communication between the router and the application and to prepare the router to use the application shared port and then the queue.
2020-08-11Libunit refactoring: port management.Max Romanov1-1/+1
- Changed the port management callbacks to notifications, which e. g. avoids the need to call the libunit function - Added context and library instance reference counts for a safer resource release - Added the router main port initialization
2020-06-23Isolation: fixed build when features aren't detected.Tiago Natel de Moura1-12/+4
2020-05-28Added "rootfs" feature.Tiago Natel de Moura1-0/+358
2020-03-09Refactor of process management.Tiago Natel de Moura1-160/+374
The process abstraction has changed to: setup(task, process) start(task, process_data) prefork(task, process, mp) The prefork() occurs in the main process right before fork. The file src/nxt_main_process.c is completely free of process specific logic. The creation of a process now supports a PROCESS_CREATED state. The The setup() function of each process can set its state to either created or ready. If created, a MSG_PROCESS_CREATED is sent to main process, where external setup can be done (required for rootfs under container). The core processes (discovery, controller and router) doesn't need external setup, then they all proceeds to their start() function straight away. In the case of applications, the load of the module happens at the process setup() time and The module's init() function has changed to be the start() of the process. The module API has changed to: setup(task, process, conf) start(task, data) As a direct benefit of the PROCESS_CREATED message, the clone(2) of processes using pid namespaces now doesn't need to create a pipe to make the child block until parent setup uid/gid mappings nor it needs to receive the child pid.
2020-04-10Resolving a racing condition while adding ports on the app's side.Max Romanov1-5/+12
An earlier attempt (ad6265786871) to resolve this condition on the router's side added a new issue: the app could get a request before acquiring a port.
2020-04-06Fixing 'find & add' racing condition in connected ports hash.Max Romanov1-14/+6
Missing error log messages added.
2019-12-06Isolation: allowed the use of credentials with unpriv userns.Tiago Natel1-6/+26
The setuid/setgid syscalls requires root capabilities but if the kernel supports unprivileged user namespace then the child process has the full set of capabilities in the new namespace, then we can allow setting "user" and "group" in such cases (this is a common security use case). Tests were added to ensure user gets meaningful error messages for uid/gid mapping misconfigurations.
2019-12-06Moved credential-related code to nxt_credential.c.Tiago Natel1-334/+1
This is required to avoid include cycles, as some nxt_clone_* functions depend on the credential structures, but nxt_process depends on clone structures.
2019-11-26Refactor of process init.Tiago Natel1-12/+12
Introduces the functions nxt_process_init_create() and nxt_process_init_creds_set().
2019-11-26Changed the group listing to run unprivileged when possible.Tiago Natel1-30/+104
Now the nxt_user_groups_get() function uses getgrouplist(3) when available (except MacOS, see below). For some platforms, getgrouplist() supports a method of probing how much groups the user has but the behavior is not consistent. The method used here consists of optimistically trying to get up to min(256, NGROUPS_MAX) groups; only if ngroups returned exceeds the original value, we do a second call. This method can block main's process if LDAP/NDIS+ is in use. MacOS has getgrouplist(3) but it's buggy. It doesn't update ngroups if the value passed is smaller than the number of groups the user has. Some projects (like Go stdlib) call getgrouplist() in a loop, increasing ngroups until it exceeds the number of groups user belongs to or fail when a limit is reached. For performance reasons, this is to be avoided and MacOS is handled in the fallback implementation. The fallback implementation is the old Unit approach. It saves main's user groups (getgroups(2)) and then calls initgroups(3) to load application's groups in main, then does a second getgroups(2) to store the gids and restore main's groups in the end. Because of initgroups(3)' call to setgroups(2), this method requires root capabilities. In the case of OSX, which has small NGROUPS_MAX by default (16), it's not possible to restore main's groups if it's large; if so, this method fallbacks again: user_cred gids aren't stored, and the worker process calls initgroups() itself and may block for some time if LDAP/NDIS+ is in use.
2019-10-29Process port refactoring.Hong Zhi Dao1-0/+11
- Introduced nxt_runtime_process_port_create(). - Moved nxt_process_use() into nxt_process.c from nxt_runtime.c. - Renamed nxt_runtime_process_remove_pid() as nxt_runtime_process_remove(). - Some public functions transformed to static. This closes #327 issue on GitHub.
2019-10-28Added clone syscall check for uid/gid mapping.Tiago Natel1-1/+1
Now it's possible to pass -DNXT_HAVE_CLONE=0 for debugging.
2019-10-22Improved error logging when uid/gid map is not properly set.Tiago Natel1-2/+30
When using "credential: true", the new namespace starts with a completely empty uid and gid ranges. Then, any setuid/setgid/setgroups calls using ids not properly mapped with uidmap and gidmap fields return EINVAL, meaning the id is not valid inside the new namespace.
2019-09-26Refactored nxt_process_create() for more explicit pipe closing.Valentin Bartenev1-40/+29
2019-09-26Fixed descriptors leak on process creation.Valentin Bartenev1-0/+12
The leak has been introduced in 325b315e48c4. This closes #322 issue in GitHub.
2019-09-19Initial applications isolation support using Linux namespaces.Tiago de Bem Natel de Moura1-62/+189
2019-03-22Ignoring EPERM error when changing application process uid/gid.Max Romanov1-16/+33
This closes #228 issue on GitHub.
2018-09-19Initializing user_cred gids and ngroups for MacOS.Max Romanov1-0/+4
2018-09-07Misspelled variable names fixed.Max Romanov1-3/+3
2018-06-18Removing Unix control socket on start failure.Igor Sysoev1-3/+1
The bug had appeared in 5cc5002a788e when process type has been converted to bitmask. This commit reverts the type back to a number. This commit is related to #131 issue on GitHub.
2018-06-18Removed unused single process type.Igor Sysoev1-12/+10
2018-03-05Reduced number of critical log levels.Valentin Bartenev1-37/+26
2018-01-24Fixed formatting in nxt_sprintf() and logging.Sergey Kandaurov1-1/+1
2017-11-20Fixing Coverity warnings.Max Romanov1-0/+6
CID 200496 CID 200494 CID 200490 CID 200489 CID 200483 CID 200482 CID 200472 CID 200465
2017-10-19Filtering process to keep connection.Max Romanov1-6/+43
- Main process should be connected to all other processes. - Controller should be connected to Router. - Router should be connected to Controller and all Workers. - Workers should be connected to Router worker thread ports only. This filtering helps to avoid unnecessary communication and various errors during massive application workers stop / restart.
2017-10-19Supporting concurrent shared memory fd receive in router.Max Romanov1-2/+2
Two different router threads may send different requests to single application worker. In this case shared memory fds from worker to router will be send over 2 different router ports. These fds will be received and processed by different threads in any order. This patch made possible to add incoming shared memory segments in arbitrary order. Additionally, array and memory pool are no longer used to store segments because of pool's single threaded nature. Custom array-like structure nxt_port_mmaps_t introduced.
2017-10-04Introducing process use counter.Max Romanov1-16/+19
This helps to decouple process removal from port memory pool cleanups.
2017-10-04Introducing use counters for port and app. Thread safe port write.Max Romanov1-7/+5
Use counter helps to simplify logic around port and application free. Port 'post' function introduced to simplify post execution of particular function to original port engine's thread. Write message queue is protected by mutex which makes port write operation thread safe.
2017-10-04Removing mem_pool from port_hash interface.Max Romanov1-13/+2
Memory pool is not used by port_hash and it was a mistake to pass it into 'add' and 'remove' functions. port_hash enrties are allocated from heap.
2017-09-15Introducing named port message handlers to avoid misprints.Max Romanov1-1/+1
2017-09-06Style fixes.Igor Sysoev1-0/+1
2017-08-31nginext has been renamed to unit.Igor Sysoev1-1/+1
2017-08-29The master process has been renamed to the main process.Igor Sysoev1-9/+9
2017-08-26Added configure option --user=USER and --group=GROUP.Igor Sysoev1-5/+25
2017-08-02Runtime processes protected with mutex.Max Romanov1-1/+1
2017-07-18Work queue thread assertions. Reset thread after fork.Max Romanov1-0/+2
2017-07-18Mem pool cleanup introduced.Max Romanov1-1/+21
Used for connection mem pool cleanup, which can be used by buffers. Used for port mem pool to safely destroy linked process.