Что является атрибутами процесса linux

Процессы в Linux

Обычно программой называют совокупность файлов, будь то набор исходных текстов, объектных фалов или собственно исполняемый файл. Для того чтобы программа могла быть запущена на выполнение, операционная система сначала должна создать окружение или среду выполнения задачи, куда относятся ресурсы памяти, возможность доступа к устройствам ввода/вывода и различным системным ресурсам, включая услуги ядра.

Процесс является тлько «мёртвой» статической оболочкой, хранящей учётные данные и обеспечивающей окружение динамического исполнения потока, даже если это единственный (главный) исполняемый поток приложения (процесса), как это принято в терминологии, не имеющей отношения к потоковым понятиям.
Любые взаимодействия, синхронизация, диспетчирезация и другие механизмы имеют смысл только применительно к потокам, даже если это потоки, локализованные в рамках различных процессов. Здесь возникает термин IPC — средство взаимодействия процессов . Для однопотоковых приложений этот терминологический нюанс не вносит ровно никакого различия, но при переходе к многопотоковым приложениям мы должны рассуждать в терминах именно взаимодействующих потоков, локализованных внутри процессов (одного или различных)
В системах с аппаратной трансляцией адресов памяти (MMU — Memory Management Unit) процесс создаёт для своих потоков дополнительные «границы существования» — защищённое адресное пространство. Большинство сложностей, описываемых в литературе в связи с использованием IPC, обусловлено необходимостью взаимодействующих потоков преодолевать адресные барьеры, устанавливаемые процессами для каждого из них.

1.1 Атрибуты процессов

Часто в качестве атрибутов процесса называют и приоритет выполнения. Однако приоритет является атрибутом не процесса, а потока, но если поток единственный (главный, порождённый функцией main()), его приоритет и есть то, что понимается под «приоритетом процесса»

Lightweight Process in Linux

Process that may share some resources with others. To implement multithreaded program, one can associate a lightweight process with each thread so that they can have shared resources due to the properties of lightweight process

Process Descriptor

A task_struct(task_t) type data structure whose fields contain all the information related to a single process. For example, it tells you the state of the process, what is its parent process and thread information associated with this process.

Relationship between process and process descriptor

Process and process descriptor has an one-to-one relationship.

Where are process descriptor stored and how they are referenced

Process descriptors are stored in kernel mode dynamics memory and can be referred through process descriptor pointers (Kernel Mode Stack vs User Mode Stack)
A small structure named thread_info is stored at the beginning of the kernel stack contains a pointer points to the address of the process file descriptor
All threads of a multithreaded application must have the same PID

How to identify the current process

In kernel mode, since thread_info is stored at the beginning of the stack, we can directly use mask to mask out the 13 least significant bits of the stack pointer (if thread_union structure is 8KB, otherwise if it’s 4KB, mask out 12 bits)

Process List

Process list is a list that links together all existing process descriptors. It is implemented as a doubly linked list

Each task_struct includes a tasks field of type list_head, which has prev and next pointing to the previous and next task_structure objects. When looking for a process, kernel loop through the linked list of process.

A process has priority from 0–139, if we simply insert a process into the correct position of the process list, it would be very difficult to maintain the list since the kernel has to scan the whole list to find the correct position to insert the process.

To solve this problem, we create a run_list, which is a list of list_head to keep track of different priority process list. When adding a process descriptor with priority k, we just add it to the tail of the process list with priority k.

Process Relationship

real_parent: points to the process descriptor of the process that created P or to the descriptor of process 1(init) if the parent process no longer exists

parent: points to the current parent of P, most of the time, real_parent is parent

children: the head of the list containing all children created by P

sibling: a doubly linked list of the sibling process

A process can also be a leader of a process group or a login session. A leader can trace the execution of other processes. To keep track of those information, hash tables are needed.

How Process Are Organized

Runqueue: a list of processes that are in TASK_RUNNING state

Wait queues: lists of processes that are either in TASK_INTERRUPTIPLE or TASK_UNINTERRUPTIBLE. A wait queue represents a set of sleeping processes which are woken up by the kernel when some condition becomes true

TASK_INTERRUPTIBLE vs TASK_UNINTERRUPTIBLE

TAKS_INTERRUPTIBLE: The process is sleeping until some condition becomes true. It raises hardware interrupt

TASK_UNINTERRUPTIBLE: The process that must wait until a given event occurs without being interrupted. It can’t be triggered by signals

Wait queues are also implemented as doubly linked lists.

The head of the wait queue is wait_queue_head. The head of the wait queue links to elements called wait_queue_t, whose struct looks like this:

func is linking to a wake function

From here, we can see that task_list contains the pointers that link this element to the list of processes waiting for the same event

Concurrency problem of wait queue introduces when the list of sleeping processes in task_list got woken up and they all try to take the same resources. To resolve this issue, we involve the concepts of exclusive processes and nonexclusive processes.

Exclusive processes are woken up by the kernel selectively while nonexclusive processes are woken up by the kernel when the event occurs. In the above example, those processes will be exclusive processes

How to wake up processes

Processes wait for a specific condition

The function set the state of the process to UNINTERRUPTIBLE, then insert the current process into wait queue. Then it invokes the scheduler to continue the execution of another process, It will keep executing sleep_on() when it is awakened.

When the process is awake, process state will be changed to TASK_RUNNING and remove the wait queue item from the list

This wait function wait for a certain condition is met, so that it won’t keep taking up the CPU resources.

To wake up a process, the kernel looks for the process in the wait queues and puts it to RUNNING state.

Process Switch

An activity that suspends and resumes the execution of a process is called process switch. This is the way kernel controls the execution of process.

This section is a bit more complicated, so I am going to use my own synthesis instead of directly getting notes from the book.

Hardware context

The set of data that must be loaded into the register before the process resumes its execution on the CPU. Hardware context switch is using far jmp instruction to select the Task State Segment(TSS) of the next process. The drawback of far jmp instruction is that it is an unconditional jump, which would not allow checking for security risks.

Process switching only occurs in Kernel Mode. When performing process switch, several main steps are involved:

save the content of the current process’s registers to Kernel Mode stack
change the stack pointer to next process’s stack
push the instruction of the current process into the stack of the next process
load contents of the next process’s register
load instruction of the next process into the instruction pointer

Task State Segment

TSS is used to store hardware contexts. Linux uses a single TSS for each processor instead of one for each process.

Kernel Mode stack addresses are stored in TSS

I/O permission bitmap is stored in TSS to verify whether the process is allowed to address port.

Performing the Process Switch

Process switch consists of two steps:

Switching the Page Global Directory to install a new address space
Switching the Kernel Mode stack and hardware context (performed by switch_to macro).

The switch_to Macro

The switch_to macro consists of three parameters: prev, next and last

prev: memory location containing the descriptor of the process that needs to be replaced

next: memory location containing the descriptor of the new process

last: an output parameter that specifies a memory location in which the macro writes the descriptor address of a process that has just been replaced

Note: I had a very difficult time understanding what is that last does, and it turns out, if I understand correctly, that last is just a memory location that contains the address of the process descriptor of the last process.

Actual steps to do the process switch:

Save prev and next process address into %eax and %edx
Save the contents of the eflags and %ebp registers in the prev Kernel Mode stack (eflag contains the status of the process, ebp is the address of the stack frame)
Saves the %esp to prev Kernel Mode stack (prev -> thread.esp)
Loads the esp from the next process Kernel Mode stack (next -> thread.esp)
Saves the address of labeled 1 for prev into prev Kernel Mode stack (prev -> thread.eip)
Push the address of labeled 1 for next into next Kernel Mode stack (next -> thread.eip)
Jump to __switch_to(), after __switch_to(), the CPU is executing the next process
When scheduler needs to selects back to the prev process as the new process, it will pop the %ebp and eflag out of the prev Kernel Mode stack
Copy the address of the previous process from %eax to last, which is the address of the process that has just been replaced

The __switch_to() function

Optionally saves some FPU, MMX, XMM registers (these are mathematical coprocessor’s registers)
Get the index of local CPU
Loads next_p -> thread.esp0 to the local CPU
Loads in the Global Descriptor Table used by next process (GDT is an array of table that specify and define the system wide resources)
Stored the content of fs and gs segmentation registers into prev_p->thread.fs and gs
Load next_p->thread.fs and gs into the fs and gs registers
Load debug registers
Update the I/O port permission bitmap in the TSS if necessary.
Terminates the __switch_to() function and return the address of the previous process

Creating Processes

Mechanism for creating processes

The Copy On Write allows both the parent and the child to read the same physical page and only write to the new physical page when a process needs to write on a physical page

Lightweight processes allow parent and child to share many data structures such as paging tables

vfor() system call creates a process that shares the memory address space of its parent

Difference between clone(), fork() and vfork()

The clone() function is a wrapper function of system call clone() function. The system call clone() only creates a new process and does not execute any functions while the wrapper function clone() accepts arguments and executes the passing in function after the process is created. The clone() is enough to create both process and thread, we just need to pass in different arguments to the function

The fork() is implemented by the clone() system call but with specific SIGCHILD flag and all the clone flags are cleared. It directly copy everything from parent process and set parent pid.

The vfork() is basically the same as fork() but with CLONE_VM (share memory descriptor and all page tables) and CLONE_VFORK flags set and child stack parameter is the same as parent stack pointer

Kernel Threads

Kernel threads run only in Kernel Mode. Functions that only run in Kernel Mode will be delegated to kernel threads

Process 0

The ancestor of all processes, called process 0, is a kernel thread created during the Linux initialization phase. Scheduler selects Process 0 when no other processes are in TASK_RUNNING state.

Process 0 exists for each CPU, when power is on, BIOS will start a single CPU and Process 0 will initialize the kernel data structures, then enable other CPUs

Process 1

Kernel thread created by Process 0. It executes the init() function, which invokes execve() to load the executable program init, it creates and monitors the activity of all processes

Destroying Processes

Process Termination

exit_group(): terminates the whole multithreaded application. This is the function call invoked by exit().

_exit(): terminates a single process. This is the function calll invoked by pthead_exit()

do_gorup_exit(): It kills other processes in the thread group and invokes do_exit() to kill the process. This function is used by exit_group()

do_exit(): removes most references to the terminating process from kernel data structures. It is used by _exit().

Process Removal

Unix kernels are not allowed to discard data included in a process descriptor field right after the process terminates. Although the process is technically dead (EXIT_ZOMBIE), its descriptor must be saved until parent process is notified. Memory reclaiming is done by the scheduler

4. Атрибуты процессов в Unix.

Интерактивные процессы монопольно владеют терминалом, и, пока такой процесс не завершит выполнение, пользователь не может работать с другими приложениями. (Кроме случаев, когда есть режим запуска других процессов из этого интерактивного процесса.)

Атрибуты позволяют ОС эффективно управлять работой процесса.

Просмотр атрибутов процесса: ps -ef.

Идентификатор процесса

Идентификатор процесса — Process ID (PID) — каждый процесс имеет уникальный идентификатор, позволяющий ядру системы различать процессы. При создании нового процесса, ядро присваивает ему следующий свободный идентификатор. Присвоение PID — по возрастающей, то есть PID нового процесса больше, чем PID процесса, созданного перед ним. Если PID достиг максимального значения, следующий процесс получит минимальный свободный и цикл повторяется. Когда процесс завершает работу — ядро освобождает занятый им PID.

Родительский процесс

Идентификатор родительского процеcса -Parent Process ID (PPID) — PID процесса, породившего данный.

Приоритет процесса

Приоритет процесса (nice number) — относительный приоритет процесса, учитываемый планировщиком при определении очередности запуска. Чем меньше число, тем больше приоритет (nice — приятный, то есть чем более “приятный”процесс, тем меньше он загружает CPU). Фактическое распределение ресурсов — приоритет выполнения: динамически изменяется ядром во время выполнения. Относительный — постоянен, но может изменяться администратором или пользователем с помощью nice.

Терминальная линия

Терминальная линия (TTY) — терминал или псевдотерминал, ассоциированный с процессом. Примечание. Демоны не имеют ассоциированного терминала.

Идентификаторы пользователей

Реальный (RID) и эффективный (EUID) идентификаторы пользователя. RID — идентификатор пользователя, запустившего этот процесс. EUID служит для определения прав доступа процессак системным ресурсам (в первую очередь к файловой системе.) Обычно RID=EUID, то есть процесс имеет те же права, что и пользователь, запустивший его. RID!=EUID, когда на программе установлен бит SUID. Тогда EUID=UID, то есть процесс получает те же права, что и у владельца исполняемого файла (например, администратор).

Идентификаторы групп

Реальный (RGID) и эффективный (EGID) идентификаторы группы. RGID=GID первичной группы пользователя, запустившего процесс. EGID служит для определения прав доступа пользователя по классу доступа группы. По умол- чанию RGID=EGID, кроме SGID, установленного на команду, тогда EGID=GID группы-владельца команды.

35. Процессы в системах Unix (Linux). Атрибуты процесса. Жизненный цикл процесса.

Дескриптор процесса содержит такую информацию о процессе, которая необходима ядру в течение всего жизненного цикла процесса независимо от того, находится он в активном или пассивном состоянии, находится образ процесса в оперативной памяти или выгружен на диск.

Контекст процесса содержит менее оперативную, но более объемную часть информации о процессе, необходимую для возобновления выполнения процесса с прерванного места: содержимое регистров процессора, коды ошибок выполняемых процессором системных вызовов, информация обо всех открытых данным процессом файлах и незавершенных операциях ввода-вывода и другие данные, характеризующие состояние вычислительной среды в момент прерывания.

Информация дескриптора

слово состояния процесса;

величина кванта времени, выделенного системным планировщиком;

степень использования системным процессором;

идентификатор пользователя, которому принадлежит процесс;

эффективный идентификатор пользователя (от имени кого запущен процесс);

реальный и эффективный идентификаторы группы;

идентификатор процесса и идентификатор родительского процесса;

размер образа, размещаемого в области подкачки;

размер сегментов кода и данных.

Атрибуты процесса

PID — уникальный идентификационный номер — подобно номеру карточки социального страхования, фактическое значение PID большой роли не играет.

РРID — идентификатор своего родительского процесса – в UNIX новый процесс создается путем клонирования одного из уже существующих процессов, после чего текст клона заменяется текстом программы, которую должен выполнять процесс.

UID – это идентификационный номер пользователя, создавшего данный процесс.

EUID – это «эффективный» UID процесса. ЕUID используется для того, чтобы определить, к каким ресурсам и файлам у процесса есть право доступа. У большинства процессов UID и ЕUID будут одинаковыми. Исключение составляют программы, у которых установлен бит смены идентификатора пользователя.

GID – это идентификационный номер группы данного процесса.

EGID -это «эффективный» GID процесса — связан с GID так же, как ЕUID с UID.

Приоритет — от приоритета процесса зависит, какую часть времени центрального процессора он получит. Выбирая процесс для выполнения, ядро находит процесс с самым высоким «внутренним приоритетом».

Значение nice — это число показывает степень «уступчивости» процесса.

Жизненный цикл процесса

Для создания нового процесса существующий процесс копирует самого себя с помощью системного вызова fork. Вызов fork создает копию исходного процесса, идентичную родителю, но имеющую следующие отличия:

у нового процесса свой PID;

PPID нового процесса равен PID родителя;

учетная информация нового процесса обнулена;

у нового процесса имеется свой собственный экземпляр дескрипторов файлов.

Процессы, выполняющие разные программы, образуются благодаря применению имеющихся в стандартной библиотеке Unix функций «семейства exec»: execl, execlp, execle, execv, execve, execvp. Эти функции отличаются форматом вызова, но в конечном итоге делают одну и ту же вещь: замещают внутри текущего процесса исполняемый код на код, содержащийся в указанном файле.

Когда процесс завершается, он вызывает подпрограмму _exit, чтобы уведомить ядро о своей готовности «умереть». В качестве параметра подпрограмме _exit передается код завершения – целое число, указывающее на причину завершения процесса. По соглашению нулевой код завершения означает, что процесс был «успешным».

Код завершения необходим родительскому процессу, поэтому ядро должно хранить его, пока родительский процесс не запросит его системным вызовом wait. Дело в том, что когда процесс завершается, его адресное пространство освобождается, время центрального процессора этому процессу не выделяется, однако в таблице процессов ядра запись о нем сохраняется. Процесс в этом состоянии называют «зомби».

Сигналы (необходимость использования)

прекращение выполнения программы в результате ошибочного действия, неправильного исходного параметра или если программа больше не нужна;

временная остановка процесса;

получение информации для системной программы, например, об изменении важного файла;

пересылка сообщения между двумя процессами, один из которых сообщает второму, что он закончил выполнение какого-то определенного задания.

Реакция на сигнал

игнорировать сигнал (Все работает так, как будто ничего не случилось);

принять сигнал (Процесс запускает программу обработки сигнала, которая, является причиной всех последующих действий. Нормальный ход работы не прерывается);

интерпретировать сигнал согласно процедурам, установленным по умолчанию (Чаще всего процесс должен прекратить работу. В большинстве случаев это означает немедленное его завершение, но сигналы STOP, TSTP, ТТIN, ТТОU только задерживают его работу, которая может быть продолжена сигналом СОNT).

Пример сигналов: HUP-Hangup. Этот сигнал получает программа, которая запускается сразу же при входе пользователя в систему, если пользователь выходит из системы или прерывается связь с терминалом. Эта программа отправляет сигнал HUP всем процессам, которые находятся под ее контролем и затем сами закрываются.

Получение списка процессов: ps aux

PID(Process ID) — ID процесса, или номер процесса – внутренний идентификатор текущей программы.

TTY — управляюший терминал процесса. Выдаются только последние буквы имени файла устройства без начальной части (dev/tty).

STAT — статус (аналог состояния) процесса:

R — процесс в состоянии выполнения;

S — процесс в состоянии ожидания;

D — непрерываемое ожидание;

T — выполнение процесса остановлено сигналом;

W — процесс временно полностью выгружен из памяти;

N — процесс сниженного приоритета.

TIME — потребовавшееся на данный момент время расчета. Не учитываются периоды ожидания ввода данных и время, затребованное планировщиком для работы других процессов.

Команда nice (установка приоритета процесса)

nice [-n приоритет] [-приоритет] команда [аргумент]]

Команда nice увеличивает значение niсе value на 10 (или другое заданное значение приоритета) и запускает команду, которой и передается указанный аргумент.

Пример. В BSD Unix значение приоритета является целым числом от -20 до +20: -20 относится к самым приоритетным процессам, а +20 — к процессам наименьшей приоритетности. Умалчиваемое значение приоритета (если не используется nice) равно 0.