thread

스레드(thread)란 무엇이며, 프로세스와 어떤 관계인가?

스레드는 프로그램 안에서 독립적으로 실행되는 가장 작은 작업 단위다. 하나의 프로세스(실행 중인 프로그램) 안에 여러 스레드가 있을 수 있으며, 한 사무실에서 여러 직원이 각자 다른 업무를 동시에 처리하는 것과 비슷하다.

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

같은 프로세스의 스레드들이 자원을 공유한다고 할 때, 구체적으로 어떤 데이터를 공유하고 어떤 데이터는 스레드마다 독립인가?

같은 프로세스의 스레드들은 메모리, 실행 코드, 전역 변수 등을 함께 사용한다. 같은 사무실의 직원들이 공용 프린터와 서류함을 공유하되, 각자의 책상과 메모장은 따로 쓰는 것과 같다.

The multiple threads of a given process may be executed concurrently (via multithreading capabilities), sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

유저 스레드(user thread)란 무엇이며, 커널 스레드와 어떤 점에서 다른가?

커널 스레드는 운영체제가 직접 관리하고 스케줄링하는 스레드이고, 유저 스레드는 프로그램(런타임) 내부에서 자체적으로 관리하는 스레드다. 회사 인사팀이 배치하는 정규 직원과, 팀 내부에서 자율적으로 역할을 나누는 것의 차이와 비슷하다.

At the kernel level, a process contains one or more kernel threads, which share the process's resources, such as memory and file handles – a process is a unit of resources, while a thread is a unit of scheduling and execution. Kernel scheduling is typically uniformly done preemptively or, less commonly, cooperatively. At the user level a process such as a runtime system can itself schedule multiple threads of execution. If these do not share data, as in Erlang, they are usually analogously called processes, while if they share data they are usually called (user) threads, particularly if preemptively scheduled.

커널 스레드가 소유하는 자원은 무엇이며, 프로세스와 비교해 생성/파괴가 저렴한 이유는?

커널 스레드는 자기만의 스택, 레지스터 복사본, 스레드 전용 저장소 정도만 갖고 있어서, 프로세스 전체를 새로 만드는 것보다 훨씬 가볍다. 새 사무실을 차리는 것과 기존 사무실에 책상 하나를 추가하는 것의 차이다.

Kernel threads do not own resources except for a stack, a copy of the registers including the program counter, and thread-local storage (if any), and are thus relatively cheap to create and destroy. Thread switching is also relatively cheap: it requires a context switch (saving and restoring registers and stack pointer), but does not change virtual memory and is thus cache-friendly (leaving TLB valid).

유저 스레드에서 블로킹 시스템 콜이 문제가 되는 이유와, 이를 해결하는 방법은?

유저 스레드 하나가 파일 읽기 같은 대기 작업에 걸리면, 같은 프로세스의 다른 유저 스레드까지 모두 멈춘다. 이를 해결하기 위해 내부적으로는 대기 없이 처리하는 비동기 I/O를 사용하고, 겉으로는 대기하는 것처럼 보이게 하는 방법을 쓴다.

However, the use of blocking system calls in user threads (as opposed to kernel threads) can be problematic. If a user thread or a fiber performs a system call that blocks, the other user threads and fibers in the process are unable to run until the system call returns. A common solution to this problem (used, in particular, by many green threads implementations) is providing an I/O API that implements an interface that blocks the calling thread, rather than the entire process, by using non-blocking I/O internally, and scheduling another user thread or fiber while the I/O operation is in progress. Alternatively, the program can be written to avoid the use of synchronous I/O or other blocking system calls (in particular, using non-blocking I/O, including lambda continuations and/or async/await primitives).

프로세스 전환(context switch)이 스레드 전환보다 비용이 큰 이유는?

프로세스를 전환하면 메모리 주소 공간 자체가 바뀌어 CPU 캐시를 비워야 하지만, 같은 프로세스 내 스레드 전환은 메모리 공간이 같으므로 캐시를 유지할 수 있다. 다른 건물로 이사하는 것과 같은 건물 내 방을 옮기는 것의 차이다.

A process is a heavyweight unit of kernel scheduling, as creating, destroying, and switching processes is relatively expensive. Processes are typically preemptively multitasked, and process switching is relatively expensive, beyond basic cost of context switching, due to issues such as cache flushing (in particular, process switching changes virtual memory addressing, causing invalidation and thus flushing of an untagged translation lookaside buffer (TLB), notably on x86).

스레드가 같은 주소 공간을 공유하는 것의 위험성은 무엇이며, 실제 소프트웨어에서 이를 어떻게 회피하는가?

스레드 하나가 잘못된 메모리 접근을 하면 같은 프로세스의 모든 스레드가 함께 죽는다. 한 직원이 사무실에 불을 내면 같은 사무실의 모든 직원이 피해를 입는 것과 같아서, 중요한 서비스는 프로세스를 분리하여 피해 범위를 제한한다.

Thread crashes a process: due to threads sharing the same address space, an illegal operation performed by a thread can crash the entire process; therefore, one misbehaving thread can disrupt the processing of all the other threads in the application.

현대 OS(Linux, Windows, macOS)가 채택한 1:1 스레딩 모델이란 무엇인가?

1:1 모델은 프로그램이 만든 스레드 하나가 운영체제의 커널 스레드 하나와 정확히 대응하는 가장 단순한 방식이다. Windows, Linux, macOS 등 현대 운영체제 대부분이 이 방식을 사용한다.

Threads created by the user in a 1:1 correspondence with schedulable entities in the kernel are the simplest possible threading implementation. OS/2 and Win32 used this approach from the start, while on Linux the GNU C Library implements this approach (via the NPTL or older LinuxThreads). This approach is also used by Solaris, NetBSD, FreeBSD, macOS, and iOS.

M:1 스레딩 모델에서 멀티코어 CPU의 성능을 활용할 수 없는 이유는?

M:1 모델은 프로그램의 모든 스레드가 운영체제 입장에서는 하나의 스레드로 보이기 때문에, CPU 코어가 아무리 많아도 한 번에 하나의 스레드만 실행된다. 차선이 여러 개인 고속도로에서 한 차선만 쓰는 것과 같다.

An M:1 model implies that all application-level threads map to one kernel-level scheduled entity; the kernel has no knowledge of the application threads. One of the major drawbacks, however, is that it cannot benefit from the hardware acceleration on multithreaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time. For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be used.

M:N 스레딩 모델이란 무엇이며, 어떤 런타임이 이를 사용하는가?

M:N 모델은 프로그램의 M개 스레드를 운영체제의 N개 커널 스레드에 유연하게 대응시키는 절충안이다. 스레드 전환이 매우 빠르지만, 프로그램 내부 스케줄러와 운영체제 스케줄러 간의 조율이 복잡해지는 단점이 있다.

M:N maps some M number of application threads onto some N number of kernel entities, or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.

스레드가 공유 데이터에 동시 접근할 때 발생하는 race condition이란 무엇이며, 이를 방지하는 방법은?

두 스레드가 같은 데이터를 동시에 수정하면 예측할 수 없는 결과가 생기는데 이를 race condition이라 한다. 두 사람이 동시에 같은 문서를 편집하면 내용이 꼬이는 것과 같아서, 뮤텍스(잠금장치)로 한 번에 하나만 접근하도록 제어한다.

When shared between threads, however, even simple data structures become prone to race conditions if they require more than one CPU instruction to update: two threads may end up attempting to update the data structure at the same time and find it unexpectedly changing underfoot. Bugs caused by race conditions can be very difficult to reproduce and isolate. To prevent this, threading application programming interfaces (APIs) offer synchronization primitives such as mutexes to lock data structures against concurrent access.

스레드 풀(thread pool)이란 무엇이며, 매번 새 스레드를 생성하는 것과 비교해 어떤 이점이 있는가?

스레드 풀은 미리 일정 수의 스레드를 만들어두고, 작업이 들어오면 대기 중인 스레드에 배정하는 방식이다. 매번 직원을 채용하고 해고하는 대신, 상시 대기 인력을 운영하는 것과 같아서 스레드 생성/소멸 비용을 절약한다.

A popular programming pattern involving threads is that of thread pools where a set number of threads are created at startup that then wait for a task to be assigned. When a new task arrives, it wakes up, completes the task and goes back to waiting. This avoids the relatively expensive thread creation and destruction functions for every task performed and takes thread management out of the application developer's hand and leaves it to a library or the operating system that is better suited to optimize thread management.

싱글스레드 프로그램에서 오래 걸리는 작업이 UI를 멈추게 하는 문제를, 멀티스레딩 없이도 해결할 수 있는가?

시간이 오래 걸리는 작업이 하나뿐인 실행 흐름을 막으면 화면이 멈춘 것처럼 보인다. 멀티스레딩으로 별도 스레드에서 처리하면 해결되지만, 비동기 I/O나 이벤트 기반 프로그래밍으로도 멀티스레딩 없이 비슷한 효과를 낼 수 있다.

Responsiveness: multithreading can allow an application to remain responsive to input. In a one-thread program, if the main execution thread blocks on a long-running task, the entire application can appear to freeze. By moving such long-running tasks to a worker thread that runs concurrently with the main execution thread, it is possible for the application to remain responsive to user input while executing tasks in the background. On the other hand, in most cases multithreading is not the only way to keep a program responsive, with non-blocking I/O and/or Unix signals being available for obtaining similar results.

멀티스레드 프로그램이 본질적으로 테스트하기 어려운 이유와, 이를 완화하는 설계 패턴은?

멀티스레드 프로그램은 실행할 때마다 스레드의 순서가 달라져 결과가 비결정적이다. 테스트 환경에서는 문제없다가 실제 운영에서만 버그가 나타나기도 한다. 스레드 간 통신을 메시지 전달 방식으로 제한하면 이런 위험을 줄일 수 있다.

Being untestable: In general, multithreaded programs are non-deterministic, and as a result, are untestable. In other words, a multithreaded program can easily have bugs which never manifest on a test system, manifesting only in production. This can be alleviated by restricting inter-thread communications to certain well-defined patterns (such as message-passing).

GIL(Global Interpreter Lock)이란 무엇이며, 멀티코어 환경에서 어떤 한계를 만드는가?

GIL은 Python(CPython)이나 Ruby(MRI) 같은 언어에서, 한 번에 하나의 스레드만 코드를 실행하도록 강제하는 전역 잠금장치다. 코어가 여러 개여도 동시에 하나만 일하므로 CPU 집약적인 작업의 병렬 처리가 제한된다.

A few interpreted programming languages have implementations (e.g., Ruby MRI for Ruby, CPython for Python) which support threading and concurrency but not parallel execution of threads, due to a global interpreter lock (GIL). The GIL is a mutual exclusion lock held by the interpreter that can prevent the interpreter from simultaneously interpreting the application's code on two or more threads at once. This effectively limits the parallelism on multiple core systems. It also limits performance for processor-bound threads (which require the processor), but doesn't effect I/O-bound or network-bound ones as much.