Sunday, December 11, 2011

Threads, Processes & related concepts.

Threads & Processes(In terms of Java - Source(Oracle)):

Processes and Threads

In concurrent programming, there are two basic units of execution: processes and threads. In the Java programming language, concurrent programming is mostly concerned with threads. However, processes are also important.

A computer system normally has many active processes and threads. This is true even in systems that only have a single execution core, and thus only have one thread actually executing at any given moment. Processing time for a single core is shared among processes and threads through an OS feature called time slicing.

It's becoming more and more common for computer systems to have multiple processors or processors with multiple execution cores. This greatly enhances a system's capacity for concurrent execution of processes and threads — but concurrency is possible even on simple systems, without multiple processors or execution cores.

Processes

A process has a self-contained execution environment. A process generally has a complete, private set of basic run-time resources; in particular, each process has its own memory space.

Processes are often seen as synonymous with programs or applications. However, what the user sees as a single application may in fact be a set of cooperating processes. To facilitate communication between processes, most operating systems support Inter Process Communication (IPC) resources, such as pipes and sockets. IPC is used not just for communication between processes on the same system, but processes on different systems.

Most implementations of the Java virtual machine run as a single process. A Java application can create additional processes using a ProcessBuilder object. Multiprocess applications are beyond the scope of this lesson.

Threads

Threads are sometimes called lightweight processes. Both processes and threads provide an execution environment, but creating a new thread requires fewer resources than creating a new process.

Threads exist within a process — every process has at least one. Threads share the process's resources, including memory and open files. This makes for efficient, but potentially problematic, communication.

Multithreaded execution is an essential feature of the Java platform. Every application has at least one thread — or several, if you count "system" threads that do things like memory management and signal handling. But from the application programmer's point of view, you start with just one thread, called the main thread. This thread has the ability to create additional threads, as we'll demonstrate in the next section.

Thread Objects

Each thread is associated with an instance of the class Thread. There are two basic strategies for using Thread objects to create a concurrent application.

To directly control thread creation and management, simply instantiate Thread each time the application needs to initiate an asynchronous task.
To abstract thread management from the rest of your application, pass the application's tasks to an executor.

Threads & Processes (from different sources) -

Source 1:
Process:

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

Thread:
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.

Source 2:
Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces. Threads are an operating environment feature, rather than a CPU feature (though the CPU typically has operations that make threads efficient).

Source 3:
Process:

An executing instance of a program is called a process.
Some operating systems use the term ‘task‘ to refer to a program that is being executed.
A process is always stored in the main memory also termed as the primary memory or random access memory.
Therefore, a process is termed as an active entity. It disappears if the machine is rebooted.
Several process may be associated with a same program.
On a multiprocessor system, multiple processes can be executed in parallel.
On a uni-processor system, though true parallelism is not achieved, a process scheduling algorithm is applied and the processor is scheduled to execute each process one at a time yielding an illusion of concurrency.
Example: Executing multiple instances of the ‘Calculator’ program. Each of the instances are termed as a process.

Thread:
A thread is a subset of the process.
It is termed as a ‘lightweight process’, since it is similar to a real process but executes within the context of a process and shares the same resources allotted to the process by the kernel (See kquest.co.cc/2010/03/operating-system for more info on the term ‘kernel’).
Usually, a process has only one thread of control – one set of machine instructions executing at a time.
A process may also be made up of multiple threads of execution that execute instructions concurrently.
Multiple threads of control can exploit the true parallelism possible on multiprocessor systems.
On a uni-processor system, a thread scheduling algorithm is applied and the processor is scheduled to run each thread one at a time.
All the threads running within a process share the same address space, file descriptor, stack and other process related attributes.
Since the threads of a process share the same memory, synchronizing the access to the shared data withing the process gains unprecedented importance.

Source 4:
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources. In particular, the threads of a process share the latter's instructions (its code) and its context (the values that its variables reference at any given moment). To give an analogy, multiple threads in a process are like multiple cooks reading off the same cook book and following its instructions, not necessarily from the same page.

On a single processor, multithreading generally occurs by time-division multiplexing (as in multitasking): the processor switches between different threads. This context switching generally happens frequently enough that the user perceives the threads or tasks as running at the same time. On a multiprocessor or multi-core system, the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task.
Many modern operating systems directly support both time-sliced and multiprocessor threading with a process scheduler. The kernel of an operating system allows programmers to manipulate threads via the system call interface. Some implementations are called a kernel thread, whereas a lightweight process (LWP) is a specific type of kernel thread that shares the same state and information.

Programs can have user-space threads when threading with timers, signals, or other methods to interrupt their own execution, performing a sort of ad-hoc time-slicing.


Differences between Threads & Processes:

Threads differ from traditional multitasking operating system processes in that:
processes are typically independent, while threads exist as subsets of a process
processes carry considerably more state information than threads, whereas multiple threads within a process share process state as well as memory and other resources
processes have separate address spaces, whereas threads share their address space
processes interact only through system-provided inter-process communication mechanisms
Context switching between threads in the same process is typically faster than context switching between processes.
Systems like Windows NT and OS/2 are said to have "cheap" threads and "expensive" processes; in other operating systems there is not so great a difference except the cost of address space switch which implies a TLB flush.

Other major difference between threads and processes is:

Threads share the address space of the process that created it; processes have their own address space.
Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
Threads have almost no overhead; processes have considerable overhead.
New threads are easily created; new processes require duplication of the parent process.
Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

More differences Between Threads and Processes:

Threads share the address space of the process that created it; processes have their own address. Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process. Threads can directly communicate with other threads of its process; processes must use inter-process communication to communicate with sibling processes. Threads have almost no overhead; processes have considerable overhead. New threads are easily created; new processes require duplication of the parent process. Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.

Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process, changes to the parent process does not affect child processes.

Both have an id, set of registers, state, priority, and scheduling policy.
Both have attributes that describe the entity to the OS.
Both have an information block.
Both share resources with the parent process.
Both function as independent entities from the parent process.
The creator can exercise some control over the thread or process.
Both can change their attributes.
Both can create new resources.
Neither can access the resources of another process.

Other sources:

An application consists of one or more processes. A process, in the simplest terms, is an executing program. One or more threads run in the context of the process. A thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread.

A process is a collection of code, memory, data and other resources. A thread is a sequence of code that is executed within the scope of the process. You can (usually) have multiple threads executing concurrently within the same process.

A single process can have multiple threads that share global data and address space with other threads running in the same process, and therefore can operate on the same data set easily. Processes do not share address space and a different mechanism must be used if they are to share data.
If we consider running a word processing program to be a process, then the auto-save and spell check features that occur in the background are different threads of that process which are all operating on the same data set (your document).

Process:
In computing, a process is an instance of a computer program that is being sequentially executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.

A computer program is a passive collection of instructions; a process is the actual execution of those instructions. Several processes may be associated with the same program; for example, opening up several instances of the same program often means more than one process is being executed. Multitasking is a method to allow multiple processes to share processors (CPUs) and other system resources. Each CPU executes a single task at a time. However, multitasking allows each processor to switch between tasks that are being executed without having to wait for each task to finish. Depending on the operating system implementation, switches could be performed when tasks perform input/output operations, when a task indicates that it can be switched, or on hardware interrupts.

A common form of multitasking is time-sharing. Time-sharing is a method to allow fast response for interactive user applications. In time-sharing systems, context switches are performed rapidly. This makes it seem like multiple processes are being executed simultaneously on the same processor. The execution of multiple processes seemingly simultaneously is called concurrency. For security and reliability reasons most modern operating systems prevent direct communication between independent processes, providing strictly mediated and controlled inter-process communication functionality.

Thread:
A single process may contain several executable programs (threads) that work together as a coherent whole. One thread might, for example, handle error signals, another might send a message about the error to the user, while a third thread is executing the actual task of the...

Link: Another good read.

No comments:

Post a Comment