POSIX Threads Programming 阅读笔记（来自劳伦斯利物浦实验室的文章）

zoukankan html css js c++ java

POSIX Threads Programming 阅读笔记（来自劳伦斯利物浦实验室的文章）

1. SMP机器中实现并行常见的做法就是使用threads, hardware vendors有自己的threads实现，但是给程序移植带来很大问题。于是，对于UNIX系统来说，IEEE POSIX 1003.1c标准出台，这就是POSIX Threads -- pthread

2. 据我所知，有一个open source的项目，是一个library，实现了windows下的pthread，简单来说，将我们写的pthread的函数映射到了windows 下的线程操作函数上。windows本身并不支持pthread，pthread更多是用于UNIX系统的

3. 这里有个对thread的精彩概括，摘录如下：

Code: Select all
This independent flow of control is accomplished because a thread maintains its own: Stack pointer Registers Scheduling properties (such as policy or priority) Set of pending and blocked signals Thread specific data. So, in summary, in the UNIX environment a thread: Exists within a process and uses the process resources Has its own independent flow of control as long as its parent process exists and the OS supports it Duplicates only the essential resources it needs to be independently schedulable May share the process resources with other threads that act equally independently (and dependently) Dies if the parent process dies - or something similar Is "lightweight" because most of the overhead has already been accomplished through the creation of its process. Because threads within the same process share resources: Changes made by one thread to shared system resources (such as closing a file) will be seen by all other threads. Two pointers having the same value point to the same data. Reading and writing to the same memory locations is possible, and therefore requires explicit synchronization by the programmer.

4. Why Pthreads? 使用threads的主要理由就是性能。参考附件1，文章给出的fork的执行时间和pthread_create执行时间的对比，是数量级的差别。其次，线程间通讯也比进程间通讯要简单和高效。有意思的是，由于本文是LLNL所写，所以搞MPI他们也是很熟，于是，他们列出了在一个SMP机器上，用 MPI做并行和用threads做并行方面，CPU-Memory之间的bandwidth的差别（参考附件2），也是数量级的差别。很显然，使用MPI 有内存的拷贝，而使用threads，较好的情况下，可能就是CPU从cache中取得数据，最差的情况也是CPU从内存中取数据，自然性能比MPI高很多。

5. PThreads overview. 摘录一段一般我们使用thread的三种模式：

Code: Select all
Manager/worker: a single thread, the manager assigns work to other threads, the workers. Typically, the manager handles all input and parcels out work to the other tasks. At least two forms of the manager/worker model are common: static worker pool and dynamic worker pool. Pipeline: a task is broken into a series of suboperations, each of which is handled in series, but concurrently, by a different thread. An automobile assembly line best describes this model. Peer: similar to the manager/worker model, but after the main thread creates other threads, it participates in the work.

我们EasyCluster中后台easy_s的thread工作模式应该是第二种 -- pipeline，每个thread都力图设计的简单，很多thread一起完成一项工作。

6. Thread-safeness. 这个很重要，就是在多线程程序中，除了我们自己的代码自己保证thread-safe之外（使用线程同步），在使用library方面，也要注意该 library是不是thread-safe的，比如，我们的程序调用一个库A，这个库A中的函数会修改B，那么，如果我们的程序是多线程的，那么，多个线程同时调用库A，就会导致B的数据发生错误。所以，在多线程程序中，要确保library是thread-safe的。如果library不是 thread-safe的或我们无法确认library是不是thread-safe，那我们就要让自己程序中的线程同步，不要让多个线程同时调用这个 library。参考附件3的图，很直观。

7. pthread APIs. pthread大概定义了60个函数，大致上可以分为三类：

Thread management: The first class of functions work directly on threads - creating, detaching, joining, etc. They include functions to set/query thread attributes (joinable, scheduling etc.)

Mutexes: The second class of functions deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion". Mutex functions provide for creating, destroying, locking and unlocking mutexes. They are also supplemented by mutex attribute functions that set or modify attributes associated with mutexes.

Condition variables: The third class of functions address communications between threads that share a mutex. They are based upon programmer specified conditions. This class includes functions to create, destroy, wait and signal based upon specified variable values. Functions to set/query condition variable attributes are also included.

8. 往下的内容就是讲解API了。这些内容都需要认真阅读，而且几乎每句话都很有用，所以不再摘录了，看附件文章本身了。文章写的非常不错，而且mutex和condition variable都讲述了。而且文章中还有三个自问自答的部分，都很精彩，这里摘录两个：

(1) 当thread被创建之后，我们如何知道thread正式开始被OS所执行呢？ -- 这依赖于pthread的实现，不同的OS可能实现的方法都不一样。换言之，我们的代码书写中，不能假设某个线程先被执行，某个线程一定要后被执行。

(2) 当一个mutex被unlock的时候，如果有多个thread都在等这个mutex，那么，哪个线程会先得到这个mutex呢？ -- 如果我们没有使用pthread的定义priority的API来定义过thread的priority的话，那就看OS本身是如何调度thread的了，也就是说，这是说不准的事，看哪个线程先被OS调度到了。

9. 其他还有一些零碎的知识点，但是都很重要，比如：main函数本身也是一个thread，如果main函数自然结束，或是调用了exit结束，那么，所有在main函数中创建的thread都将terminate，但是如果main函数是调用了pthread_exit结束的话，那么，在main函数中创建的thread就不会terminate，会继续执行下去；有关stack management一节中的thread的default stack size，我在centos 4.4 i686/x86_64的机器上都测过了，都是10M（默认值）；使用mutex和使用condition variable两节讲解的非常精彩，很容易懂，不错。

10. 本文还说了本文没有涉及的内容：

Code: Select all
Thread Scheduling Implementations will differ on how threads are scheduled to run. In most cases, the default mechanism is adequate. The Pthreads API provides routines to explicitly set thread scheduling policies and priorities which may override the default mechanisms. The API does not require implementations to support these features. Keys: Thread-Specific Data As threads call and return from different routines, the local data on a thread's stack comes and goes. To preserve stack data you can usually pass it as an argument from one routine to the next, or else store the data in a global variable associated with a thread. Pthreads provides another, possibly more convenient and versatile, way of accomplishing this through keys. Mutex Protocol Attributes and Mutex Priority Management for the handling of "priority inversion" problems. Condition Variable Sharing - across processes Thread Cancellation Threads and Signals

11. 本文有关mutex和condition variable是讲解的比较不错的，简单易懂且实用。这里正好联想到EasyCluster的后台easy_s的逻辑：线程A和B。A负责接受来自前台的request，然后填充某个数据结构；线程B负责轮询这个数据结构，如果发现有数据了就开始干活。当然，两个线程就访问这个关键数据结构有个 mutex。不过这样做的话，线程B就需要轮询，反复的lock/unlock mutex，带来的是性能的损失。

所以，这种情况下，就可以使用condition variable。首先，创建一个cv，创建cv的话一定要和一个mutex绑定，毋庸置疑，就是我们现在使用的这个mutex了。这样的话，逻辑就变成线程B首先lock mutex，然后调用pthread_cond_wait，该函数会自动unlock mutex，然后block；线程A在收到前台的request之后，尝试lock mutex，如果线程B已经处于cond wait的状态，那么，该mutex是released，所以线程A可以取得这个mutex，然后将request的数据填充入数据结构，接着 pthread_cond_signal线程B，最后unlock mutex。线程B被唤醒之后，pthread_cond_wait会自动lock mutex（注意哦，之前没有被signal之前，该函数是自动unlock mutex，现在被唤醒，是自动lock mutex），然后开始干活，活干完之后，我们需要unlock mutex，然后再进入循环开头，就是重新lock mutex并调用pthread_cond_wait，如果成功，mutex自动被pthread_cond_wait unlock。此时线程A就可以再次signal线程B了，如果收到request的话。

按照这个逻辑的话，线程B就不需要轮询，只有在需要的时候被唤醒，效率高了很多，特别是没有request的时候，线程B根本就不需要做任何事情。但是仔细想想，这样做的话可能会带来系统响应度的下降。因为基于这种cv的线程设计，如果像上述的做法，就变成串行程序了，根本不并行。因为线程A在线程B没有完成工作之前，就无法往关键数据结构中添加新的request请求，不是么？相比轮询的方法，轮询的方法线程B在工作的时候，线程A照样可以添加新的request请求数据，线程B会在下个轮询的时候处理，这样响应度还更高一些呢？不是么？

所以，condition variable并不是一个能带来高性能和高响应度的方法，而是一种避免轮询的方法。还是上面的例子，我想，如果要并行度高的话，提高request的响应速度的话，应该把线程B的工作拆分，使用pipeline的思路，让多个线程来处理一个request，大家分工，那么，当第一个线程从数据结构中 check out一个request之后，可以将request的数据处理成后面线程需要的格式，然后立马就释放mutex，重新 pthread_cond_wait，这样，线程A就可以很快的重新充填request数据了。这样并行度就高了，request的处理速度就提高了。但是，这就需要我们精心的设计这多个线程了。

最后再重申一句，使用cv，一定要注意，不能在没有人pthread_cond_wait的时候，发生pthread_cond_signal的调用，设计的时候要注意了。最后总结一下上面例子中使用cv，两个线程的主要调用过程：

Code: Select all
Thread A * Accept client requests * Lock associated mutex * Change global data structure. * Send pthread_cond_signal to Thread-B * Unlock mutex. * Repeat Thread B * Lock associated mutex * Call pthread_cond_wait() to perform a blocking wait for signal from Thread-A. Note that a call to pthread_cond_wait() automatically and atomically unlocks the associated mutex variable so that it can be used by Thread-A. * When signaled, wake up. Mutex is automatically and atomically locked. * Access/modify global data structure * Explicitly unlock mutex * Repeat

查看全文

相关阅读:
eclipse 开始运行提示 Java was started but returned exit code=13
c# silverlight
CSS 文本、字体、链接
 IIS8中添加WCF支持几种方法小结[图文]
CSS 背景
 如何创建 CSS
CSS 简介、语法、派生选择器、id 选择器、类选择器、属性选择器
 HTML 5 服务器发送事件、Input 类型、表单元素、表单属性
 HTML 5 Web 存储、应用程序缓存、Web Workers
asp.net运行时（二）httpHandle

原文地址：https://www.cnblogs.com/super119/p/1996108.html