多线程基础之一：进程间同步问题的来源和解决方案

同步问题诞生的最本质的原因：In fact, a process may be interrupted at any point in its instruction stream, and processing core may be assigned to execute instructions of another process.总之一句话，关于共享对象的更改操作并非原子操作，如假设两个进程对于共享对象counter进行不同的操作：

Process_1: counter++;
Process_2:counter --;

一般地，counter++将被翻译成低级别的原子操作序列

counter++：
    1:  register1 = counter
    2:  register1 = register1 +１
    3:  counter = register1

而counter–将被翻译成

counter--：
    1:  register1 = counter
    2:  register1 = register1 -１
    3:  counter = register1

考虑到现今操作系统为了提供更好的实时响应特性，抢占式内核是主流设计，故而上面两个原子操作序列的执行顺序是可以互相交叉的，故而最后counter的结果是不确定的。这种有多个进程操作同一数据对象，并且结果依赖于此次具体执行顺序的情况被称作race condition（竞争状况）。

0.同步问题的解决方案核心要求：互斥性（exclusion) + 等待有限性(bounded waiting)

操作系统领域将多进程中涉及对共享数据对象进行操作的代码段称为critical section，而对于临界区的操作设计必须要满足两特性:互斥性和等待有限性。

关于临界区的互斥性，Peterson Solution给出了经典的针对两个竞争进程的临界区软件实现策略。

do {
   flag[i] = true;
   turn = j;
   while (flag[j] && turn == j); //单步循环，除非对方flag[j]=false，或者turn=i
   //因为即便是两个进程交互进入，在任何瞬时turn也只能取一个值，这也是Peterson方案只能适用于两个race process的原因
      ****critical section****

   flag[i] = false;
      ****remain section****
 }while (ture);

在单处理器架构下，多进程共享数据一致性问题可以很好地解决，只要涉及到对共享数据对象进行修改时，不准“打断”存在，这时非抢占式内核(non-preemptive kernel)的常用手段；但SMP多处理器架构下，“组织打断或抢占”的机制将会极大地影响系统效率，故而现今主流操作系统都是支持抢占式内核的，并且额外提供硬件支持（可以保证在硬件层面上，实现对于一个字的内容修改或两个字内容互相交换等操作的原子性）Hardware features can make any programming task easier and improve system efficiency.

如操作系统提供如下的原子操作atomic_operation: compare_and_swap()和test_and_set()

boolean test_and_set(boolean *target) {
   boolean rv = *target;
   *target = true;
   return  rv;
}

int compare_and_swap(int * value, int expected, int new_value){
    int temp = *value;

    if (*value == expected)
        *value = new_value;

    return temp;
}

然后通过这两个原子操作可以实现进程间加锁互斥

do {
   while (test_and_set(&lock)); //给lock进行加锁，原子操作
      ****critical section****
   lock = false;
      ****remain section****
 }while (ture);

do {
   while (compare_and_swap(&lock, 0, 1) != 0); //给lock进行加锁，原子操作
      ****critical section****
   lock = false;
      ****remain section****
 }while (ture);

上述操作虽然满足了互斥性，但不能保证等待有限性，可以通过以下设计满足bounded-waiting设计

do {
   waiting[i] = true;
   key = true;
   while (waiting[i] && key)
      key = test_and_set(&lock);
   waiting[i] = false;

      ****critical section****

   j = (i+1) % n;
   while ( (j != i) && !waiting[j] ) //此处遍历全等待进程数组一遍，保证每个进程至多等n-1 turns
      j = (j+1) % n;

   if (j == i) //当此前所有等待进程都获得操作临界区的机会后，才释放本轮的锁
      lock = false;
   else
      waiting[j] = false;

      ****remain section****

 }while (true);

而各进程间共享的数据为：

 boolean  waiting[n];
 boolean  lock;

这种基于硬件特性的原子性操作test_and_set()\compare_and_swap()手工构建的互斥并行方案是一种需要coder承担很多工作的设计，显然封装底层实现，提供抽象接口才是程序世界的王道。

1.锁机制之一Mutex Lock

既然提供了硬件加锁的可能，操作系统必然也是为给出一系列方便程序员使用的锁机制。The hardware-based solutions to the critical-section problem using atomical function eg. test_and_set() are complicated as well as generally inaccessible to application programmers. Instead, operating-systems designers build software tools to solve the critical-section problem. Mutex便是操作系统提供的诸多锁机制中的最简单的一个（Mutex是mutual exclusion的合并）。

锁机制的经典使用机制

acquire() {
    while (!available); //要么获取，要么一直重复该步消耗掉本次获取的所有CPU时间片
    available = false; //获取成功，加锁
}

release() {
    available = true;
}

do {
    acquire lock:
           ****critical section****
    release lock:
           ****remain section****
}while (true);

Mutex是在上面介绍的原子操作test_and_set()等atom lock基础上封装而来的，可以实现锁的互斥性，但是Mutex最大的问题在于它存在“空等”情况，如果process_i进入了临界区，其余进程必须一直调用acquire()直到分配成功。正是因为“空等”的存在，故而Mutex这类锁也被称为自旋锁。In fact, this type of mutex lock is also called a spinlock because the process “spins” while waiting for the lock to become available.

但也正是因为自旋空等的存在，自旋锁生效期间，等待进程没有产生进程上下文内容切换（no context switch is required)，因为上下文内容切换也是要花费时间的（a context switch may take considerable time）。所以如果单一线程占用共享资源的时间较短时，采用自旋锁是合理的，其实就是比较盲等时间和让进程切换上下文执行其他内容带来的收益。

2.锁机制之二 semaphores

旗语semaphores分为计数counting和binary两种，前者取值范围是（-∞，+∞），后者只能在0和1间取值，binary semaphore相当于Mutex Locks。

Counting semaphores相当于为某一资源提供次数有限的访问权限，类似于线程池thread pool为外部服务请求进行处理的场景，需要用counting semaphores管理每时刻线程池空闲服务线程数目。

上面我们说道Mutex虽然简单，但是存在等待进程盲等的计算资源浪费情况。在Semaphores中，相比于Mutex存在盲等，Semaphores有着更经济的操作方式，一旦进程检测到Semaphores Value小于等于0，则当前进程自我阻断，进入一个和Semaphores绑定的等待队列中，该进程的状态被置为waiting状态，控制权交给CPU调度中心，由其选择下一进程执行（The block operation places a process into a waiting queue associated with the semaphore, and the state of the process is switched to the waiting state. The control is transferred to the CPU scheduler, which selects another process to execute.）而进入休眠状态的进程需wakeup()指令激活，然后进入ready queue等待获取资源。

typedef struct{
   atomic_t  count;
   int sleepers;
   wait_queue_head_t  wait;
} semaphore;

其中block()操作将会中止它的宿主进程，而wakeup(P)则会激活进程P，这两个功能函数均是由操作系统实现的（基于中断机制实现）。而Semaphores配备的等待队列采用的排队策略，最简单的当然是FIFO，但是配备其他策略也是可以的，并不影响semaphore的正常使用。

简单一句话，如果critical section的操作指令并不多，完全可以借助Mutex或者Binary Semaphores来实现锁机制，没必要使用Semaphores的这种配备等待队列的机制来实计算时间节约，但如果临界区内操作复杂（使用进程占用时间很长），这种情况下应该使用Semaphores配备等待队列机制，以集中计算资源给当前正在临界区中的进程使用。