zoukankan      html  css  js  c++  java
  • 具体解释clone函数

    我们都知道linux中创建新进程是系统调用fork,但实际上fork是clone功能的一部分,clone和fork的主要差别是传递了几个參数。clone隶属于libc。它的意义就是实现线程。


    看一下clone函数:

    int clone(int (*fn)(void * arg), void *stack, int flags, void * arg);

    fn就是即将创建的线程要运行的函数,stack是线程使用的堆栈。


    再来看一下clone和pthread_create的差别:linux中的pthread_create终于调用clone。


    我们的目的不是为了介绍clone,而是探究clone中的上下文切换问题。

    (1)进程切换:把执行的进程的CPU寄存器中的数据取出存放到内核态堆栈中,同一时候把要加载的进程的数据放入到寄存器中(硬件上下文)。还会把全部一切的状态信息进行切换。

    (2)时间片轮转的方式使多个任务在同一颗CPU上运行变成了可能,但同一时候也带来了保存现场和载入现场的直接消耗(上下文切换会带来直接和间接两种因素影响程序性能的消耗。直接消耗包含:CPU寄存器须要保存和载入。系统调度器的代码须要运行,TLB实例须要又一次载入,CPU 的pipeline须要刷掉;间接消耗指的是多核的cache之间得共享数据。间接消耗对于程序的影响要看线程工作区操作数据的大小)。

    (3)clone任务[1]:

    • Allocate data structures for thread representation
    • Initialize structures according to clone parameters
    • Set up kernel and user stack as well as argument for the thread function
    • Put the thread on the corresponding CPU core’s run queue
    • Notify target core via an interrupt so that the new thread will be scheduled

    (4)我们在clone出线程时指定高的优先级,也许会降低因抢占而造成的上下文切花开销。

    #include <pthread.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <assert.h>
    
    #define N 4
    #define M 30000
    
    #define THREAD_NUM      4
    #define POLICY          SCHED_RR
    
    int nwait = 0;
    volatile long long sum;
    long loops = 6e3;
    pthread_mutex_t mutex;
    
    void set_affinity(int core_id) {
    	cpu_set_t cpuset;
    	CPU_ZERO(&cpuset);
    	CPU_SET(core_id, &cpuset);
    	assert(pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset) == 0);
    }
    
    void* thread_func(void *arg) {
    	//set_affinity((int)(long)arg);
    	for (int j = 0; j < M; j++) {
    		pthread_mutex_lock(&mutex);
    		nwait++;
    		for (long i = 0; i < loops; i++) // This is the key of speedup for parrot: the mutex needs to be a little bit congested.
    			sum += i;
    		pthread_mutex_unlock(&mutex);
    		for (long i = 0; i < loops; i++)
    			sum += i*i*i*i*i*i;
    		//fprintf(stderr, "compute thread %u %d
    ", (unsigned)pthread_self(), sched_getcpu());
      }
    }
    
    int main() {
        //set_affinity(23);
    
        pthread_t             threads[THREAD_NUM], id;
        pthread_attr_t        attrs[THREAD_NUM];
        struct sched_param    scheds[THREAD_NUM], sched;
        int                   idxs[THREAD_NUM];
        int                   policy, i, ret;
    
        id = pthread_self();
        ret = pthread_getschedparam(id, &policy, &sched);
        assert(!ret && "main pthread_getschedparam failed!");
        sched.sched_priority = sched_get_priority_max(POLICY);
        ret = pthread_setschedparam(id, POLICY, &sched); //set policy and corresponding priority
        assert(!ret && "main pthread_setschedparam failed!");
    
        for (i = 0; i < THREAD_NUM; i++) {
            idxs[i] = i;
    		
            ret = pthread_attr_init(&attrs[i]);
    	assert(!ret && "pthread_attr_init failed!");
           
            ret = pthread_attr_getschedparam(&attrs[i], &scheds[i]);
    	assert(!ret && "pthread_attr_getschedparam failed!");
       
            ret = pthread_attr_setschedpolicy(&attrs[i], POLICY);
    	assert(!ret && "pthread_attr_setschedpolicy failed!");
      
            scheds[i].sched_priority = sched_get_priority_max(POLICY);
          
            ret = pthread_attr_setschedparam(&attrs[i], &scheds[i]);
    	assert(!ret && "pthread_attr_setschedparam failed!");
      
            ret = pthread_attr_setinheritsched(&attrs[i], PTHREAD_EXPLICIT_SCHED);
    	assert(!ret && "pthread_attr_setinheritsched failed!");
        }
    
    
        for (i = 0; i < THREAD_NUM; i++) {
            ret = pthread_create(&threads[i], &attrs[i], thread_func, &idxs[i]);
    	assert(!ret && "pthread_create() failed!");
        }
    
        for (i = 0; i < THREAD_NUM; i++)
            ret = pthread_join(threads[i], NULL);
    
        return 0;
    }
    


    我们让四个子线程和主线程都採取RR调度,并设置最高优先级,我们用VTune观察Preemption Context Switches是否会因此降低。


    VTune现象:





    如今设置最低优先级:



    原来设置最低优先级能够降低Preemption Context Switches,可是添加了Synchronization Context Switches。

    显然最高优先级执行用时少(4.470s,而最低优先级用时7.280s)。


    REFERENCES:

    [1] Balazs Gerofi, etc, Clone n(): Parallel Thread Creation for Upcoming Many-Core Architectures, 2012, IEEE International Conference on Cluster Computing.

  • 相关阅读:
    Linux 安装 Redis
    IDEA 安装 VisualVM 插件
    Linux安装Erlang和RabbitMQ
    vue Uncaught Error: Redirected when going from “/*“ to “/*“ 路由报错
    gerrit安装指南
    【.NET技术栈】数据库与Entityframework Core目录
    vue-cli3.0/4.0搭建项目
    安装nodejs并搭建vue项目
    Vue学习之vue-cli脚手架下载安装及配置
    写在强基录取之后
  • 原文地址:https://www.cnblogs.com/yangykaifa/p/6904293.html
Copyright © 2011-2022 走看看