当前位置:Linux教程 - Linux资讯 - 内核中的互斥之我见

内核中的互斥之我见

  /*e4gle:在我修改Linux源代码的过程中曾被大量的内核互斥现象所困扰,这需要利用内核锁去解决,虽然最后大部分解决,但我觉得应该留下些什么,也没时间写了,偶尔看见这位兄弟的文章,觉得正是我想整理的,所以拿出来给大家分享,关于bottom_half和中断的问题,在tcp/ip半底中绝对不能对文件读写操作,不然就panic,恰恰我在linux中的增强功能就有这个操作,使我郁闷了很久,欢迎大家讨论   */   内核中的互斥之我见   by wheelz   看了前面各位的讨论,我也有些想法,与大家商榷。   需要澄清的是,互斥手段的选择,不是根据临界区的大小,而是根据临界区的性质,以及 有哪些部分的代码,即哪些内核执行路径来争夺。   从严格意义上说,semaphore和spinlock_XXX属于不同层次的互斥手段,前者的 实现有赖于后者,这有点象HTTP和TCP的关系,都是协议,但层次是不同的。   先说semaphore,它是进程级的,用于多个进程之间对资源的互斥,虽然也是在 内核中,但是该内核执行路径是以进程的身份,代表进程来争夺资源的。如果 竞争不上,会有context switch,进程可以去sleep,但CPU不会停,会接着运行 其他的执行路径。从概念上说,这和单CPU或多CPU没有直接的关系,只是在 semaphore本身的实现上,为了保证semaphore结构存取的原子性,在多CPU中需要spinlock来互斥。   在内核中,更多的是要保持内核各个执行路径之间的数据访问互斥,这是最基本的互斥问题,即保持数据修改的原子性。semaphore的实现,也要依赖这个。在单CPU中,主要是中断和bottom_half的问题,因此,开关中断就可以了。在多CPU中,又加上了其他CPU的干扰,因此需要spinlock来帮助。这两个部分结合起来,就形成了spinlock_XXX。它的特点是,一旦CPU进入了spinlock_XXX,它就不会干别的,而是一直空转,直到锁定成功为止。因此,这就决定了被spinlock_XXX锁住的临界区不能停,更不能context switch,要存取完数据后赶快出来,以便其他的在空转的执行路径能够获得spinlock。这也是spinlock的原则所在。如果当前执行路径一定要进行context switch,那就要在schedule()之前释放spinlock,否则,容易死锁。因为在中断和bh中,没有context,无法进行context switch,只能空转等待spinlock,你context switch走了,谁知道猴年马月才能回来。   因为spinlock的原意和目的就是保证数据修改的原子性,因此也没有理由在spinlock 锁住的临界区中停留。   spinlock_XXX有很多形式,有   spin_lock()/spin_unlock(),   spin_lock_irq()/spin_unlock_irq(),   spin_lock_irqsave/spin_unlock_irqrestore()   spin_lock_bh()/spin_unlock_bh()   local_irq_disable/local_irq_enable   local_bh_disable/local_bh_enable   那么,在什么情况下具体用哪个呢?这要看是在什么内核执行路径中,以及要与哪些内核执行路径相互斥。我们知道,内核中的执行路径主要有:   1 用户进程的内核态,此时有进程context,主要是代表进程在执行系统调用 等。   2 中断或者异常或者自陷等,从概念上说,此时没有进程context,不能进行   context switch。   3 bottom_half,从概念上说,此时也没有进程context。   4 同时,相同的执行路径还可能在其他的CPU上运行。   这样,考虑这四个方面的因素,通过判断我们要互斥的数据会被这四个因素中   的哪几个来存取,就可以决定具体使用哪种形式的spinlock。如果只要和其他CPU互斥,就要用spin_lock/spin_unlock,如果要和irq及其他CPU互斥,就要用   spin_lock_irq/spin_unlock_irq,如果既要和irq及其他CPU互斥,又要保存EFLAG的状态,就要用spin_lock_irqsave/spin_unlock_irqrestore,如果要和bh及其他CPU互斥,就要用spin_lock_bh/spin_unlock_bh,如果不需要和其他CPU互斥,只要和irq互斥,则用local_irq_disable/local_irq_enable,   如果不需要和其他CPU互斥,只要和bh互斥,则用local_bh_disable/local_bh_enable,   等等。值得指出的是,对同一个数据的互斥,在不同的内核执行路径中,   所用的形式有可能不同(见下面的例子)。   举一个例子。在中断部分中有一个irq_desc_t类型的结构数组变量irq_desc[],   该数组每个成员对应一个irq的描述结构,里面有该irq的响应函数等。   在irq_desc_t结构中有一个spinlock,用来保证存取(修改)的互斥。   对于具体一个irq成员,irq_desc[irq],对其存取的内核执行路径有两个,一是   在设置该irq的响应函数时(setup_irq),这通常发生在module的初始化阶段,或   系统的初始化阶段;二是在中断响应函数中(do_IRQ)。代码如下:   int setup_irq(unsigned int irq, strUCt irqaction * new)   {   int shared = 0;   unsigned long flags;   struct irqaction *old, **p;   irq_desc_t *desc = irq_desc + irq;   /*   * Some drivers like serial.c use request_irq() heavily,   * so we have to be careful not to interfere with a   * running system.   */   if (new->flags & SA_SAMPLE_RANDOM) {   /*   * This function might sleep, we want to call it first,   * outside of the atomic block.   * Yes, this might clear the entropy pool if the wrong   * driver is attempted to be loaded, without actually   * installing a new handler, but is this really a problem,   * only the sysadmin is able to do this.   */   rand_initialize_irq(irq);   }   /*   * The following block of code has to be executed atomically   */   [1] spin_lock_irqsave(&desc->lock,flags);   p = &desc->action;   if ((old = *p) != NULL) {   /* Can't share interrupts unless both agree to */   if (!(old->flags & new->flags & SA_SHIRQ)) {   [2] spin_unlock_irqrestore(&desc->lock,flags);   return -EBUSY;   }   /* add new interrupt at end of irq queue */   do {   p = &old->next;   old = *p;   } while (old);   shared = 1;   }   *p = new;   if (!shared) {   desc->depth = 0;   desc->status &= ~(IRQ_DISABLED IRQ_AUTODETECT IRQ_WAITING);   desc->handler->startup(irq);   }   [3] spin_unlock_irqrestore(&desc->lock,flags);   register_irq_proc(irq);   return 0;   }  asmlinkage unsigned int do_IRQ(struct pt_regs regs)   {   /*   * We ack quickly, we don't want the irq controller   * thinking we're snobs just because some other CPU has   * disabled global interrupts (we have already done the   * INT_ACK cycles, it's too late to try to pretend to the   * controller that we aren't taking the interrupt).   *   * 0 return value means that this irq is already being   * handled by some other CPU. (or is disabled)   */   int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */   int cpu = smp_processor_id();   irq_desc_t *desc = irq_desc + irq;   struct irqaction * action;   unsigned int status;   kstat.irqs[cpu][irq]++;   [4] spin_lock(&desc->lock);   desc->handler->ack(irq);   /*   REPLAY is when Linux resends an IRQ that was dropped earlier   WAITING is used by probe to mark irqs that are being tested   */   status = desc->status & ~(IRQ_REPLAY IRQ_WAITING);   status = IRQ_PENDING; /* we _want_ to handle it */   /*   * If the IRQ is disabled for whatever reason, we cannot   * use the action we have.   */   action = NULL;   if (!(status & (IRQ_DISABLED IRQ_INPROGRESS))) {   action = desc->action;   status &= ~IRQ_PENDING; /* we commit to handling */   status = IRQ_INPROGRESS; /* we are handling it */   }   desc->status = status;   /*   * If there is no IRQ handler or it was disabled, exit early.   Since we set PENDING, if another processor is handling   a different instance of this same irq, the other processor   will take care of it.   */   if (!action)   goto out;   /*   * Edge triggered interrupts need to remember   * pending events.   * This applies to any hw interrupts that allow a second   * instance of the same irq to arrive while we are in do_IRQ   * or in the handler. But the code here only handles the _second_   * instance of the irq, not the third or fourth. So it is mostly   * useful for irq hardware that does not mask cleanly in an   * SMP environment.   */   for (;;) {   [5] spin_unlock(&desc->lock);   handle_IRQ_event(irq, ®s, action);   [6] spin_lock(&desc->lock)
[1] [2] 下一页 

(出处:http://www.sheup.com)


上一页 [1] [2]