( work in progress while I try to understand it ) ( also, much of the following is incomplete in ELKS ) ( add info on fork and exec )
struct task_struct
, declared in
<linuxmt/sched.h>
The currently-executing task is pointed to by current
The most important members of task[] are:
Q. Why have separate kernel and user stacks?
A. Because when kernel code executes it assumes SS = DS = its
own data segment. Any modern processor has a separate system stack
pointer; the 8086 doesn't, so the switch has to be done in software.
Q. Why do we have a separate kernel stack per task?
A. Because a context switch can occur in kernel code, if it
calls schedule() - although preemptive timeslicing does not occur
when kernel code is executing, nor during interrupts.
Q. When does timeslicing occur?
A. At the moment, on return from an interrupt, if we were in
userland when the interrupt occurred. In Linux, it can also occur
when returning from a system call.
TASK_RUNNING | The task is eligible to run. It might not actually be running because of timeslicing, but it will continue when its turn comes again. |
TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE |
The task is "sleeping", and won't wake up until another process puts it back to TASK_RUNNING, usually as a result of some external condition changing. The difference between the two is that a TASK_INTERRUPTIBLE will also be woken up if a signal arrives. |
TASK_STOPPED | A process is stopped when it receives a certain signal (SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOUT). It is restarted by sending it a SIGCONT. |
TASK_UNUSED | Indicates an empty slot in the task[] table |
If someone can give a more lucid and/or accurate description of the above please submit it!
save_regs(current); current = &task[curnum]; /* choose a new task */ load_regs(current);save_regs saves the exact state which the kernel was in when it called this function. load_regs doesn't return to that point (because the program flow would cause load_regs to be run again, ad infinitum) - rather it pops enough items off the stack to return to whoever called schedule().
schedule() can be called whenever the kernel wants to 'give up' its current timeslice - normally it would have set current->state=TASK_(UN)INTERRUPTIBLE first, otherwise it will be rescheduled at the next opportunity.
schedule() is also called at the end of the timer interrupt routine if the global flag need_resched is set (see arch/i86/kernel/irqtab.c). This is a bit hair-raising: the user's process is left hanging mid-timer-interrupt while another one continues! But when its turn comes again, the return from the timer interrupt completes.
need_resched is set if a process has used up its allotted time, which should also be calculated in the timer interrupt routine [not yet implemented].
Q. What if schedule() is called from a timer interrupt while the kernel is already executing schedule()?
A. This can't happen; the timer interrupt won't check need_resched unless user-level code is executing.
Note that hardware interrupt handlers are NOT allowed to call schedule! This means they can't sleep - they must run to completion and return.
The process which needs a resource calls sleep_on(q). sleep_on adds this process to the "wait queue" q, sets its state = TASK_UNINTERRUPTIBLE, and calls schedule(). This causes the process to sleep.
Later, another process which makes the resource available calls wake_up(q), which sets state=TASK_RUNNING for the sleeping process. This allows it to run; it continues at the point it got to in the sleep_on function (i.e. just after the call to schedule), where it removes itself from q and returns. Because q is actually a linked list, many processes can be asleep waiting for the same resource.
The functions that manipulate the wait queues protect themselves from being timesliced, by disabling interrupts. This prevents corruption of the queue.
Cunningly, no malloc-style storage allocation is needed to create objects for a wait queue. sleep_on defines a local variable of type struct wait_queue - i.e. it is allocated on the kernel stack for that process. When sleep_on returns, which is when we don't need it any more, it vanishes. If multiple tasks are sleeping for the same resource, q will be a linked list of objects, one on each kernel stack.
Last updated: 1 September 1996