Thursday, October 25, 2012

Error Correcting Code, Error Detection and Correction, Checksum

Error Detection and Correction (EDC)

Engineering is such a beautiful field, how brilliant minds are able to find awesome solution to the difficult problems.Error detection and correction is again such an example. As you have reached this post, try to think and generate a strategy yourself for EDC. Error detection is really easy and you should be able to find the primitive solution to it. Error correction on the other hand is little tricky. Anyway I will also explain one of simple strategy here.

Definition (WikiPedia)
  • Error detection (say ED) is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver.
  • Error correction(say EC) is the detection of errors and reconstruction of the original, error-free data.
So problem what we are trying to solve here is find 1) if there are incorrect bits in a byte stream (EC) and 2) find which bit is incorrect and correct it (EC)

Error Detection using Parity Check
Strategy is intentionally add few extra bit(s) to the original message which would help us finding sanity of the message on the other end.
Extra bits are called parity.
Original message + Parity is called Codeword.

Single parity check code
Simplest of strategy is called single parity check code (SPC).The SPC involves the addition of a single extra bit to the binary message, the value of this parity would depends on the bits in the message. In an even parity code, the additional bit added to each message ensures an even number of 1s in every codeword.
e.g. lets say message(7 bit ) is 1000 110. It has 3 bits as 1 so parity bit would be 1 to make total numbers event i.e 4
Code word would be 1000 1101

Mathematically
c = [c1 c2 c3 c4 c5 c6 c7 c8],
where each ci is either 0 or 1, and every codeword satisfies the constraint
c1 ⊕ c2 ⊕ c3 ⊕ c4 ⊕ c5 ⊕ c6 ⊕ c7 ⊕ c8 = 0

symbol ⊕ is modulo 2 addition.

You must be getting now, if message was sent through faulty channel and you recevied
1001 1101
Here number of 1's are 5 which are odd, means, there is error in this message.

Yeah, I get it you are already jumping your guns telling me
1) What if there are more than 1 bits which were turned from 0 to 1
2)  How to find which bit got flipped.

Answer of both of above questions is you DON'T. Remember when  I told its very primitive strategy. You have to add more redundant and sophisticated parity bits.

Below I will try to explain error detection strategy for 1 bit error correction

Encoding and Constraint Matrix
Lets say we have message bits c1, c2 and c3
and we are adding three parity bits
c4, c5,c6

we make a strategy to define c4, c5, c6 in such a way that

c4 = c1 ⊕ c2
c5 = c2 ⊕ c3
c6 = c1 ⊕ c2 ⊕ c3
These conditions of defining/choosing parity bits is called constraints. 

Using the constraints above the message 110 produces the parity-check bits
c4 = 1 ⊕ 1 = 0,
c5 = 1 ⊕ 0 = 1,
c6 = 1 ⊕ 1 ⊕ 0 = 0,
and so the codeword for this message is c = [1 1 0 0 1 0].

Above things can be put in mathematical form as



[c1 c2 c3 c4 c5 c6] = [c1 c2 c3][   1 0 0 1 0 1    ]
                                                       0 1 0 1 1 1
                                                       0 0 1 0 1 1           ---------Equation   1


Second matrix here is call Generator matrix or Parity check matrix(H)
suppose y is the code word received then

       c1
       c2           0
[G][c3  ]  = [ 0 ]                                                   ---------    Equation 2
       c4           0
       c5
       c6

            T
or H(y)     =  0

If its not all zero.. That particular code constraint is not satisfied.

The message bits are conventionally labeled by u = [u1, u2, · · · uk], where the vector u holds the
k message bits. Thus the codeword c corresponding to the binary message
u = [u1u2u3] can be found using the matrix equation c = uG.

Substituting each of 8 distinct messages c1 c2 c3 = 000, 001, . . . , 111
into equation above yields the following set of codewords for the code from
[0 0 0 0 0 0] [0 0 1 0 1 1] [0 1 0 1 1 1] [0 1 1 1 0 0]
[1 0 0 1 0 1] [1 0 1 1 1 0] [1 1 0 0 1 0] [1 1 1 0 0 1]   --------------------- Equation3


Error Detection and Correction

Suppose a code word is sent through binary channel and one bit got corrupted. This section would try to find and correct that bit.

The codeword c = [1 0 1 1 1 0] from the code in  was sent through a binary symmetric channel and the the string y = [1 0 1 0 1 0]  received.



substituting in equation 2

                  1
H(y)T  = [  0  ] 
                  0
This result is non zero that means bit flip must have been occurred. This resultant matrix is called syndrome of y.

Syndrome tells which parity constraint is not satisfied.

Error Detection/Correction
Above syndrome tells that first parity check constraint is not satisfied. Since this parity-check
equation involves the 1-st, 2-nd and 4-th codeword bits we can conclude that at
least one of these three bits has been inverted by the channel.

Hamming Distance
The Hamming distance between two codewords is defined as the number of bit positions in which they differ. For example the codewords [1 0 1 0 0 1 1 0] and [1 0 0 0 0 1 1 1] differ in two positions, the third and eight codeword bits, so the Hamming distance between them is two. The measure of the ability of a code to detect errors is the minimum Hamming distance or just minimum distance of the code. The minimum distance of a code, dmin, is defined as the smallest Hamming distance between any pair of codewords in the code. For the code in above example, dmin = 3, so the corruption of three or more bits in a codeword can result in another valid codeword.


A Simple Decoder to Find corrupted bit/ Maximum LikeliHood Decoder
To go further and correct the bit flipping errors requires that the decoder determine which codeword was most likely to have been sent. Based only on knowing the binary received string, y, the best decoder will choose the codeword closest in Hamming distance to y. When there is more than one codeword at the minimum distance from y the decoder will randomly choose one of them.This decoder is called the maximum likelihood (ML) decoder as it will always chose the codeword which is most likely to have produced y.

In above example we detected that the received string y = [1 0 1 0 1 0] was not a codeword of the code. By comparing y with each of the codewords in this code, the ML decoder will choose c = [1 0 1 1 1 0], as the closest codeword as it is the only codeword Hamming distance 1 from y.

This is one of the strategy. But still you can see that
1) Above exampled code can only correct errors  if hamming distance is 1
2) There are better algorithms available but this is just to give feel how things can work. You can check LDPC on wikipedia(reference below)




Refrences
1)http://www.cs.utexas.edu/~eberlein/cs337/errorDetection3.pdf
2)http://en.wikipedia.org/wiki/Error_detection_and_correction
3)http://en.wikipedia.org/wiki/Low-density_parity-check_code


Monday, October 15, 2012

6. What are Semaphores, RTOS Semaphores, MQX Semaphores

I would try to explain

what are semaphores?
What you need to take care if you want to implement your own semaphores?
Though I have label this post as MQX. But description is more on general term.


Semaphores

When They are used
Semaphores are used for two purposes
1) Process Synchronization
2) Critical Section problem / Mutual Exclusion

Is their any alternative to Semaphores?
Yes there are few solutions know as n process synchronize(software solutions) and Hardware solutions. Hardware solutions becomes complicated when number of process increases and secondly in all these you have busy waiting (it means you are eating CPU cycles when waiting and no other productive work can be done during this). So practically they are never used.

What are Semaphore?
They are synchronization tool. In a very abstract terms, semaphore is a global variables, which apart form initialization can be modified by only two operations wait() and signal().

Suppose S is semaphore
where wait is as simple as
wait(s)
{
     while(s <=0)
         ; //do no operation
    s--;
}

signal(S)
{
       S++;
}

Example of Usage
1) n process critical section problem
put every process code like this
do {
       wait ( mutex );
       critical section;
       signal(mutex);
       reminder section;
}while(1);

2) Example of using semaphores for Synchronization
Lets say, two concurrent process P1 and P2 having statements S1 and S2. We want in any case S1 should execute first. this can be achieved easily by
initialize Sem=0;

In process P1
{
   // Execute what ever you want to do
   // before executing P2
   S1;
   signal(Sem);

}

in process P2
{
     wait(Sem);
     S2;
}


Why not use Global Variables instead of semaphores to synchronize ?

Now if world was perfect, global variables can be used. You can write above mentioned functions in your C code to create semaphores. But(as its case with World anyway), its not that simple. In addition to above things we need to ensure

1) Modification of S value must be done exclusively.
2) In addition checking of while(s<=0) in wait and possible modification S--, should also be done without interruption
3) Above implementation is having busy wait too

How OS/RTOS implement Semaphores.What is spin lock?

The very basic implementation which I showed above has one problem they also have busy waiting. This is not at very appreciated in real time environments where wasting CPU is crime, equivalent to murdering someone (i.e murdering in civilized society). These type of semaphores are called the spin-lock semaphores.

But spin-lock semaphores have advantage too. There is no context switch required when process wait on Semaphore. Hence if waiting time is minimal this approach is very helpful.

How to overcome busy waiting.
You can modify wait and signal functions. When ever you have to wait, instead of looping you can block the current process. Bock is very common system call present in every Operating system.
In signal you simply call wakeup() for that process.
You also have to make sure that you maintain queue for all blocked process.

 There is one more important point.  On wakeup() you are simply putting it on ready queue. Its up to CPU scheduler scheduling policy when that process will be executed.


Critical Section Problem of Semaphore implementation.
The critical aspect of semaphore implementation that they are executed atomically. We must guarantee that no two processes can execute wait and signal operations on the same semaphores at same time. This is called critical section problem and can be solved in 2 ways.

1) In uni-processor environment, we can simply inhibit (disable) interrupts during the time that wait and signal operations are executing. This works very well in uni processor environment. Because, once interrupts are inhibited, there is no one, I mean literally no one (even 'the one' from matrix) can ask CPU to execute some thing else.

2)In multiprocessor env inhibiting of interrupts does not work. Instructions from different processors many be interleaved in some arbitrary way. We can  not employ any simple software solutions (not discussed here) for critical section problem.

I again emphasize what I said earlier, that MQX gives you two type of implementation one is called light weight implementation. When you call light weight implementation, MQX RTOS forgets about complexity of multiprocessor environment.


Saturday, October 13, 2012

Align a pointer to 4 byte boundary, Align a pointer to word bounday

Many a times, you would need to move a pointer to next word aligned location.

The pointer you are getting is not word align.


uint8 alignBoundary =4; //can be 2 or 8 too


void SomeFun(uint8 *ptr)
{
        uint8 misAlignBytes;
        uint8 shiftRequired;

        misAlignBytes = ptr & (alignBoundary-1);
        //e.g  in case of 4 byte align above would be misAlignBytes = ptr & 0x03

        shiftRequired = (misAlignBytes ? (alignBoundary- misAlignBytes ):0 ) ;

       ptr = ptr + (uint8*)shitRequired;

      //thats it you pointer is word aligned.
      //After this pointer you can use your pointer without being worried about misalignment 

}



 

Friday, October 12, 2012

5. MQX RTOS Handling Interrupts -2


How to install an application ISR?

By call _int_install_isr(). MQX updates its interrupt vector table appropriately and calls your ISR handler when ever your interrupt occurs.

You might have guessed from my previous post when installing an ISR you have to supply
  • Interrupt Number
  • ISR function pointer
  • data pointer
Whats most frequent things Done in ISR?
  • setting a event bit
  • posting a semaphore, message queues etc.
  • DE-queing a task from task queue
 Whats all  MQX does not allow to place in an ISR?
 Basically, MQX has put some intelligent to stop you doing some stupid thing inside ISR. If you try to perform those illegal operations MQX would return error. You cannot 
  • cannot create/destroy a semaphore/message queue/events
  • cannot wait for an event inside an ISR
  • to cut the chase, you can not create any kernel object inside an ISR
 What you should not call inside an ISR?
  • No IO functions as they may take unpredictable time
  • never call _int_default_isr
  • any wait event function
  • any mutex lock operation
  • never create/destroy a task
  •  
     Next topic explains Semaphores in detail. Click here

Thursday, October 11, 2012

4. MQX RTOS Handing Interrupts -1



MQX handles HW interrupts and exceptions with interrupt service routines (ISR). Please note the sentence ‘exceptions are handled by ISR’. In due course I would discuss my experience with some very interesting problem involving exception ISR.

What an ISR? What goes inside an ISR?
ISR is not a task. It’s a small piece of code which is executed whenever an interrupt occurs. The main functionally of ISR may include
a)      Servicing a device
b)      Clearing an error condition
c)       Singling a task

MQX allows to pass a parameter to the ISR handler whenever application is installing an ISR. This parameter is a pointer. Please note that this pointer should not be at task stack, as it can be called even before task is created.

People keep on saying ISR should be precise. Well, that makes lots of sense as during ISR other interrupts may be disabled. And you definitely not want to miss the interrupts. Are you asking for example?
Hmm, let’s say whenever aeroplane’s engine is getting too hot, cooling procedure should be initiated immediately.  Now, this is how it would work, a sensor would trigger an HW interrupt, and ISR would ask you application to run the Cooling Task. But wait, you application is handling a pervioius ISR which has disabled other interrupts. And you are playing ‘print Hello word’ a zillion time game there in ISR(just kidding!). So you would miss the interrupt.( And lets hope you are not flying on the plane )


Back to serious stuff.

What is Kernel ISR?
MQX provides a kernel ISR(_init_kernel_isr()) which is written in assembly language. The kernel ISR runs before any other ISR. And it
a)      saves the context of the active task.
b)      switches to the interrupt stack .
c)       calls the appropriate ISR .
d)      after the ISR has returned, restores the context of the highest-priority ready task
When MQX starts it installs kernel ISR to all interrupts.

Initializing interrupt handling/Installing interrupts and Default ISR
       When MQX starts, it initializes its ISR vector table which has entry for all interrupts. Each entry consists of  a)   pointer to ISR  b)data to pass as parameter c)   Default exception handler.

Initially ISR for each entry is ‘_int_default_isr()’.
All of this is performed by a MQX kernel function called _int_init(), which loops for all interrupts and does what I mentioned above.

continued in next post. Click here

Good article on ARM exception handling
http://www.csie.nctu.edu.tw/~wjtsai/EmbeddedSystemDesign/Ch3-1.pdf 

MQX RTOS Tutorial, MQX Tutorial, RTOS Tutorial

Wednesday, October 10, 2012

3. MQX Events. Event Handling, Building an Event based application

Events



Events can be used to synchronize tasks. Just to make it simpler, you can imagine an event a bit in the 32 bit integer number (mostly, that is the word size of a MQX RTOS).
So this is how it would happen:
a) Somebody-1 would be waiting for this bit to be set. Until this bit is set that somebody would be stuck.
b) And Somebody-2 would set that bit so that somebody-1 can proceed further.
So in crude and very layman sense, events in RTOS are nothing but setting/clearing one bit.  

Now let’s add few more technicalities to it.

Somebody-1 above normally would be tasks which would wait for something. And sombebody-2 could be another Task or ISR.
In fact many embedded applications work this way, ISR gets triggered and as ISR handlers should be very small, they normal set an event and exit. So it serves both purposed ISR finishes quickly and also your application logic of waiting gets out of ISR. (in fact this is the most widely approach of designing ISR/task based applications).  

Why not use Setting/Clearing of bits than Events? What extra ROTS event handling adds apart from setting clearing bits?

a)      First and for most, whenever RTOS calls are setting/clearing event bit that part of kernel code is 100% atomic and safe. I mean, as done for other parts of kernel, when updating (here setting/clearing bit) interrupts are disabled.
      (Imagine you try to mimic the events and you are about to clear the bit and suddenly and ISR comes which tries to set the same bit. Well, that would be it for your logic.)  
If you have access to MQX source it would be fun exercise to trace the implementation of set/get event.

b)      Option to wait for single event (one event) or many events in single call.
_event_wait_all -> if you choose this one your wait call would be blocked until all the events(bits) you specified in the mask are set.
_event_wait_any -> if any of the events(bit) specified in the mask is triggered.

c)       RTOS has data structure called run queue and blocked queue. So event wait call need not be a infinite for loop. If you are implementing the events try to think how would be your WaitEvent call. As you don't have those data structure it would have to be infinite loop.
      RTOS has that(run, blocked queue) info, so when it hits a event wait call, it puts that task to blocked list. When it receives the event it checks the blocked queue and puts that task to ready queue.
      Infinite loop are big NO in embedded systems. As in that case, CPU would be performing loop operation and consuming power.

d)      Option to unblock all the tasks. Suppose 3 tasks are waiting for an event, and event is received. There in an option in MQX which would automatically clear the event as soon as it arrives and unblock all the tasks.

e)      You can also pass timer parameter to the wait event calls, means you will only be blocked for x mill secs.


BTW, MQX comes with two type of event handling, one with lightweight event handling and second one is (let’s say) normal event handling. Lightweight APIs are targeted to single processor systems while other one can handle more than one processor. Remember, in first MQX  post when I said MQX can handle more than one processors.


 PS: I intentionally have avoided to put the API names, thinking behind this is concepts are more important than the API names.


Continue reading about events here

One interesting MQX problem I faced can be found here