Saturday, January 19, 2013

Optimize memcpy(), Faster memcpy()

A primitive implementation of memcpy function would look like this:

 void basic_memcp(char *des, char *src, uint32 len)   
 {  
   uint32 i;  
   for(i=0;i<len;i++)  
   {  
    *des++ = *src++;  
   }  
 }  



Question is to improve on this.
Improvement can be  made in terms that, the above function is not doing overlap check of source and destination buffers.

But, we are looking on performance improvement in term of execution time.
Normally, CPU runs at much faster clock than the memory. Which means in system memory access is slowest operation. So, if we can reduce number of memory access during any operation, performance is bound to improve. 

How can we reduce number of memory accesses, its by transferring data in term of word size. Many system has word size as 4. So in above function, if instead of accessing data byte by byte, we access data in term of 4 bytes, execution time should reduce.

But, solution is not straight forward, both src and/or destination pointers can be unaligned. Many system do not support unaligned memory access. Even on the platform it is supported, its more costly than aligned memory access. So we have to make sure that all memory accesses are word aligned.

I have tried to solve one of such case, where destination address is assumed as word aligned and source address is  starting at 1 byte after the word boundry.
 /*Assumptions  
 a) destination is word aligned  
 b) source is at 3 bytes before next word aligned boundry  
 c) focusing more on main concept, avoiding case where len is less than 4  
 d) machine is little endian  
 */  
 void memcpy_1(char *des, char *src, uint32 len)  
 {  
   uint32 srcData;  
   uint32 desData;  
   uint32 tempLen=len/4;  
   uint8 i;  
   uint32 *desPtr32 = (uint32*)des;  
   uint32 *srcPtr32 = (uint32*)(src-1);  
   uint8 *desPtr8 ;  
   uint8 *srcPtr8 ;  
   srcData = *srcPtr32;  
   while(tempLen)  
   {  
    /* chuck out msb 8 bits, DO remember its Little Endian*/  
    desData = srcData >> 8;   
    srcData = *(++srcPtr32);  
    /*Fill the last byte, Again Little Endian*/  
    desData |= (srcData <<24);  
    *desPtr32++ = desData;  
    tempLen--;  
   }  
   desPtr8 = (uint8*)desPtr32;  
   srcPtr8 = (uint8*)srcPtr32;  
   srcPtr8++;  
   /*Cover the case for leftover from above loop*/  
   for(i=0; i<(len%4); i++)  
   {  
    *desPtr8++ = *srcPtr8++;  
   }  
 }  


It should be noted that there are other cases, where we have to align that particular case in particular way(e.g. source address is at 2 bytes from next aligned address and destination address is 1 byte from next aligned address etc,). But, the concept remains the same .

I have also written a test program(tested on visual studio) which will show the performance gain of this approach.  

File can be downloaded from here.


Notepad++ Reindent C++ code, Notepad++ TextFX option not available

Select the section of code and press below







If your notepad++ does not have this option, you need to download a plug in.

Download the plugin from here



 Save the dll file somewhere.

Open the notepad++ Click on menu as shown below







Browse the dll file. That's it. You are game.!
Happy Coding.



Friday, January 11, 2013

Difference between Monolithic and Microlithic kernels





Micro kernel and Monolithic kernels
As the name suggest mono means everything put together in one huge unit that is called the monolithic. Monolithic kernel contains all of the services like process management, file management, Scheduler, Virtual memory etc are in kernel space.

On the other hand, in Microlithic kernel only small portion of basic services(lets call them primitive services) are put into kernel space and remaining services goes to the user space. The user space facilities are called as servers.
When any request comes which cannot be served by the kernel( remember in cases of micro kernel, kernel space contains essential of essential services), the primitive kernel passes the message to the related server, which in turn would gets the requested work done.

Below are the main difference.

1) Memory management : In case of Mono kernel everything in the kernel space. Micro kernels only keeps the basic of the facility in kernel code and remaining is implemented in the user space.

2) Security and stability : Say if one of the process crashes in process management module; then, in case of monolithic kernel whole system goes down.
But in case of micro, we still have a chance that kernel will be active as most of the work is done in user address space. One more fact is that lesser the code in the kernel space easy its to maintain to write the tighter code.

3) I/O: Even in mono kernels, the concept of module loading/unloading was present. Prime example was device driver. But, its still hardwired with the kernel. So, it means the module working with the generation A mono kernel may not work with generation B mono kernel. You can load/unload a device driver but you cannot replace A with B.
Micro kernel makes it easy, As kernel simply passes the message to the server, and then its expects results. It doesn't matter to the primitive kernel how work is getting done in server code. Driver manger pass the information to the related driver module. You can simply replace the driver manger without affecting the system.

4) Extensibility and Portability :
Beside size the extensibility is the most important factor of micro kernel. Adding new features to a monolithic system means recompilation of the whole kernel, often including the whole driver infrastructure. e.g. If you have a new memory management routine and want to implement it into a monolithic architecture, modification of other parts of the system could be needed. In case of a micro kernel the services are isolated from each other through the message system. It is enough to re-implement the new memory manager. The processes which formerly used the other manager, do not notice the change. micro kernels also show their flexibility in removing features . That way a micro kernel can be the base of a desktop operating system and as well as of real time appliances in single chip systems.

Example
Mono Kernels : Linux
Micro : QNX the real time Operating system

Sunday, January 6, 2013

Heisenbug : Missing Event Problem, Interesting Embedded system problem


I was working on a embedded environment where different processors would interact with each other. Let call them Master and Slave1 and Slave2.

Normal flow was like this, Master processor based on the input would ask Slave1 and Slave2 to do the work. Master processor can queue up more than one job for either Slave1 or Slave2.

Master keeps tracks of how many jobs it asked for particular Slave and would expect same number of responses (RTOS event) from the slave.

When slave completes a job it sends a interrupt to the master. Interrupt would generate an RTOS event.

Here was the problem, during hour’s long stress testing; one of the test was failing. I took the failed test case and fired it as standalone test And it failed.

Now was the time to put breakpoint in debugger and check, but, hey, if I put break point everything seems to be working fine. This is classic example of Heisenberg.

Well, I don’t think there is any hard and fast rule to fix Heisenberg problem . You should have thorough understanding of system.

In my case, one ‘other’ module , which was called before the problematic modules, was leaving one of the unhanded RTOS event from Slave2. So during stress testing, even before Slave2 has completed the Job, Master was getting a response, and subsequently it thinks that Slave2 has finished the job. But, in actuality Slave2 was still processing the job.

When I was hooking up the debugger, by the time I was stepping thought the program, Slave2 had enough time to finish the job and hence problem was getting masked.






Saturday, January 5, 2013

6. Puzzels. Whats date of birth

Yesterday in a party, I asked Mr. Shah his birthday. With a mischievous glint in his eyes he replied. "The day before yesterday I was 83 years old and next year I will be 86." Can you figure out what is the Date of Birth of Mr. Shah? Assume that the current year is 2000.

Ans

Mr. Shah's date of birth is 31 December, 1915

Today is 1 January, 2000. The day before yesterday was 30 December, 1999 and Mr. Shah was 83 on that day. Today i.e. 1 January, 2000 - he is 84. On 31 December 2000, he will be 85 and next year i.e. 31 December, 2001 - he will be 86. Hence, the date of birth is 31 December, 1915

5. Puzzles, Tanya's Date

Tanya wants to go on a date and prefers her date to be tall, dark and handsome.

Of the preferred traits - tall, dark and handsome - no two of Adam, Bond, Cruz and Dumbo have the same number.
Only Adam or Dumbo is tall and fair.
Only Bond or Cruz is short and handsome.
Adam and Cruz are either both tall or both short.
Bond and Dumbo are either both dark or both fair.
Who is Tanya's date?


Ans.
Basic method would be assume Adam is tall, dark and handsome and then read the above statements and see if they all hold good.
Repeat same for Bond, Crux and Dumbo. You will find that assumption only for Cruz hold good.

But little more logical and deductible solution is as below.

Cruz is Tanya's date.
As no two of them have the same number of preferred traits - from (1), exactly one of them has none of the preferred traits and exactly one of them has all the preferred traits.

From (4) and (5), there are only two possibilities:
* Adam & Cruz both are tall and Bond & Dumbo both are fair.
* Adam & Cruz both are short and Bond & Dumbo both are dark.

But from (2), second possibility is impossible. So the first one is the correct possibility i.e. Adam & Cruz both are tall and Bond & Dumbo both are fair.

Then from (3), Bond is short and handsome.

Also, from (1) and (2), Adam is tall and fair. Also, Dumbo is the person without any preferred traits. Cruz is Dark. Adam and Cruz are handsome. Thus, following are the individual preferred traits:

Cruz - Tall, Dark and Handsome
Adam - Tall and Handsome
Bond - Handsome
Dumbo - None :-(

Hence, Cruz is Tanya's date.

Next Puzzle

4. Puzzles, 1000 apples in 10 boxes

How to Divide apples

An apple vendor has 1000 apples and 10 empty boxes. He asks his son to place all the 1000 apples in all the 10 boxes in such a manner that if he asks for any number of apples from 1 to 1000, his son should be able to pick them in terms of boxes. How did the son place all the apples among the 10 boxes, given that any number of apples can be put in one box.

Ans.

1, 2, 4, 8, 16, 32, 64, 128, 256, 489

Let's start from scratch.
• The apple vendor can ask for only 1 apple, so one box must contain 1 apple.
• He can ask for 2 apples, so one box must contain 2 apples.
He can ask for 3 apples, in that case box one and box two will add up to 3.
• He can ask for 4 apples, so one box i.e. third box must contain 4 apples.
• Now using box number one, two and three containing 1, 2 and 4 apples respectively, his son can give upto 7 apples. Hence, forth box must contain 8 apples.
• Similarly, using first four boxes containing 1, 2, 4 and 8 apples, his son can give upto 15 apples. Hence fifth box must contain 16 apples.
You must have noticed one thing till now that each box till now contains power of 2 apples. Hence the answer is 1, 2, 4, 8, 16, 32, 64, 128, 256, 489. This is true for any number of apples, here in our case only upto 1000.

Next Puzzle here

Thursday, January 3, 2013

Techi Slags, Techi Acronym, Heisenbug,

This is fun post about some acronym slang. .

WISCA - Why Isn't Sam Coding Anything

Applicable for mainely for
a) mangers who want some code output even before requirement/design is over.
b) Naive programmer who are too eager to code without doing proper prerequisite.

FUBAR- fucked up beyond all recognition/any repair/all reason

PEBCAK- Problem Exists Between Chair And Keyboard 


Bohr Bug: Its antonym of HeisenBug.  A repeatable bug; one that manifests reliably under a possibly unknown but well-defined set of conditions

Heisenbug : Is a software bug which disappears or changes its behavior when you try to reproduce or isolate it.

Heisenbug are often very interesting bugs to solve. I would post few of them which I have faced.

Read the problem here.