Monday, July 20, 2009

CQ001: Storage of constants

Question: how exactly does a statement like : int i = 10 ; work ?? Does it happen like : Store 10 at a mem location where constants are stored Allocate memory for i Assign the val of constant to i ??

Information:
Let us assume your compiler does not do any kind of optimization on 'i' and 'i' is not declared with storage class "register".

Now there are only two possible ways "int i = 10; " can occur
  1. Not bound within a block or declared with static storage class specifier (i.e., identifier 'i' is not 'auto')
  2. Bound inside a block (i.e., identifier 'i' is an 'auto' variable)
Case 1:
Example:
sample1.c
int i = 10;
void foo(void)
{
/* Some code that uses 'i' */
}
In this case value for 'i' is assigned (assigned is not a right word to use here I must use 'loaded' instead, but assigned is commonly used and well understood) at the time of loading and not done after the execution begins. Here value 10 will be stored in initialized data section of your executable file. Initializing the object identified by 'i' will be done by the loader (mostly in case of a Hosted Environment) or by the startup function [function who calls the main] (mostly in case of Freestanding Environment). For more details about loading an executable check article "Understanding Loader" [Yet to come].

Case 1:
Example:
sample2.c
void foo(void)
{
int i = 10;
/* Some code that uses 'i' */
}

In the above case the value for 'i' is assigned during run-time that is the above code will produce a machine code as shown below

1.) CISC machines like x86 (Endian - Little)

C7 45 FC 0A 00 00 00 (You will find this hex sequence in your executable file's text section)

The above is a single instruction (7 byte) fed to the decoder, and upon execution by the processor 'i' will end-up with a value 0x0000000A (For those who what to know why 10 has become 0x0000000A please check out the article Value Vs Representation [yet to come])

Let us try to decode that cryptic numbers
C7 --> Opcode for instruction "MOV"
45 --> ModR/M byte (Click here to find more details)
FC --> -4 in 2's compliment
0A 00 00 00 --> 4 byte integer in "little endian" format (32 - bit)

We can represent the same in assembly language as

MOV [EBP - 4], 0x0000000A

So, if you change the value of 'i' to 305419896, then the same instruction will become
C7 45 FC 78 56 34 12 --> "EXERCISE: Think how?"

if you declare "int j;" before "int i = 10;" you might (just might) end-up in an instruction like
C7 45 F8 0A 00 00 00

If not able to guess why? then see "Function prologue and epilogue" [yet to come]

2) RISC machines like ARM (Endian - Little)

In ARM with optimizations disabled it generated a code like (Remember RISC machines will have fixed size instruction, for ARM it is 32 bit)
e3 a0 30 0a
e5 0b 30 08

EXERCISE: Try to encode those instructions by yourself!
HINT: Use this document, for encoding (see page number 10 of that document).

No comments: