C/Pointers with words

From Attie's Wiki
(Redirected from Pointers with words)
Jump to: navigation, search

Having been criticized by my colleagues, I decided to add some words to my Pointers page and hopefully make it feel a little less like a test (!)
Before we begin, this page assumes an x86 system (little endian - 32-bit) - which means that addresses and integers are 32-bit / 4 bytes.
The variable sizes may vary, for example a pointer on a 64-bit system is 8 bytes.
The memory maps on this page will not necessarily hold true for other architectures, especially systems that are not little endian, for example zSeries mainframes.

Contents

A Variable

If we start by thinking about a normal variable, i, of type int.

An int is just a block of 4 bytes that the compiler promises it will find memory for.
In reality this comes from the stack, but that is a detail for another time.

Let's declare i and assign 0xDEADBEEF to it.

int i;
 
i = 0xDEADBEEF;

If you were to inspect the memory for this process, and represent it visually, we might find the following.

Address Value Note
...
0xBF893E8F 0xDE i MSB
0xBF893E8E 0xAD i
0xBF893E8D 0xBE i
0xBF893E8C 0xEF i LSB
...

From this memory map, we can see that the variable i is stored at address 0xBF893E8C, and because the memory is byte addressable, i uses 4 'slots' - 0xDE, 0xAD, 0xBE, 0xEF.

A Pointer

Now let's consider a pointer.
They are just like normal variables, but they have a special meaning to the compiler.

int *p;
 
p = 0xDEADBEEF;

The code above gives exactly the same result, and assuming that p is located at the sample place in memory that i was, the previous memory map is still valid.

But when we come to 'de-reference' p we are going to have a problem.
The term 'de-referencing' is the act of following a pointer, to access the variable to which it points. This can be done in two ways:

*p     /* this is usually used in a situation similar to this, when 'p' points at a single variable */
p[0]   /* this is usually used when 'p' points to an array of variables */

The map below shows the value of p as well as the memory that it points to.
As you can see, things are no longer simple, because its value is used to locate some other memory.
In this case, the memory at 0xDEADBEEF is not allocated to anything, and accessing it will cause a segfault (your program will crash).
The color coding here highlights the value and the address that it points at.

Address Value Valid? Note
...
0xDEADBEF2 ?? No ?
0xDEADBEF1 ?? No ?
0xDEADBEF0 ?? No ?
0xDEADBEEF ?? No ?
...
0xBF893E8F 0xDE Yes p MSB
0xBF893E8E 0xAD Yes p
0xBF893E8D 0xBE Yes p
0xBF893E8C 0xEF Yes p LSB
...

Let's make something useful, instead of something that will crash.
Consider the following code:

int i;            /* our variable */
int *p;           /* our pointer */
 
p = &i;           /* assign the address of 'i' to 'p' */
 
*p = 0x12345678;  /* effectively assigns 0x12345678 to 'i' */

In this snippet, we make p point at i, and then store 0x12345678 in the memory that p points to.
This effectively stores 0x12345678 in i, and could result in the following memory map.

Address Value Valid? Note
...
0xBF893E8F 0x12 Yes i MSB
0xBF893E8E 0x34 Yes i
0xBF893E8D 0x56 Yes i
0xBF893E8C 0x78 Yes i LSB
0xBF893E8B 0xBF Yes p MSB
0xBF893E8A 0x89 Yes p
0xBF893E89 0x3E Yes p
0xBF893E88 0x8C Yes p LSB
...

'Strings'

Let's now investigate 'strings'. I've used inverted commas because a string in C is really just a chunk of memory with some ASCII values stored in it, and terminated with a nul byte ('\0' or 0x00).
Other languages provide friendly features, such as storing the length of the string so that it can contain nul bytes, and providing automatic reallocation of memory.

Strings are useful for demonstrating my next point (pointer vs. array), but for now consider the following snippet of code:

char *str = "Hello!";

So we have a pointer to some memory, and that memory is of type char.
A char is 1 byte long, and we may find the following in memory.

Address Value Character Valid? Note
...
0xBF893E8B 0x08 Yes str MSB
0xBF893E8A 0x04 Yes str
0xBF893E89 0x84 Yes str
0xBF893E88 0xF4 Yes str LSB
...
0x080484F9 0x00 '\0' Yes ?
0x080484F8 0x6F 'o' Yes ?
0x080484F7 0x6C 'l' Yes ?
0x080484F6 0x6C 'l' Yes ?
0x080484F5 0x65 'e' Yes ?
0x080484F4 0x48 'H' Yes ?
...

The p points to the beginning of a string that is stored in memory, and there is no symbol identifying the string, hence the ?'s.
As you can see here, the string itself is actually stored quite a long way from the stack.

In this case, &str = 0xBF893E88 and str = 0x080484F4.

In this case, it is safe to return the pointer from within a function and let it go out of scope, because the actual data is stored in the application's Data Segment.
This will be covered again in the Array section

De-referencing a pointer has the following characteristics:

Code Actual Meaning
*p Same as p[0] (see below)
p[0] p + ((sizeof(char) * 0)0x080484F4 + (1 * 0)0x080484F4 + 00x080484F40x48 / 'H'
p[1] p + ((sizeof(char) * 1)0x080484F4 + (1 * 1)0x080484F4 + 10x080484F50x65 / 'e'
p[2] p + ((sizeof(char) * 2)0x080484F4 + (1 * 2)0x080484F4 + 20x080484F60x6C / 'l'
p++ p += sizeof(char)
etc...

If the pointer pointed to an int instead, then the formula would be replaced with p[x] = p + ((sizeof(int) * x).

Array

Using the same example above, but declaring it as an array has an interesting effect. Follow on...

char str[] = "Hello!";

Is synonymous with the following (don't forget the terminating nul / '\0' / 0x00) because the compiler automatically calculated the amount of data required.

char str[7] = "Hello!";

By typing this, we don't ask for a pointer any more. We actually ask the compiler to put the string directly on the stack.
To achieve this, the compiler must use more stack space, and must also copy the initialization data before the function can run.
As the memory is now on the stack instead, it is possible to modify it with out getting a segfault, though this is probably not recommended.

Address Value Character Valid? Note
...
0xBF893E8D 0x00 '\0' Yes ?
0xBF893E8C 0x6F 'o' Yes ?
0xBF893E8B 0x6C 'l' Yes ?
0xBF893E8A 0x6C 'l' Yes ?
0xBF893E89 0x65 'e' Yes ?
0xBF893E88 0x48 'H' Yes str
...

In this case, &str = 0xBF893E88 and str = 0xBF893E88 (the same! - the compiler knows to de-reference to the same place, because you declared the variable with square brackets - char str[7]).

In this case it is also not possible to return the pointer out of the function, as its memory will be tidied up after the function returns and the memory location will become invalid.
You may see something like this: warning: function returns address of local variable

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox