C/Pointers with words

Revision as of 21:47, 13 March 2012

Having been criticized by my colleagues, I decided to add some words to my Pointers page and hopefully make it feel a little less like a test (!) Before we begin, this page assumes an x86 system (32-bit) - which means that addresses and integers are 32-bit / 4 bytes.

A Variable

If we start by thinking about a normal variable, i, of type int.

An int is just a block of 4 bytes that the compiler promises it will find memory for.
In reality this comes from the stack, but that is a detail for another time.

Let's declare i and assign 0xDEADBEEF to it.

int i;
 
i = 0xDEADBEEF;

If you were to inspect the memory for this process, and represent it visually, we might find the following (the colouring will become apparent later).

Address	Value	Note
...
`0xBF893E8F`	`0xDE`	`i` MSB
`0xBF893E8E`	`0xAD`	`i`
`0xBF893E8D`	`0xBE`	`i`
`0xBF893E8C`	`0xEF`	`i` LSB
...

From this memory map, we can see that the variable i is stored at address 0xBF893E8C, and because the memory is byte addressable, i uses 4 'slots' - 0xDE, 0xAD, 0xBE, 0xEF.

A Pointer

Now let's consider a pointer.
They are just like normal variables, but they have a special meaning to the compiler.

int *p;
 
p = 0xDEADBEEF;

The code above gives exactly the same result, and assuming that p is located at the sample place in memory that i was, the previous memory map is still valid.

But when we come to 'de-reference' p we are going to have a problem.
The term 'de-referencing' is the act of following a pointer, to access the variable to which it points. This can be done in two ways:

*p     /* this is usually used in a situation similar to this, when 'p' points at a single variable */
p[0]   /* this is usually used when 'p' points to an array of variables */

The map below shows the value of p as well as the memory that it points to.
As you can see, things are no longer simple, because its value is used to locate some other memory.
In this case, the memory at 0xDEADBEEF is not allocated to anything, and accessing it will cause a segfault (your program will crash).
The color coding here highlights the value and the address that it points at.

Address	Value	Valid?	Note
...
`0xDEADBEF2`	`??`	No	`?`
`0xDEADBEF1`	`??`	No	`?`
`0xDEADBEF0`	`??`	No	`?`
`0xDEADBEEF`	`??`	No	`?`
...
`0xBF893E8F`	`0xDE`	Yes	`p` MSB
`0xBF893E8E`	`0xAD`	Yes	`p`
`0xBF893E8D`	`0xBE`	Yes	`p`
`0xBF893E8C`	`0xEF`	Yes	`p` LSB
...

Let's make something useful, instead of something that will crash.
Consider the following code:

int i;            /* our variable */
int *p;           /* our pointer */
 
p = &i;           /* assign the address of 'i' to 'p' */
 
*p = 0x12345678;  /* effectively assigns 0x12345678 to 'i' */

In this snippet, we make p point at i, and then store 0x12345678 in the memory that p points to.
This effectively stores 0x12345678 in i, and could result in the following memory map.

Address	Value	Valid?	Note
...
`0xBF893E8F`	`0x12`	Yes	`i` MSB
`0xBF893E8E`	`0x34`	Yes	`i`
`0xBF893E8D`	`0x56`	Yes	`i`
`0xBF893E8C`	`0x78`	Yes	`i` LSB
`0xBF893E8B`	`0xBF`	Yes	`p` MSB
`0xBF893E8A`	`0x89`	Yes	`p`
`0xBF893E89`	`0x3E`	Yes	`p`
`0xBF893E88`	`0x8C`	Yes	`p` LSB
...

'Strings'

Let's now investigate 'strings'. I've used inverted commas because a string in C is really just a chunk of memory with some ASCII values stored in it, and terminated with a nul byte ('\0' or 0x00).
Other languages provide friendly features, such as storing the length of the string so that it can contain nul bytes, and providing automatic reallocation of memory.

Strings are useful for demonstrating my next point (pointer vs. array), but for now consider the following snippet of code:

char *str = "Hello!";

So we have a pointer to some memory, and that memory is of type char.
A char is 1 byte long, and we may find the following in memory.

Address	Value	Character	Valid?	Note
...
`0xBF893E8B`	`0x08`		Yes	`p` MSB
`0xBF893E8A`	`0x04`		Yes	`p`
`0xBF893E89`	`0x84`		Yes	`p`
`0xBF893E88`	`0xF4`		Yes	`p` LSB
...
`0x080484F9`	`0x00`	'`\0`'	Yes	`?`
`0x080484F8`	`0x6F`	'`o`'	Yes	`?`
`0x080484F7`	`0x6C`	'`l`'	Yes	`?`
`0x080484F6`	`0x6C`	'`l`'	Yes	`?`
`0x080484F5`	`0x65`	'`e`'	Yes	`?`
`0x080484F4`	`0x48`	'`H`'	Yes	`?`
...

The p points to the beginning of a string that is stored in memory, and there is no symbol identifying the string, hence the ?'s.
As you can see here, the string itself is actually stored quite a long way from the stack.

In this case, it is safe to return the pointer from within a function and let it go out of scope, because the actual data is stored in the application's Data Segment.
This will be covered again in the Array section

De-referencing a pointer has the following characteristics:

Code	Actual Meaning
`*p`	Same as `p[0]` (see below)
`p[0]`	`p + ((sizeof(char) * 0)` → `0x080484F4 + (1 * 0)` → `0x080484F4 + 0` → `0x080484F4` → `0x48` / '`H`'
`p[1]`	`p + ((sizeof(char) * 1)` → `0x080484F4 + (1 * 1)` → `0x080484F4 + 1` → `0x080484F5` → `0x65` / '`e`'
`p[2]`	`p + ((sizeof(char) * 2)` → `0x080484F4 + (1 * 2)` → `0x080484F4 + 2` → `0x080484F6` → `0x6C` / '`l`'
`p++`	`p += sizeof(char)`
etc...

If the pointer pointed to an int instead, then the formula would be replaced with p[x] = p + ((sizeof(int) * x).

Array

Using the same example above, but declaring it as an array has an interesting effect. Follow on...

char str[] = "Hello!";

By typing this, we don't ask for a pointer any more. We actually ask the compiler to put the string directly on the stack.
To achieve this, the compiler must use more stack space, and must also copy the initialization data before the function can run.
As the memory is now on the stack instead, it is possible to modify it with out getting a segfault, though this is probably not recommended.
In this case it is also not possible to return the pointer out of the function, as its memory will be tidied up after the function returns. You may see something like this: warning: function returns address of local variable
See the following memory map.

Address	Value	Character	Valid?	Note
...
`0xBF893E8D`	`0x00`	'`\0`'	Yes	`?`
`0xBF893E8C`	`0x6F`	'`o`'	Yes	`?`
`0xBF893E8B`	`0x6C`	'`l`'	Yes	`?`
`0xBF893E8A`	`0x6C`	'`l`'	Yes	`?`
`0xBF893E89`	`0x65`	'`e`'	Yes	`?`
`0xBF893E88`	`0x48`	'`H`'	Yes	`str`
...

@@ Line 157: / Line 157: @@
 |colspan="5" align="center"| ...
 |}
-The <code>p</code> points to the beginning of a string that is stored in memory, and there is no symbol pointing at the string, hence the <code>?</code>'s.<br>
+The <code>p</code> points to the beginning of a string that is stored in memory, and there is no symbol identifying the string, hence the <code>?</code>'s.<br>
-As you can see here, the string itself is actually stored quite a long way from the stack.<br>
+As you can see here, the string itself is actually stored quite a long way from the stack.
-In this case, it is safe to return the pointer and let it go out of scope, because the actual data is stored in the application's read-only memory.
+In this case, it is safe to return the pointer from within a function and let it go out of scope, because the actual data is stored in the application's [http://en.wikipedia.org/wiki/Data_segment Data Segment].<br>
+This will be covered again in the [[#Array|Array]] section
 De-referencing a pointer has the following characteristics:

C/Pointers with words

Revision as of 21:47, 13 March 2012

Contents

A Variable

A Pointer

'Strings'

Array

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox