C/Pointers with words
m (→'Strings') |
m (→'Strings') |
||
Line 157: | Line 157: | ||
|colspan="5" align="center"| ... | |colspan="5" align="center"| ... | ||
|} | |} | ||
− | The <code>p</code> points to the beginning of a string that is stored in memory, and there is no symbol | + | The <code>p</code> points to the beginning of a string that is stored in memory, and there is no symbol identifying the string, hence the <code>?</code>'s.<br> |
− | As you can see here, the string itself is actually stored quite a long way from the stack. | + | As you can see here, the string itself is actually stored quite a long way from the stack. |
− | In this case, it is safe to return the pointer and let it go out of scope, because the actual data is stored in the application's | + | |
+ | In this case, it is safe to return the pointer from within a function and let it go out of scope, because the actual data is stored in the application's [http://en.wikipedia.org/wiki/Data_segment Data Segment].<br> | ||
+ | This will be covered again in the [[#Array|Array]] section | ||
De-referencing a pointer has the following characteristics: | De-referencing a pointer has the following characteristics: |
Revision as of 21:47, 13 March 2012
Having been criticized by my colleagues, I decided to add some words to my Pointers page and hopefully make it feel a little less like a test (!) Before we begin, this page assumes an x86 system (32-bit) - which means that addresses and integers are 32-bit / 4 bytes.
Contents |
A Variable
If we start by thinking about a normal variable, i
, of type int
.
An int
is just a block of 4 bytes that the compiler promises it will find memory for.
In reality this comes from the stack, but that is a detail for another time.
Let's declare i
and assign 0xDEADBEEF
to it.
int i; i = 0xDEADBEEF;
If you were to inspect the memory for this process, and represent it visually, we might find the following (the colouring will become apparent later).
Address | Value | Note |
---|---|---|
... | ||
0xBF893E8F |
0xDE |
i MSB
|
0xBF893E8E |
0xAD |
i
|
0xBF893E8D |
0xBE |
i
|
0xBF893E8C |
0xEF |
i LSB
|
... |
From this memory map, we can see that the variable i
is stored at address 0xBF893E8C
, and because the memory is byte addressable, i
uses 4 'slots' - 0xDE
, 0xAD
, 0xBE
, 0xEF
.
A Pointer
Now let's consider a pointer.
They are just like normal variables, but they have a special meaning to the compiler.
int *p; p = 0xDEADBEEF;
The code above gives exactly the same result, and assuming that p
is located at the sample place in memory that i
was, the previous memory map is still valid.
But when we come to 'de-reference' p
we are going to have a problem.
The term 'de-referencing' is the act of following a pointer, to access the variable to which it points. This can be done in two ways:
*p /* this is usually used in a situation similar to this, when 'p' points at a single variable */ p[0] /* this is usually used when 'p' points to an array of variables */
The map below shows the value of p
as well as the memory that it points to.
As you can see, things are no longer simple, because its value is used to locate some other memory.
In this case, the memory at 0xDEADBEEF
is not allocated to anything, and accessing it will cause a segfault (your program will crash).
The color coding here highlights the value and the address that it points at.
Address | Value | Valid? | Note |
---|---|---|---|
... | |||
0xDEADBEF2 |
?? |
No | ?
|
0xDEADBEF1 |
?? |
No | ?
|
0xDEADBEF0 |
?? |
No | ?
|
0xDEADBEEF |
?? |
No | ?
|
... | |||
0xBF893E8F |
0xDE |
Yes | p MSB
|
0xBF893E8E |
0xAD |
Yes | p
|
0xBF893E8D |
0xBE |
Yes | p
|
0xBF893E8C |
0xEF |
Yes | p LSB
|
... |
Let's make something useful, instead of something that will crash.
Consider the following code:
int i; /* our variable */ int *p; /* our pointer */ p = &i; /* assign the address of 'i' to 'p' */ *p = 0x12345678; /* effectively assigns 0x12345678 to 'i' */
In this snippet, we make p
point at i
, and then store 0x12345678
in the memory that p
points to.
This effectively stores 0x12345678
in i
, and could result in the following memory map.
Address | Value | Valid? | Note |
---|---|---|---|
... | |||
0xBF893E8F |
0x12 |
Yes | i MSB
|
0xBF893E8E |
0x34 |
Yes | i
|
0xBF893E8D |
0x56 |
Yes | i
|
0xBF893E8C |
0x78 |
Yes | i LSB
|
0xBF893E8B |
0xBF |
Yes | p MSB
|
0xBF893E8A |
0x89 |
Yes | p
|
0xBF893E89 |
0x3E |
Yes | p
|
0xBF893E88 |
0x8C |
Yes | p LSB
|
... |
'Strings'
Let's now investigate 'strings'.
I've used inverted commas because a string in C is really just a chunk of memory with some ASCII values stored in it, and terminated with a nul byte ('\0
' or 0x00
).
Other languages provide friendly features, such as storing the length of the string so that it can contain nul bytes, and providing automatic reallocation of memory.
Strings are useful for demonstrating my next point (pointer vs. array), but for now consider the following snippet of code:
char *str = "Hello!";
So we have a pointer to some memory, and that memory is of type char
.
A char
is 1 byte long, and we may find the following in memory.
Address | Value | Character | Valid? | Note |
---|---|---|---|---|
... | ||||
0xBF893E8B |
0x08 |
Yes | p MSB
| |
0xBF893E8A |
0x04 |
Yes | p
| |
0xBF893E89 |
0x84 |
Yes | p
| |
0xBF893E88 |
0xF4 |
Yes | p LSB
| |
... | ||||
0x080484F9 |
0x00 |
'\0 ' |
Yes | ?
|
0x080484F8 |
0x6F |
'o ' |
Yes | ?
|
0x080484F7 |
0x6C |
'l ' |
Yes | ?
|
0x080484F6 |
0x6C |
'l ' |
Yes | ?
|
0x080484F5 |
0x65 |
'e ' |
Yes | ?
|
0x080484F4 |
0x48 |
'H ' |
Yes | ?
|
... |
The p
points to the beginning of a string that is stored in memory, and there is no symbol identifying the string, hence the ?
's.
As you can see here, the string itself is actually stored quite a long way from the stack.
In this case, it is safe to return the pointer from within a function and let it go out of scope, because the actual data is stored in the application's Data Segment.
This will be covered again in the Array section
De-referencing a pointer has the following characteristics:
Code | Actual Meaning |
---|---|
*p |
Same as p[0] (see below)
|
p[0] |
p + ((sizeof(char) * 0) → 0x080484F4 + (1 * 0) → 0x080484F4 + 0 → 0x080484F4 → 0x48 / 'H '
|
p[1] |
p + ((sizeof(char) * 1) → 0x080484F4 + (1 * 1) → 0x080484F4 + 1 → 0x080484F5 → 0x65 / 'e '
|
p[2] |
p + ((sizeof(char) * 2) → 0x080484F4 + (1 * 2) → 0x080484F4 + 2 → 0x080484F6 → 0x6C / 'l '
|
p++ |
p += sizeof(char)
|
etc... |
If the pointer pointed to an int
instead, then the formula would be replaced with p[x] = p + ((sizeof(int) * x)
.
Array
Using the same example above, but declaring it as an array has an interesting effect. Follow on...
char str[] = "Hello!";
By typing this, we don't ask for a pointer any more. We actually ask the compiler to put the string directly on the stack.
To achieve this, the compiler must use more stack space, and must also copy the initialization data before the function can run.
As the memory is now on the stack instead, it is possible to modify it with out getting a segfault, though this is probably not recommended.
In this case it is also not possible to return the pointer out of the function, as its memory will be tidied up after the function returns. You may see something like this: warning: function returns address of local variable
See the following memory map.
Address | Value | Character | Valid? | Note |
---|---|---|---|---|
... | ||||
0xBF893E8D |
0x00 |
'\0 ' |
Yes | ?
|
0xBF893E8C |
0x6F |
'o ' |
Yes | ?
|
0xBF893E8B |
0x6C |
'l ' |
Yes | ?
|
0xBF893E8A |
0x6C |
'l ' |
Yes | ?
|
0xBF893E89 |
0x65 |
'e ' |
Yes | ?
|
0xBF893E88 |
0x48 |
'H ' |
Yes | str
|
... |