PA W2 - C Programming Language Basic

Last

Warning

If someone is reading this blog, please be aware that the writer did not consider the experience of the other readers.
After all, the most important part is about writing things down for better memorization.

C programming language used in PA

  • How to generate an executable file from a C source file?
1
.c --precompile--> .i --compile--> .s --assemble--> .o --link--> .out

Precompile

  • What are precompile behavior?

  • For example, these are precompile behavior:

1
2
3
4
5
#include
#define
#
##
#ifdef

include

  • Normally, head files do not contain the definition functions, instead, they include the declaration of the functions.(and some variables)

  • the declaration of the standard library function printf:

    1
    int printf(const char *restrict format, ...);
  • if we declare this printf function before we invoke it, the program will run well even without including stdio.h:

    1
    2
    3
    4
    5
    6
    int printf(const char *restrict format, ...);

    int main(int argc, char *argv[]) {
    printf("%d\n", a);
    return 0;
    }
  • Thus, including head files are essentially copying the content inside the head files to the source code.


  • What’s the differences between these two lines of code?
    1
    2
    #include "stdio.h"
    #include <stdio.h>
  • The answer is: They are actually the same, except that they would search different path for the required head file.
  • We can see the search list by adding --verbose flag to gcc:

06-15-1.png

  • We can also add search path manualy by adding telling gcc the path with -I flag:
    1
    gcc --verbose test1.c -I /home/last/Coding/Cpp/test

06-15-2.png

  • We can see that /home/last/Coding/Cpp/test/ is added to the search list. (as the first one)

#if

  • What would this piece of code output?
1
2
3
4
5
6
7
8
9
10
#include <stdio.h>

int main(int argc, char *argv[]) {
#if aa == bb
printf("Yes\n");
#else
printf("No\n");
#endif
return 0;
}
  • The answer is Yes.

  • But why? What exactly is aa == bb ?

  • Actually, aa and bb are marcos. As shown in the code, these two macros are both not defined, thus making them equal as they are both null.

  • After precompilation is done, the code in the false branch of the if statement will be dropped.

#define

  • This statement is for defining macros. Macros will be expand when precompiling, which is basically copying and pasting again, just like the head file.

  • How to destroy an OJ? LMAO

06-15-4.png

  • From the picture above we can see that, macros can call(rely) on each other. It would be pretty common to see multiple macros calling each other in the source code of PA.

  • Macros can also be used to hide some keywords from inspection.

1
2
3
4
5
#define A sys ## tem

int main() {
A("echo Hello\n");
}

X-macro

1
2
3
4
5
6
7
#define NAMES(X) \
X(Tom) X(Jerry) X(Tyke) X(Spike)

int main() {
#define PRINT(x) puts("Hello, " #x "!");
NAMES(PRINT);
}
  • The output will be like:
1
2
3
4
Hello, Tom!
Hello, Jerry!
Hello, Tyke!
Hello, Spike!
  • We can see that NAMES(PRINT) calls the function PRINT for four times with different parameters each time:
1
NAMES(PRINT) --> PRINT(Tom) PRINT(Jerry) PRINT(Tyke) PRINT(Spike)

  • From above we can see that macro provides a way for the programmer to define some sort of customized ‘function’ or ‘variable’, which is called meta-programming.
  • It is very flexible and easy to use, but at the same time, it will reduce the readability of the code to some extend.

Compile && Assemble

  • This step is mainly about translating the C programming language to the assembly language.
  • For example, a simple function like:
1
2
3
4
5
6
7
8
int foo(int n) {
int sum = 0;
for (int i = 0; i <= n; i++) {
sum += i;
}

return sum;
}
1
gcc --assemble 4test-1.c

will be translate to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
4.file	"4test-1.c"
4.text
4.globl foo
4.type foo, @function
foo:
.LFB0:
4.cfi_startproc
4pushq %rbp
4.cfi_def_cfa_offset 16
4.cfi_offset 6, -16
4movq %rsp, %rbp
4.cfi_def_cfa_register 6
4movl %edi, -20(%rbp)
4movl $0, -8(%rbp)
4movl $0, -4(%rbp)
4jmp .L2
.L3:
4movl -4(%rbp), %eax
4addl %eax, -8(%rbp)
4addl $1, -4(%rbp)
.L2:
4movl -4(%rbp), %eax
4cmpl -20(%rbp), %eax
4jle .L3
4movl -8(%rbp), %eax
4popq %rbp
4.cfi_def_cfa 7, 8
4ret
4.cfi_endproc
.LFE0:
4.size foo, .-foo
4.ident "GCC: (GNU) 14.1.1 20240522"
4.section .note.GNU-stack,"",@progbits
  • This rough file is hard to understand. But if we remove the parts which are unrelated to the core logic of the program, it will become this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
foo:
4movl %edi, -20(%rbp)
4movl $0, -8(%rbp)
4movl $0, -4(%rbp)
4jmp .L2
.L3:
4movl -4(%rbp), %eax
4addl %eax, -8(%rbp)
4addl $1, -4(%rbp)
.L2:
4movl -4(%rbp), %eax
4cmpl -20(%rbp), %eax
4jle .L3
4movl -8(%rbp), %eax

4ret
  • Still hard to read for human, lets replace all of the variables with their names in the source code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
foo:
4movl %edi, n
4movl $0, sum
4movl $0, i
4jmp .L2
.L3:
4movl i, tmp
4addl tmp, sum
4addl $1, i
.L2:
4movl i, tmp
4cmpl n, tmp
4jle .L3
4movl sum, tmp

4ret
  • Then, make it a pseudocode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
foo:
n = ARG-1
sum = 0
i = 1
goto .L2
.L3:
tmp = i
sum += tmp
i += 1
.L2:
tmp = i
compare (n, tmp)
if(<=) goto .L3
RETURN-VAL = sum

ret
  • The file containing the assembly language translated from the foo function is not able to run yet, since it’s missing a main function, the entry point.
  • Let’s quickly define a main function in a separate file.

4test-2.c:

1
2
3
4
5
6
7
8
9
#include "4test-1.c"
#include <stdio.h>

int foo(int n);

int main(int argc, char *argv[]) {
printf("%d\n", foo(100));
return 0;
}
  • AWAK, this source file will also be translated to assembly language by gcc. After that, gcc will link the two .o file together, producing the final a.out executable file.

From a Memory Perspective

  • Why would this code report a segmentation fault?
1
2
3
4
int main(int argc, char* argv[]) {
int *p = (void *) 1;
*p = 1; // Segmentation fault
}
  • From the code we can see that, this program is trying to write value 1 to a special memory address.
  • The thing is that, the memory block which the program can read or write is limited, mostly the part which allocated to the program by the OS.
  • This program is trying to write stuff into a memory address which does not ‘belong’ to it, which causes a segmentation fault.

Understanding Pointer

  • Let’s explore the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <assert.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
int (*f)(int, char *[]) = main;
if (argc != 0) {
// clang-format off
char ***a = &argv;
char *first = argv[0];
char ch = argv[0][0];
// clang-format on

printf("arg = \"%s\"; ch = '%c'\n", first, ch);
assert(***a == ch);
f(argc - 1, argv + 1);
}

return 0;
}
  • It might be complicated when taking a first look, so take another look on this picture, which analyze the ‘pointing’ relationship between these pointers and variables:

06-15-5.png

  • From the picture we can clearly see that, pointer a is pointing to the address which stores an address pointing to an address pointing to the first element of argv.
  • first is pointing to the first character of the first element of argv, which is a string, AKA, a character array.
  • ch is the first character of the first element of argv, which is …

End of the First Day…

  • Title: PA W2 - C Programming Language Basic
  • Author: Last
  • Created at : 2024-06-15 10:57:09
  • Link: https://blog.imlast.top/2024/06/15/nju-pa-c1-md/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments