The volatile keyword is what is known as a type qualifier and one that doesn’t make a lot of sense when one is programming only in the software world – most of the time. But for programmers who deal with hardware it is a “must know” tool in the software toolbox.
In the previous blog, we looked at using signals for inter-process communication. We also had a program that replaced the system’s default signal handler for the SIGUSR1 signal.
An important thing to understand is – if we are going to distribute the program we are writing for others to use, it is a good idea to distribute an optimized version of the program so that when the user uses it, its performance will be fast (and also the size of the executable file should be small).
To get an optimized version of our program, we tell the compiler that we want the executable produced at the compilation to be optimized based on the option specified. Different levels of optimization can be specified while compiling.
With gcc/clang the way to specify that we want optimization we have to use the -O option with an integer value 1 to 3. If the option used is -O1, we will get minimal optimization, and with -O3 a lot of optimization will be done, but the compilation process is likely to take a longer time and the generated code size might be bigger (there is also an option called -Ofast which tries to optimize the code more than even -O3, but may violate a few of language standards as per the manual page). Obviously -O2 gives optimization somewhere between -O1 and -O3.
The code shown below is the code we looked at in the blog about the signal () function and the kill command/system call.
The code shown below is the code that we looked at in the blog about the signal () function and the kill command/system call.
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <unistd.h>
int *i_ptr;
int sh_val = 0;
void sig_handler (int sig)
{
sh_val++;
printf (“sig_handler: sh_val = %d\n”, sh_val);
}
int main (int ac, char *av[])
{
i_ptr = &sh_val;
signal (SIGUSR1, sig_handler);
printf (“main: pid is %u\n”, getpid ());
while (1) {
if (*i_ptr > 0) {
printf (“main: SIGUSR1 received. Quitting\n”);
break;
}
}
return 0;
}
We already saw the program’s behavior when there was no optimization done during compilation in that blog. But the result of the execution of the program in one terminal and the execution of the kill
command on another terminal is shown below again.
$ ./signal
main: pid is 75554
sig_handler: sh_val = 1
main: SIGUSR1 received. Quitting
$
$ kill -SIGUSR1 75554
$
The behavior of the program, after we compile using the -O2 option to turn on optimization is as shown below:
$ gcc -02 volatile.c -o voldemo
$./voldemo
main: pid is 2159
sig_handler: sh_val = 1
sig_handler: sh_val = 2
sig_handler: sh_val = 3
Terminated
$
$
$ kill -SIGUSR1 2159
$ kill -SIGUSR1 2159
$ kill -SIGUSR1 2159
$ kill -SIGTERM 2159
$
Note the program execution didn’t terminate even after sending the SIGUSR1 signal thrice to the program using the kill command. We can see that our signal handler function was called by the operating system as expected, but the infinite loop didn’t terminate even though the value of sh_val did get incremented thrice.
Finally, we used a different signal, the SIGTERM, to end our program execution. We could have terminated the program by sending some other signal also, or by pressing the Ctrl-C key combination from the keyboard on the terminal in which the program was executing.
Let us take a look at the generated assembly code to try and understand what caused the difference in behavior when the code was optimized.
Let us look at the X86 assembly output of the program under discussion. The C version of the program is below:
The most relevant part of the code (in the context of optimization), is the infinite loop in the main () function. And that snipper is shown below:
while (1) {
if (*i_ptr > 0) {
printf (“main: SIGUSR1 received. Quitting\n”);
break;
}
}
Assuming the program was saved as volatile.c, the X86_64 assembly code for this program can be generated by the command gcc -S volatile.c
With no optimisation specified, the assembly code generated by the compiler for this loop is below
X86_64 Assembly code without optimization
.
.
call printf
.L5:
movq i_ptr(%rip), %rax
movl (%rax), %eax
testl %eax, %eax
jle .L5
movl $.LC2, %edi
.
.
X86_64 Assembly code with optimization
.
.
call printf
movq i_ptr(%rip), %rax
movl (%rax), %eax
.L4:
testl %eax, %eax
jle .L4
movl $.LC2, %edi
.
.
The two pieces of assembly code look different, even if the statements are pretty much similar. There is one change, involving two statements, that has a major impact on the program behavior. These are two statements that do the following.
1) The movq statement loads the rax register with the 8-byte address that is there in the pointer i_ptr (address of sh_val)
2) The movl statement loads the eax register with the integer value that is present in the address that has been loaded into the rax register.
In the optimised version generated by the compiler, these two assembly statements got moved from inside the loop to just before the loop. This means that the value of the sh_val variable is loaded once into the eax register and never updated again in the loop. The result of this is that, in the optimized version, even if the value of the sh_val variable gets changed by the sig_handler () function getting called by the OS, it does not affect the code inside the loop. The code in the loop will just continue to use whatever value was loaded into the eax register before the loop started
I guess, by now it is pretty obvious why the compiler did this. As far as the compiler is concerned, the value of the value of sh_val (*i_ptr) is not getting changed inside the loop, at all. And memory accesses to read values from the memory are pretty expensive compared to instructions that just involve CPU registers (and there are two memory accesses in this case). So if the variable is not changing inside the loop, why read it multiple times?
So now we know that optimization messed with the program’s behavior. But we do need optimization and when dealing with hardware, we are sure to have code like the infinite loop of the code shown. And that is where the volatile type qualifier comes into play.
If we modify the same program with the type qualifier used when declaring the pointer variable, even with optimization, the program behavior will be as before (without optimization):
#include <stdio.h>
#include <signal.h>
.
.
volatile int *i_ptr;
int sh_val = 0;
void sig_handler (int sig)
{
sh_val++;
printf (“sig_handler: sh_val = %d\n”, sh_val);
}
int main (int ac, char *av[])
{
.
.
return 0;
}
The behavior of the program with the volatile type qualifier, compiled with optimization is shown below:
$ gcc -02 volatile.c -o voldemo
$./voldemo
main: pid is 2535
sig_handler: sh_val = 1
main: SIGUSR1 received. Quitting
$
$ kill -SIGUSR1 2535
$
A look at the assembly code generated after we used the volatile type qualifier with *i_ptr and compiled with optimisation turned on will show the effect of the type qualifier on the generated assembly code:
Optimised Assembly code with volatile specified:
.
.
call printf
movq i_ptr(%rip), %rdx
.L4:
movl (%rdx), %eax
testl %eax, %eax
jle .L4
.
.
This looks very similar to the code generated without any optimisation. Still there is one optimisation being done – the 8 byte address in the i_ptr pointer variable is now moved into the rdx register and this is being done outside the loop.
But the 4 byte value from the location specified by the address in the rdx register is moved every time the loop is executed. This means that whenever the signal SIGUSR1 is received by the program (and the operating system calls the sig_handler ()function, resulting in the incrementing of value of sh_val), the loop will end because the testl instruction will find that the Zero Float is no longer set (an explanation of the relevant assembly statements is given below).
X86_64 Assembly code without optimisation
.
.
call printf
.L5:
movq i_ptr(%rip), %rax
movl (%rax), %eax
testl %eax, %eax
jle .L5
movl $.LC2, %edi
.
.
Let us take a look at each of the lines beginning with the start of the loop (.L5:) upto the end of the loop (jle .L5) of the unoptimised code.
.L5:
This just creates a label that can be used as an address to branch to.
moveq i_ptr(%rip), %rax
Move the 8 byte address that is stored in the pointer i_ptr (address of sh_val) into the rax register
movel (%rax), %eax
Move the 4 byte data from the address stored in the rax register into the eax register
testl %eax, %eax
Does a bitwise and of the 4 byte value in the eax with itself. This instruction can be done on two 4 byte registers or one 4 byte register and a 4 byte immediate value. This will set the Zero Flag and Sign Flag.
jle .L5
If the Zero Flag (or Sign Flag) is set, jump to the given address (in our case .L5), thus creating a loop that will execute as long as the value in sh_val doesn’t become greater than 0.
So, any time the signal SIGUSR1 is sent to the program (either using the mkill program or the kill command) the sig_handler () function will be called by the operating system and the value of sh_val gets incremented from 0. So the testl instruction will result in the ZF becoming 0 and thus ending the loop.
Venu Kolathur is Chief Architect and Co-Founder at Vayavya Labs and has over 38 years of industry & academic experience. He is responsible for product technology road-map, and design strategies.