Developing a Program
At the end of the previous article we said that we imagined a way to replace the Opcodes with mnemonics which represent them and are lot easier to understand by us humans. This way, a low level programming language (next to machine code) was created and its name is Assembly. The program in Assembly is essentially a sequence of mnemonics corresponding to a sequence of Opcodes, enriched with some more few options with the purpose to make easier the life of the developers. Let’s see how.
Assembly
The Assembly programming language is not universal. Each manufacturer may have its own mnemonics according to its CPU architecture. Actually it was achieved a way of standardizing most of the CPU with a particular Assembly (the X86 for 32 bit CPU and X86-64 for 64 bit CPU). But this is the reality.
Our CPU has 8 bits and is tailored to our goal. So, we’ll use an Assembly from the of 8 bits epoch which we’ll call Z80-CL. The CL is the acronym for Computer Logic and only seeks the justification of some minor adjustments.
Let’s choose a set of 16 instructions based on what we may need to prepare a small program that we’ll develop further on. For this program we’ll need to read from the DM, write in the DM, subtract, compare, perform conditional jumps (zero, less than zero and less than) and perform unconditional jumps. While we will not use them, we will also include in this instructions set the addition, the multiplication and the no operation instruction.
This way we get to the set of 16 instructions that we represent in the table of Figure 1-14, associated with the opcode, the command signal and the final binary code for each of the instructions.
Whenever x or xx is displayed the control signals and therefore the respective bits they refer to are not relevant to the instruction. As irrelevant is something abstract and not representable we gave to those bits the value 0 in the final result. The mnemonics tanslation is as follows:
- LD means LoaD.
- ADD means ADDition.
- SUB means SUBtraction.
- MUL means MULtiplication.
- CP means ComPare.
- JP means JumP, here an unconditional jump.
- JP followed by Z, N or M, means JumP, here a conditional jump.
In the Table of Figure 14 we represent the Assembly Mnemonics and their corresponding Opcode, Command Signals and Binary Result. The values of value and address referred in the table are provided by the constant of the instruction. When DM address is referred its meant to refer the value contained in that DM position.
Assembler
The Assembler is an interpreter for the Assembly, creating machine code by converting the Assembly mnemonics into opcodes. But it does more than that. The Assembler provides directives that once implemented in the Assembly, led it to perform certain procedures when interpreting it. In our small Assembly program we’ll use the EQU (EQUal) directive. This directive does not generate machine code instructions, only telling the Assembler that the symbolic constant that comes before the directive has the value that comes after it.
The use of this directive prevents the developer to write always the same values at several places. In addition, having the name awarded to it some meaning to the developer, allows him to more clearly know the places where to include it in the program. And yet, if by any means the developer has to change the constant value it will do it only once, changing the directive. When interpreting the program the assembler replaces the name of the constant by its value.
For the jump addresses definition the Assembler introduces the concept of labels. These shall be placed before the Assembly program instructions to which jumps can eventually be performed. When interpreting the program, the Assembler assigns to each label the correct address of the IM where the referred instruction lies, afterwards replacing them where they are referred in the program by their value. This way every time the developer changes the program he doesn’t need to change all the jump addresses as the Assembler, when interpreting it, will assign their correct values to the labels.
The Program
Now we are going to develop a little program to calculate the factorial of a number, starting by the determination of the algorithm for its solution, then going to its conversion into an Assembly program and finally writing it in C.
To illustrate its execution we’ll follow the opposite path decomposing the C instructions into Assembly, then this ones into opcodes and finally into frames illustrating the CPU operation and highlighting the circuits used for it.
The factorial of a number is the result obtained by successively multiplying that number by all that follow it until 1. For instance, the factorial of 4, is represented by
4! = 4 x 3 x 2 x 1 = 24
In order to calculate it we must multiply 4 by 3 (4-1) and then multiply the result by 2 (3-1). Multiplying by 1 is needless since it doesn’t produce any effect. Thus the operation is finished when we multiply by 2.
The Program Algorithm
In order to develop the algorithm we simulated this operation in a calculator which only admits one operation at a time, provided with 2 memories, M1 and M2. We used 2 variables, factorial and temporary and used an iterating method to execute the operation.
The variable factorial represents the result at the end of each iteration. The variable temporary represents the value for which factorial will be multiplied in the next iteration.
From these assumptions results the following description for the algorithm:
- The value of the number whose factorial is to be calculated is assigned to a constant N.
- M1 is defined as the memory where factorial will be kept.
- M2 is defined as the memory where temporary will be kept.
- The value of N is written in the display.
- The value in the display is assigned to M1, thus containing the value of factorial from now on.
- The value in the display is decremented (subtracted from 1).
- The value in the display is assigned to M2, thus containing the value of temporary from now on.
- The value in the display is compared with 2.
- If the value in the display is smaller then we jump to step 14.
- The value in the display is multiplied by the value in M1.
- The result which is in the display is assigned to M1.
- The value in M2 is called to the display.
- The operation is repeated starting on step 6.
- The operation is terminated. The value of factorial kept in M1 is the final value of the N factorial calculation.
The Program in Assembly
Fctr and Temp are the addresses of the DM positions [Fctr] and [Temp], where the values of factorial and temporary are kept, assuming the identities of the memories M1 and M2 from the calculator. The RegA, which in Assembly is designated just by A will assume the identity of the calculator display. We will use EQU directives to assign values to N and to the addresses Fctr and Temp. In Figure 16a we can see the program in Assembly.
We can verify the use of the EQU directive in the first 3 lines of the Assembly program, in order to assign values to constants that will be used throughout the program.
We can also verify the use of labels before some instructions to which jumps will be performed during the program, as is the case of Repeat and End.
The Program in Machine Code
When interpreting the Assembly program, the Assembler starts by establishing values for directives and labels.
In Figure 16b we can see those assignments.
The Assembler reads all the program in Assembly, verifies the existence of Jump instructions with Labels, thus assigning to those labels their address in the Instructions Memory.
The remaining assignments are a consequence of EQU directives.
After that the Assembler creates the machine instructions with the correspondance we can see in Figure 16c.
Just look to the left column and imagine yourself writing a program in machine code without the help of the mnemonics.
The Program in CPU Operations
We can see all the operations that the CPU executes to fulfill this program in Figure 16 d.
If this program was intended to calculate the 10 factorial, for instance, the machine would have to perform plus 48 operations, as it would have to repeat 10 times the iterative cycle of 8 instructions, which in our case was repeated only 4 times. Thus we can conclude that the jumps and iterations that we implement in our programs have to be executed by the machine in individual operations per iteration.
Let’s get back to the analogy with our electrical model train. Let’s suppose that we wanted it to run 50 laps to each of 8 different paths in the layout in sequence. We would define its task like this:
- Run the path 1
- Run the path 2
- Run the path 3
- Run the path 4
- Run the path 5
- Run the path 6
- Run the path 7
- Run the path 8. (During this path the train triggers a lever that increments a value on a display).
- If the value at the display as reached 50 you can stop.
- Go back to step 1.
We could summarize its task in 10 instructions, but the poor train can not summarize the 400 laps that it will have to run through the layout by 8 different paths to accomplish its task.
In the table of Figure 16 we represent all the CPU operations establishing each one’s relation with
- its step in the algorithm,
- its Assembly instruction,
- its machine code (opcode/constant),
- the value in [Fctr],
- the value in [Temp],
- the value in RegA and
- the constant value in the machine code instruction.
As a great program will never be developed in Assembly, how would this small program look like when written in a high level programming language?
The Program in C
We will choose C as high level programming language to develop this program. The level of a programming language is defined in the inverse proportion how its instructions match the machine code instructions. The more they match the lower the programming language level language, the less they match the higher the programming language level.
A C program in is composed by functions. The instructions belonging to each function are in curly brackets {} so defining the content of the function. The functions are pieces of program that execute specific tasks being called from other functions to execute their task and return its result. In the defining shape of a function in C – type name (arguments)- the several components tell us:
- The type of the result returned by the function, i.e. if it is an int (integer) a char (character), a bool (1 or 0),etc.
- The name of the function, which can be anyone with the exception of main. The function named main is the one which is fetched and executed by a C program when it starts.
- The arguments of the function, i.e. the values of the variables that the function includes in its execution .
When a C program starts running it looks for the main function and executes it, the other functions being called from within it and in succession from within each one of them, always returning to the caller, until the main function is reached again.
The graphic of Figure 15 tries to illustrate how a function is called from within another, what is asked to it and how. The function
int mult (x,y)
which executes the multiplication of the arguments x by y is called from within another referring
mult (4,3)
i.e. asking it: How much is 4 times 3? In return it receives the result of the called function executed with the values it was sent as arguments,
int a = z=12 = 4*3
or the answer to the question: It’s 12.
After this very brief and slight introduction we can develop our C program which might look like Figure 15a.
The main function has the type int (integer) as being the type of the value it returns. Actually the main function returns nothing to no one, so it was agreed that when main returns 0, it means that the program has been properly executed. If the return is not 0 is because there was an error. Hence the last statement of the program (in the main function) to be
return(0);
The first line is the definition of a symbolic constant N which is assigned the value 4
#define N 4
The 2 following lines are the declaration and initialization of 2 variables with the type int
int factorial = N;
int temporary = N-1;
The CPU ignores the value of a variable. For it only exists a memory address with a specific name where a given value is saved. When in C we declare int factorial = N, the compiler allocates a space in memory for a type int, registers its address and assigns it to a pointer [Fctr]. Then registers in that space the value of the symbolic constant N, i.e. it initializes the variable factorial registering into its position in memory a value. When the variable factorial has to be referred to the CPU what actually is referred to it is [Fctr], i.e. the memory position where factorial lies.
The same is to be said regarding the declaration int temporary = N-1 and the variable temporary.
If these two variables hadn’t been declared, any reference made to them would report an error, for the compiler when looking for the address in memory for those names wouldn’t find any in its list.
Thus, when in C we refer to a variable actually what we are referring to it’s a pointer to the position in memory where it is saved.
Just to complete the information, C allows the developer the direct allocation of those spaces in memory. When this is not done, the compiler takes care of that and allocates the necessary spaces in memory. It’s what happens here.
We’ve referred several times the compiler. The Compiler is a program which converts source code, i.e. a program written in a high level programming language, into machine code.
The line 4 inside the main function contains an instruction
while (temporary ≥ 2){factorial=factorial*temporary; temporary –;}
which is an instruction with the shape
while (this is true) {do that}
where this is the predicate to be evaluated and that are the declarations to be executed. While represents an iterative cycle or a Loop, which consists in the repetition of the instruction body. The instruction body consists in the declarations in curly brackets {} after the instruction. Each time the instruction body is executed and before another loop, while verifies if the predicate evaluation returns true. If so there it goes for another loop. If not it jumps out of the instruction body and proceeds with the next program instruction.
The loop begins with the initialized value of temporary
int temporary = N-1;
and the declarations to execute within the instruction body are
factorial=factorial*temporary;
which assigns to factorial the result of multiplying itself by temporary and
temporary –;
which decrements the value of temporary, i.e. subtracts 1 from it. When the value of temporary is less than 2 the program jumps out of the body of this instruction to the next program instruction, the final one
return(0);
The reason for the introduction of these concepts about C has to do with the association that we are going to establish further on with a program developed in a high level programming language. It doesn’t intend to be an initiation to C, what isn’t indeed our purpose, at least for now. We only intend that everyone can understand what is going to happen next.