The Logic of the Cache access in Detail
This article is about the description in detail of what goes on inside the logical circuits of a Cache memory in its different mapping situations. It’s not a the designated commands, in order that only those who are interested have to read it.
The description that follows is supported into any of the images which you can found under the section each of the mapping situations. The notes (x) in these figures will be referred only by their numbers, omitting the figure number. You can open the Figures in another window in order you can follow the description with them.
Everything in the Cache is resolved in hardware. Thus, it is important to imagine the logic to implement in circuits in order to reach the goal of being able to select a specific Byte inside a Cache, using only a MM address. We are going to do this exercise taking into account all the mapping methods. The Block (row) of our imaginary cache is going to have 8 bytes, or 64 bits. We will give it only 4 rows, just to make the analysis simple. The MM will be the one we used in the beginning of the previous chapter, with 4KB size.
Let us begin by watching the signals MMU sends when it queries the cache:
- The address of the value it wants, designated by the bits A0 to A11 (12 bits for 4KB), is placed on the input register (7).
- The signal CS (Chip Select), through which the MMU tells the cache it wants to access: “Hey! I’m talking to you!”
- The signal RD (ReaD), active at 0, when it’s intended to make a reading.
- The signal WR (WRite), active at 0, when it’s intended to make a writing.
- The signal WRBL (WRite BLock), active at 0, when it’s intended to write a block it got from other level of the cache or from the MM.
- The signal WRVL (WRite VaLid bit), active at 0, when it intends to write the Valid bit value on a cache block.
The 3 bits with less weight, A0, A1 and A2, which designate the offset, are connected to the selection bits of a NAND gate MUX (2), that sets as active low(0) only one of the outputs, the one matching the column identified by those 3 bits. They are 3 bits, because we only have 8 (23) bytes in the block. If we would have 64 (26) bytes in the block then we would have to choose the lowest weight 6 bits.
The outputs of the MUX gates will connect to the AND gates where the other input comes from WRBL, which is sent from the Control Unit each time that a block is written in cache (1). When WRBL is active (0), all the AND gates return 0. If WRBL is inactive (1), only the gate selected by the MUX (2) returns 0. The strategy purpose is to make all the MUX outputs (2) low (0) when WRBL is active (0), making this way possible the activation of all tristate gates of the input lines and allow the block to be fully written.
The outputs of the AND gates will connect to 2 OR gates each, where its signal is analyzed together with the signals RD or WR/WRBL in the gates that activate the reading or the gates that activate the writing, whichever the case.
These signals come carefully treated from the circuit Control Unit (1). This way ends up with a column selected for RD or WR or all the columns selected for the writing in a block.
The Control Unit
To this Control Unit connect all the signals sent by the MMU, CS, RD, WR, WRBL e WRVL and another signal that comes from a Validation Control (5) which we will call VL.
In this Control Unit (1) the signals RD and WR are validated by the signals CS and VL, which will connect with those in two OR gates. Whichever signal is active low (0), thus when any of the gates returns 0 it is because there is a writing or a reading being validated. The WRBL signal is validated with CS in an OR gate, which output will directly connect to the AND gates after getting out of the MUX (2). The validated WRBL signal will also connect along with the validated WR to an AND gate that returns 0, unless any of the signals is active (0). The output of this gate is the one that connects to the writing validation OR gates.
The signals WRBL and WRVL will be handled together:
- By a NAND gate, that only returns 0 when both are inactive (1).
- By an AND gate that only returns 1 when both are inactive (1).
The outputs of these gates will be used for the selection of the row in the writing situations of a block or a valid bit, in alternative to data writing or reading. We’ll see how, in every mapping situation.
This Control Unit (1) still has one more output, but this time for the exterior, in particular to the MMU. Using an OR gate, that validates the value of VL with CS, it produces an output that is active low (0) and that we will call CH (Cache Hit). Therefore, if CS is selected, when VL becomes 0, CH will be 0 and we will have cache hit. If VL is 1, then CH will be 1 and we will have a cache miss. All this information is connected to the MMU.
The objective of the Validation Control (5) is generically to verify the result of the comparison of the label, the value of the valid bit and the value of the selected row. These 3 values will be connected to AND gates, one for each cache row. If all are valid (1), while connected to AND gates (one for row), this gate will have output 1. All AND gates are connected to a NOR gate. NOR gates only return 1 if all the inputs are 0. If only one AND gate returns 1 (only one can), then the set VL output (5) will return 1, the value in which VL is active to the graphic (1). In this case, we will have a cache hit and the fetch is valid, i.e. a RD or WR signal can be validated and a cache hit can be communicated, what is done by the Control Unit (1).
We referred a validity bit but we still haven’t said what it is for. When a computer starts, the cache will have undefined and incoherent values in its lines and its label bits. It can happen that when a fetch is done, by pure chance to find the intended address value, a cache hit, without the positions value having anything to do with all the work being done. To prevent these situations, every time the cache is initialized, the valid bits are all put to 0, only ever becoming 1 as values dependent from the current work being done are being written in the blocks.
Until now, we referred general questions that apply to all mapping methods. From now on, each of the methods will have specific information and they will be handled in proper sub chapters for the designated mapping method.
In Figure 1 it’s represented a graphic of our cache control circuit in direct mapping and in Figure 2 we have the detail of the access to the label and valid bit cells. In the analysis that we are doing the notes (x) in these figures will be referred only by their numbers, omitting the figure number. You can open the Figures in another window in order you can follow the description with them.
RD and WR Situation
A3 and A4 are the address bits that compose the index, which will be used for the line selection. In this regard, they are connected to a MUX (4) which activates (active high) the output correspondent to the line selected by the index.
Once activated the line, the connection of the label cell bits B5 to B11 to a comparator entrance (3) is established. To the other input are connected the equivalent address bits, A5 to A11.
The comparator (3) will provide as an answer the signal Ig (Equal) which is 1 if they are equal and 0 if they are different. In our case Ig=1.
This Ig signal, along with the MUX outputs (4) and the valid bits values of each row will connect to the Validation Control Unit (5). The behavior of this circuit (5) has already been described in the generic section for which we refer to that description.
Let’s now analyze in detail how the label and the validity bit are read in Figure 2. Let’s reduce the size of the label to 3 bits just for the purpose of this analysis.
The label and the validity bit cells are identical to the data cells. In reality it’s data that they keep, just with a different meaning. It’s reading these cells that is detected the block is validated and the label is compared with the address value.
During both RD and WR signals, the valid bit and the label cells are both read for validation. As to data cells, RD and WR go with the specific function of each one, already confirmed with the VL signal of the unit (5) which we’ll obtain by reading the valid bit and label cells (5). For that purpose, the reading output of the label cells is connected to one of the comparator circuit inputs (3). The equivalent address bits are connected to the other input.
The cells of each column that at each moment will connect to the comparator are defined by the line selection. It’s the active line that opens the transistors allowing the communication of values in the cells to the reading line.
The selected cells are the block (line) label cells in which a reading or writing is intended, thus the selected and active line.
To the Validation Control Unit (5) input AND gates will connect:
- The Ig signal returned by the comparator (3) to all the AND gates. It is the same for all the lines, being at each moment comparing the active line
- The reading lines of the validity bit cells, each to an AND gate concerning its line. The value of the line itself, to the AND gate that concerns it.
- The value of the line itself, to the AND gete that concerns it.
It is only of our interest the gate that refers to the active line, therefore, to the validity bit being read and to the Ig of the comparison with the label of that line.
If for the active line these 3 conditions are true (1), then its AND gate will return true (1), what will make VL=0, in other words, active. VL=0 shows a cache hit when low, therefore it will validate the RD or WR signal, whatever is being executed at that moment.
For this type of mapping, we already have in this stage a line selected, as we have described. The VL value found will or not allow the opening of the input or output tristate gates of the selected column, whichever is a WR or a RD.
WRBL and WRVL Situation
We are here dealing with a change to the content of the block (line) made by the MMU:
- Because, in the sequence of a cache miss, it got a missing block from the MM and intends to write it.
- Because it intends to change the value of the validity bit.
When a Cache block is written, two different situations will happen:
- The label will be written, registering the correspondent address value of that block.
- The cache line (block) is totally written, what means that all the columns are written.
In the case of direct mapping, which we are now analyzing, the line where the new block will be written is the line referenced by the address index which generated the cache miss.
As we have seen before, in direct mapping, each index matches only one possible line in cache.
The way the WRBL signal acts so that all the columns of the line stay active for writing, has already been referenced in the generics section, reason why we refer to it.
For the label to be written, WRBL has to be connected with the tristate buffers that activate the write operation in its cells.
In the same way, for the validity bit to be written, WRVL has to be connected to the tristate buffers that activate the write operation in its cells.
To write the validity bit in any situation independent of a block writing, the MMU only has to identify the line where it wants to execute it in the address index and turn active the WRVL signal.
Full Associative Mapping
In Figure 3 it’s represented a graphic of our caches control circuit in associative mapping and in Figure 4 it’s represented the detail of the access to the label and validity bit cells and the line selection in a write operation of a block or validity bit. In the analysis that we are doing the notes (x) in these figures will be referred only by their numbers, omitting the figure number. You can open the Figures in another window in order you can follow the description with them.
In Figure 3 it’s represented a graphic of our caches control circuit in associative mapping and in Figure 4 it’s represented the detail of the access to the label and validity bit cells and the line selection in a write operation of a block or validity bit.
In this case there is no line selector stated with (4) in the previous picture, since there is no index in associative mapping.
The search for a line is done by comparing the label with the address, using a comparator for each block (line) of the cache.
In the beginning all the lines are selected in the labels and validity bits cells section.
It’s based in the result of the Ig signal from the comparators (3) and the validity bit value, that the line which serves the data cells, i.e. the block that has the required byte, is selected and activated.
The result from the reading of each block label, will connect to one of the of the comparator (3) inputs dedicated to that block (line). To the other comparators input (3), are connected the address bits intended to be compared. The Ig output signal of each of these comparators will be connected together with the validity bit of the compared block (line) to an AND gate in the validation control circuit (5).
We analyzed in the generics section how this circuit works (5), reason why we forward this question to it.
This circuit the will output the VL signal that will allow the action of the RD or WR signals over the selected byte, depending on a cache hit. If there’s a cache miss, the CH signal (CH=1) will notify the MMU about it. While the MMU looks in another Cache level or in MM, the MMU does the necessary procedures for the cache block substitution algorithm. It will determine which block will be substituted and it will activate in exclusive the chosen block line in the label and validity bit cells section, as we can see in Figure 4.
The WRBL and WRVL signals are connected together to an AND gate, guarantee that its output signal is 1 as long as both are inactive (1). This value connected to the OR gates, where the other input belongs to the line selector (4), forces all the outputs to set the lines high (1).
But, when any of the WRBL or WRVL signals is inactive, AND returns 0 and that value, connected to the OR gates, allows them to return a value equal to the other input, since 1 OR 0 = 1 and 0 OR 0 = 0, prevailing the activation of the line corresponding to the active output of the MUX (4).
The MUX (4) selection bits will be the address bits L0 and L1 of the line chosen by the MMU to put the new block.
The data line activation happens normally by the previous process, since the block label is equal to the address bits to be compared and, as consequence, will allow the activation of that line. The data column activation will be done in the same way described for the direct mapping.
In the case of WRBL and since written the block and its label, the cache proceeds with RD or WR, the one that originated the cache miss.
Set Associative Mapping
In Figure 5 it is represented a graphic of our cache control circuit in set associative mapping and in Figure 6 we have the detail of the access to the label, the validity bit cells and the line selection for a block or validity bit writing.
This case is a mix of the two previous situations. We have 2 sets of direct mapping fully associative between them.
We start by defining the active lines, which will always be one in each set, although for this situation the lines will be defined with independent MUX for each set. With these MUX, we find one line active by Way.
Each Way (one Way is one set) has a dedicated comparator (3). In one of the comparator inputs (3) are connected its Way label cells reading lines. In the other input are connected the address bits to compare. From the two comparators only one can grant a coincidence value, if any even does.
The active line selection in the data area is treated in a similar manner as the previous associative cache. In short, each line connects to an AND gate along with the valid bit and the Ig signal from the comparator of the Way the line belongs to. When all are 1, the AND gate returns 1 and activates the line.
These same values will connect to a validation controller (5) previously described.
Case there is a situation where all three values are 1, a cache hit happens and VL assumes the value of 0, therefore allowing RD and WR to execute their functions.
In the negative case, a cache miss happens. Therefore VL and CH have a value of 1, telling the MMU about the cache miss.
While it fetches in another Cache level or in MM, the MMU executes the necessary procedures for the cache block substitution algorithm. It will define which block will be substituted and will activate in exclusive the chosen block line in the validity bit and label cells area, as we can see in Figure 6. In a 2 Ways associative cache with 2 blocks by Way, the selection is done by a 1 to 2 MUX, what is solved with NOT gates, not allowing an analysis with some stringency of the way how the intended line is chosen, in the RD or WR situations as well as in the WRBL and WRVL situations.
For this reason we have put in a graphic apart the way to make this selection. Figure 7 illustrates the selection of a line in a set associative cache with 4 Ways and 4 blocks by Way.
In the case A (right side) we have a RD or WR operation, being selected 4 lines, one in each Way, by the A3 and A4 bits that define the index, equal in all the selected lines.
In case B (left side) we have a cache miss and the MMU now intends to write a block in a line it chose between the 4 possible, more specifically the line defined by A3 and A4 in the Way 2 (1 0), which defines using the V0 and V1 bits.
The same circuit has to work in both situations. In situation A both WRBL and WRVL are inactive (1). In situation B at least one of them is active (0). This difference is what distinguishes the choices we have. An input of 1 in a 1 to 4 MUX (3) with AND gates lets its outputs maintain a value matching the selection. A 0 input forces all the outputs to be 0.
After all this is what we want. All with their outputs defined by the selection (A) or just one with the outputs defined by the selection and the remaining at 0 (B).
All the MUX defined by (3) need to have an input of 1. But we must be able to convert that input into the value of the Way selection MUX (1) output.
This Way selection MUX (1) is a MUX with 4 output NAND gates, in which all the outputs are high except the selected one.
If we put this MUX outputs as inputs of NOR gates together with the output of an AND gate where the inputs are WRBL and WRVL, we verify that:
- When they are both inactive (1), the AND gate output forces all NOR gates to 1.
- When at least one is active, the AND gate output no longer forces the NOR gates which will have in their outputs the inverse of the MUX (1) outputs, i.e. the selected Way active high (1) and the remainder low (0).
Therefore we only need to connect the NOR gates outputs to the respective Way MUX (3) inputs and we have what we intended.
Finished the label and block write operation in cache, the MMU reactivates the RD or WR process interrupted by the cache miss, with the address that is still found in the address register, now with a guaranteed cache hit.