File Systems
All the developments we could read in the previous Chapter allow us to have a huge amount of mass storage in a small plate, but organized in such a way that only the HDD controller can handle with it. These controllers are the secret of the success of many manufacturers, with a huge logic complexity, resulting from the continued investigation of thousands of engineers.
How do we deal with that data when we need to access an HDD? Do we have to refer to each data CHS and how it’s stowed in order to get access to it? Do we have to refer to all the sectors where a given set of data we need is stored? And how do we know in which sectors the HDD controller kept them?
It’s obvious the answer to all these questions: we don’t know how to deal with an HDD at this level. We can’t imagine a developer telling in his program the places where to get or put the data he needs for his program.
Here is where the OS (operating Systems) came to help us, creating an abstraction which represents each type of data set that we deal with, which is the File. The File is the abstraction between the humans and the HDD. The File consists of a set of data (bits) which give shape to something that means something for us, such as:
A program, a movie, a song, a photograph, a text, a database, this book, that little Assembly program that we developed in CPU chapter, the programs we work with, and so on.
Everything we register into an HDD is under the shape of a File. Thus, the great persistent memory of the computer is composed by Files which can contain the basis of our computer operation or a simple data storage. Files whose size can go from some few bytes to hundreds of Gigabytes.
Therefore, all the information we deal with is contained in Files which the HDD keeps in Sectors spread through it in the way its controller tells it to do. We talk with the HDD in Files, which are what we understand and has logical mean for us. The HDD controller transforms them into data referred by CHS which only it knows.
The Cluster is one more logical abstraction that we humans created for the HDD in order to deal with it. The Cluster is a block of data composed by a specified number of sectors, from 1 to 64.
But if we already had the sectors, a physical unit grouping lots of bytes, why creating another logic unit grouping sectors?
To understand this it’s important to refer that each Cluster can only contain one File or a part of it. In other words, a File can be spread through many Clusters but a Cluster can only contain one File. One File, even if it has only 1 byte in size, it takes one all Cluster for it.
The size of a cluster is fixed and unique for an HDD, i.e. it’s the same for all the HDD. However, we can vary the size of the cluster in different HDD that will store different types of files.
To make this possible we don’t even need multiple physical HDD because, as we shall see, even in an HDD we can create multiple Volumes, corresponding to partitions of the same HDD, which are dealt as individual drives. This way in the same HDD we can have different Cluster sizes as we have different drives in the same HDD.
For us and for the OS an HDD contains data organized into files. For the HDD controller the files are a transparency. They simply don’t exist. For it there’s only Sectors, arranged in CHS geometries, where it will write or read the data we want to access, the named files stowed in its way, certainly the most efficient for it in its physical terms. The OS talks in FILES and the HDD controller talks in CHS.
Nowadays it’s not possible to teach the OS to address the HDD in CHS because the HDD geometry evolved in such way that each manufacturer has its own geometry and algorithms which only its controllers know. So being, another abstraction was created to make possible to the OS to address the HDD and the HDD to tell the OS the addresses of the data it keeps, the LBA (Logical Block Addressing). The OS addresses its Files in the HDD through LBA which the HDD controller translates into CHS. In the middle was left behind an outdated BIOS existing only for compatibility issues with older versions. We’ll see how.
But we will not address the files in LBA when we want to access them. We only tell to the OS which file we want and its path. The remaining is up to the OS. The File Systems are arranged so that each OS can organize the files in the HDD. And not only. Nowadays a File System must not only organize the files, but also ensure security and access restrictions that have to be imposed on files as well as the consistency of their organization.
Few words for a multitude of work. This is what we’ll try to show how it’s done, within reasonable limits and inserted into a certain goal.
Our analysis will focus more thoroughly on a large current deployment File System, the NTFS (New Technology File System), the current Windows File System since XP (at Personal Computers level), now being replaced at Windows Servers OS level, namely in Windows Server 2012, by the ReFS (Resilient File System).
We will track the NTFS File System in the several queries it does to its binary tree index until it reaches a certain file we’ll chose. We’ll analyze at the Byte level (even at the bit level when is the case) via hexadecimal editors, the composition of the several Volume sectors whose position in the Volume will be told in LBA, the way how they tell us what they are, their characteristics and where we shall go next. We’ll analyze the metadata files, the file attributes, their headers, the MFT (Master File Table), its entries (one per file), the Index files, the Data Run Chains and a lot more.
We’ll see what is Formatting an HDD and its two levels. We’ll start in the MBR (Master Boot Record), independent from any OS and giving us the needed information to boot the system and about the HDD Partitions composition and the Partition where the Operating System is installed.
Then we go to the PBR (Partition Boot Record) of that Volume (partition) where the BIOS delivers the computer command to the OS, there initiating the way across the binary tree to the selected file.
At the end of this Chapter we’ll make a light approach to the FAT (File Allocation Table) File System.
It’s a long journey this Chapter, which goes through several levels of knowledge across several parts for which we’ll prepare you to. We invite you to go till where you desire and are interested to, knowing that at the end we’ll find the file we are looking for and
you will know how you can find any file in your HDD using its hexadecimal editor representation.
Enjoy your journey.
See the global synopsis of this work
We introduce here the table of contents of the Paper Book to describe the themes approached in this Chapter