Hard Drives and Linux

Linux does hard drives astonishingly well. So well, in fact, that most hard drive rescue services default to Linux because its toolset is so rich.

TL;DR
If you do not want to go into hard drive forensics for a living, you may be most interested in:

  • kvpm, a graphical hard drive manager
  • udf, a truly universal format that you should use for external drives under 2TB in size

Hard drives can broadly be classified as either internal or external. Internal drives tend to be easier to manage, because they are not portable; they are integrated in some way with your operating system, even if it is just extra storage space that you use on an as-needed basis.

Here is a profile of your internal drive:

Internal Drives

The drive inside your Linux computer is either an SSD or a traditional spinning disk drive, or you possibly are using both. Your drive(s) can be seen as one pool of available disk space by Linux, or conversely one drive can be seen as several different, artificially-separated disks even though it's actually one physical drives. Computers are able to perform this perceptual magick through “partitions”: imaginary boundaries that an OS agrees to respect as an (air-quotes) Different Disk than the rest of the drive (or, when disks actually are physically separate, the computer can see all of the disks as [air-quotes] One Big Drive).

There is nothing special about hard drives in terms of storage except that they are relatively fast and efficient in managing a lot of information. Abstractly, though, they are no different that, for instance, a tape archive (or a quarter-inch tape, or similar media). In fact, hard drives can be used directly as, basically, a tape drive.

Do not try this at home, you could possibly erase important data.

Assume we have attached a drive and it appears as /dev/sdx on our computer. Raw data can be written to the drive without any formatting, and independent of any file system:

# echo 'Do not try this at home unless you really know what you are doing.'
# echo "hello" | dd of=/dev/sdx
# head -c{1..5} /dev/sdx
hello#

The data “hello” was written as raw bytes on the drive. It was not written as a file, so if you plug the drive into another computer, it will not look like there is anything on the drive (if the computer even understands how to mount the drive), but the string “hello” is still on the drive. It's just written as raw data.

This is actually how data was stored for a very long time,but eventually the disadvantages became too much to bear and someone invented a “file system” which does exactly what its name suggests: it creates a system for managing files. With a file system, you don't need to know the exact byte count to retrieve your data off of a hard drive; the drive itself remembers that for you. Rather than reading raw data, bytes 1 to 5, for the string “hello”, we could instead just make a request to our drive for our “hello” file, and no matter where that file was or how many times we have revised and added to it, the computer can quickly and easily find it on the drive and show it to us.

Your Linux drive is running an open source file system, probably either ext4 or jfs. These file systems, although open source and free for anyone (person or corporation) to use, the major closed source operating systems decline to include support for them. This means that the drives are recognised as being blank or simple “un-readable” by another OS. It's advantageous to use the file systems nevertheless, because aside from being some of the most stable on the market, they have features that help Linux run efficiently.

Once a drive has a file system, it can be mounted and used by your operating system. On Linux, when you attach a drive to your computer (internally or externally), the drive (upon detection) is assigned a node in the /dev directory. Drive nodes are dynamically created on an as-needed basis. Internal drives are assigned nodes as they are detected by the system, which is usually predictable after you notice the pattern but really it depends on the motherboard and which slots your drives are plugged into.

The first detected drive is assigned /dev/sda. The “sd” prefix denotes the type of drive it is (actually it's historically inaccurate, but as a revisionist you can think of “sd” meaning “Sata Disk”), and the “a” is the first letter of many alphabets. That node represents the physical drive itself, which is different that what's on the drive. If a partition is found on that drive (most drives have at least one partition), then it gets a node /dev/sda1. If there is yet another partition, then it would be assigned /dev/sda2.

The second drive found gets /dev/sdb and its partition /dev/sdb1, and so on.

These nodes only represent that drives are attached. They are not directories that you can open and view data in, they are metadata about your system.

The data on the drives is used by your Linux system, and internal drives are usually automatically mounted by Linux because those drives appear in the file /etc/fstab as drives that are to be mounted upon boot.

The take away points from this overview of your internal Linux drives are:

  • Use ext4 or jfs or any Linux-native format for drives that are directly a part of your operating system to avoid unexpected results in how you OS works.
  • Drives get nodes in /dev based on when they were detected, and how many partitions exist on them.
  • Internal drives are automatically mounted by the system only if they are listed in /etc/fstab.

There is no quiz on this, but now you know.

Formatting Internal Drives

If you purchase an additional internal drive for your Linux computer, you probably should be using a Linux-native file system. There is probably no advantage in using a non-native file system, because it's an internal drive; other computers are not plugging into in asking to use data off of it (and if they are, they are doing so over your network, and TCP/IP makes basically everything universal).

(The possible exception here is that you are getting a large drive that you want to use as shared storage space between a drive running Linux and a drive running some other OS. This is not recommended, but if you do this, then treat the drive as an external drive. Slackermedia does not support this, because the other OS adds a significant variable to how your data is being managed, so you are, respectfully, on your own!)

To format an internal drive for use with Linux:

Determine the device node of the drive you are going to format by first seeing what drives are already part of your system.

Use the lsblk command to view all block devices (hard drives) attached to your computer. If the lsblk command is not clear to you, or you want to double-check what it tells you, you can investigate further:

$ mount | egrep '.*sd.*'
/dev/sda2 on / type jfs (rw)
/dev/sda1 on /boot type vfat (rw,fmask=177,dmask=077)
/dev/sdb1 on /home type jfs (rw)

In this example, there are two drives already in use by the system: one being used as the boot and system drive (sda), and another (sdb) used exclusively for the home directory.

Compare that list to what the computer actually has attached:

$ ls -1 /dev/sd*
sda
sda1
sda2
sdb
sdb1
sdc
sdc1

In this example, there is a third drive not in use by the system, labelled sdc. This is the new drive that needs formatting. Notice that it does have a partition on it already, but that's only because most all drives purchased from a modern computer store are pre-formatted, presumably so that users do not have to learn about formatting themselves.

Keep in mind that your drive in real life could be anything from sdb to sdz, depending on how many actual drives you have plugged in. Usually, the first drive you plug in is going to come up as sdb because sda is the drive running your computer, but be aware of your actual setup and use your head. You do not want to format the wrong drive.

If you are unsure that you are targeting at the correct drive, mount it and have a look at what's on it:

$ su -c 'mount /dev/sdc1 /mnt/hd'
$ cd /mnt/hd
$ ls
.
..
Acme Drivers
Acme Backup Pro Plus
$ df -h /mnt/hd | awk '{print $2}'
Size
2.8T

In this example, the drive is mounted at /mnt/hd (a pre-existing directory for quickly mounting drives on Slackware) and is shown to contain basically nothing, if we ignore the obligatory drivers and bloatware bundled by the vendor on the drive.

Confirming the size of the drive provides further reinforcement: yes, this really is the 3TB drive you have purchased.

With that settled, unmount (with the umount [sic] command) the drive so that you can perform surgery on it:

$ cd ~
$ su -c 'umount /dev/sdc*'

Create a fresh partition table on the device. A partition table just tells a computer what kind of partition to look for when reading the drive. Operations like re-formatting entire drives justifiably require root permissions:

$ su
# parted /dev/sdc mklabel gpt

Historically, the de facto partition label was msdos because that was (and still is) the most ubiquitous; msdos-style partitioning is universally recognised. For drives larger than 2TB, a gpt partition label must be used, because msdos partition labels cannot scale to 2TB.

Very little actually rides on this, it's just a matter of whose identifier you want to use. It has nothing to do with how your data is secured or kept, it's just an identifier so that the computer knows what to look for when it mounts a drive.

Next, find out how big your disk is:

# parted /dev/sdc print | grep Disk

For the sake of this example, assume the drive is 2834020 MB (2.8TB) in size.

Create a partition that spans the whole drive:

# parted /dev/sdc mkpart primary 1 2834020

This creates a partition that starts at the first megabyte (1) and spans all the way until the 2,834,020th megabyte.

Do not start your partition at the 0th megabyte or you will get the error Warning: The resulting partition is not properly aligned for best performance. Start your partition at 1. You are sacrificing 1024 bytes, but it's worth it.

Now the drive has a partition; all it needs now is a file system. Remember, a partition is indicated by a number trailing the device node. In this example, the location of your new partition is /dev/sdc1.

For a Linux native drive, use ext4:

# mkfs.ext4 -L penguindrive /dev/sdc1

Or jfs:

# mkfs.jfs -L penguindrive /dev/sdc1

The drive is now formatted. It's best to create a permanent, standard place for it on your system. Assuming that it is going to be used as extra storage space:

# mkdir /storage

To make the drive automatically mount, add it to /etc/fstab. For example, to have it mount as extra storage at boot time, add a line like this:

LABEL=penguindrive   /storage  jfs  rw  1 1

If you do not know the label of your drive, use lsblk -f. If your drive has no label, then use the PARTUUID (use UUID if you partition is msdos) instead:

LABEL=penguindrive   /storage  jfs  rw  1 1
PARTUUID=7280201c-fc5d-40f2-a9b2-466611d3d49e /storage  jfs  rw  0  2

And finally, mount the drive by mounting all drives listed in /etc/fstab:

mount -a

External Drives

External hard drives are more complex, but only because they tend to move around. You may not move it often, or you might literally have it on your keychain and take it everwhere with you. The bottom line is that external hard drives tend to visit not just one computer, indeed not even just one operating system, and as a multimedia artist, you are likely to have to deal with that frequently.

Incoming Drives

In this context, an “incoming drives” is a drive you do not own but need to work with.

Good news: drives coming to you from clients or collaborators, more often than not, are plug-and-play for reading on Linux.

There are only a few caveats:

  • Macs mostly use HFS+ formatted drives. HFS+ is a relatively volatile file system; it is rather prone to failure and corruption, but luckily tools to repair them tend to be pretty good. Even so, it is safer to minimise contact with HFS+ drives; get the data you need from the drive, and disconnect it.
  • Windows formats generally pose no issue; it is mostly all reverse-engineered by now, and so it's mostly a matter of attaching the drive and reading data from it as you would any other drive.

Writing to a foreign drive is less simple.

Mac Drives

HFS+ drives are crafted to be intentionally incompatible with other systems (the journal is not open source or even open spec); in order to write to HFS+ from Linux, you must disable journaling on the drive first.

Disabling the journal on an HFS+ drive must be done from within Mac OS. If you do not have access to Mac OS, then you cannot write to the HFS+ drive, do not attempt to write to the drive; you could do serious damage to the drive's file system.

If you are stuck with a “Mac compatible” drive and want to use it as an active “normal” drive in your studio, with full read and write capabilities, then your best bet is to copy all of the data off of the drive, re-format it, and then copy the data back onto it.

If you require the drive to remain compatible with a Mac as well as Linux (and, as a side benefit, Windows), then use the UDF format.

If these are not valid options for you, then use Mac OS to disable the journal on the drive. Mac OS may later re-activate the journal without notice, so you may have to do this often.

Windows Drives

Most Windows drive formats are well reverse-engineered at this point and can be both read from and written to without any special action taken on your part.

UDF: the Universal Universal Disk Format

Partly as an answer to the problem of having no universally-accepted file system and partly out of the need for a good file system, a few Standards groups came up with UDF, the Universal Disk Format. It was mostly intended as the replacement for ISO-9660, and did become the official file system for CD-RW, DVD-RW, and Blu-Ray.

There are two significant disadvantages of UDF:

  • Maximum volume size is 2TB (on hard drives).
  • It does not use journaling, making data recovery after a crash or accidental unplugging a little riskier.

It is open source, it can use UTF-8 filenames that are as long as 255 bytes, file sizes and file system sizes of 2TB. UDF does not bother with permissions, making it ideal for external drives.

Since it was primarily intended for optical media, creating a UDF volume is different from formatting a drive for any other filesystem.

Slackermedia recommends the UDF format for any external drive that you intend to use with more than just your own computer. It avoids both file permission frustration and file size limitations, but maintains all the other UNIX features that one would expect from a drive. It works well on thumbdrives as well as tradition drives. By being a universal format, it ensures that the data that matters to you the most is always available to you, regardless of what OS you happen to have on hand.

Formatting a Drive for UDF

The drive being formatted must have no partitions on it. This is entirely unlike any other filesystem, but it is necessary for some operating systems to accurately detect the UDF filesystem.

To get rid of the existing partition on a drive, zero out the first 4096 bytes of the drive. This effectivel erases all of the data on the drive so you should be doing this only on an empty drive or a drive with data on it that you do not wish to use.

# dd if=/dev/zero of=/dev/sdx bs=512 count=4096

Note that the bytesize (bs) is not flexible. It must be 512.

Create the filesystem so that it spans the entire drive:

mkudffs --utf8 \
--blocksize=512 \
--udfrev=0x0201 \
--lvid="penguindf" \
--vid="penguindf" \
--media-type=hd \
/dev/sdx || echo "failed"

Now you can mount and use the drive on any platform.

Formatting Drives on Linux

This section reiterates the usual steps in formatting a drive, with an emphasis on external drives (read the section on formatting internal drives if you are adding a drive to you computer internally). If you are formatting to UDF (you probably should be), read the section on UDF.

1. Identify the drive's device node.

With the drive unplugged from the computer:

# ls /dev/sd*
sda  sda1  sdb  sdb1

Then plug the drive in and do the same command again:

# ls /dev/sd*
sda  sda1  sdb  sdb1  sdc  sdc1

The new node is the drive you have just attached.

2. Unmount the drive.

It probably isn't mounted, but just in case…

# umount /dev/sdc*

3. Create a fresh partition table:

From this point on, the device sdx (a rare drive node, since it would require you to have 23 drives attached to your computer) is used to protect people from copying and pasting commands and unintentionally erasing a real drive. Replace sdx with the actual label of your target drive.

# parted /dev/sdx mklabel msdos

Or, for drives lager than 2TB:

# parted /dev/sdx mklabel gpt

4. Get the drive size:

# parted /dev/sdx unit MB print | grep Disk
Disk /dev/sdx: 63417MB
Disk Flags:

5. Create a partition spanning from 1 MB to the last available MB (which you will have gotten from the parted print command):

# parted /dev/sdx mkpart primary 1 63417

6. Put a file system on the partition you just created.

# mkfs.jfs -L rockhopper /dev/sdx

Or for an ext4 file system (more common across Linuxes)

# mkfs.ext4 -L rockhopper /dev/sdx

Also consider using UDF for drives under 2TB.

7. Unplug the drive and plug it back in.

KVPM Volume and Partition Manager

KVPM is a graphical interface to the parted command and some related hard drive tools.

Install KVPM from http://slackbuilds.org

Launch kvpm from the K Menu.

Once kvpm has launched, look at the list of drives for one that matches your drive in size and vendor. For safety purposes, you should not have any other external drives plugged into your computer at this point, so only the internal hard drive(s) and you target drive should be visible.

Right-click on your target drive and choose Add disk partition.

Accept the default options so that you are using all available space on the drive.

Create a partition on a drive.

One the new partition appears, right-click on the partition inside the drive and choose Filesystem operationsMake filesystem.

Create a filesystem in the partition on your drive.

Use the filesystem type of your choice, give the drive a name (or “label”), and click OK.

Name your drive and set it loose.

R S Q