A yak shave with SGI's EFS
Over the past few weekends I’ve gone on a bit of an adventure with the SGI Indy. This yak-shave has everything - an OS install, lots of SCSI problems, a completely overkill golang utility, and a happy ending. Since it’s pretty long, I’m gonna dive right in.
A disk upgrade
It all started with a drive replacement - I got an order of SCS2SDs in and decided that I should allocate the faster SCSI2SDv6 I’d been saving to the Indy. At 100 MHz it’s one of the fastest pizzaboxes in my collection, so I think it’ll be better able to use the increased IO bandwidth the v6 can provide - supposedly up to 10MB/s vs 2.6 MB/s for the v5.
Installing the new drive took a few steps. First, the firmware my v6 had loaded on it seemed to have some odd bugs - the Indy wouldn’t POST with it connected! I tried a lot of variations of settings and what ended up working was…updating the firmware of the drive.
Out of the box, the SCSI2SD acts as a hard drive on SCSI ID 0 - a reasonable configuration for most systems. Unfortunately, SGI assigns ID 0 to the controller (most systems assign the controller ID 7). When the SCSI2SD is set to 0, the Indy believes that all SCSI IDs are pointing to the SCSI2SD, and basically nothing works. When I set it to ID 1, I was able to format it with fx and move on to the install.
SCSI gremlins ruin an evening
A disk replacement is a natural opportunity to install a fresh OS. The previous OS had been loaded via netboot and controlled via a serial console - I had neither a CD-ROM drive nor a suitable keyboard and mouse at the time, so this would be my first graphical install. I also took this opportunity to downgrade from IRIX 6.5 to 5.3 - while 6.5 ran OK, my Indy only has 64 MB of RAM and I’ve seen recommendations to use older versions of IRIX for best performance on lower-spec Indys.
With IRIX 5.3 burned to a CD, the hard drive formatted, and (so I thought) all the SCSI problems behind me, I booted in to the installer and watched expectantly.
if yr having scsi problems I feel bad for u son
— cron mom (@sophaskins) June 14, 2018
use active termination and target drives by LUN
After I’d start the install, I kept hitting a bus timeout, and it seemed to never recover. I tried swapping terminators, turning on and off termination on the internal drive, setting the “parity” jumper on the CD-ROM, reseating all the cabling, and nothing could keep it stable enough to complete a full install.
Whyyyyyyyyy pic.twitter.com/Un66ORSCHf
— cron mom (@sophaskins) June 15, 2018
Eventually I gave up and went with plan b: the SCSI2SD can emulate multiple drives. It’s a little bit annoying to set up - you can easily set the second, etc, drive to be a CD-ROM but loading the ISO on to it requires dd
-ing the ISO to the appropriate offset after your primary disk.
> dd if=C:\Users\haski\Downloads\IRIX_5.3.iso seek=16567501 of="\\.\Volume{a03b15f0-0f8b-11e8-9c75-00155d4b013b}" --progress
I try to avoid using this feature because “dd
-ing over your hard disk” sounds like an even worse way to spend an evening than “using a single-speed CD-ROM drive”, but if you’re careful, it works:
Ok that also didn’t work but setting it up as another drive on the *same* SCSI2SD worked great and now it’s working and I can go to bed happy pic.twitter.com/OnTaXLUGCl
— cron mom (@sophaskins) June 15, 2018
Motivating a yak shave
With SCSI problems behind me, the OS install finished smoothly and my Indy was running IRIX 5.3 like a champ. The base OS is pretty spartan - it includes a basic compiler but almost no tools for development. Thankfully, I had another “ISO” (just a raw dump from a CD, but not in ISO9660 format - it’s in “EFS” format) labeled “IRIS Development Option 5.3” - sounds like just what I need! I had a few options to load the software:
- burn it to a CD and hope that the Indy and my SCSI CD-ROM can be friends long enough to install some software (unlikely)
- load the image in to the SCSI2SD’s fake CD drive (tedious and requires opening the Indy’s case back up - ugh)
- put the ISO file on my NAS and have the Indy mount it as a loopback device (unfortunately, not possible on IRIX)
- get another computer running that can read the image and copy the files to my NAS where the Indy can just read them directly
- while the Linux kernel does have efs support, it’s neither compiled in to the Ubuntu base nor can I find a package that includes the module. Its 2018 and apparently I refuse to compile my own kernel modules anymore.
- the BSDs have support but I didn’t want to go through the hassle of setting up a VM or whatever and shuffling all the files around.
This left one final option: write a program to convert the image in to something more portable (like a tarball) that I could unpack on my NAS and make available to the Indy. Is this reasonable? Absolutely not. But it sounded like a hell of a fun project!
Following along
The code I’m referencing in this section lives at https://github.com/sophaskins/efs2tar - please take a look at how it all fits together! I’m running it against an image of “IRIS Development Option 5.3”, but I suspect it’ll work similarly for other images too - the Internet Archive has many available.
I use my golang struct definitions to illustrate what the on-disk format looks like - other implementations may name the struct members, etc, differently. Also note that EFS is big-endian, so the implementation needs to reference that at all the places where we parse bytes.
Filesystem headers
My goal was simple: efs disk image in, tarball out. I made a blank golang project, opened up the NetBSD source for a guide, and started reading raw bytes.
Since the sources helpfully point out that the superblock lives in the first (zero-indexed) 512-byte block, I wrote up some code to unpack those bytes:
// cmd/where-is-the-superblock/main.go
package main
import (
"os"
"github.com/davecgh/go-spew/spew"
)
func main() {
file, _ := os.Open("./input.iso")
b := make([]byte, 1024)
file.Read(b)
spew.Dump(b[512:1024])
}
and got…nothing but zeros
> go run .\cmd\where-is-the-superblock\main.go
([]uint8) (len=512 cap=512) {
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
}
What the heck? The NetBSD sources also claim that block 0 is unused, but if we look at those, there’s definitely some non-zero bytes. The first 8 bytes are:
0b e5 a9 41 00 00 00 00
Filesystems (like many formats) often use “magic numbers” - an arbitrary bunch of bytes that occur at a known location so they can be quickly identified by type. Maybe this string is a magic number? I grepped for be5a941
in the NetBSD codebase and got a hit in sys/sys/bootblock.h
- it is indeed a magic number, SGI_BOOT_BLOCK_MAGIC
. That section of the file defines the layout of the boot block of SGI partitions. A deeper dive in to that code and the IRIX manpage for vh
(volume header) explain what’s going on.
It turns out, these “ISO” files aren’t just raw EFS filesystems, they’re SGI-formatted volumes with an EFS partition on them. There’s a whole volume header that has the partition table and some additional info:
// from sgi/vh.go
type VolumeHeader struct {
MagicNumber uint32
Root int16
Swap int16
Bootfile [16]byte
BootDeviceParams DeviceParameters
VolumeDirectory [15]FileHeader
Partitions [16]Partition
Checksum int32
Padding int32
}
The VolumeDirectory
field is pretty neat - its an array of pointers to files that exist outside of any filesystem, directly in the volume header. Its apparently usually used for the bootloader and the SGI partitioning program, fx
(this is apparently why partitioning happens outside of the OS install process).
The Partitions
field contains what block offset each partition starts at, how long it is, and its filesystem type. It was a surprise to me that the partitions can (and do!) overlap! Apparently it’s typical for one of the partitions to represent the whole disk, another this header section, and another for the actual partition (the partitioning SunOS uses also has overlaps like this). The EFS partition for my image was number 7.
Crawling inodes
Starting at the offset of the partition, I found the expected data in the Superblock (including the correct filesystem magic number, a plausible filesystem size, etc):
// from efs/filesystem.go
type SuperBlock struct {
Size int32 // filesystem size (in BasicBlocks)
FirstCG int32 // BasicBlock offset of the first CG
CGSize int32 // CylinderGroup size (in BasicBlocks)
CGInodeSize int16 // Number of BBs per CG that are Inodes
Sectors int16 // sectors per track
Heads int16 // heads per cylinder
CGCount int16 // CylinderGroups in the filesystem
Dirty int16 // whether an fsck is required
_ int16 // padding
CTime int32 // last SuperBlock updated time
Magic int32 // filesystem magic number
FSName [6]byte // name of the filesystem
FSPack [6]byte // fs "pack" name
BMSize int32 // size in bytes of bitmap
FreeBlocks int32 // count of free blocks
FreeInodes int32 // count of free inodes
BMBlock int32 // offset of the bitmap
ReplicatedSB int32 // offset of the replicated superblock
LastInode int32 // last unallocated inode
_ [20]int8 // padding
Checksum int32
}
Most of this data is irrelevant to my purposes - since I’m not adding new data, I don’t really care about the bitmap, where the next free inode is, etc. The offset of the first CylinderGroup is important, though. The filesystem is divided in to “cylinder groups” - contiguous groups of blocks where the first CGInodeSize
blocks of the cylinder group contain inodes, and the rest is data. The NetBSD sources note that the root inode is at inode index 2, so probably at index 2 in the inode portion of the first cylinder group.
Each inode includes a bunch of data about the object it represents:
// from efs/inode.go
type Inode struct {
Mode uint16
NumLinks int16
UID uint16
GID uint16
Size int32
ATime uint32
MTime uint32
CTime uint32
Generation int32
NumExtents int16
Version uint8
Spare uint8
// Payload is a union struct - sometimes it contains extents, but
// it also can contain other stuff (like link targets and device
// descriptors, which are not implemented here)
Payload [96]byte
}
If we write up a quick program to dump this data, it looks something like:
> go run .\cmd\rootinode\main.go
(efs.Inode) {
Mode: (uint16) 16895,
NumLinks: (int16) 5,
UID: (uint16) 0,
GID: (uint16) 0,
Size: (int32) 512,
ATime: (uint32) 784844972,
MTime: (uint32) 784844972,
CTime: (uint32) 784844972,
Generation: (int32) 784844868,
NumExtents: (int16) 1,
Version: (uint8) 0,
Spare: (uint8) 0,
Payload: ([96]uint8) (len=96 cap=96) {
00000000 00 00 00 e0 01 00 00 00 00 00 00 00 00 00 00 00
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
}
}
This is really promising - those timestamps (784844972
) are in 1994 (1994-11-14T20:29:32+00:00
) which seems appropriate given that the disk image I’m working on is “the developer tools that came with IRIX 5.3”. The Mode
field indicates that this inode represents a directory - seems reasonable, given that I’d expect the root inode to be /
.
To find out what’s in the directory listing, we need to unpack the Payload
field. In this context, it contains extents - an array of the block ranges that make up the body. This approach to allocating blocks to files gave the filesystem its name, EFS (the Extent File System). Reading extents is somewhat more complicated than just parsing that Payload field, but we’ll get to that later. The extents stored in the root inode look like:
> go run .\cmd\root-inode-extents\main.go
([]efs.Extent) (len=1 cap=1) {
(efs.Extent) {
Magic: (uint8) 0, // side note, yes - the magic number
StartBlock: (uint32) 224, // for extents is...zero. Not
Length: (uint8) 1, // exactly unique, heh.
NumIndirectExtents: (uint32) 0
}
}
which seems plausible for the root directory listing - it’s short (only one block long), and early in the disk (block 224). Fetching that block is pretty easy, and if you dump it it does appear to have some filenames on it, but reading its format correctly takes a little care. The format of a data block belonging to a directory listing is:
// from efs/directory.go
type Directory struct {
Magic uint16
FirstUsed uint8
Slots uint8
Data [508]byte
}
At the “bottom” of Data
(the low indexes) there are some pointers (Slots
many of them) to directory entry offsets (offsets from the base of the struct). The entries live at the “top” of Data
(the high indexes) and are of variable lenth (because they include filename strings!), so are made up of:
- the index of inode for the entry
- how many bytes its name is
- the name
It’s a somewhat wacky scheme - why isn’t it just a list of entries starting at Data[0]
? This indirect approach allows adding and removing entries without having to rewrite the entire block, which…I guess is nice if you’re dealing with late 1980s / early 1990s disk performance (SGI replaced EFS with XFS for hard disks around 1993, but kept using EFS for CD-ROMs). At any rate, if we follow it, we can read the entries at the root inode, which look pretty rad to me:
> go run .\cmd\root-inode-entries.go
([]efs.DirectoryEntry) (len=8 cap=8) {
(efs.DirectoryEntry) {
InodeIndex: (uint32) 2,
Name: (string) (len=1) "."
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 2,
Name: (string) (len=2) ".."
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 3,
Name: (string) (len=11) "CDgrelnotes"
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 4,
Name: (string) (len=10) "CDrelnotes"
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 5,
Name: (string) (len=4) "dist"
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 90,
Name: (string) (len=7) "insight"
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 1525,
Name: (string) (len=8) "relnotes"
},
(efs.DirectoryEntry) {
InodeIndex: (uint32) 1614,
Name: (string) (len=12) "RELEASE.info"
}
}
This seems extremely plausible to me as the root of the CD-ROM. The .
and ..
entries even point (correctly) to the same inode (number 2) as the one we’re displaying! The system works! From here, it wasn’t too implausible to walk the whole filesystem and output filenames. Since it’s a tree, we start at the root inode, visit each entry, and if it’s a directory, recursively walk starting there. We have a complete list of all files
File contents
For the inode of a “regular” file, the extents just point to where the body of the file is. You can literally concatenate the bytes from the extent blocks and get the file, at least for unfragmented files.
An inode’s Payload
field can fit up to 12 extents - each extent is 8 bytes, and the Payload
field is 96 bytes. If a file is made up of more than 12 discontinuous ranges of blocks, its extent descriptors wont fit inside the inode. In this case, EFS switches from “direct” extents (the inode data contains the extents) to “indirect” extents (the extents in the inode point to blocks that contain nothing but extents, which are the actual extents of the file). This took me a few tries to implement correctly, but ends up being a fairly simple algorithm:
// from efs/filesystem.go
func (fs *Filesystem) extents(in Inode) []Extent {
payloadExtents := in.PayloadExtents()
if in.usesDirectExtents() {
// if all of the extents fit inside of Payload (aka "direct")
// we have a much simpler time reading the extents
return payloadExtents
}
// if we have more than will fit in Payload, then the extents
// in Payload (aka "indirect extents") point to ranges that
// themselves contain the actual extents.
extents := make([]Extent, in.NumExtents)
extentsFetched := 0
for _, indirectExtent := range payloadExtents {
for _, extentBB := range fs.ExtentToBlocks(indirectExtent) {
// copy respecting the length of extents saves us from
// accidentally including the garbage extents at the end
// of the last block (beyond NumExtents)
copy(extents[extentsFetched:], extentBB.ToExtents())
extentsFetched += extentsPerBlock
}
}
return extents
}
With this in place, I was able to get the contents of files, large and small! Mixed with the golang archive/tar
library, I was able to get the tarball I had hoped for.
Harvesting the fruits of my labor
I copied the (hopefully correct) tarball to my NAS and unpacked it. Despite the 25 years between the birth of my Indy and my NAS they both speak computing’s lingua franca, unauthenticated NFS. The IRIX Software Manager doesn’t care about where the software lives - it just wants a directory:
ok so I got my last-weekend project of "a tool that converts SGI EFS volumes in to tar files"...working!?!?! and here I am pointing the install tool to an extracted tarball!?!?!?! pic.twitter.com/IGXTTVbaet
— cron mom (@sophaskins) June 23, 2018
The installation suceeded with no issues - I now have developer tools on my Indy! Entertainingly, one of the features this installed was kernel headers - including ones that describe (in more depth than the manpages) how EFS works.
Parting thoughts
I had an enourmous amount of fun completely over-engineering this problem. I was able to dig in to Unix filesystems for the first time in a practical context, had fun writing golang, and got some dev tools out of it to boot!
Have you ever written code to deal with raw filesystems? Or perhaps written software for IRIX? I’d love to hear your stories! Send them to me via email: sophie@pizzabox.computer!