After few hours of a heroic fight, the system was up and
running, booting from reiser4 root partition on a stripped dmraid
set up on two 160GB SATA hdds via Intel ICH7 controller. The
controller provides so-called fakeraid functionality, which is
something in between hardware and software RAID. It's software
RAID, but handled by the BIOS. In Linux kernel it is eventually
handled by dmraid. Dmraid manages the array with the help of device
mapper (thus the name dmraid), providing propper device nodes for
whole devices and partitions. Note: RAID 0 (data
split across multiple devices) is not strictly RAID, because of no
Redundancy.
Disclaimer: some issues mentioned are important only
because this was an x86_64/amd64 (Pentium D) box. Also note that
you need a 2.6 series kernel for dmraid.
A reasonable way to do it all was to use an additional IDE drive
and prepare base system on it, then when everything is working to
move it to the array, but I had no spare hdd, and besides I'm
generally not a screwdriver-friendly person. The system is
PLD Linux.
This could not be achieved without the help of gen2dmraid,
a custom-built Gentoo-based live/rescue CD activating dmraid at
startup. This distro has a hidden bonus feature - an amd64 kernel
thanks to which you can run both 64- and 32-bit binaries. It's also
a very good rescue CD in general.
However, before I found gen2dmraid, the whole case gave me a little
review of bootable thingies suitable for such setup:
- my good friend R. I.
P. was not enough this time. It's the rescue distro of my
choice for most purposes and I highly recommend it, but it lacks
amd64 support. AFAIR it also lacked the NIC driver I needed, which
was some Broadcomm chipset - the tg3 module. The NIC issue was
actually a major one, the system was quite useless without network
acccess for installation,
- the Debian installer for amd64 didn't even detect the hdds (I
mean by default, this is doable for sure, but I had no time and
wanted to use it as a rescue CD, not an installer),
- Fedora 5's so-called rescue CD is to rescue as bananas are to
firearms, which means that bananas might look like firearms, and
there have been many reports of successful robbery attempts with a
banana, but no shots have ever been fired by the bananas,
- I also tried Ubuntu amd64 live CD, but it wasn't suitable for
some reasons I cannot recall at the moment,
- the SystemRescueCD doesn't
yet support amd64.
To my surprise, quite recent ISO images of PLD were released on
22nd march, including amd64 (I found that out when the installation
was already done), but (assumption is bad) I assume they wouldn't
help me anyway. Instead, I used a handy
tarball with a working base amd64 PLD Linux system provided by
one of this distro's developers. As to the dmraid part, I will now
shortly and as distro-independently as possible explain how to
handle it. For SATA RAID information, refer to: the Linux SATA
FAQ.
Things you should know:
- what hardware you exactly have and what kernel drivers are
responsible for it
- the structure of /dev directory - device names and locations,
things like major and minor numbers and how to create device nodes
using mknod,
- how the Linux boot process looks like - kernel and it's
parameters, bootloader, initial ramdisk (initrd), init script,
- things like partitioning a disk using tools like cfdisk,
creating partitions etc.
- how to handle problems and seek help - which means all you do
not to hear RTFM all the time: man, google.
The "need to know" part might be redundant, since you're rather
not a newbie if you're doing this kind of setup.
Preparing a system on the array:
- Be sure to create your RAID array first - most probably using a
tool in your controller's BIOS. Dmraid doesn't support creation of
arrays on all the hardware, besides the menu your controller
provides will be most comfortable anyway.
- Boot gen2dmraid from the CD using amd64 option at the
SysLinux prompt, of course not if you're booting from a hard
drive.
- If on a hard drive, make sure the system supports dmraid
(dmraid tools, dm-mod module, libata module for SATA
drives, other modules for your hardware). Activate your fakeraid
array (using dmraid -ay, but gen2dmraid does it
automatically).
- Locate the device that dmraid provided for your array. This
will not be /dev/sdx like with regular SATA (handled by the
well tested SCSI layer), but something like
/dev/mapper/[driver]_blabber_arrayname. For my
Intel controller it was like isw_*. Of course this is on top of
/dev/sdx, but we're not using them directly, it's an array,
not a multiple disk system. If you can't see your device in
/dev/mapper/, try to call dmsetup makenodes to create
appropriare device nodes for you. Note: These nodes could be
also /dev/dm-0 device already created for your array by your
distro.
- Partition your array the way you want, like a regular drive (I
used 100M /boot, 1G swap, 10G /, the remaining 290G for /home)
- If you somehow can't see the device nodes reflecting your new
partitions (/dev/dm-1, dm-2 etc or /dev/mapper/sth_sth_sth), call
dmsetup makenodes again and the devices will sure be there.
Then Create your filesystems (ext2 or something that simple for
/boot, swap is swap). I wanted the system to be on reiser4, but
gen2dmraid does not support it, so initially I used reiserfs 3.6
format.
- Mount your filesystems (/ and /boot that is, if you want a
separate /boot at all).
What you need to achieve next is a filesystem containing what
you want to boot from your array later. If I was using a spare
hdd, I would boot it and do the part where dmraid fun starts. I
will now mostly use the term system to call the
chroot environment that will eventually be your working system.
What you have to do is:
- Untar the pack with a prepared base system (this was the case
with the distro I chose, but you can do anything you want here,
like do your Gentoo magic). From a hdd, simply copy the filesystem
to your new partition(s). You better unmount /proc, /selinux, /sys
or other virtual directories before that to avoid strange problems
(like attempt to access /proc/kcore) and having trash on your new
system. Be sure to mount them back after the copying.
- Chroot to the directory where the prepared system is located.
This is why my bootable CD had to be an amd64 one. Otherwise
I'd just get "Exec format error" - that usually appears when you
try to run a binary compiled for a different architecture than your
system.
- Get your system to a stage where it's able to boot normally
after you're done with dmraid, like configure basic things, set the
root password etc, so that you can log in at all when you finally
boot off the array. I simply upgraded the packages using PLD's
package manager and added some more.
- Do all the necessary steps to give your system the dmraid
functionality with your hardware. This might involve rebuilding the
kernel. The dmraid module is called dm-mod. Note that you're
working in a fake environment (chroot), you depend on the kernel of
the system you chrooted from. Be sure to mount your /boot (if one)
and /proc (mandatory). If you booted from a CD, your new system
might not contain the propper device nodes for your array's
partition (if so, then use dmsetup makenodes again).
Generally, study the man pages for dmraid and
dmsetup. To see your array setup, issue dmraid -r. If
it shows it all, you're ready to go further.
That's it for the trivial things. Now when we have
something to boot from, we have to be able to boot
into it.
First, edit the /etc/fstab so that it's correct. In my case
this was all /dev/mapper/isw_* for swap, /, /home and /boot.
Since fakeraid has to be activated at boot time by dmraid,
this can't be done in a different place than the initial
ramdisk, initrd. It has to be like that only if you want
to boot both from and into the array.
Here comes a short explanation of Linux boot process and
initrd, skip it if you know what it's about.
When the system starts booting, the bootloader (located in the boot
sector of a partition currently marked as bootable) shows it's
presence somehow (that might be a menu, etc) and can allow some
kind of a choice what to boot. Then when we choose our Linux setup
(kernel image, it's parameters), kernel is loaded and initial
ramdisk (initrd) is mounted and does it's job. Inirtrd is not
necessary for a basic typical setup of a system that can support
everything that's needed straight from the kernel (like a highly
monolythic kernel with lots of modules built-in, and a system that
just starts running from the root partition). The initial ramdisk
is a minimal working system that runs before the real system starts
and right after the kernel is loaded. It's purpose is to prepare
some things for the system. The initrd image must contain a few
modules, some device nodes in /dev (crucial like /dev/null,
/dev/zero and /dev/console), and a few utilities (like insmod to
insert modules). Initrd is meant to be as small as possible, so it
usually can't contain any fancy libraries (even like glibc), and
from obvious reasons contains only static binaries. It must have
something providing elementary shell tools (and a shell), which
will most probably be busybox, a multi-call binary being a real swiss army
knife. Initrd also contains a script, linuxrc, that is invoked when
the booting starts. Just before it finishes, the system's root
device is changed to the one set up in your bootloader or as a
kernel parameter (that is why your system can start even if there's
something wrong with your /etc/fstab) - this is the moment the real
system starts running - and initrd image is unmounted a while
after. So to boot from your array, initrd must know your devices
and have access to them, initialize the array and be able to mount
it.
Initrd/boot process explanation ends here, now time for
preparing the initrtrd.
Most distributions provide some tool that creates initial ramdisk
image to match your current setup. This might be the
geninitrd script. Those distributions also provide
additional static executables to do additional functions (in PLD
it's the dmraid-initrd package for dmraid, containing dmraid-initrd
file, which also was a multi-call binary, acting as dmsetup when
called from a symbolic link called dmsetup). Those binaries
must be added to initrd by geninitrd. The initrd image is usually a
romfs filesystem (usually also compressed by gzip), that can be
mounted and unpacked, or created from a directory. Check the inside
of geninitrd to see what it exactly does. Geninitrd can be
configured to add particular modules that have to be loaded. To be
able to boot from your array, initrd's linuxrc must load your
array's drivers. Modules that have to be loaded for a SATA dmraid
boot must be: sg-mod (general SCSI), libata (SATA), dm-mod (device
mapper), and maybe some others your array needs (like dm-mirror).
Of course they can be built into the kernel. Theoretically, your
distro should make a working initrd for you. I fought for a long
time with it. In the initrd, when your drives are already detected
as normal devices, (/proc and /sys must be mounted) dmraid -ay
-i has to be called to turn the array on. The -i option ignores
problems with access to /var/lock, since the filesystem is
read-only then.
My worst nightmare was the fact that in my distro, udev device
manager was started in initrd, and it didn't provide device nodes
for dmraid. I simply added a set of mknod commands that created
/dev/sda, /dev/sdb (physical SATA drives), /dev/mapper/control
(controlling device), /dev/mapper/isw_sth_sth basic device (whole
array), and the partition nodes. The /dev/mapper/control is a
character device, the others are block ones. To see their major and
minor numbers, do a simple ls -la /dev/mapper/*. Check the
man page for mknod to find out how to create device nodes.
Note: the gen2dmraid CD had a 2.6.16 kernel, my prepared
system had a 2.6.14 one. Tens of painful boot attempts finally
brought me to a solution. In 2.6.14, my dmraid partition devices
had the major number 254 and in 2.6.16 it's 252. In a
perfect situation you shouldn't worry about the device files in
initrd, but with udev things can get tricky. When /sys is mounted
and dmraid array is active, you can find correct device nodes'
major and minor numbers in /sys/block/md-x/dev for each of
your dmraid devices, this can be used to automatically make the
device files, at least the main one (after I finished with this
box, I read a mailing list post stating that things would be much
more simple for me if I just turned udev off in the initrd, but
that was a little too late). I first included reiserfs module in
initrd, to be able to boot from reiserfs root partition.
The root= option in your bootloader or the same kernel
parameter is usually a device file that must exist inside of
initrd's /dev structure, but can also be a hexadecimal number
consisting of 4 digits and pointing to a major and minor number of
a device. Because of the kernel version difference, I when I
installed my bootloader (lilo) to the array's device, it was
obvious that the root device will really be different when the
system will try to boot from the array.
To experiment in your initrd and/or to find out what's wrong and
why doesn't your system want to boot after initrd run, you can add
/bin/sh into it's linuxrc. You will then get a shell prompt on a
very early booting stage and be able to check what devices exist,
what modules are loaded, and check if the array is up and
initiated. If you're using the hexadecimal form of the root
parameter, then you don't have to create the device node for your
root partition at the initrd stage.
When my chroot environment on the array was all set up, and the
bootmanager was set up correctly, I could finally try to boot
the array. Remember, when working on the chrooted system,
bootmanager has to have access to the device where it's going to be
installed. Lilo bootmanager must be patched to allow booting from
dmraid (in most recent distros it is), and GRUB bootmanager must
know the correct device mappings. Refer to gen2dmraid homepage to
check it out.
I logged out from the chroot, rebooted and used my array as booting
device instead of the CD as usually. Kernel was loaded, initrd
started, I got my initrd shell prompt, I checked dmesg
output to see if the disks are detected and dmraid -r to
check if array's correctly started. It was, so I tapped ^D (logout
sequence) to let it go further. And then I got a kernel panic,
because the root device could not be found. That was the magic
2.6.14-16 difference, and it took me a lot of time to find out. At
the boot prompt I passed root=0xFE03 (this is like
/dev/mapper/isw_sth_sth3 node - major 254, minor 3), and my system
finally started it's normal run. The new system had reiser4
support, which the bootable CD didn't. What I did to do the
transition was: I created a clean reiser4 filesystem on one
partition and copied the whole system there. To have less to do, I
first installed the system on a partition that had to be /home at
the end. So the reiser4 I first did was on what was meant to be /.
I edited /etc/fstab to reflect my change, dumped reiserfs module
from geninitrd configuration in favour of reiser4, rebuilt the
initrd, edited the bootloader's config (lilo that was) to finally
set a correct root device. Because of udev usage as the /dev tree
manager, I had no /dev/mapper/* files when the system started (real
system, after initrd), so I added dmsetup mknodes in the
rc.sysinit script. Sysinit is the script called after initrd
ends it's run and the switch to real root device is done. There I
added dmsetup call right after starting udev. Note that you can
boot the system even with no records in /etc/fstab, and/or no
device node files for your partitions. The system will then start
with it's root partition mounted read-only and the other partitions
simply unmounted (make the / read-write by issuing mount -o
remount,rw ).
After my system booted with reiser4 instead of reiserfs module
loaded in initrd stage, and it successfully mounted the root
partition, I knew I had won the fight. I removed the unnecessary
debug things from initrd's linuxrc script (I added the shell prompt
call and some other commands to show what's going on), and cleanly
booted for the first time. I had my / on reiser4, and the big
partition on reiserfs, containing a copy of my system. I created a
clean reiser4 partition on it, and, as said before, set it up as
/home.
All that needed some work, but had to be done olny once. Writing
this down might help some of you solve some problems I encountered.
As to other setups of booting from a RAID, it's all done in initrd.
It's there where hardware RAID controller drivers have to be loaded
or software arrays started with mdadm. For other things not
explained here, check URLs mentioned here, read the documentation
and search the web.
An article this note evolved into is a little messy, I know. It
sometimes describes things really simply and with details, while
earlier it states that you should know those things, and has some
other imperfections, but I intended to put as much information as
needed to clear possible doubts and answer questions before they
will be asked. Also, while probably [put your distribution's name
here] does some of the things I described in a slightly different
way or does them automatically, I hope you get all the
differences.
And for an ending joke, to ease all my frustration, I gave the box
a really bad and totally politically incorrect hostname.
All constructive feedback to this blabber is of course welcome at
, including mariage offers
and bombs, as long as you consider bomb a constructive
feedback.