Posts Tagged ‘linux’

Linux 3.0 will support Xen

After a relatively long road traveled with a few bumps along the way, as of yesterday, Linus’s mainline tree (2.6.39+) contains literally every component needed for Linux to run both as a management domain kernel(Dom0) and a guest(DomU).

Xen has always used Linux as the management OS (Dom0) on top of the hypervisor itself, to do the device management and control of the virtual machines running on top of Xen. And for many years, next to the hypervisor, there was a substantial linux kernel patch that had to be applied on top of a linux kernel to transform into this “Dom0″. This code had to constantly be kept in sync with the progress Linux itself was making and as such caused a substantial amount of extra work that had to be done.

Another bit of code, that’s been in the kernel for many years, were the paravirt drivers for xen in a guest VM (DomU). Linux has had this as part of the codebase for quite a few years, the xen network, block and xenbus drivers that are loaded when you run a hardware virtualized guest (hvm) on Xen with paravirt (pv) drivers. This is always referred to as pv-hvm.

A pure hardware virtualized kernel without any xen drivers, just emulated qemu devices is just simply called an “hvm guest”. This does not perform well as any type of network or block IO goes through many layers of emulation. As hardware virtualization has improved over the years in the chips, pv-hvm has become performant and is frequently used. The pv-drivers basically are highly optimized virtual devices that communicate through the hypervisor to do network or disk io, handled behind the scenes by the Dom0 kernel and what is called backend devices (netbk, blockbk).

A pure paravirtualized guest is/was an OS kernel that was totally modified to really be in sync with the hypervisor and let the hypervisor take care or own a number of tasks to be as optimized as possible. Performance and integration is the best with a paravirtualized kernel and this also allowed xen to run on x86 hardware, optimally, without hardware virtualization instruction support – this is referred to as pv-guest. The Dom0 kernel runs in pv mode (more on this later) and the DomU guests could run in hvm, pv-hvm or pv mode.

Over the years, a number of efforts were made to get these pv / dom0 patches submitted into the mainline kernel but at times the code was not considered acceptable by a number of the linux kernel maintainers and little progress was made. Over the last 2 years a renewed effort started to really convert the code into patches considerd acceptable and a set of people : Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, Ian Campbell , Stefano Stabellini (and others not mentioned but obviously also important) focused on getting this stuff done once and for all… and so.. bit by bit. code was rewritten submitted for review, rewritten again until it was considered ok. In terms of timeline, a good chunk of code has gone in over time to handle Linux as a well behaved guest (DomU) first, then followed by all the work to make the Dom0 happy as well.

One change that happened in the Linux kernel to be able to better handle such an infrastructure in a virtual world for more than one hypervisor, was called pvops.

pvops, is a mode where the kernel can switch into pv, hvm or pvhvm at boot time. Instead of having multiple kernel binaries, there is just one and it will lay out its operations at boot time when it detects on what platform it runs. Linux as a DomU guest on Xen has had pvops support since 2.6.23/24 with good use starting around 2.6.27. So the frontend network and block drivers and running pvops on xen has been around also for quite some time. As this finalized the work focused more on preparing the Dom0 parts of integration and a migration from the old classical pure pv kernel mode to what’s now called pvops.

Late last year in 2.6.37, we had a mainline kernel that was able to actually run as the “Dom0″ for the Xen hypervisor. That was a big step, followed shortly by adding the remaining bits that were needed to really handle every area : memory management, grant table stuff, network pv driver backend and block pv driver backend code (and other misc components). The last remaining driver just got merged 2 days ago into 2.6.39+ mainline – the block backend driver blkback.

The Most Common Things You Do To A Large Data File With Bash

I find that whenever I get a large data file from somewhere (i.e. extract some data from a database, crawl some sites and dump the data in a file) I always need to do just that little bit of extra processing before I can actually use it. This processing is always just non-trivial enough and I do it just uncommonly enough for me to always forget exactly how to go about it. Of course, this is to be expected, if you learn something and want it to stick you have to keep doing it. It’s all part and parcel of how our brain works when it comes tolearning new skills, but that doesn’t make it any less annoying.

Back to our data file, for me I find that I almost always need to do 3 things (amongst others) before doing anything else with my file.

  • delete the first line (especially when pulling data out of the database)
  • delete the last line
  • remove all blank lines

Don’t ask me why but for whatever reason, you always get an extraneous first line and unexpected blank lines (and less often an extraneous last line) no matter how you produce the file :) .

Anyways, my tool of choice in the matter is bash - it is just too trivial to use anything else (plus I love the simplicity and power of the shell). So, to make sure I never forget again here is the easiest way of doing all the three things above using sed:

sed -e 1d -e ‘$d’ -e ‘/^$/d’ input_file > output_file

Of course since we’re using bash, there should be numerous ways of doing the above.

You can remove the first line using awk:

awk 'FNR>1'

but I don’t know how to remove the last line using awk. Anyone?

You can use head or tail to get rid of the first and last line:

head --lines=-1 input_file | tail --lines=+2

but not to remove blank lines.

You can use grep to remove blank lines

grep -v "^$" input_file

but it would be silly to try and use it to remove the first and last line (possible though).

If you know of an easier way to do the above three things in a one-liner using bash – do share it.

What are some of the most common (but non-trivial enough) things that you find yourself doing with bash when it comes to pre-processing that large data file?