Interview with Nigel Cunningham, TuxOnIce developer

Oleksandr Natalenko aka post-factum, 17.06.2011

Where are you from?

I grew up in Auckland, New Zealand, but have lived in Australia for most of the last 14 years. I currently live in Geelong (pronounced dji-long), just south of Melbourne.

Is programming your profession? When did you start programming? What is your favourite programming language?

I'm actually a Christian (Protestant) Minister by profession, but prior to training for the ministry, I completed a Bachelor of Commerce in Management of Information Systems and Management Science.

As a kid, I was a bit of a computer nerd, but over time, I increasingly got more interested in dealing with people and less interested in dealing with computers. After finishing my theological training about ten years ago, I've lived with one foot in the IT industry and one foot in the Christian ministry. The job I'm currently doing is the first one in which I'm doing both at once, though. I'm working for a theological college in Melbourne, coordinating on their Distance Education program. As part of that, I work on the Moodle E-learning software and also take care of the main Drupal based website.

I started programming in my early teens. Our family's first computer was a Dick Smith VZ-200 computer (they were sold in Australia and New Zealand), but it was when I get a Commodore 64 that I made my first real attempt at programming. I learned machine code, to the extent that I wrote a little pop-up menu system with most of the code residing in RAM that was normally hidden by the C-64's 16KB Basic ROM.

I'm not really sure that I do have a favourite programming language. The main two I use at the moment are C for the kernel work, and PHP for my website development work. I like them both, but it's a case of horses for courses: C is great for kernel programming and PHP is great for websites. To flip the question on its head, the language I like least is Scheme. I had to learn it while doing the Computer Science papers in my B.Com, and I think all those brackets do more for the sale of headache tablets than they do for readability! :)

What are your hobbies?

I like gardening a little — I'm currently watching tulips pushing their way up through the soil for the third time, and I really love watching them grow.

I'm also an active member of our local State Emergency Service unit — we are a group of volunteers who provide help in case of floods, storms and other emergencies. I've really enjoyed learning to safely climb a roof, manage traffic and more, and am looking forward to the other courses I still have to do — and of course to putting them into practice.

Do you have any pets at home (especially cats as most of our readers enjoy them very much)? What are their names? Do they help you in programming?

I'm sorry to say that I don't have any pets at home. But I do have two children who I love to bits. My wife Michelle and I have a 13 year old son Alisdair, and a 3 year old daughter named Irene. Alisdair wants to learn to do programming, but he hasn't done so yet.

What was the starting point of suspend2 developing? When it started? Why it was named «suspend2» but not «suspend»?

My work on hibernation started about 10 years ago, when I began to use Linux in earnest. Being at theological college, I wanted to run Bible study software, but it was all Window based. So I had to start Linux, then Win4Lin, then Logos (the Bible Study software). It would be five minutes before I got any 'real' work done. I began to look at hibernation as a way of speeding up the process.

At that stage, hibernation wasn't yet in the kernel. It was about the time of 2.4.16 and the early days of the 2.5 series kernels. Gabor Kuti and Pavel Machek had developed a very basic version of software suspend, which they abbreviated to swsusp. I gave it a try and joined the swsusp list on Sourceforge. I then slowly began to learn how it worked and learn C, and started to contribute little patches to make it faster, more reliable and user friendly. If I recall correctly, Gabor didn't have a lot of time to work on the code, so it naturally progressed into me taking over the development of swsusp.

At about the same time, Pavel got a version of the code merged into the 2.5 kernels, but not (as far as I recall) in consultation or cooperation with the community. I made some efforts at working with Pavel, but they didn't work and the lines slowly diverged.

Regarding the name, that was one thing that always annoyed me — how are you supposed to pronounce 'swsusp'?! So I started just calling it 'software suspend'. We eventually got to the point that I was happy to call it 1.0 (July 2003).

Development continued, until we got to a 2.0 version at the end of January 2004. This was abbreviated to 'suspend2'. Development of Suspend2 continued, but after 2005 it slowed a lot as my priorities changed and the software matured. In more recent times, Rafael Wysocki came on the scene, and wanted to refer to hibernation software as hibernation rather than 'suspend'. To accommodate that desire, I sought suggestions for a new name, and TuxOnIce was suggested and eventually adopted. We kept the version numbering, so now we're at 3.2.

Is Linux kernel power management code good enough? What should be done to improve it?

The kernel power management code still has a long way to go. Huge progress has been made, but there still still a lot of work to do because to do power management really well, we need a combination of run-time power management (ie saving power while you're using the system) and system state management (suspending and/or hibernating). We need to deal with a ton of different architectures and devices, and be flexible enough to deal with varying needs of different users and different scenarios. On top of this, good power management requires not only a well written kernel, but also well written applications. The best power management system can be defeated by one badly behaved problem sitting (for example) in a tight loop polling for events.

So power management is much more than just the small hibernation problem that I work on. It really is everyone's problem.

But let me be a bit more narrow in my focus and just concentrate on hibernation. In this area, the answer is the same: no, the code we have in the kernel at the moment isn't (in my opinion) good enough. I believe that software should reliable, flexible, user friendly, efficient and bug free, as much as possible. The current code has come a long way from the version that was initially merged, but it still has a long way to go.

Part of this is my fault. I've not made a sustained attempt at getting TuxOnIce code merged. I have sought to get code reviews some years ago, with an aim to merging, but I'd made too many changes from the version that was already in the kernel. It was impossible for someone to do a good job of reviewing the code. On top of that, I didn't really understand what was wanted/required of me, so I wasted a lot of time and effort. TuxOnIce has a lot of valuable features that should, I believe, get merged. But that needs to be done one at a time, and it needs someone with more time than I have. Or it needs Linus to bite the bullet and decide to merge TuxOnIce as is. I don't believe that's going to happen.

So I guess the best thing people can do to improve power management in the Linux kernel is pitching in and helping. It could be in seeking to get bits of TuxOnIce (or improved versions) merged, or it could be in seeking to improve what's already in the kernel. Of course you don't need to be a programmer to help — just letting us know that there's a problem with a particular device driver is also incredibly useful. There's no way that any kernel developer can test their code on every configuration, and they certainly can't fix bugs if they don't know they exist. Just telling us you have a problem, helping us find the cause and testing fixes is a great service you can provide without having to cut a single line of code.

What can you say about high power consumption in Linux 2.6.38 (please, refer to the article)?

I'm sorry, but I'm not able to help there — I haven't been keeping up with all the changes in the kernel between versions. I can however say that when come across such regressions all of us can help diagnose the cause by running something called a git bisect. The basic idea is that as well as trying 2.6.37 and 2.6.38, we try a kernel that has roughly half the changes between the two versions, and see if it has the issue. If it does, we've just narrowed down the problem by ruling out half of the changes between the two versions as being the cause. We do this repeatedly until we can say 'this patch causes the problem'. It usually takes about 12-16 iterations of building and testing the kernels to do this, but it's much simpler and more accurate than guessing. If you Google for 'git bisect', you'll find some good tutorials.

Some people claim that ACPI in Linux causes many problems. Why so? Any comments on it?

The difficulty open source developers have faced since open source began is that we often have to interact with software that isn't open source and often isn't well written. In the Linux kernel context, this means — among other things — dealing with your computer's BIOS. BIOS writers are like all other computer programmers — they make mistakes, misunderstand specifications or perhaps just read the specifications differently to how others have read and implemented them. Sometimes there are problems with the specifications themselves. All of this means that when they guys at Intel came to write the ACPI implementation for Linux, they couldn't just follow the specification and expect it to work. They need to deal with brokenness ('quirks') all over the place. I don't know ACPI well enough to say that there are no problems with the ACPI spec itself, but I do know from what I've heard over the years that a big part of the problem is not so much ACPI itself as various BIOSes and ACPI tables that BIOS developers have written.

Is vanilla kernel hibernation code quality high enough? What are advantages and disadvantages of using vanilla hibernation?

As I said earlier, I think a lot could be done to improve the vanilla kernel hibernation code. At its heart, it's stable and solid, and not requiring patching is a big advantage. But there are a ton of opportunities for improvement. To name just a few, speed could be significantly increased by — among other things — multithreading and readahead. It could be given support for ordinary files (non-swap), which would avoid races in low memory conditions and help reliability (no issues with insufficient storage if this is done). Reliability could be improved by doing some of the calculations regarding whether we'll have enough memory and storage prior to the atomic copy (the amount of memory needed for drivers is generally pretty predictable). And the code could be made into loadable modules so that the memory can be used for other things when you're not wanting to hibernate. This doesn't matter so much for desktops, but embedded systems want to hibernate too — especially if it's fast to do so.

That said, the vanilla kernel's hibernation code has had a lot of work done since it was first merged. It is much more reliable and user friendly than used to be the case. BUG_ON()s are no longer even close to being a standard debugging tool!

Can Linux borrow some ideas of power management from other (e.g. BSD) systems?

It's been a long time since I ran BSD, but I'm sure there will be some potential for us sharing ideas with one another. That's one of the great strengths of open source, particularly when it comes to small, lone developers like me. We tend not to worry so much about taking out patents or hiding our secrets from competitors. We're much more focused on the quality of the software itself. So yes, I'm sure there will be ideas we can share with one another to make the end result for all our users better than it might otherwise have been.

What is the purpose of TuxOnIce existing? Why are you developing it?

TuxOnIce exists to give users the best Linux hibernation support they can possibly enjoy. I'm developing it first and foremost because I want to use it, but also because I have a ton of loyal users who keep encouraging me to improve it and keep providing it. I'm also more than happy to give back to the community something. After all, I've been running free software on my computers for more than 10 years — it's only fair that I give something back.

Do you prefer to turn off your computer or to hibernate it? How often do you turn your computer completely off?

I use TuxOnIce most of the time. There are occasions where I run swsusp for testing, or do simple power offs, but they're the exception rather than the rule.

About a year ago, I purchased an SSD drive for my laptop. I was amazed to see the speed difference it made. My read and write speeds with the old drive were about 100MB/s (a 50MB/s drive with LZF compression of the image effectively doubling the speed). With the SSD drive, my image is written at about 250MB/s and read back at about 380MB/s. With speeds like that, hibernating even 4GB of memory doesn't take a long time, and the advantage of getting back all the programs you were running and documents you had open just like you'd never turned off the computer is unbeatable. Why would I want to do a simple power off? :)

What are main differences between vanilla kernel hibernation code and TuxOnIce? Is TuxOnIce better than vanilla hibernation?

The two implementations share a lot of code. They make exactly the same calls to the driver model, and follow the same basic pattern of freezing processes, doing an atomic copy, writing the copy to disk and the powering down.

One of the key differences is that whereas the vanilla kernel performs single threaded I/O, submitting the pages in batches, TuxOnIce is multithreaded, and doesn't use batches. This gives higher throughput (depending upon your specific hardware configuration, of course).

Another big difference is that TuxOnIce saves the image in two parts. Memory is divided into pages that won't be needed in reading or writing the image (mainly process and LRU pages), and all other pages (you can confirm the first lot of pages aren't needed by enabling checksumming of the pages, which adds a small overhead to your hibernation time). In this mode (which is the default, but can be disabled), TuxOnIce writes the unused pages to disk first, then does an atomic copy of the remaining pages, copying them to unused memory and the memory used by the first group of pages. This atomic copy is then written to disk before powering off. Working in this way allows us to write a complete image of memory (the first group of pages generally comprise far in excess of 50% of RAM).

Swsusp, on the other hand, does an atomic copy of all pages, meaning that the largest image you can ever write using it is 50% of your RAM. If you have more than that in use at the start of a cycle, it will seek to free memory until that '50% free' constraint is satisfied. It actually seeks to have more than 50% free, because you need some memory to be available for writing the image.

The tradeoff between freeing RAM and writing a whole image is that writing (and reading) a bigger image takes longer, but gives you a more responsive system post-resume. Writing a smaller image takes less time, but the system then has to fault back in pages post-resume (which is slower, especially on rotating media, because of the seeking involved and the greater overhead in faulting), and freeing memory also takes some time.

TuxOnIce has historically been the first version to provide a good number of the new features. It had SMP support first, a nice user interface, swap file support first and now has features like checking last-mount times that the in kernel version doesn't have.

Is there any intention on merging TuxOnIce into vanilla kernel instead of existing hibernation subsystem? Have you tried to put TuxOnIce into mainline?

I'd like to see that happen, but my shift in priorities in the last few years has meant that I struggle to find the time.

As I mentioned earlier, I have sought code reviews a few times, but it hasn't really happened. What will probably need to happen will be an incremental improvement of the existing inline code. That will be a lot of work and take a long time, but it's the only way ahead I can see working.

If vanilla hibernation works well for me, should I use TuxOnIce? Why yes or why no?

If vanilla hibernation works well for you, keep using it. If you're not satisfied with some aspect of it, feel free to give TuxOnIce a try. More importantly, though, give feedback to the developers as to what you'd like to see improved and why. We can't fix problems if we don't know they exist.

How many developers except yourself work on TuxOnIce?

Over the years, I've had some terrific and invaluable help from Bernard Blackham and others. Bernard developed the userspace user interface that is still used today. Others have done huge amounts of testing and given great feedback. But the kernel patch itself has always been my baby. Others have contributed patches, but I've done the design, development, maintenance and documentation.

Do you make commits into mainline kernel? What subsystems are you interested in except power management?

I've occasionally made contributions in hibernation related areas such as the memory manager, but that hasn't happened very often. I really only became a 'kernel hacker' because I wanted to see the improvement in the code I wanted to use each day.

If users deal with bugs in TuxOnIce, how they can help you to improve its quality? How they should gather debugging information in case TuxOnIce fails once a month?

The most important thing for finding bugs is getting the information on where a problem occurs. The ideal is if you can narrow things down to a particular line of code and perhaps a particular configuration. To achieve that, all you really need is to be running a kernel with debugging information available. Then, when an oops occurs (if that's the sort of bug that's happening), you write down the address that's given and use the addr2line utility after rebooting to get the line at which everything stopped.

Context is usually important, so you will probably also want to get the addresses of the last 4 or 5 routines that were in the calling chain, and convert them to files and line numbers as well.

This will give a picture of what code path led to the oops.

The other part of the picture is describing how your computer is configured. That will involve attaching your kernel config to the bug report (there's a compile time option that puts it in /proc/config.gz, which I highly recommend!). You may also want to describe how your storage is configured (file? swap partition? swap file?). Finally, your dmesg can be invaluable.

If you can use netconsole and TuxOnIce's interactive debugging to get an even more detailed picture of the leadup to the problem, that would be better again. Netconsole can also be great for grabbing a backtrace of all processes.

Other useful tools include kdb (particularly in the content of kernel mode setting!) and your digital camera (take a picture rather than writing down all the details on your screen — just pick a resolution where the information is readable but the download isn't overwhelming).

What distribution do you use? What is your favourite DE?

I use Ubuntu, mainly because things generally just work. For a desktop environment, I use xfce4, but replace the panel with AWN.

Do you support merging such things as BFS, BFQ, reiser4 into vanilla kernel? Do you think Linus opinion about these things is well-weighted?

To answer the second question first, I have to admit that I don't know what Linus' opinion about these things is. Unlike many in the kernel community, I'm somewhat on the outside — I'm not a career kernel programmer (though I've occasionally thought I'd like to be), and am much more a user who just wants to hibernate quickly, reliably and often! Right now, I'm not even subscribed to the Linux Kernel Mailing List. In fact, the only non-TuxOnIce Linux related list I'm on is the Power Management list, and I don't read that enough!

Now on to the first question:

Some problems are hard to solve, and different approaches have different merits. In addition, different users have different priorities in what they seek. This is certainly true when it comes to schedulers and filesystems, so I agree wholeheartedly with the idea of putting multiple options and tuning knobs into the kernel, and giving the user options. This is one of the things I've sought to do in TuxOnIce. If you look in /sys/power/tuxonice, you'll find a number of options for tuning the software according to your preferences, precisely because one size doesn't fit everyone.

What do you think about kernel version numbering change?

3.0 has been talked about for years, and I'm happy to see it finally come about. At the same time though, it would have been nice to see some big change that warranted the rollover. One thing I do wish is that Linux had just let it be 3.0.0. Why complicate life unnecessarily?

Have you ever faced kernel bug #12309? Any comments on it?

Yeah, every time I suspend a virtual machine in VMware. I'm not going to make any comments though, because I know that scheduling is a hard problem to solve. That's one thing I love about working on TuxOnIce — once everything else is frozen, there aren't too many scheduler issues to deal with!

Whom of kernel developers do you know personally («offline»)?

I've met a few developers over the years at the kernel summit I went to and the Linux.Conf.Au conferences I've been to, but I don't know any of them really well. It's part of the result of not being an employee of Redhat, Intel, Ubuntu or the like.

I did have the pleasure of being in Canberra about 7 years ago and having a little bit to do with Rusty Russel and some of the guys who were at IBM then, but nothing serious or enduring.

What developers team among opensource teams is the most well-organized and professional?

I've only been part of the Drupal community for about 4 years, but I'm quite impressed with the way they do things.

What other opensource projects do you contribute to?

I contribute to a number of Drupal modules (Mailfix and Fasttoggle, and to a lesser extent OG Mailing list), and have recently started maintaining a fork of the pam-mysql module on Github.

Are you ready for IPv6?

No. I'm very ignorant of IPv6, but expect that will have to change now that v4 addresses have all but run out. Still, I'll only learn as much as I have to in order to get it working. You can't know everything and Google is your friend! :)

What is your best achievement in your life?

Hmm. That's a hard one. I tend not to think too much about what I achieve or about being proud of achievements. I guess I'd have to say I'm happy with where TuxOnIce has ended up — it feels quite stable and mature now (although problems with drivers and changes in the vanilla kernel mean there's always more work to do). I'm also happy to be being a helpful influence in my family, church and community. At the end of the day, I think it's far more important to serve than to be served, and if I can look back on a life in which I've done that, I'll be happy.

Thanks for your answers!

You're welcome. Thanks for asking!