Getting started with a raspberry pi, or how I had to fix it

Just a few days ago I realized that the Raspberry PI I use to control my irrigation system was dead. Could not get to the web interface, pings would time out, could not ssh into it.

The Symptoms

The first thing I tried was a simple reboot. The raspberry is in a black box in my backyard, maybe the hot summer days were... too hot? I have a cron job that shuts it down if the temperature goes above 70 degrees. Or maybe the shady wireless card and its driver stopped working? I have another cron job to restart it, so this seems less likely.

So.. I reboot it by phyiscally unplugging it, but still nothing happens. The red led on the board, next to the ethernet plug is on, which means it is getting power. The green led next to it flashes only once. By reading online, this led can flash to report an error, or to indicate that the memory card is being read.

There is no error corresponding to one, single, flash, so I assume it means that it tried to read the flash, and somehow failed. It is supposed to be booting now, so I would expect much more activity from the memory card.

Maybe the card is corrupted, or something bad happened to the file system.

Checking the flash

Removed the memory card from the raspberry, inserted it in my laptop. First thing I do is run fsck.

Note that /dev/sdb is the memory card inserted in my laptop! On your computer, it will likely have a different name. Make sure you don't damage one of your real partitions.

Anyway, the command I use is:

# fsck -f /dev/sdb1

First partition is good, let's look at the second one:

# fsck -f /dev/sdb2

TADA! Lot's of problems reported! This is annoying, the file system was corrupted.

Next step is to back it up, just in case. Although there's not much on it, and it took very little to get it running in the first place, a backup may come in handy.

Backing it up

To copy the memory card, all I had to do was:

# dd if=/dev/sdb of=/opt/backup/raspberry-20130730.img

and let it run until completion.

Checking it

Next step is to fix the file system. Can I really do it? Let's try:

# fsck -f -p /dev/sdb2

Unfortunately, this fails with something like: "fsck failed, please repair manually". Not a good sign. So, let's try once again:

# fsck -y -f /dev/sdb2

This shows several screens of errors. Bad bad sign. Let's try to mount it:

# mount /dev/sdb2 /mnt/tmp

Seems to mount cleanly now. Let's try to put it back, and reboot the system one more time. Still no luck...

Setting up a raspberry from scratch

Unfortunately, I don't have a backup of the original working memory card. So let's start from scratch, like I did when I first got it.

  1. Downloaded latest raspbian image from: http://downloads.raspberrypi.org/

  2. Installed it, with:

    # unzip 2013-07-26-wheezy-raspbian.zip
    # dd if=2013-07-26-wheezy-raspbian.img of=/dev/sdb
    
  3. Next step is configure the network on the memory card, so I can put it back in the raspberry, and finish the setup via ssh. To do so, I need to mount the memory card, and modify a few config files:

    1. I need to tell my linux kernel to re-read the partition table of sdb, so it picks up the position of the partitions I just copied into /dev/sdb, with:

       # sfdisk -R /dev/sdb
      

      Alternatively, I could just have removed the memory card and re-inserted it. But I'm lazy, and the command is more convenient.

    2. Mount the partition somewhere:

       # mkdir -p /mnt/raspberry
       # mount /dev/sdb2 /mnt/raspberry
      
    3. Setup the wireless config. This means editing /etc/network/interfaces on a Debian based system:

       # vim /mnt/raspberry/etc/network/interfaces
      

      and added wpa-ssid and wpa-psk, leading to the file looking like this:

       auto lo
      
       iface lo inet loopback
       iface eth0 inet dhcp
      
       allow-hotplug wlan0
       iface wlan0 inet dhcp
         wpa-ssid "SSID-of-your-wireless-network"
         wpa-psk "password!"
      
       iface default inet dhcp
      
    4. Save and umount.

       # sync # Just in case
       # umount /mnt/raspberry
      

Rebooting, and connecting via network

Now it is time to try it. Let's remove the memory card from the laptop, and put it back on the raspberry. Reboot, the green leds are blinking happily.

In my home server, responsible for my network, I have dhcpd running configured to assign the raspberry a static address. I do so with a block like:

subnet 10.1.40.0 netmask 255.255.255.0 {
  option domain-name-servers 10.1.40.254, 8.8.8.8, 8.8.4.4;
  option routers 10.1.40.254;
  range 10.1.40.20 10.1.40.200;

  group {
    use-host-decl-names on;
    host raspberry {
      fixed-address 10.1.40.9;
      hardware ethernet 80:1f:02:9a:9d:e6;
    }
  }

}

In /etc/dhcp/dhcpd.conf. The mac address 80:1f:02:9a:9d:e6 is the one of my raspberry. You can find it by running the command ifconfig or ip link show on the raspberry itself.

Thanks to that block, the raspberry gets assigned the address 10.1.40.9. If you don't have a similar configuration, or don't know the MAC address of your raspberry, don't despair! It is pretty easy to figure it out.

If you have a dhcp server in your network, you can just look at its logs. Around the time the raspberry is booted, you can probably see a line like:

...
Jul 28 07:13:13 yourserver dhcpd: DHCPDISCOVER from 80:1f:02:9a:9d:e6 via eth1
...

in /var/log/messages.

Alternatively, you can run tcpdump while the raspberry is rebooted, and most likely see its mac address and assigned ip. To do so, you can use something like:

# tcpdump -v -nei wlan0 port 67 or port 68

In any case, the raspberry boots. My:

$ ping 10.1.40.9

eventually succeeds, and I can login with ssh using username pi, password raspberry, and sudo -s to become root:

$ ssh pi@10.1.40.9
Password: raspberry
$ sudo -s

Configuring the raspberry

I use the raspberry as a headless server to control my irrigation system. Unfortunately, raspbian seems to be geared more to desktop users.

Here's what I did to configure it:

  1. Install my ssh keys both for root and pi. This is necessary only if you use ssh-agent and ssh keys.

    $ ssh-copy-id pi@10.1.40.9
    $ ssh pi@10.1.40.9
    $ sudo -s
    # cp -a ~pi/.ssh ~root/.ssh
    # chown root -R ~root/.ssh
    
  2. Disabled password based access. Again, do this only if you use ssh keys. If you don't though, you should make sure to change the password of user pi.

    # passwd -l pi
    # passwd -l root
    
  3. Pruned and installed a few utilities, while updating the system to the latest version:

    # apt-get install bootlogd vim mosh screen bsd-mailx postfix
    # apt-get --purge remove consolekit triggerhappy
    # apt-get --purge remove cups.* xserver.* x11.*
    
    # apt-get update
    # apt-get dist-upgrade
    # apt-get autoclean
    # apt-get autoremove
    
  4. Configured the language, so it would use my language (and above all, stop apt-get and other tools from complaining about a locale not being set):

    # apt-get install locales
    # dpkg-reconfigure locales
    
  5. Changed a few settings, in particular, set RAMTMP=yes, to have /tmp in ram, rather than write on ssd, and mounted boot as read only. Both to protect the file systems, in case something else goes wrong with the SSD:

    # vim /etc/defaults/tmpfs
    ...
    RAMTMP=yes
    ...
    
    # vim /etc/fstab
    ...
    /dev/mmcblk0p1  /boot           vfat    defaults,ro       0       2
    
  6. Given the corruption problem I had, I was tempted to mark the root file system as using data=journal, or even sync. Given that I had not found the root cause of the corruption, in the end I decided to do a back up and leave the setup as is :).

  7. Installed cron jobs. I have two cron jobs on the raspberry pi:

    1. To check the internal temperature, send me an email and reboot the device if it is too hot.

    2. To verify that the wireless is up, and restart it if it is not. I have a tiny USB wireless dongle, an EW-7811Un, which generally works well. However, it does disconnect from time to time, especially if I reboot or poke at the access point :).

    The first script is this one:

    $ cat  ./check-connectivity.sh
    #!/bin/bash
    
    attempts=5
    # This is any machine on your network that is always on. The script tries
    # to ping this machine a few times, if it fails, it restarts the wireless.
    server=server
    
    for n in `seq $attempts`; do
      logger -t "connectivity-check" "Sending ping request $n to '$server'."
      ping -c1 "$server" &>/dev/null && {
        logger -t "connectivity-check" "Server is reachable, nothing to do."
        exit 0
      }
    done
    
    logger -t "connectivity-check" "Server is unreachable, restarting wireless."
    ( 
      set -x
      ifdown wlan0
      rmmod 8192cu
      modprobe 8192cu
      ifup wlan0
    ) 2>&1 | (while read line; do logger -t "connectivity-check" "Output: $line"; done;)
    

    While this is the second one:

    cat ./check-temperature.sh
    #!/bin/bash
    
    temperature=`vcgencmd measure_temp | sed -e 's/.*=\([^.]*\).*/\1/'`
    precise=`vcgencmd measure_temp | sed -e "s/.*=\([^\']*\).*/\1/"`
    email=youremailaddress@whatever.com
    max=70
    
    logger -t "temperature-check" "Temperature: $precise, max: $max"
    if [ "$temperature" -ge "$max" ]; then
      (echo "The temperature is currently $temperature. Greater or equal to $max."
       echo ""
       echo "SHUTTING DOWN THE SYSTEM IN 30 SECONDS") | mail $email -s 'Temperature too high - shutting down!' &>/dev/null
      sync
      sleep 30
      halt
    fi
    

    To configure cron, I had to add the following lines to /etc/crontab:

    * *     * * *   root    /root/utils/check-temperature.sh
    * *     * * *   root    /root/utils/check-connectivity.sh
    

Ok, after I install the web interface of my irrigation system, everything seems to be up again.

Backing up the Raspberry

Now to the backup part. I could remove the memory card again, and do the same I did before. But I am lazy, and would like a backup I can do remotely. So, here's what I did:

  1. Mounted the image I installed on the raspberry locally (remember? one of the first dds in this blog post). Given that the image contains a few partitions, I had to use kpartx to make them available, like this:

    # kpartx -l 2013-07-26-wheezy-raspbian.img
    loop0p1 : 0 114688 /dev/loop0 8192
    loop0p2 : 0 3665920 /dev/loop0 122880
    # mount /dev/mapper/loop0p2 /mnt/raspberry
    

    Make sure to use the loop device created by kpartx, as shown in the output.

  2. Once mounted, used rsync to copy everything from the raspberry to /mnt/raspberry. I have a terrible memory for the options and flags to use, so just used this ac-system-backup here, with something like:

     # ac-system-backup 10.1.40.9 /mnt/raspberry
    
  3. At the end of the sync, unmounted the partitions with:

     # umount /mnt/raspberry
     # kpartx -d 2013-07-26-wheezy-raspbian.img
    
  4. Finally, renamed the image as:

     # mv 2013-07-26-wheezy-raspbian.img backup-raspberry-2013-07-31.img
    

Next time I have to do a backup, I will first copy this image, and then run rsync.

Conclusion

I still have not found the cause of the file system corruption. In theory, ext4 with journaling should be able to recover cleanly from most states, especially on a system that is hardly (if ever) modified, and the only write activities are for logs.

I spent some time looking at the backup, and the file system was terribly corrupted. Eg, directories turned into files, files with the wrong content, and so on. If it is a software problem, it is a nasty one :)

I checked the power supply, and it is very good, both in terms of voltage and amperage. Given that this is not the first time it happens, I suspect the memory card or the thermal cycle causing issues.

If it happens again, I will probably replace the memory card and see.


Other posts

  • Getting back to use openldap While trying to get ldap torture back in shape, I had to learn again how to get slapd up and running with a reasonable configs. Here's a few things I ...
  • Ldap & slapd Back in 2004 I was playing a lot with OpenLDAP . Getting it to run reliably turned out more challenging than I had originally planned for: BerkeleyD...
Technology/LDAP