NSA325v2 RAID 1 recovery

2

All Replies

  • Mr_C
    Mr_C Posts: 14  Freshman Member
    edited October 2019
    So, after a bit more messing about I've found the following - can I presume that the last line "spare rebuilding" is a dead giveaway as to what the wretched thing is doing? D'oh.

    ~ # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
        Update Time : Sun Oct 13 17:56:11 2019
              State : clean, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 1
               Name : Nelly:0  (local to host Nelly)
               UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
             Events : 34813639
        Number   Major   Minor   RaidDevice State
           0       8       18        0      active sync   /dev/sdb2
           2       8        2        1      spare rebuilding   /dev/sda2
    ~ # mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Sun Sep 14 10:15:30 2014
         Raid Level : raid1
         Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
      Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
        Update Time : Sun Oct 13 17:57:17 2019
              State : clean, degraded
     Active Devices : 1
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 1
               Name : Nelly:0  (local to host Nelly)
               UUID : 1ae146c1:423305c9:1ee2a382:b4dc443b
             Events : 34813675
        Number   Major   Minor   RaidDevice State
           0       8       18        0      active sync   /dev/sdb2
           2       8        2        1      spare rebuilding   /dev/sda2

  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    'Spare rebuilding' is what it should do, and according to your /proc/mdstat it indeed started to do so, but stopped it quickly.
    That behavior is reproducible, it seems. I wonder if something is wrong with the source disk, that it always stops around 100MB.

    Can you execute
    <div>dd if=/dev/sdb2 of=/dev/null bs=16M count=64</div><div></div>
    and see if that throws an error? This will copy the first 1000MB from /dev/sdb2 (which is the active raid member) to /dev/null (nowhere). Basically the same as a resync does, except for the destination. If this fails at +/- 100MB, your sdb disk has a problem.
  • Mr_C
    Mr_C Posts: 14  Freshman Member
    This would be the error.  It is the originally faulty disk mind you.

    dd if=/dev/sdb2 of=/dev/null bs=16M count=64
    dd: /dev/sdb2: Input/output error


    I'm just hoping the rebuilding spare is going to get there....
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    I'm just hoping the rebuilding spare is going to get there....
    Apparently not. The rebuild is interrupted each time you start it. It doesn't jump over an unreadable section.

    The original 'good' disk didn't assemble either. Assuming that disk is healty, we can try to repair the array, which basically means creating a new header. The content remains untouched.

    Remove both disks, and put back the original 'good´ disk. Double check it still doesn't assemble. According to your header, the command to create a new array is:

    mdadm --stop /dev/md0<br>mdadm --create --assume-clean --level=1 --raid-devices=2 --metadata=1.2 /dev/md0 missing /dev/sda2
    The 'assume-clean' tells the raid manager not to touch the content of the array. Don't know if that actually matters with raid1, but it won't hurt. The array is build with device role 0 'missing', so it can insert your new disk here, when the array is up.

    Maybe the stop command will fail. I don't know if it's actually 'up'.

    After creating the array you might have to reboot, to get the volume.

  • Mr_C
    Mr_C Posts: 14  Freshman Member
    Thanks for that.  Just to be clear though, the NAS currently has two drives - sdb2 is the faulting drive, sda2 is the drive originally in the volume currently "rebuilding".  The new drive is not in the NAS presently.  So, do I follow the process you've suggested above or switch sda2 out for the replacement drive?  I think the answer is to leave as is but I'm just conscious I'm a desperate noob and could well screw this up utterly....

    Again, you are being awesome Mijzelf - thank you for your support in this and sorry for being dense.
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Originally you had 2 disks, and one failed, and was kicked off the array. The one remaining seems to have a hardware error, which makes it impossible to use normal ways to sync a new disk. The 'kicked off' disk doesn't assemble because a resync was tried and aborted, and now it has a mark 'partly synced'. (Recovery Offset : 215808 sectors). So the idea is to use that disk, and write a new, compatible header, to be able to assemble the array. And hope the reason it was kicked is no showstopper.
    Of course it's possible that it also has a hardware error, which will show up when trying to add the new disk.

  • Mr_C
    Mr_C Posts: 14  Freshman Member
    Hello again - sorry for the delay in responding.

    I'm having difficulty unmounting : 

     mdadm --stop /dev/md0
    mdadm: Cannot get exclusive access to /dev/md0: possibly it is still in use.

    Do I need to unmount before trying the next step? i.e.

    mdadm --create --assume-clean --level=1 --raid-devices=2 --metadata=1.2 /dev/md0 mdaissing /dev/sda2
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    I'm loosing track.
    You have three disks
    • OA: The Old disk which assembles to a degraded array, but which contains a hardware error around 100MB, which causes resyncs to stall.
    • OB: The Old disk which was kicked from the array for an unknown reason, and which doesn't assemble to a usable array, because it is flagged as 'partly synced', but which is supposed to still contain all your data.
    • N: The New disk, which now has a proper partition table, but which data partition contains only 100MB of the data filesystem.
    At this moment your NAS is supposed to contain only disk OB. Right? If it's OA you can't stop the array md0 because it's assembled and mounted. Running the second command on an array which already assembles is meaningless.
    The goal of the command is to get an array form OB, in hope it doesn't have blocking hardware errors, to be able to sync to N.

    If you have OB in, post the output of
    <div>cat /proc/mdstat</div><div><br></div><div>cat /proc/mounts</div>

  • Mr_C
    Mr_C Posts: 14  Freshman Member
    To confirm, I have:

    OA: Old disk, showing as faulty (hardware error)
    OB: Old disk, not faulty, shows as rebuilding constantly
    N: New disk, originally failed to build, only builds from OA but constantly fails to do so
    When NAS contains OA and OB, the volume shows as degraded, rebuilding with both of these disks inserted .  If only OB is inserted, the volume shows as Inactive.

    So, with just OB in:

    ~ # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1]
    md0 : inactive sda2[2](S)
          1952996928 blocks super 1.2

    ~ # cat /proc/mounts
    rootfs / rootfs rw 0 0
    /proc /proc proc rw,relatime 0 0
    /sys /sys sysfs rw,relatime 0 0
    none /proc/bus/usb usbfs rw,relatime 0 0
    devpts /dev/pts devpts rw,relatime,mode=600 0 0
    /dev/mtdblock8 /zyxel/mnt/nand yaffs2 ro,relatime 0 0
    /dev/sda1 /zyxel/mnt/sysdisk ext2 ro,relatime,errors=continue 0 0
    /dev/loop0 /ram_bin ext2 ro,relatime,errors=continue 0 0
    /dev/loop0 /usr ext2 ro,relatime,errors=continue 0 0
    /dev/loop0 /lib/security ext2 ro,relatime,errors=continue 0 0
    /dev/loop0 /lib/modules ext2 ro,relatime,errors=continue 0 0
    /dev/ram0 /tmp/tmpfs tmpfs rw,relatime,size=5120k 0 0
    /dev/ram0 /usr/local/etc tmpfs rw,relatime,size=5120k 0 0
    /dev/ram0 /usr/local/var tmpfs rw,relatime,size=5120k 0 0
    /dev/mtdblock4 /etc/zyxel yaffs2 rw,relatime 0 0
    /dev/mtdblock4 /usr/local/apache/web_framework/data/config yaffs2 rw,relatime 0
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited October 2019
    So, with just OB in:

    <div>~ # cat /proc/mdstat</div><div><br></div>~ # cat /proc/mounts<br><div></div>
    In in this situation mdadm complained it couldn't get 'exclusive access'? That's strange, as md0 clearly isn't mounted. It can't as it is down.

    You are sure you were running the command as root? If yes, something is keeping the array locked. That might be mdadm in daemon mode, which monitors the arrays. In that case try
    killall mdadm
    Repeat until you get 'no processes killed'.

    If that doesn't help, try to find out which process has md0 open:
    lsof | grep /dev/md0
    and kill that process.

    If that doesn't help either, you can try to 'brute force'.
    <div>while ! mdadm --stop /dev/md0 ; do sleep 1 ; done<br></div>
    This will repeatedly try to stop the array, until it succeeds, or until you stop it with Control-C.

    If that doesn't work either, you can try to hotplug the disk. Boot the NAS without disks, and plugin the disk once booted. In that case the firmware will leave the array alone, I think.

Consumer Product Help Center