How do I recover a volume after repair process silently failed?

bjorn
bjorn Posts: 8  Freshman Member
My NAS540 started beeping two weeks ago. The volume was degraded as drive 2 failed. Following the web-interface instructions, I replaced it and initiated repair. A day later, I tried to login to the web interface and it just hung forever. A few posts said the repair may take a while, so I just waited...

...for five days.

At that point, I logged in via SSH. CPU usage was low, and there were no processes which resembled anything that may be performing a repair.

I restarted the device.

Now when I login, I get a "before you start using your NAS" greeting.



My old Shared Folders are still visible in the control panel, but they all say "lost." The file browser shows no results.



The storage manager shows no volumes or disk groups.



All drives are healthy,



Did the web GUI repair process fail and destroy the volume? And, is there any way to reinitiate recovery?

«1

All Replies

  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    That doesn't look good. Can you login over ssh as root and post the output of

    fdisk -l
    cat /proc/partitions
    mdadm --examine /dev/sd[abcd]3
  • bjorn
    bjorn Posts: 8  Freshman Member
    ~ # fdisk -l
    
    Disk /dev/loop0: 144 MiB, 150994944 bytes, 294912 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock0: 256 KiB, 262144 bytes, 512 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock1: 512 KiB, 524288 bytes, 1024 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock2: 256 KiB, 262144 bytes, 512 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock3: 10 MiB, 10485760 bytes, 20480 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock4: 10 MiB, 10485760 bytes, 20480 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock5: 110 MiB, 115343360 bytes, 225280 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock6: 10 MiB, 10485760 bytes, 20480 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock7: 110 MiB, 115343360 bytes, 225280 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/mtdblock8: 6 MiB, 6291456 bytes, 12288 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 21EE10AA-752B-4744-9421-343874E5EE0B
    
    Device       Start        End    Sectors  Size Type
    /dev/sda1     2048    3999743    3997696  1.9G Linux RAID
    /dev/sda2  3999744    7999487    3999744  1.9G Linux RAID
    /dev/sda3  7999488 7814035455 7806035968  3.7T Linux RAID
    
    Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: 7B0BC181-BA8F-410B-BB00-12B62214BE8A
    
    Device       Start        End    Sectors  Size Type
    /dev/sdb1     2048    3999743    3997696  1.9G Linux RAID
    /dev/sdb2  3999744    7999487    3999744  1.9G Linux RAID
    /dev/sdb3  7999488 7814035455 7806035968  3.7T Linux RAID
    
    Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 812FBB8D-F40F-4685-852C-BFBF2DC2A8E0
    
    Device       Start        End    Sectors  Size Type
    /dev/sdc1     2048    3999743    3997696  1.9G Linux RAID
    /dev/sdc2  3999744    7999487    3999744  1.9G Linux RAID
    /dev/sdc3  7999488 7814035455 7806035968  3.7T Linux RAID
    
    Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 55DC49C1-620A-4F4C-B96D-5C2959CC8F07
    
    Device       Start        End    Sectors  Size Type
    /dev/sdd1     2048    3999743    3997696  1.9G Linux RAID
    /dev/sdd2  3999744    7999487    3999744  1.9G Linux RAID
    /dev/sdd3  7999488 7814035455 7806035968  3.7T Linux RAID
    
    Disk /dev/md0: 1.9 GiB, 2045706240 bytes, 3995520 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disk /dev/md1: 1.9 GiB, 2046754816 bytes, 3997568 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    

    
    ~ # cat /proc/partitions
    major minor  #blocks  name
    
       7        0     147456 loop0
      31        0        256 mtdblock0
      31        1        512 mtdblock1
      31        2        256 mtdblock2
      31        3      10240 mtdblock3
      31        4      10240 mtdblock4
      31        5     112640 mtdblock5
      31        6      10240 mtdblock6
      31        7     112640 mtdblock7
      31        8       6144 mtdblock8
       8        0 3907018584 sda
       8        1    1998848 sda1
       8        2    1999872 sda2
       8        3 3903017984 sda3
       8       16 3907018584 sdb
       8       17    1998848 sdb1
       8       18    1999872 sdb2
       8       19 3903017984 sdb3
       8       32 3907018584 sdc
       8       33    1998848 sdc1
       8       34    1999872 sdc2
       8       35 3903017984 sdc3
       8       48 3907018584 sdd
       8       49    1998848 sdd1
       8       50    1999872 sdd2
       8       51 3903017984 sdd3
      31        9     102424 mtdblock9
       9        0    1997760 md0
       9        1    1998784 md1
      31       10       4464 mtdblock10
    

    
    ~ # mdadm --examine /dev/sda3
    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 9d8a6c1f:bb790cfd:5c04a459:9213646c
    
        Update Time : Sat Nov 21 15:26:51 2020
           Checksum : b9ad081d - correct
             Events : 533
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 0
       Array State : AAAA ('A' == active, '.' == missing)
    

    
    ~ # mdadm --examine /dev/sdb3
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 8594f107:efdda91d:618b64fb:ba5b1ec8
    
        Update Time : Wed Nov 25 13:39:37 2020
           Checksum : c051785a - correct
             Events : 1707
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : spare
       Array State : ..AA ('A' == active, '.' == missing)
    

    
    ~ # mdadm --examine /dev/sdc3
    /dev/sdc3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 9232419a:22d8698c:e5ca4ca7:00915b5a
    
        Update Time : Wed Nov 25 13:39:37 2020
           Checksum : ff8684e2 - correct
             Events : 1707
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 2
       Array State : ..AA ('A' == active, '.' == missing)
    

    
    ~ # mdadm --examine /dev/sdd3
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : b754d116:8d0e00cd:9507af1a:410b6d87
    
        Update Time : Wed Nov 25 13:39:37 2020
           Checksum : 5d209464 - correct
             Events : 1707
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 3
       Array State : ..AA ('A' == active, '.' == missing)
    

  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    I have some bad news, I'm afraid. The command 'mdadm --examine /dev/sd[abcd]3` shows the headers of the 4 raid members of the data array, and it seems you exchanged the wrong disk.
    /dev/sda3 (the first disk, I think the left one), shows as latest update Sat Nov 21 15:26:51, and as array state AAAA, which means that member thinks all members were active at that time. It's not updated, which means this disk was dropped from the array. The others were updated at Wed Nov 25 13:39:37, and their array state is ..AA, which means they 'know' sda3 is dropped, and sdb3 is missing. Of course sdb3 is not really missing, but it's Role is 'spare', which means it is not added to the array. Which was not possible, as only 2 active members were available after you exchanged sdb.
    Yet it might be possible to restore the array. Do you still have the old disk?


  • bjorn
    bjorn Posts: 8  Freshman Member
    Hrm. I'm vaguely following, although I'm unsure how I swapped the incorrect disk going from the restoration screens illustration.

    do still have the failing disk though.
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    although I'm unsure how I swapped the incorrect disk going from the restoration screens illustration.
    I'm not pretending the firmware did it right. I have never seen the 'exchange disk' illustrations myself. Although it's easy to see which disk failed in some status file, and I would think translating that to instructions shouldn't be difficult. But who knows?

    Anyway, can you post the output of
    cat /proc/mdstat
    exchange the old sdb back, (and powercycle) and post again
    cat /proc/mdstat
    mdadm --examine /dev/sd[abcd]3


  • bjorn
    bjorn Posts: 8  Freshman Member
    Fair. I appreciate your time. Latest executions: Before,
    
    ~ # cat /proc/mdstat
    
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md1 : active raid1 sda2[0] sdd2[3] sdc2[2] sdb2[4]
    1998784 blocks super 1.2 [4/4] [UUUU]
    
    md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[4]
    1997760 blocks super 1.2 [4/4] [UUUU]
    
    unused devices: 
    
    After swapping old drive back,
    
    ~ # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md1 : active raid1 sdb2[4] sda2[0] sdd2[3] sdc2[2]
          1998784 blocks super 1.2 [4/4] [UUUU]
    
    md0 : active raid1 sdb1[4] sda1[0] sdd1[3] sdc1[2]
          1997760 blocks super 1.2 [4/4] [UUUU]
    
    unused devices: 
    

    
    ~ # mdadm --examine /dev/sda3
    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 9d8a6c1f:bb790cfd:5c04a459:9213646c
    
        Update Time : Sat Nov 21 15:26:51 2020
           Checksum : b9ad081d - correct
             Events : 533
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 0
       Array State : AAAA ('A' == active, '.' == missing)
    

    
    ~ # mdadm --examine /dev/sdb3
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : 0a5c35b6:3bd8a182:5030b8be:51bbe238
    
        Update Time : Thu Oct 22 21:21:39 2020
           Checksum : 77ae1fd - correct
             Events : 47
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 1
       Array State : AAAA ('A' == active, '.' == missing
    

    ~ # mdadm --examine /dev/sdc3
    /dev/sdc3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 9232419a:22d8698c:e5ca4ca7:00915b5a
    
        Update Time : Wed Nov 25 13:39:37 2020
           Checksum : ff8684e2 - correct
             Events : 1707
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 2
       Array State : ..AA ('A' == active, '.' == missing)

    ~ # mdadm --examine /dev/sdd3
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : ad82b6f7:6aacc5f3:c7a86a8b:25240df4
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Thu Jul 27 13:12:32 2017
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : b754d116:8d0e00cd:9507af1a:410b6d87
    
        Update Time : Wed Nov 25 13:39:37 2020
           Checksum : 5d209464 - correct
             Events : 1707
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 3
       Array State : ..AA ('A' == active, '.' == missing)
  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited December 2020
    You didn't swap the wrong disk. The old disk was dropped from the array at Thu Oct 22 21:21:39.
    So what happened, I think, is that disk sdb was dropped at Oct 22. Don't know why you wasn't notified before ~Nov 12. So at Oct 22 the state of sd[acd]3 was changed to A.AA. You exchanged disk sdb, and the state changed for all disks to AAAA, while sdb3 was rebuilding. I think that should have taken around 24 hours (4TB @ ~50MB/sec).
    Before rebuilding was finished sda was dropped, which kept state AAAA (as it was not written after being dropped), while the other disks were upgraded to ..AA. Partition sdb3 lost it's active state because it was not fully rebuild, and the array was down, so further rebuilding was no longer possible.
    According to your story you started rebuilding ~Nov 12, and it should have taken around 24 hours, so the array went down ~Nov 13.Don't know why sd[bcd]3 have an update stamp of Nov 25. AFAIK there should have been no updates after the array went down.

    sd[acd]3 should contain your data, except an (maybe small) error on sda3, which caused it to be dropped. The old sdb3 is probably not usable anymore, as it's content is around 3 weeks older than the rest of the array, so an array build with this disk has almost certainly a corrupt filesystem, but if everything else fails, it can be tried.
    The new sdb3 has an unknown status. It is possible that it is mainly empty, it is also possible that it's almost completely build.

    In most cases a disk is dropped because it has an unreadable sector. Unwritable is less obvious, because the disk will transparently swap in a spare sector. When we rebuild the array from the current sd[acd]3, it is possible that you can access all your data, as the error can be on an unused part of the disk, not in use by a file. But if you add a 4th disk to be rebuild, the whole surface is read, as the raid array is below the filesystem, and doesn't know about files. So in that case sda might be dropped on the same unreadable sector.
    The options are:
    1) rebuild the current sd[acd]3 array, and copy away the data, in hope the unreadable sector is not in use. That odds are better if there is a lot of free space.
    2) rebuild the current sd[acd]3 array, and add an sdb, and hope the best of it.
    3) clone the current sda to a new disk, to get rid of that sector, rebuild sd[acd]3, and add an sdb.
    4) combine 1 and 3.

    I would go for 1, assuming there is not too much data. The reason is that if sdb and sda are both dropped due to an unreadable sector, what are the odds that sdc and sdd, which are from the same batch, I suppose, will also have unreadable sectors? So save your data to an independent disk. You need a backup anyway.

    Having said that, the command to rebuild the array from sd[acd]3 is

    mdadm --create --assume-clean --level=5 --raid-devices=4 --metadata=1.2 --chunk=64K --layout=left-symmetric --bitmap=none /dev/md2 /dev/sda3 missing /dev/sdc3 /dev/sdd3

    That is a single line. As you can see I skipped /dev/sdb3 in the command, and used 'missing', so the array will be build degraded. After rebuilding you can re-enable your shares.
  • bjorn
    bjorn Posts: 8  Freshman Member
    This makes sense. The NAS is in a second bedroom that we've kept shut to conserve heat. I may have just not heard the beeping for a while. (And there are relatively few writes to it.)

    I tried to execute the mdadm command. But got,

    mdadm: '--bitmap none' only support for --grow



  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    You can omit the --bitmap=none. Nice that each version of mdadm is slightly different.
  • bjorn
    bjorn Posts: 8  Freshman Member
    I ran the command and it output,


    BusyBox v1.19.4 (2020-03-18 07:23:22 CST) built-in shell (ash)
    Enter 'help' for a list of built-in commands.
    
    ~ # mdadm --create --assume-clean --level=5 --raid-devices=4 --metadata=1.2 --chunk=64K --layout=left-symmetric /dev/md2 /dev/sda3 missing /dev/sdc3 /dev/sdd3
    mdadm: /dev/sda3 appears to be part of a raid array:
        level=raid5 devices=4 ctime=Thu Jul 27 13:12:32 2017
    mdadm: /dev/sdc3 appears to be part of a raid array:
        level=raid5 devices=4 ctime=Thu Jul 27 13:12:32 2017
    mdadm: /dev/sdd3 appears to be part of a raid array:
        level=raid5 devices=4 ctime=Thu Jul 27 13:12:32 2017
    Continue creating array? y
    mdadm: array /dev/md2 started.
    
    Logging in now I see,



    I assume this puts me in the bad outcome condition?

Consumer Product Help Center