Hey everyone,

I recently built my first NAS. It was bough used with SAS hardware. I’ve finally got past all the roadblocks and problems that were in my way (I basically bricked a whole SAS drive, a hero of a lemmy user helped me fix it).

Now after filling the 15 TB of RAIDZ2 with around 100gb of data. One of the drives started waiving its white flag and wants to die on me.

I am a complete beginner with no experience with these things.

Is my drive dying and should be replaced? or can it be fixed?

This is the output of the 507 errors that TrueNAS received form it and labelled the vDev as degraded and the drive as faulted:

Output of zpool status and sudo smartctl -a /dev/sdd

As a beginner it looks like this drive is cooked, please let me know if it needs replacing so I can order a new one and replace it right away.

Thank you sooo much!

Edit: SAS not SATA drives

  • non_burglar@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    zpool has very reasonable thresholds for disk failure being enough to kick it from the pool. I’ve seen pool members have a batch of bad blocks and ZFS still chugged along for a few years just avoiding those blocks before the disk finally failed.

    Heed truenas here, replace the disk if you can.

  • sakphul@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    I had similar problems with a single Drive in a new TrueNAS setup. The Drive would come up healthy after most reboots but after some reboots it was unhealthy. For me S.M.A.R.T data die not indicate and errors. I reboot the Maschine often because it is a backup system that only runs during backups.

    Swapping drives (with a known gold drive) did not resolve it. The error was always at the same Drivebay.

    For me thepProblem was the y-split SATA power cable I used. After replacing it the system is working without a problem since.

  • CubitOom@infosec.pub
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    3 months ago

    42441.67 powered on hours is really young for drive death, I normally don’t start seeing issues till 50k.

    Is it making any audible sounds while running?

    I haven’t read the smart data for Seagate in a while but the errors numbers look off and would like to see more details. Haveing errors in itself doesn’t mean much. Normally I look for Reallocated Sector Count, Seek Error Rate, and Uncorrectable Sector Count. But here it’s not telling the details like the types of errors.

    Maybe try sudo smartctl -all /dev/sdd not sure if the -a you used is being interpreteded correctly?

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    Not necessarily. I would shut the system down completely and check the drive connectors. If it’s on a backplane, try swapping slots, or if it’s breakout connector, swap it with another drive (and clear the zpool errors). If the errors start happening on the other drive, it’s a cable problem. If they continue on the same drive, it’s a drive problem. If they stop happening, it was a bad connection and it ought to be fine now.

    That’s kind of a short output from smartctl -a, though. Shouldn’t it include the attribute data? I’d run a smart test (after doing the swap above) and see what it says.

    On a raidz2, I wouldn’t be too concerned about losing a drive, but you should always be prepared to order a replacement if you value your data.

    • s38b35M5@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      I second this. SATA cables are cheaply made and can present issues that seem to indicate drive failure.

      • BigDaddySlim@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        Had this issue once, 2 drives kept not initializing during boot, rebooting a few times got them to register but showed drive errors. I thought either the drives or my SAS card was dying. Fully reseating the connectors fixed it and haven’t had an issue since.

    • ragebutt@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      3 months ago

      I would bet money that drive is done. Cable would be udma crc errors, not media failure. Drive made it 11 years (even if power on time is only about half of that)