How to test used/renewed hard drives
My Testing methodology (u/echogecko795)
This is something I developed to stress both new and used drives so that if there are any issues they will appear.
Testing can take anywhere from 4-7 days depending on hardware. I have a dedicated testing server setup.
I use a server with ECC RAM installed, but if your RAM has been tested with MemTest86+ then your are probably fine.
1) SMART Test, check stats
smartctl -i /dev/sdxx
"-i, --infoPrints the device model number, serial number, firmware version, and ATA Standard version/revision information. Says if the device supports SMART, and if so, whether SMART support is currently enabled or disabled. If the device supports Logical Block Address mode (LBA mode) print current user drive capacity in bytes. (If drive is has a user protected area reserved, or is "clipped", this may be smaller than the potential maximum drive capacity.) Indicates if the drive is in the smartmontools database (see '-v' options below). If so, the drive model family may also be printed. If '-n' (see below) is specified, the power mode of the drive is printed."
smartctl -A /dev/sdxx
"-a, --allPrints all SMART information about the disk, or TapeAlert information about the tape drive or changer. For ATA devices this is equivalent to'-H -i -c -A -l error -l selftest -l selective'and for SCSI, this is equivalent to
'-H -i -A -l error -l selftest'.Note that for ATA disks this does not enable the non-SMART options and the SMART options which require support for 48-bit ATA commands."
smartctl -t long /dev/sdxx
"-T TYPE, --tolerance=TYPE[ATA only] Specifies how tolerant smartctl should be of ATA and SMART command failures.The behavior of smartctl depends upon whether the command is "optional" or "mandatory". Here "mandatory" means "required by the ATA/ATAPI-5 Specification if the device implements the SMART command set" and "optional" means "not required by the ATA/ATAPI-5 Specification even if the device implements the SMART command set." The "mandatory" ATA and SMART commands are: (1) ATA IDENTIFY DEVICE, (2) SMART ENABLE/DISABLE ATTRIBUTE AUTOSAVE, (3) SMART ENABLE/DISABLE, and (4) SMART RETURN STATUS.
The valid arguments to this option are:
normal - exit on failure of any mandatory SMART command, and ignore all failures of optional SMART commands. This is the default. Note that on some devices, issuing unimplemented optional SMART commands doesn't cause an error. This can result in misleading smartctl messages such as "Feature X not implemented", followed shortly by "Feature X: enabled". In most such cases, contrary to the final message, Feature X is not enabled.
conservative - exit on failure of any optional SMART command.
permissive - ignore failure(s) of mandatory SMART commands. This option may be given more than once. Each additional use of this option will cause one more additional failure to be ignored. Note that the use of this option can lead to messages like "Feature X not implemented", followed shortly by "Error: unable to enable Feature X". In a few such cases, contrary to the final message, Feature X is enabled.
verypermissive - equivalent to giving a large number of '-T permissive' options: ignore failures of any number of mandatory SMART commands. Please see the note above."
2) BadBlocks -This is a complete write and read test, will destroy all data on the drive
badblocks -b 4096 -c 65535 -wsv /dev/sdxx > $disk.log
3) Real world surface testing, Format to ZFS -Yes you want compression on, I have found checksum errors, that having compression off would have missed. (I noticed it completely by accident. I had a drive that would produce checksum errors when it was in a pool. So I pulled and ran my test without compression on. It passed just fine. I would put it back into the pool and errors would appear again. The pool had compression on. So I pulled the drive re ran my test with compression on. And checksum errors. I have asked about. No one knows why this happens but it does. This may have been a bug in early versions of ZOL that is no longer present.)
zpool create -f -o ashift=12 -O logbias=throughput -O compress=lz4 -O dedup=off -O atime=off -O xattr=sa TESTR001 /dev/sdxx
zpool export TESTR001
sudo zpool import -d /dev/disk/by-id TESTR001
sudo chmod -R ugo+rw /TESTR001
4) Fill Test using F3 + 5) ZFS Scrub to check any Read, Write, Checksum errors.
sudo f3write /TESTR001 && f3read /TESTR001 && zpool scrub TESTR001
If everything passes, drive goes into my good pile, if something fails, I contact the seller, to get a partial refund for the drive or a return label to send it back. I record the wwn numbers and serial of each drive, and a copy of any test notes
8TB wwn-0x5000cca03bac1768 -Failed, 26 -Read errors, non recoverable, drive is unsafe to use.
8TB wwn-0x5000cca03bd38ca8 -Failed, CheckSum Errors, possible recoverable, drive use is not recommend.
References:
https://www.reddit.com/r/homelab/comments/17oobde/what_do_you_use_to_test_your_drives_that_you_buy/
https://linux.die.net/man/8/smartctl
...
No comments to display
No comments to display