Check Ssd/hdd health with S.m.a.r.t.. And back up data before drive failure

S.M.A.R.T. is a built-in self-monitoring system in SSDs/HDDs that can warn you before a drive fails, but only if you read the right attributes and react early. Check key indicators (reallocated sectors, pending sectors, SSD wear, CRC errors), set alerts, and start backups immediately when trends worsen-then replace the drive before it becomes unreadable.

Critical S.M.A.R.T. indicators that predict drive failure

  • Reallocated Sector Count (HDD/SSD): any increase is a red flag; repeated growth usually means the media is degrading.
  • Current Pending Sector Count (HDD): unstable sectors waiting to be remapped; treat as urgent if non-zero or increasing.
  • Uncorrectable Sector Count (HDD/SSD): read/write errors the drive could not fix; prioritize immediate backups.
  • UDMA CRC Error Count (mostly SATA setups): often indicates cable/port issues; if it rises, fix the link before blaming the drive.
  • SSD Wear/Health metrics: look for Percentage Used / Media Wearout Indicator / Remaining Life decreasing faster than normal.
  • SMART overall health "PASSED" is not enough: many drives fail while still reporting PASSED; rely on attribute trends.

How S.M.A.R.T. works: core concepts and common attributes

S.M.A.R.T. stores counters and error logs inside the drive firmware. It's useful for intermediate users who can read attributes, compare changes over time, and act on decision triggers (backup now, replace soon). It's less useful when the drive disappears from BIOS/UEFI, clicks/beeps repeatedly, or the OS freezes on every access-those cases need "get data off" priority, not diagnostics.

  • Normalized value vs threshold: vendor-scaled "health score"; crossing threshold is late-stage.
  • Raw value: the underlying counter; often the best early warning, but interpretation differs by vendor.
  • Common HDD attributes: 05 Reallocated Sector Count, C5 Current Pending Sector, C6 Uncorrectable Sector Count, 07 Seek Error Rate (vendor-specific), 0A Spin Retry Count.
  • Common SSD attributes: Percentage Used, Media Wearout Indicator, Total Host Writes/Reads, Available Spare, Unsafe Shutdowns.

Do not "repair" a dying drive with heavy scans as your first step. Full-surface tests and repeated chkdsk/fsck can accelerate failure when the drive is already unstable.

Practical tools for reading S.M.A.R.T. data on Windows, macOS and Linux

You'll need: (1) admin/root privileges, (2) direct access to the physical drive (USB enclosures sometimes hide S.M.A.R.T.), and (3) a way to save reports over time.

Tool comparison (choose by OS and connection type)

Tool OS Best for Notes / limitations Quick start
smartmontools (smartctl, smartd) Windows / macOS / Linux Accurate attribute readout, automation, logs Some USB-SATA bridges need specific device flags; NVMe uses a different attribute set smartctl -a /dev/sdX or smartctl -a /dev/nvme0
CrystalDiskInfo Windows Fast GUI checks; easy trend spotting Less detail than smartctl; interpretation depends on vendor mappings Open app → select drive → view attributes
Hard Disk Sentinel Windows Long-term monitoring, alerts, health scoring Paid features; health percent is tool-specific Install → enable background monitoring
DriveDx macOS Mac-friendly SMART dashboards and warnings Some Macs/bridges restrict SMART access; may require permissions Install → grant required access → run scan
GNOME Disks Linux Quick GUI SMART view + self-tests GUI is convenient, but you still need to interpret key attributes Open Disks → Drive → SMART Data

Concrete commands you can copy

  • Linux (SATA HDD/SSD): sudo smartctl -a /dev/sdX
  • Linux (NVMe SSD): sudo smartctl -a /dev/nvme0 (or use nvme smart-log /dev/nvme0 if you prefer nvme-cli)
  • macOS: install smartmontools (Homebrew) → sudo smartctl -a /dev/diskX
  • Windows (smartctl): run as Administrator → smartctl -a /dev/sdX (device numbering differs; list first with smartctl --scan)

If you're searching in Thai for a โปรแกรมเช็คสุขภาพ SSD or โปรแกรมเช็คสุขภาพ HDD SMART, pick one GUI (CrystalDiskInfo/DriveDx) plus smartctl for "second opinion" and raw logs.

Interpreting raw values vs threshold scores: what signals real risk

  • Risks and limitations (read before you start):
    • SMART "PASSED" can still coincide with imminent failure; don't use it as your only decision point.
    • Raw values are vendor-specific; compare your drive against its own history (trend), not against random screenshots online.
    • USB adapters may not pass through SMART reliably; if results look empty/odd, test the drive on SATA/NVMe directly.
    • Running long tests on a failing drive can worsen instability; back up first if you already see errors increasing.
  1. Capture a baseline report (now) and save it

    Export or copy the SMART report to a dated file so you can compare changes later. On smartctl, redirect output to text.

    • Example: smartctl -a /dev/sdX > smart_2026-05-25.txt
  2. Identify your drive type (HDD vs SATA SSD vs NVMe)

    Attribute names and failure modes differ. NVMe uses "Percentage Used" and error log counters; HDDs rely heavily on sectors and reallocation.

  3. Prioritize "data integrity" attributes over temperature or power-on hours

    Temperature spikes matter, but sector/pending/uncorrectable indicators and SSD wear are better predictors of data loss risk.

    • HDD focus: 05, C5, C6 (and sometimes 0B Recalibration Retries).
    • SSD focus: Percentage Used / Remaining Life, Available Spare, Media and Data Integrity Errors (NVMe).
  4. Use simple decision triggers you can act on

    Trigger actions based on presence and trend, not on a single "health percent." If any of these are non-zero and rising, treat it as urgent.

    • Immediate backup: C5 (Pending) > 0, C6 (Uncorrectable) > 0, or any reallocated sectors that keep increasing.
    • Fix the connection first: CRC errors rising (often cable/port); re-check after replacing cable/port.
    • Plan replacement: SSD wear indicators showing low remaining life or rapidly increasing Percentage Used.
  5. Run the least invasive self-test only after backups start

    If the system is stable enough, run a short self-test first. Use long tests only when you're not risking your last readable copy.

    • smartctl short test: smartctl -t short /dev/sdX then smartctl -l selftest /dev/sdX
  6. Decide: keep monitoring, migrate, or replace

    If critical attributes are stable over multiple checks, keep monitoring. If they worsen, migrate to a new drive and retire the old one. When comparing replacements (including ซื้อ SSD ราคา and ซื้อ ฮาร์ดดิสก์ HDD ราคา decisions), prioritize warranty, workload fit, and reliability history over small price gaps.

Continuous monitoring strategies and automated alerting

  • Log SMART snapshots on a schedule (daily/weekly) and compare against your last baseline, not just "current status."
  • Enable background monitoring with notifications (smartd on Linux/macOS; a resident monitor on Windows).
  • Alert on trend changes: any increase in reallocated/pending/uncorrectable, not only threshold failures.
  • For SATA systems, alert if CRC errors increase after a reboot (often a cable/port problem to fix immediately).
  • For NVMe, alert on Percentage Used jumps and increasing media/data integrity error counters.
  • Keep at least one offline copy of SMART logs (a failing system drive may corrupt local logs).
  • Re-check SMART after power events or crashes (unsafe shutdowns can correlate with filesystem corruption and SSD internal stress).
  • Validate that SMART is actually visible (if using USB, confirm pass-through or test directly attached).

Backup policies before imminent failure: priorities and timelines

ทริคดูสุขภาพ SSD/HDD ด้วย S.M.A.R.T. และวิธีแบ็กอัปก่อนพัง - иллюстрация

When SMART indicates elevated risk, your policy should shift from "nice-to-have backups" to "get a restorable copy first, then optimize." If you're choosing a โปรแกรมสำรองข้อมูล Backup อัตโนมัติ, confirm it supports versioning and verification, not only syncing.

Pre-failure backup checklist table (practical order of operations)

Priority What to back up Why it matters Safe method Stop and reassess if...
1 Irreplaceable user data (documents, photos, project folders) Highest value, smallest size Copy to external SSD/HDD or cloud; keep folder structure Copy causes freezes/clicking or repeated read errors
2 Browser profiles, password vault exports, key app data Hard to reconstruct, easy to miss Use app export + file copy; store separately You can't locate profile paths reliably-pause and document first
3 System image (optional, if stable) Fast recovery if the drive is still readable Imaging tool with verification; avoid repeated retries Imager reports many bad blocks or the system becomes unstable
4 Full disk clone (only if needed) Last resort for migration/forensics Prefer read-mostly, skip-bad-block strategy Clone is thrashing the disk for hours with no progress

Common mistakes that make data loss more likely

  1. Running heavy "repair" scans before copying data: surface scans and repeated filesystem repairs can turn readable sectors into unreadable ones on a weak drive.
  2. Trusting a single health score: "Good 90%" doesn't override rising reallocated/pending/uncorrectable counters.
  3. Syncing instead of backing up: a sync tool can propagate deletions/encryption; use versioned backups.
  4. Backing up to the same physical device: different partitions on the same failing drive don't count as a backup.
  5. Writing large amounts of new data to the failing disk: avoid moving big libraries around; read and copy out first.
  6. Ignoring cabling when CRC errors rise: replacing a SATA cable is cheap and can stop corruption, but it doesn't fix existing bad sectors.
  7. No verification: at least spot-check restored files; ideally use backup verification or checksums for critical archives.
  8. Delaying replacement once attributes trend worse: if indicators keep rising, treat the drive as non-trustworthy even if it still boots.

Safe recovery and validation: restoring data and testing drive health

  • Migrate to a new drive and restore from backups: best when SMART trends worsen but the system is still readable; retire the old drive after successful restore and verification.
  • Use a file-level recovery approach: appropriate when the OS is unstable but the filesystem is partially readable; prioritize user folders before attempting full imaging.
  • Use a controlled disk image workflow: useful when you need to preserve what's readable without stressing the drive repeatedly; make one image, then recover from the image.
  • Professional data recovery service: appropriate if the drive is not detected, makes unusual mechanical sounds, or contains business-critical data you cannot risk with DIY attempts.
  1. Validate restored data: open representative files (documents, photos, archives) and confirm application data (mail, projects) loads correctly.
  2. Re-check SMART on the replacement drive: establish a clean baseline report immediately after setup.
  3. Set monitoring and backups on day one: ensure alerts and your automated backup schedule run successfully at least once.

Quick answers on monitoring and backing up before a drive dies

Is SMART "PASSED" a guarantee my SSD/HDD is fine?

No. "PASSED" often stays true until late-stage failure. Use attribute trends (reallocated/pending/uncorrectable, wear indicators) as your real decision signals.

Which SMART attributes should make me back up immediately?

Any non-zero and increasing Current Pending Sector Count or Uncorrectable Sector Count, and any reallocated sectors that keep growing. For SSDs, rapidly worsening wear/remaining life metrics also justify immediate backups.

Why do I see CRC errors but no bad sectors?

CRC errors commonly point to a bad SATA cable/port or unstable connection. Fix the link, then monitor whether the counter continues to increase.

Can I rely on a USB enclosure to read SMART?

ทริคดูสุขภาพ SSD/HDD ด้วย S.M.A.R.T. และวิธีแบ็กอัปก่อนพัง - иллюстрация

Sometimes, but not always-many bridges hide or distort SMART data. If results look incomplete, connect the drive directly via SATA/NVMe for a trustworthy reading.

Should I run a long SMART test on a drive that's already throwing errors?

Only after you start copying critical data. A long test can add stress and may accelerate failure when the drive is unstable.

What's the safest "first backup" when failure seems close?

Copy the most irreplaceable folders to a different physical device or cloud, then verify a sample restore. Don't start with a full clone if the drive is already stalling.

When should I replace instead of keep monitoring?

Replace when critical attributes increase across multiple checks, when read errors appear during normal use, or when the system becomes unstable during file access. Treat "worsening trend" as the trigger, not a single screenshot.

Scroll to Top