Sleep Tracking Technology: PSG, Actigraphy, Wearables, and Accuracy
Polysomnography (PSG) remains the gold standard for sleep staging with trained technologist scoring; consumer wearables show 85–95% sleep/wake accuracy but overestimate total sleep time by 30–60 min and struggle with N1/N2 discrimination.
| Measure | Value | Unit | Notes |
|---|---|---|---|
| PSG inter-rater agreement (epoch-by-epoch) | 80–85 | % agreement | Trained scorers using AASM rules; N1 is hardest stage to agree on (60–70%) |
| Consumer wearable sleep/wake accuracy | 85–95 | % accuracy | Chinoy et al. 2021; Oura, Fitbit, Apple Watch vs PSG; wrist actigraphy benchmark |
| Wearable total sleep time overestimation | 30–60 | minutes too long | Due to misclassifying quiet wakefulness as light sleep; consistent across devices |
| Wearable sleep stage accuracy | 60–70 | % correct staging | vs PSG; weakest on N1 (<50%) and REM discrimination; deep sleep better classified |
| Actigraphy vs PSG total sleep time correlation | r = 0.82 | Pearson correlation | Smith et al. 2018; AASM recommended for circadian rhythm disorders, not staging |
Polysomnography (PSG): Gold Standard
PSG records:
- EEG (electroencephalography): Brain electrical activity; 4–6 electrodes; identifies sleep stages by wave frequency/amplitude
- EOG (electro-oculography): Eye movements; identifies REM by rapid movements and NREM by slow rolling
- EMG (electromyography): Chin muscle tone; absent in REM (atonia), present in NREM
- SpO2: Oxygen saturation for apnea detection
- Airflow: Nasal/oral cannula; detects apneas and hypopneas
- Effort belts: Thoracic/abdominal; differentiates obstructive from central apnea
EEG scoring follows AASM 2017 rules: 30-second epochs scored into Wake, N1, N2, N3, or REM. N1 is the most ambiguous stage — even trained scorers agree only 60–70% of the time.
Actigraphy
Wrist-worn accelerometry measures physical movement and light exposure. Algorithms infer wake from movement and sleep from stillness. Validated against PSG for:
- Total sleep time (correlation ~0.82)
- Sleep/wake detection (~86% accuracy)
- Circadian phase estimation (rest-activity rhythm)
Actigraphy fails at staging and systematically overestimates sleep in hypersomnolence (sleeps without moving) and underestimates in insomnia (lies awake motionless).
Consumer Wearables: Accuracy Landscape
Chinoy et al. (2021) compared 7 devices against in-lab PSG:
| Metric | Best Device | Worst Device | Mean |
|---|---|---|---|
| Sleep/wake F1 | 0.93 | 0.85 | ~0.90 |
| N3 sensitivity | 0.71 | 0.31 | ~0.52 |
| REM sensitivity | 0.73 | 0.48 | ~0.62 |
| TST bias | −8 min | +65 min | +30 min |
Devices use PPG (photoplethysmography) for heart rate, accelerometry for movement, and increasingly skin temperature, HRV, and SpO2. Machine learning models trained on lab PSG generalize imperfectly to home environments.
The “Orthosomnia” Problem
Clinical observation: patients who become preoccupied with optimizing wearable sleep scores develop anxiety and maladaptive behaviors that worsen actual sleep. Scored “bad nights” trigger bedtime anxiety; pursuit of deep sleep scores paradoxically increases cortical arousal. Wearable data presented without clinical context can trigger iatrogenic insomnia. Research criteria for orthosomnia as a variant of health anxiety are under development.
Related Pages
Sources
- Depner CM et al. — Wearable technologies for developing sleep and circadian biomarkers. Sleep (2020)
- Chinoy ED et al. — Performance of seven consumer sleep-tracking devices compared with polysomnography. Sleep (2021)
- de Zambotti M et al. — The sleep of the ring: comparison of the ŌURA sleep tracker against polysomnography. Behav Sleep Med (2019)
- Smith MT et al. — Use of actigraphy for the evaluation of sleep disorders and circadian rhythm sleep-wake disorders. J Clin Sleep Med (2018)
Frequently Asked Questions
Can I trust my fitness tracker's sleep staging data?
For total sleep time (±30–60 min) and detecting major sleep disruption, consumer wearables are reasonably informative. For precise staging (how much N3 or REM), they are unreliable — misclassifying 30–40% of epochs versus PSG. The fundamental limitation is that heart rate and accelerometry cannot replicate the EEG signal that defines sleep stages. Stage-specific biomarkers like sleep spindles (N2), delta waves (N3), and PGO waves (REM) simply do not have reliable peripheral correlates. Use wearables for longitudinal trends, not clinical decisions.
When is polysomnography actually needed?
PSG is indicated when: suspected sleep apnea (apnea-hypopnea index determination), narcolepsy diagnosis (requires overnight PSG + next-day MSLT), REM behavior disorder (requires chin EMG to detect REM atonia loss), seizures during sleep, or treatment failure for clinically significant insomnia. For straightforward insomnia evaluation or circadian rhythm assessment, PSG is generally not needed. Actigraphy is AASM-recommended for measuring sleep patterns over days-to-weeks in circadian disorders, which PSG (1 night) cannot capture.