Wearable_Insight_Forum

 

Notifications
Clear all

Do you know anything about this? If anyone knows how to calculate a sleep score, please let me know.

2 Posts
2 Users
0 Reactions
18 Views
(@thomas-kim)
Posts: 5
Active Member
Topic starter
 

I think wearables will also calculate and apply sleep scores.

For example, do you know a method to calculate a score based on sleep duration, percentage of deep sleep, frequency of mid-awakenings, etc., and then apply it to the algorithm?


 
Posted : 05/12/2025 4:59 am
(@david-mun)
Posts: 30
Eminent Member
 

Yep — sleep scores are absolutely a thing in wearables, and spoiler:
there’s way less magic in them than marketing wants you to believe.

Most “sleep scores” are not some deep neurological insight. They’re carefully engineered scoring systems layered on top of sleep staging and basic sleep statistics.

Let’s talk about how people actually do this in the real world.


First: what a sleep score really is (and isn’t)

A sleep score is not a medical diagnosis.
It’s a user-facing abstraction that answers one question:

“How was your sleep last night, compared to what’s considered good for someone like you?”

That’s it.

So companies care less about physiological purity and more about:

  • stability (score shouldn’t jump wildly)

  • interpretability (users understand what improved/worsened)

  • behavior shaping (go to bed earlier, sleep longer)

This has huge implications for the algorithm design.


The core idea: normalize → weight → combine

Almost every sleep score starts the same way.

You take key sleep metrics:

  • total sleep duration

  • sleep efficiency (sleep / time in bed)

  • deep sleep percentage

  • REM percentage

  • wake after sleep onset (WASO)

  • number of awakenings

  • sometimes heart rate / HRV deviation

Each of these gets normalized against a reference:

  • population average

  • age-adjusted range

  • or (best case) the user’s own baseline

So instead of raw values, you’re scoring things like:

“How far from ideal is this?”


A very common (and very real) approach

Let’s say you want a score from 0 to 100.

Behind the scenes, it often looks like this:

Each metric gets converted into a sub-score, usually via:

  • piecewise linear scaling

  • sigmoid / soft thresholds

  • capped ranges (to avoid penalizing extreme sleepers)

For example:

  • 7–8 hours of sleep → near max

  • 5 hours → heavily penalized

  • 9.5 hours → only slightly rewarded

Same thing for deep sleep:

  • below X% = penalty

  • within normal range = neutral

  • above threshold = diminishing returns

This avoids users gaming the score.


Then comes the controversial part: weighting

This is where the “science” quietly turns into product philosophy.

Companies assign weights like:

  • duration: 35–45%

  • efficiency: 20–30%

  • deep + REM: 20–30%

  • awakenings / WASO: small but sharp penalties

These weights are not universal truths. They’re chosen to:

  • reduce day-to-day volatility

  • match how users feel

  • align with sleep research just enough

Two companies can look at the same raw sleep and give different scores — and both will claim to be “accurate.”

They kind of are. They’re just optimizing for different definitions of “good sleep.”


What about awakenings?

This part trips people up.

Most scoring systems don’t punish count as much as:

  • duration of awakenings

  • clustering toward morning

  • interruption of deep/REM phases

Short micro-awakenings are often ignored or lightly penalized, because otherwise everyone looks like a terrible sleeper.

This is where rule-based logic quietly outperforms ML.


Is machine learning used here?

Yes — but mostly upstream, not in the final score.

ML is usually used for:

  • sleep/wake detection

  • sleep stage classification

  • motion artifact handling

The score itself is often calculated using:

  • heuristic formulas

  • weighted sums

  • bounded nonlinear functions

Why?

Because:

  • scores need to be explainable

  • regulators hate opaque health numbers

  • users want to know why their score dropped

A pure NN-generated score is a nightmare to ship.


Personal baselines: where things get interesting

The more advanced systems slowly shift from:

population-based scoring

to:

“you vs your usual sleep”

So the algorithm adapts:

  • your normal deep sleep %

  • your usual duration

  • your own HRV range

Now a “bad night” is:

worse than your typical sleep, not a textbook ideal

This makes scores feel smarter — without changing the core math much.


The most important hidden rule

Sleep scores are intentionally conservative.

They are designed to:

  • change slowly

  • avoid overreacting

  • preserve user trust

A wildly accurate but unstable score feels broken to users.


 
Posted : 09/12/2025 12:51 am
Share: