When developing a "user movement tracking algorithm" using accelerometer and gyro data, what methods are typically used?
Hello! Wearables friends.
I’ve recently been studying an algorithm that uses accelerometer and gyroscope data from wearables to more intelligently track and analyze user movement.
The problem is… When searching for information, I only see terms like Kalman filter, complimentary filter, sensor fusion, and quaternion floating around,
but there’s surprisingly little information on how people actually implement and tune them.
So, I have some questions: ↓↓
❓ Question
When identifying human movement based on IMU sensor data,
→ What are the most commonly used sensor fusion algorithms these days?
(Madgwick? Mahony? EKF?)
When performing step detection, wrist gestures, and micro-movements,
→ I’m curious about how the raw data is smoothed/denoised.
FFT-based frequency analysis? Or is a simple low-pass filter sufficient?
In real-world applications,
→ “Basic filter + machine learning model (RNN/LSTM)”
is this combination often used? Or are classical algorithms also quite performant?
Hey, welcome to the IMU rabbit hole.
This is one of those areas where everyone name-drops Kalman, Madgwick, Mahony, quaternions… and then somehow skips the part where you actually make the thing work.
So yeah, totally valid confusion.
Let me answer this the way it usually comes up on Reddit: slightly opinionated, grounded in “people actually ship this” reality.
First: why it feels like all the good info is missing
Most practical IMU know-how lives in one of three places:
-
firmware repos with zero documentation
-
academic papers that assume you already know everything
-
old forum posts written by someone who disappeared in 2014
So when you search, you get terms, not implementation intuition. You’re not doing anything wrong.
Sensor fusion for human movement: what people actually use
Short answer: yes, all the names you mentioned are real, but they’re used very differently depending on constraints.
In wearables, the most common setups look like this:
Madgwick and Mahony filters are everywhere, especially when you don’t have a magnetometer or want something cheap and stable. They’re popular because they’re fast, reasonably accurate, and don’t require you to babysit covariance matrices at 2am. Madgwick tends to converge a bit faster; Mahony feels a bit more “controlled” if tuned well. In practice, both are fine and the difference often matters less than people want to believe.
EKF (Extended Kalman Filter) shows up when:
-
you have more sensors
-
you need estimates beyond orientation
-
or someone on the team really knows Kalman filters
They’re powerful, but tuning them is not beginner-friendly. A badly tuned EKF is worse than a well-tuned Madgwick filter, full stop. That’s why many consumer wearables quietly avoid EKF unless absolutely necessary.
Meta point: for human activity recognition, you usually don’t need “perfect orientation.” You need something stable and consistent. That’s why simpler fusion methods dominate.
About denoising and smoothing (the unsexy but critical part)
This is where a lot of systems quietly succeed or fail.
Despite what papers might suggest, most real systems start with very boring filters.
Low-pass filters are doing way more work than FFT in production systems. Human motion lives in a pretty narrow frequency band, especially for step detection and gestures. A simple low-pass (sometimes combined with a high-pass for gravity removal) gets you 80% of the benefit with almost zero complexity.
FFT and frequency-domain analysis definitely exist, but they’re usually:
-
for offline analysis
-
or feature extraction stages
-
or cadence estimation
Real-time micro-movement detection? Most teams don’t want the latency and complexity of FFT unless they need it.
A common pattern is:
raw IMU → low-pass / band-pass → windowing → features or ML
Nothing fancy. Just disciplined.
Step detection, wrist gestures, micro-movements
For step detection, many commercial systems still rely on some form of peak detection on filtered acceleration magnitude, often with adaptive thresholds. ML can help, but it’s surprisingly easy to overcomplicate something users have been doing reliably since pedometers existed.
Wrist gestures are harder. Here you’ll often see:
-
temporal windows
-
smoothed gyro signals
-
relative motion features rather than absolute orientation
Micro-movements are the hardest category, mainly because signal-to-noise is awful. That’s where careful filtering, normalization, and per-user adaptation matter way more than your choice of model.
“Filter + ML” — is this actually common?
Yes. Very common. Almost the default, honestly.
The typical real-world stack looks like:
-
basic filtering and gravity compensation
-
sensor fusion for stability
-
windowing and normalization
-
ML for classification or prediction
RNNs / LSTMs show up when temporal context matters (gestures, sequences), but plenty of systems still get great results with simpler models like CNNs or even classical classifiers on handcrafted features.
And this is important:
classical algorithms are absolutely still competitive when:
-
data is limited
-
power and latency matter
-
interpretability is important
A clean signal + simple logic often beats a noisy signal + fancy model.
The uncomfortable truth
Most “IMU intelligence” isn’t magic algorithms. It’s:
-
good filtering
-
reasonable assumptions
-
aggressive handling of edge cases
The fusion algorithm name matters less than:
-
sensor placement
-
sampling rate
-
filter tuning
-
how you handle transitions and noise
People don’t like hearing this because it’s not sexy, but it’s very real.
![WEARABLE_INSIGHT [FORUM]](https://wearableinsight.net/wp-content/uploads/2025/04/로고-3WEARABLE-INSIGHT1344x256.png)

