Skip to content

Emphasis detection

The app reads each song’s lyrics at analyze time and asks the local Qwen LLM to identify emphasized lines (hooks, drops, big lyrical beats). At render time, the per-style effects engine pulls back during emphasis and cranks above its style baseline outside emphasis. The visible result: emphasized lyrics land cleanly with subdued motion; connective bars feel kinetic and dramatic.

How it works

  1. Analyze time — the song’s lyrics (with Whisper-aligned per-line timestamps) are sent to a single Qwen call. The LLM returns {"emphasis_lines": [3, 4, 12, 13, 25]}. Consecutive lines are grouped into intervals; boundaries are snapped to nearby bass-onset peaks/valleys (within ±0.5s) so emphasis windows align to musical phrasing instead of mid-syllable.

  2. Cached — result lives at <song-dir>/emphasis.json. Subsequent renders reuse it. The cache is invalidated when you save new lyrics for the song.

  3. Render time — the effects composer reads emphasis.json and the active style’s swing config (per-style calm_mult / crazy_mult), then weaves a time-varying multiplier into the ffmpeg filter graph. Four operators react: brightness flash, pulse, chorus pulse, zoom bump. Ken Burns and overlay operators (grain / leaks / dust) stay at their style baseline due to ffmpeg filter limitations — they form the “ambient hum” that’s relatively constant.

Per-style swing

Each style declares its own swing range. Cinematic gets gentle drama; Kinetic gets max push-pull.

Stylecalm × baselinecrazy × baseline
Cinematic0.61.2
Kinetic0.22.5
Illustrated0.71.0 (subtle)
Visualizer0.32.0

The style profile defines the average vibe; the envelope swings around it.

Asymmetric ramp

Effects wind down for ~1 second BEFORE an emphasized line lands (anticipation), then hard-cut back to crazy after the emphasis ends. Music-video editor’s classic “calm before the storm” pattern.

Failure modes

  • LLM error or malformed response → no emphasis intervals; render uses the static style profile (current pre-emphasis behavior). Logged as [emphasis] failed.
  • Bass-onset data missing → boundary snap is a no-op; intervals use raw lyric timestamps.

Known limitations

  • Ken Burns motion is currently static — ffmpeg’s zoompan filter parser splits on commas in its z= parameter before reaching the eval engine, so the multi-comma envelope expressions (if(between(t,a,b),X,Y)) cause Eval errors. The other 4 envelope-aware operators use eq= and scale= filters which handle multi-comma expressions correctly. Ken Burns intensity-during-emphasis is on the future-enhancement list; the visible effect is mostly carried by beat-sync flashes, brightness pulse, chorus pulse, and zoom bumps.
  • Overlay opacity is staticgrain_overlay, leaks_overlay, dust_overlay use ffmpeg’s blend filter which requires a literal all_opacity= value. Overlays form the ambient hum that doesn’t swing with emphasis.

No UI in v1

There’s no review or override panel for v1 — the LLM picks emphasis lines once at analyze time, and the result drives the next render. Diagnostic logs ([emphasis] cached <N> intervals for <song>) let you confirm what was selected.