DON’T
LOOK BACK: AN ONLINE BEAT TRACKING METHOD USING RNN AND ENHANCED PARTICLE
FILTERING
Mojtaba Heydari and Zhiyao Duan
Overview
Online beat tracking (OBT) has always
been a challenging task. Due to the inaccessibility of future data and the need
to make inferences in real-time, it is more difficult
than offline beat tracking. We propose Don’t Look back! (DLB), a
novel approach optimized for efficiency when performing online beat tracking.
Related Work
Most
existing online beat tracking methods either apply some offline approaches to a
moving window containing past data to make predictions about future beat
positions or must be primed with past data at startup to initialize. Figure 1
shows the mechanism of the moving window approach for online beat tracking.
This approach has many downsides, for example:
·
They
require some initialization time to construct the first window. In other words,
they don’t give any output until the first window is constructed.
·
They
suffer from discontinuity and potentially computational overload
Fig.
1: Moving window strategy for upcoming
beat/downbeat estimation
Our approach
DLB
feeds the activations of a unidirectional RNN into an enhanced Monte-Carlo
localization model to infer beat positions. Without waiting at the beginning to
receive a chunk, it provides an immediate beat tracking response, which is
critical for many online beat tracking applications. It consists of two main
parts. The first one includes a unidirectional pre-trained RNN that delivers
the beat activations as each frame's observation likelihood. The second part is
a causal sequential Monte Carlo particle filtering that infers beats based on
the given state-space, transition model, and observed activations. Fig.2 is the
overall scheme of our model.
Fig. 2: The block diagram of the proposed online beat tracking method
We approach the problem from the
PF localization perspective, which comprises two motion and correction steps.
According to the importance sampling principle, a high dimensional probability
distribution can be represented by a large number of independent samples from a
known arbitrary proposal distribution. We utilize several particles
(hypotheses) as beat location candidates and update their state based on the
motion model and observation probabilities. In simple words, to obtain the beat
positions, first, we distribute the particles uniformly in our state space
(which will be explained later in this article), then at each iteration, we
take the following steps:
1- Sample particles from proposal distribution which
in our case is transition probability. (motion)
2- Compute the new importance weights (4) based on
observation probability derived from LSTM Beat activations.
3- Resample based on new normalized weights that
discard unlikely hypotheses and generate more rational ones. (correction)
4- Take the median of the positions of all particles and classify the
frame as beat if it is within the beat boundary and is far from previous beat
with a dynamic time threshold equal to half of the median of all particle’s
tempo.
Due to the importance of the
speed, we utilized an efficient state space that is an enhanced version of the
discrete beat pointer model and requires a way smaller number of particles.
Also, the transition model needs sampling only at certain frames. On top of
that, we introduced a new observation model, which increases the performance by
far. Figure 3 shows our state space.
Fig. 3: The
state space, including beat states (blue dots) and non-beat states (grey dots)
At fig.3, the vertical axis
represents the tempo states by including different integer jumping intervals
between adjacent states. The horizontal axis represents the phase of the frame
in the beat interval. The grey dots are non-beat stats, and the yellow dots are
the beat positions in different investigated setups [vertical line boundary,
equal beat states for each row, Gaussian soft transition between beat/non-beat
states]. Finally, the figure 4 shows the performance of our model through
different iterations.
Fig. 4: Proposed
PF inference process. (a): particles are initialized randomly and start to move
right one step per frame (b): particles within the beat boundary gain weight
while many others get discarded, when the first strong beat activation arrives.
(c): significant gatherings move right with different paces. (d): Upon the next
beat activation’s arrival, many gatherings are discarded and the one with the
correct tempo survives; a few double tempo investigators are also added. Blue
line is the median of particles’ positions.
EVALUATION
Here is the F-measure table of the proposed model versus
other online beat tracking methods. To compare the performances of our
inference model versus offline non-causal inference models, two offline methods
are reported as well. As exhibited in table 1, our proposed method
significantly outperforms the state-of-the-art online beat tracking methods.
Furthermore, its response is immediate, and it doesn’t require an
initialization window.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A short video demo of the performance of our
method: https://www.youtube.com/watch?v=u2Ee6WsNzoU |
|||
|
|
|
|
((To learn more about this work and further details, please visit
our original ICASSP 2021 paper named “DON’T LOOK BACK: AN ONLINE BEAT TRACKING
METHOD USING RNN AND ENHANCED PARTICLE FILTERING” from
the link down below))
Arxiv link:
https://arxiv.org/abs/2011.02619