Music Rhythmic Analysis
This project encompasses multiple research papers, each addressing issues related to music rhythmic analysis. This post provides an overview of each paper, as outlined below:
1- Don't Look Back: An Online Beat Tracking Method Using RNN and Enhanced Particle Filtering
2- BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking
3- A Novel 1D State Space for Efficient Music Rhythmic Analysis
4- Singing Beat Tracking With Self-supervised Front-end and Linear Transformers
5- SingNet: A Real-Time Singing Voice Beat and Downbeat Tracking System
6- BeatNet+: Real-Time Rhythmic Analysis for Diverse Music Audio (Under Process)
Fig. 1: Don't Look Back Model Overview Including Preprocessing, RNN and Particle Filtering Blocks
Fig. 2: Don't Look Back PF Inference Process: (a) Random particle initialization (b) Beat-boundary particles gain weight; others discarded with strong beat activation. (c) Particles Proceed forward. (d) Surviving the swarm with correct tempo and phase. Blue line: particle median Inferred positions.
Table 1: F-measure report of online/offline beat tracking models and initialization time for online models (GTZAN dataset)
Fig. 3: BeatNet Pipeline
Fig. 4: BeatNet CRNN Neural Structure
Table 2: F-measure report of online/offline beat/downbeat tracking models on 3 datasets
Paper Arxiv (PDF)
GitHub Source
Music time structure analysis faces computational hurdles with state-of-the-art (SOFA) methods being impractical for real-world industrial settings.
Introducing a new state space and semi-Markov model, we use a jump-back reward strategy to transform 2D state spaces into a 1D model, drastically reducing computational complexity.
Fig. 5: Comparison Between traditional state spaces and the proposed 1D state space with Jump reward for beat (a and b) and downbeat (c and d) tracking
Our proposed method matches SOFA joint causal models' performance with over 30 times speedup, making it highly applicable to large music collections in industrial scenarios.
Table 3: System Performance and Speed Comparison between the proposed model vs previous SOTA
Paper Arxiv (PDF)
GitHub Source
Singing voice beat tracking encounters difficulties due to the absence of strong rhythmic and harmonic patterns.
This paper pioneers singing beat tracking, leveraging pre-trained self-supervised WavLM and DistilHuBERT speech representations, with a self-attention encoder layer for beat prediction.
Fig. 6: Singing data and label generation pipeline.
Fig. 7: Neural network structures of the proposed models. (I), (II) and (III) use WavLM, DistilHuBERT and Spectrogram front-ends blocks, respectively, followed by the same linear transformer network.
Experiments demonstrate the proposed system outperforming state-of-the-art methods by a significant margin in beat tracking accuracy.
Table 4: Average performance and speed across segments of several methods on the GTZAN separated vocal tracks.
Paper Arxiv (PDF)
GitHub Source
Real-time singing voice beat and downbeat tracking face
challenges, including non-trivial rhythmic patterns and the impossibility of
correcting inconsistent results.
Introducing the first real-time system, our dynamic particle
filtering approach uses offline historical data for online inference correction
with a variable number of particles.
Fig. 8: Pipeline of the SingNet system
Fig. 9: Example of past-informed process for two time steps (I) and (II): (a) streaming audio arrival; (b) solid blue lines represent historical beats inferred by offline DBN, with dotted red lines as extrapolations; (c) injecting new particles (in green) into beat/tempo state space before resampling; (d) phase correction after resampling.
Experimental results show our approach outperforms the baseline by 3�5%, marking a significant advancement in real-time singing voice beat and downbeat tracking.
Table 5: Evaluation results (F1 scores in %) of different methods of SingNet and comparing them to the baseline models.