The goal of this project was to address a few shortcomings of previous guitar tablature transcription models like TabCNN, which performs fret classification independently on each string. The main issue with this approach is that it does not explicitly take into account the correlation between the fretting of different strings, which manifests as a result of bio-mechanical feasibility and the musical relevance of various pitch intervals across the fretboard. Futhermore, it is challenging to capture these relationships naturally due to the small size of contemporary guitar transcription datasets like GuitarSet.
We proposed an alternate output layer formulation for guitar tablature transcription and introduce a novel accompanying inhibition objective to discourage the co-occurrence of unlikely string/fret combination (S/F) pairs when generating predictions. The inhibition weight for each S/F combination pair is informed by a corresponding pairwise likelihood, which is estimated from DadaGP, a large dataset containing GuitarPro files (symbolic tablature). The pairwise likelihood estimates are computed as the intersection-over-union of each S/F combination pair across the entire dataset. Illustrated as a symmetric matrix, the estimated pairwise likelihood for every S/F combination pair, ordered by string and fret (low-to-high), is proportional to the following:
The likelihoods range from zero to one (inclusive), where a zero-weight (dark purple) corresponds to a pair that never co-occurs, such as two frets on the same string, and a one-weight (bright yellow) corresponds to a pair that always co-occurs (identity along diagonal). The complement of the pairwise likelihoods represents weights for computing an inhibition loss for every S/F combination pair, equal to the product of the respective probability activations scaled by the corresponding inhibition weight. The loss for every S/F combination pair is summed to obtain the total inhibition loss, which is minimized jointly with the standard classification loss.
We evaluate several variations of the proposed output layer and inhibition objective and compare against the output layer formulation used in TabCNN. We find that models trained using the inhibition objective produce tablature with lower inhibition loss, indicating that the tablature is more consistent with DadaGP. The produced tablature also contains less duplicate-pitch errors, which arise from the same pitch being falsely predicted on multiple strings.
Frank Cwitkowitz, Jonathan Driedger, and Zhiyao Duan, A data-driven methodology for considering feasibility and pairwise likelihood in deep learning based guitar tablature transcription systems, in Proc. The Sound and Music Computing Conference (SMC), 2022, pp. 131-138.
This work has been partially funded by the National Science Foundation grants IIS-1846184 and DGE-1922591.