Unlike images, audio signals are time-dependent and have complex temporal dynamics, making it more challenging to generate realistic synthetic data that captures the nuances of real-world audio. Meanwhile, the complex nature of audio signals, the scarcity of high-quality training data, and the subjective evaluation of audio quality collectively contribute to the ongoing challenges in building near-flawless audio separation models.
Yes, the Demucs mixing model has excellent SNR performance, but it is computationally intensive. It also incorporates a random mechanism, so each time it produces different spectrograms.
Guitar chords are totally whacked. Get a guitarist to help you out with those.
m7 and M7 are a reasonable choice, although min7/Maj7, or mi7/Ma7 are also good too (the pairs should match).
It shows G7 as 12ooo3, which is not how a guitarist would ever play that chord. (32ooo1, or 323oo3 of 353433, in increasing order of difficulty would be correct). Other chords have similar problems. e.g. C7 o3231o should be x3231o).
Thank you for your feedback. Could you please email us the information of your phone model and system version? We will investigate promptly. In the meantime, you can try exiting the program and re-entering to see if that helps. Please also check your network connection.
Source separation is commonly done by applying masks to the spectrogram. Deep learning is used to train the mask masks for different instruments' parameters. As you mentioned, this is the approach we will follow in the subsequent steps.
Thank you for raising the issue. We are continuously optimizing our model, and we are also constantly gathering various UI and business-related bugs. We will continue to optimize and resolve them in the future.
The testing model for guitar separation is currently under development. The test results are somewhat unsatisfactory due to the significant variations in guitar instrument tones, especially for electric guitars. This adds to the difficulty of training