Intership title : AI characterization of the MR SWI-DWI ischemic mismatch for hyperacute ischemic stroke patients
Supervisors : Dominique Fourer, Geoffroy Peeters and Côme Peladeau
Team / Laboratory : SIAM / IBISC (EA 4526) – Univ. Évry-Paris-Saclay
Collaborators : LTCI TelecomParis
Fundings : ANR AQUA-RIUS Project (https://fourer.fr/aquarius/)
1 Context
Since the rise of recording during the twentieth century, audio processors have been widely used in music production [1]. Processors usually fall into two categories: synthesizers which generate a new signal, and audio effects which transform an already existing signal. The term “audio effects” coins a variety of processes: spectral (equalization), temporal (delay, reverberation), time-variant filtering (chorus, flanger, phaser), dynamic processing (compression, limiter), non-linear processing (distortion) [2], etc.
2 Goal
The work revolves around the task of blind estimation of audio effects. Audio effects take as input an audio signal x and a set of parameters v and output an audio signal y, as shown on figure 1.
Now, we aim to create a system which takes as input only the transformed signal y and computes an estimation of the parameters v. Traditional methods based on deep learning rely on supervised learning with data pairs {v, y}. The considered neural network 𝑓𝜃 is trained to minimize a parameter-based metric between the estimated parameters vˆ = 𝑓𝜃(y) and the ground truth parameters v. This metric can be for instance the mean-squared error ∥vˆ −v∥22. This approach is illustrated in figure 2.
Recent approaches propose to implement audio processors as differentiable units 𝑔. This is called differentiable digital signal processing, or DDSP as coined by Engel et al. [3]. These differentiable processors allow the to compute the gradients of outputs w.r.t. the inputs signals and parameters, thus they can be inserted in deep learning. For example, in [4], we proposed to use differentiable audio effects (DDAFx) in our framework. We trained our neural network so that the output of DDAFx matches the target sound by minimizing an audio metric 𝑚 between the output of those DDAFx yˆ = 𝑔(x, vˆ) and the ground truth signal y:
L(x, y, 𝜃) = 𝑚(𝑔(x, 𝑓𝜃(y)), x) (1)
= 𝑚(𝑔(x, vˆ), x) (2)
These approaches illustrated in figure 3 have two advantages:
-
-
-
- The ground truth parameters are no longer needed to train the network [5]
- This allows the neural network to have better performance in terms of audio matching
-
-
In this approach, the audio effects chain 𝑔 has to be fixed. However, it could be beneficial to let the network predict an appropriate effects chain before estimating its parameters.Some works try to tackle this issue but rely on labeled data: we want our approach to rely on differentiable audio effects and to use only x and y. This could be done using a winner-takes-all training scheme [6], and would require to design and train an appropriate classifier. DDAFx typically suffer from high training costs [7], so attention could be brought on methods to keep the computation times low.
3 Required profile
- Strong knowledge in machine learning (deep learning) and signal processing (filtering, time-frequency analysis).
- Programming skills, particularly in Python and the pytorch
- High motivation, strong productivity, and a methodical approach to work.
- An interest in audio and music processing is a plus.
References
[1] Thomas Wilmering et al. “A History of Audio Effects”. en. In: Applied Sciences 3 (Jan. 2020), p. 791. issn: 2076-3417. doi: 10.3390/app10030791.
[2] Udo Zölzer, ed. DAFX: Digital Audio Effects. en. 2nd ed. Wiley, Mar. 2011. isbn: 978-0-470-66599-2. doi: 10.1002/9781119991298.
[3] Jesse Engel et al. “DDSP: Differentiable Digital Signal Processing”. en. In: Proc of ICLR. Addis Ababa, Ethiopia: ICLR, Apr. 2020.
[4] Côme Peladeau and Geoffroy Peeters. “Blind Estimation of Audio Effects Using an Auto-Encoder Approach and Differentiable Digital Signal Processing”. In: of IEEE ICASSP. Seoul, South Korea: IEEE, Apr. 2024, pp. 856–860. doi: 10.1109/ICASSP48485.2024.10448301.
[5] Christian J. Steinmetz et al. “Automatic Multitrack Mixing with a Differentiable Mixing Console of Neural Audio Effects”. In: of IEEE ICASSP. Toronto, Ont., Canada: IEEE, June 2021, pp. 71–75. doi:
10.1109/ICASSP39728.2021.9414364. (Visited on 03/18/2024).
[6] Stefan Lee et al. “Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles”. In: of NeurIPS. Vol. 29. Curran Associates, Inc., 2016.
[7] Chin-Yun Yu et al. “Differentiable All-Pole Filters for Time-Varying Audio Systems”. In: of DAFx. Surrey, United Kingdom: DAFx, 2024, pp. 345–352.
- Date de l’appel : 27/11/2024
- Statut de l’appel : Non pourvu
- Contacts cotés IBISC : Dominique FOURER (MCF IUT Évry, IBISC équipe SIAM) dominiqueDOTfourerATuniv-evryDOTfr, Geoffroy PEETERS et Côme PELADEAU
- Sujet de stage niveau Master 2 (format PDF)
- Web équipe SIAM