The technology already exists, sort of -- see https://medium.com/the-research-nest/voice-cloning-using-deep-learning-166f1b8d8595 -- still in its infancy and largely calibrated for *spoken* audio. I'd reckon in about a year or two it'll be able to do what you're describing.
There's a huge emphasis on AI in audio restoration right now, with Izotope leading the pack. Spleeter, too, works extremely well, and in many cases outperforms Izotope's rebalancing tool. So I wouldn't be surprised if a lot of the innovation comes from outside the industry giants. But there's definitely been a lot of chatter about developing machine learning tools to reconstruct degraded audio, particularly, e.g., where non-degraded audio from the same (or similar track) can be used as a reference.