Abstract:
To improve the quality of generated speech signals, this paper proposes a method for taking into account time-varying information about the speaker. Using this technique, the system synthesizes more natural speech with a voice similar to the given target voice in both the voice cloning and voice conversion problems.