4 articles found
Microsoft's new open-source TTS model can synthesize feature-length audio with multiple speakers, but comes with audible disclaimers and watermarking to prevent misuse.
Microsoft's VibeVoice model can generate 90-minute multi-speaker podcasts that blur the line between synthetic and human speech, raising ethical questions about audio deepfakes.
With 2.65x faster CPU inference, BitDistill signals a potential shift toward CPU-efficient AI deployment, reducing reliance on expensive GPU infrastructure.
Microsoft's UserLM-8b flips the script by training AI to think like messy, inconsistent humans instead of perfect assistants.