hacker-news-custom-logo

Hackr News App

Edge TTS

(github.com)

91 points

by: smy20011

18 hours ago

60 comments

  • BrunoJo

     

    15 hours ago

    next

    [ - ]

    I wouldn't use Edge TTS for commercial projects since it's using an internal Microsoft API that was reverse engineered.

    If you are looking for a commercial API, I just launched a TTS API powered by the the best performing open source model Kokoro: https://www.lemonfox.ai/text-to-speech-api. The API is compatible with OpenAI and ElevenLabs and up to 25x cheaper.

    reply

    rany_

     

    13 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@BrunoJo> It's worth noting that there have been occasions where the library was blocked and it took a few weeks to workaround said block. For example, when a valid Sec-MS-Token became required, it took a while to implement it in the library: https://github.com/rany2/edge-tts/blob/08b10b931db3f788a506c...

    Basically, it's a very bad idea to use this library for anything serious/mission critical. It also is really limited to only taking in text (i.e., no custom SSML, emotion elements, etc) as Microsoft restricts the API to only the features Microsoft Edge itself already supports. Generally commercial users would want these more advanced features and so they'd want to use Azure Cognitive Services.

    At any rate this library was never really marketed, I'm not sure how it blew up. It was really only intended so that I can have audio files I can play back for my Home Assistant instance. Later, I started using it to generate e-books. In general, these are the two main uses of the library AFAIK.

    reply

    ghxst

     

    1 hour ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@rany_> > no custom SSML

    I believe this used to be available for edge tts, very sad to see they removed it.

    If anyone knows of comparable projects that implement something like SSML please do share.

    reply

    bilater

     

    4 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@BrunoJo> Nice I was thinking about launching an API because providers like Replicate have long queues. I think if you can nail down latency and concurrency you may get a lot of users who need reliable fast TTS.

    reply

    laurentlb

     

    11 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@BrunoJo> Interesting, I'm interested in something like this, but the page doesn't have much information. - What languages are supported? - How many voices are available? - Is it possible to use without a monthly subscription? I'd rather pay only based on my usage (I don't use it every month).

    For my use case, I'd need access to a wide variety of languages, and ideally 5+ voices per language. I'm currently using Amazon Polly, but I wonder if there's something better now.

    reply

    dqv

     

    11 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@BrunoJo> Ah, I'm always looking for new ones, but it doesn't look like it supports SSML. Most engines have trouble with things like postal codes, names, and other implicit linguistic rules. Take the example

    > Melania Trump's zip code is 20001.

    It says "Melaynia Trump's zip code is twenty-thousand one". With SSML, you can tell the engine the correct pronunciation and to say a string of numbers digit-by-digit. Spelling proper nouns differently to trick it into pronouncing it correctly works until it doesn't.

    Being able to tell it to pronounce "Melania" like [ˌməˈlɑːn.jə] or [%m@"lA:n.j@] and tweak other aspects of the synthesis with SSML is, in my opinion, an important part of a commercial speech synthesis offering.

    I wonder how much effort is needed to make these engines work with SSML. Kokoro+SSML would be awesome.

    reply

    bsenftner

     

    9 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@BrunoJo> Hey BrunoJo, I'd like to learn more about lemonfox.ai, but there does not seem to be information such as "about us" links. Your service looks worth investigating.

    reply

    hobo_mark

     

    11 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@BrunoJo> I wish Kokoro supported SSML... Is there a way to explicitly emphasize parts of the text?

    reply
  • modeless

     

    17 hours ago

    prev

    next

    [ - ]

    Why would you pirate a TTS service when there are so many great options for local open source TTS now? Models like Fish and Kokoro and StyleTTSv2 are great and very fast.

    Click the leaderboard tab here: https://huggingface.co/spaces/TTS-AGI/TTS-Arena

    reply

    itake

     

    16 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@modeless> The models you shared only support the top ~10 languages / english only.

    I believe the Edge API supports more models:

    https://gist.github.com/BettyJJ/17cbaa1de96235a7f5773b8690a2...

    Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/

    reply

    archerx

     

    15 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@itake> I have been experimenting with piper TTS recently, it's free, open source, fast and has a lot of voices in different languages but the quality is not the best but it's still good enough for most cases.

    https://rhasspy.github.io/piper-samples/

    reply

    magicalhippo

     

    14 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@archerx> For my native language, Norwegian, Piper TTS is at best "usable", and sometimes a fair bit worse than that. At least in its default form[1].

    Especially the rhythm and timing is often very jarring making words difficult to understand, especially when the pitch is not quite right.

    It also doesn't seem to know about pacing, ignoring semicolon and comma.

    Combined I often need to think hard about what it just said, or even listen to it again.

    I also notice these issues in the various English voice models to varying degrees, so seems to be an inherent problem. Or can it be improved significantly with training it yourself?

    [1]: https://rhasspy.github.io/piper-samples/

    reply

    archerx

     

    13 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@magicalhippo> I don’t know about Norwegian but I wonder if the issues are due to the training data.

    I’m sure it’s possible to train new voices.

    The English voices are hit or miss, but some voices have up to 900 speakers so it should be able to find a nice voice in the hay stack.

    The thing I like about piper is it is so fast. I set it up to stream the output to VLC and it starts speaking in less than a second even on my laptop.

    I wish it could have eleven labs quality but right now the speed is the most important factor for what I’m doing with it.

    reply

    magicalhippo

     

    12 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@archerx> I saw that the piper-phonemize project linked to espeak-ng, and so I tried to pass the Piper sample text through espeak-ng and the way it phonemicized the text had the same rhythm issues that I noted in the TTS sample. Ie it put the stresses in the same wrong places in certain words and such.

    This was also reflected in the voice output of espeak-ng, even though it's overall quality was vastly subpar compared to Piper TTS (as expected).

    So it seems that improving this aspect might be one way to get better performance out of Piper for my language. Not sure how easy that'll be tho...

    reply

    rolfus

     

    12 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@magicalhippo> What TTS model has given the best results for you (for Norwegian)? I've tried MS Azure and it's pretty good, but not flawless.

    reply

    magicalhippo

     

    8 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@rolfus> I haven't found any open source that come close to the commercial offerings, though I admin I haven't tried 'em all.

    Azure like you say is pretty decent, Google does an ok enough job but not as good.

    reply

    deadprogram

     

    14 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@archerx> I also have used Piper and agree it is worth trying out.

    reply

    lupusreal

     

    4 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@archerx> Piper is superb for my needs. Runs extremely fast on CPU (so fast it can run in real time on a raspi) so it's perfect for use on laptops without dedicated GPUs. Subjectively, I'd say the quality is about on par with where MacOS's TTS was about 10 years ago, which is extremely usable.

    reply

    willwade

     

    11 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@itake> https://ttsvoicesavailable.streamlit.app

    Acapela, Nuance - but its around 75 languages.

    reply

    itake

     

    4 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@willwade> I really want southeast Asian languages (thai, laos, etc). seems only MS supports those.

    reply

    depr

     

    2 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@willwade> Isn't that Nuance product EOL?

    reply

    modeless

     

    15 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@itake> I don't know, but the Edge API is not licensed for any use, commercial or otherwise (outside of Edge itself).

    reply

    userbinator

     

    17 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@modeless> "pirate"? This was always free.

    reply

    modeless

     

    16 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@userbinator> The API endpoint was clearly intended for use only by Edge. Yes, reverse engineering the authentication (even if trivial) and using it for other applications, knowing that was not its intended use, I consider a form of piracy.

    reply

    itake

     

    13 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@modeless> I'm not really sure how this is any different from a web crawler? I guess the issue would be republishing the content is bad.

    But I thought the LinkedIn lawsuit settled that crawlers are ok, as long as you're not republishing the content?

    reply

    userbinator

     

    16 hours ago

    root

    parent

    prev

    next

    [ - ]

    [ x ]

    <@modeless> That is a very hazardous slope to go down. We are already seeing user-agent discrimination and this is no different than using Bing from a browser that isn't Edge.

    reply

    TOMDM

     

    16 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@userbinator> If Bing wasn't a public website and only accessable through the windows Search bar/Edge without reverse engineering the API I'd agree with you.

    Comparing an API that typically requires a key and a public website is absurd.

    reply

    userbinator

     

    15 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@TOMDM> It's still publicly accessible.

    reply

    natebc

     

    7 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@modeless> Is Kokoro open source? I couldn't find it's source anywhere.

    reply

    noja

     

    9 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@modeless> Typing anything with “r” into that text to speech box gives a random sentence instead

    reply
  • hexage1814

     

    14 hours ago

    prev

    next

    [ - ]

    Have been using this for some time. It is pretty good. But not as good as ElevenLabs though.

    Also, ironically enough, ElevenLabs lunched a readerapp for iOS and Android, which allows you to text to speech for "free" in some limited voice selections, but the app is not available for PC or as browser extension. So like "we give you unlimited tts but only if you use your smartphone"

    reply
  • chopete3

     

    17 hours ago

    prev

    next

    [ - ]

    Its not running on the edge. A hack to use MS online tts.

    >> edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.

    reply

    nejsjsjsbsb

     

    13 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@chopete3> I read it first as Edge TTL!

    reply

    wiradikusuma

     

    16 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@chopete3> Edge = Microsoft Edge, a browser

    reply

    croes

     

    16 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@wiradikusuma> I guess parent wanted just to clarify that it’s using Edge not running on the edge.

    reply
  • dcre

     

    15 hours ago

    prev

    next

    [ - ]

    Not sure if the CLI does this directly, but here's a command that takes text either as an arg or through stdin.

        function tts() {
          if [ -p /dev/stdin ]; then
            edge-playback --file -
          else
            edge-playback --text "$*"
          fi
        }

    reply
  • slyn

     

    16 hours ago

    prev

    next

    [ - ]

    I like to use Edge on occasion when I need to read something dry but necessary because I find following along with the TTS it’s auto-highlight of text helps me stay focused and retain better as well.

    Is there any equivalent program for ebooks? If not can someone build one? The dream would be to plop in an arbitrary document (pdf, docs, tex, epub, and so on) and have it read to me by a reasonable TTS at a speed of my choosing and have words / lines highlighted as the TTS goes along. Bonus points if you can regularly identify and skip things that are not necessarily relevant like page numbers, headers, footnote markers, and so on, which is something that Edge TTS within Edge struggles with when reading PDFs.

    reply

    FireInsight

     

    14 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@slyn> I've been using https://readest.com/ lately. It's FOSS and just recently got this feature. The TTS voices are pretty natural and text is highlighted one sentence at a time. Plus the design of the product is great.

    reply

    visarga

     

    14 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@slyn> https://www.naturalreaders.com/, is has a free tier I think

    reply

    laurentlb

     

    10 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@visarga> If anyone else wonders, naturalreaders provides no API.

    reply

    lf-non

     

    15 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@slyn> The ReadEra app for android supports this, and I use it for reading/listening to ebooks during commute. It works well.

    reply

    jahsome

     

    14 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@slyn> Calibre does this.

    reply

    tomr75

     

    10 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@jahsome> can you use TTS models?

    reply

    gostsamo

     

    15 hours ago

    parent

    prev

    next

    [ - ]

    [ x ]

    <@slyn> You can use a screen reader. Most of them have a focus highlight feature and use local tts.

    reply
  • slig

     

    6 hours ago

    prev

    next

    [ - ]

    I'd like the equivalent of "say" from macOS on my W11/WSL2 machine, is there anything entirely offline that just works?

    reply
  • VMtest

     

    8 hours ago

    prev

    next

    [ - ]

    Thanks for sharing this, I learnt that Edge on mobile has TTS as well but I have never used it on desktop or mobile

    Now that I try it on desktop, it's really good! I might try to use the python script in the future

    reply
  • caseyy

     

    16 hours ago

    prev

    next

    [ - ]

    So is this entirely offline? If so, it could have quite many useful applications, if not for copyleft of course.

    reply

    userbinator

     

    16 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@caseyy> Entirely online.

    reply

    caseyy

     

    16 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@userbinator> Ah, thanks.

    reply
  • RobinHirst11

     

    12 hours ago

    prev

    next

    [ - ]

    used this for ages. i have my raspberry pi setup with Cloudflare tunnels to rout to my domain... extremely useful :)

    reply
  • gigel82

     

    15 hours ago

    prev

    next

    [ - ]

    This is dubious, I'm surprised MS hasn't locked down those APIs yet.

    I'm curious, would this be the legal equivalent of "cracked" software in terms of piracy?

    reply

    rany_

     

    13 hours ago

    parent

    next

    [ - ]

    [ x ]

    <@gigel82> They have locked down these APIs slightly but it's not a very complex "DRM" mechanism: https://github.com/rany2/edge-tts/blob/08b10b931db3f788a506c...

    reply

    ale42

     

    12 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@rany_> Sure, but if everybody starts (ab)using this they'll change it again with something more complex. Or they will restrict it, like leaving it usable only by users logged in on their MS account.

    reply

    bangaladore

     

    6 hours ago

    root

    parent

    next

    [ - ]

    [ x ]

    <@ale42> In reality, you should be more worried about a DMCA claim or cease and desist.

    Microsoft cannot move fast enough to present any real concern to someone who is dedicated.

    The Microsoft login seems more reasonable, at that point they can filter out bad actors presumably.

    reply
  • yapyap

     

    4 hours ago

    prev

    [ - ]

    Do what now to TTS

    reply