@ninjawedding @dogjaw @kitredgrave based on the number of my own posts getting tagged not english im gonna bet the answer is "yes"
@animeirl @dogjaw @KitRedgrave I'm much more concerned about the Japanese instances.  Look at @ nullkal@mstdn.jp's Atom feed, for example...there's a lot of misclassifications.  If an admin there selects "show only Japanese" it doesn't seem like it'll end well.

There was a change pushed to master which strips URLs from the text-under-consideration and that *might* help, but not all of these have URLs in them.

Perhaps Mastodon can switch to a better language detector at some point.  The Chrome library, for example, seems decent -- but it's a pain in the ass to extract.

@ninjawedding @animeirl i'm honestly against the entire idea, because machines are not great at this task

Kaito / Katie Sinclaire @KS

@KitRedgrave @ninjawedding @animeirl And it assumes posts won't contain multiple languages worth of text, which is also not necessarily correct...

I can immediately think of JP/EN mixed *tweets* as an example. The higher character limit just makes it more likely to happen.

· Web · 0 · 1