language technology

MT quality – Automatic metrics or manual evaluation?

Automatic evaluation
One method is the bleu score: bilingual evaluation understudy. Here MT output is compared to human translation: the higher the similarity between MT and the human version is, the higher the translation quality is considered.
However, automatic evaluation ignores the difference among various styles and terminological choices.

Manual evaluation
Done by asking humans sometimes through crowd-sourcing platforms.
In this case people were asked through the Amazon Mechanical Turk to asses if the translations were adequate and fluent in their native language.

The research is going on in both directions. Automatic and manual MT evaluation are still considered complementary methods.

You can read the full article here.



2 thoughts on “MT quality – Automatic metrics or manual evaluation?”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s