MT quality – Automatic metrics or manual evaluation?
One method is the bleu score: bilingual evaluation understudy. Here MT output is compared to human translation: the higher the similarity between MT and the human version is, the higher the translation quality is considered.
However, automatic evaluation ignores the difference among various styles and terminological choices.
Done by asking humans sometimes through crowd-sourcing platforms.
In this case people were asked through the Amazon Mechanical Turk to asses if the translations were adequate and fluent in their native language.
The research is going on in both directions. Automatic and manual MT evaluation are still considered complementary methods.
You can read the full article here.