Automatic evaluation
One method is the bleu score: bilingual evaluation understudy. Here MT output is compared to human translation: the higher the similarity between MT and the human version is, the higher the translation quality is considered.
However, automatic evaluation ignores the difference among various styles and terminological choices.
Manual evaluation
Done by asking humans sometimes through crowd-sourcing platforms.
In this case people were asked through the Amazon Mechanical Turk to asses if the translations were adequate and fluent in their native language.
Conclusions
The research is going on in both directions. Automatic and manual MT evaluation are still considered complementary methods.
You can read the full article here.
Kirti Vashee • There is more discussion on this at :
http://kv-emptypages.blogspot.com/2010/03/problems-with-bleu-and-new-translation.html
http://kv-emptypages.blogspot.com/2012/01/short-guide-to-measuring-and-comparing.html
And comments on MT quality in terms of productivity implications
http://kv-emptypages.blogspot.com/2012/03/exploring-issues-related-to-post.html
Tucker Maney, Linda Sibert, Dennis Perzanowski, Kalyan Gupta and Astrid Schmidt-Nielsen (2012) Toward Determining the Comprehensibility of Machine Translations.
See http://wing.comp.nus.edu.sg/~antho/W/W12/#2200