language technology

WebCorp – web as corpus – now it’s easy to query!

Corpora are traditionally clean and controlled data but sometimes you don’t have the chance to monitor different linguistic behaviors and “there will always be aspects of the language which are too rare or too new to be evidenced” (

WebCorp was created and is maintained by the Research and Development Unit for English Studies in the School of English at the Birmingham City University.

Why using WebCorp instead of an ordinary search engine?
WebCorp offers a better result analysis as it contains filtering options specifically designed for linguistic research.
In response to a query, standard search engines return a list of page addresses, together with a description of the site or some text taken from each web page. WebCorp actually checks each one of these pages and extracts concordance lines from them. Although some search engines, such as Google, give Key Word in Context style outputs for some URLs in the results list, but this does not happen for all URLs and not for all instances.

The engine is a bit slow, please wait ten seconds before leaving the page. This kind of analysis makes the delay acceptable.

Follow the link to try WebCorp


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s