Here’s one for the linguistics enthusiasts out there!
A treebank, according to Wiki…
…is a text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank. Treebanks can be used in corpus linguistics for studying syntactic phenomena or in computational linguistics for training or testing parsers.
Simple, right?
Seriously though, this is cool stuff. I have sort of a peripheral interest in social linguistics and computational linguistics, and though I don’t have much direct use for these tools, I recognize that having a corpus and a treebank for a given language opens up a lot of doors. Say you want to record a conversation, then analyze it syntactically; a treebank would allow you to feed a transcription of the recording into a parser.
Ok, ok this is pretty esoteric stuff admittedly. What does this have to do with Portuguese? Good question…
I found this Portuguese Treebank and wanted to share it with you, meus caros leitores.