News

This way you can determine popularity of each word. After you've fed the program tons of files (you can find whole books in .txt format online) you should have a pretty good list.
You can use the Java JDOM classes to read the XML data including the element's attribute property. You use these properties to determine the different types of XML nodes you have in your folder.