Online Text Mining Tools & WebServices

Currently, many projects in GeoVista center  (at Pennsylvania State University) deal with unstructured textual data and require parsing of space, time, and entitiy information from textual document. Trying to compare one of our text-mining tool with other projects, I stumbled across many freely available text-mining tools. However, comparing them directly wouldn’t have made sense as they offer different levels of functionality. Hence, I have classified text-mining approaches into four categories and the table below list some of the text-mining tools that are freely available online.

Classification of Text-mining approaches:

1. Keyword Extractors – Traditionally, text-mining tools mainly involve determining important keywords in a document. This is done by creating a “term vector matrix” and assigning certain score to each word. This approach forms the core of any search engine (checkout the table below).

2. Entity Extractor – Current text-mining tools go beyond identifying terms but they also try to classify these terms into basic categories such as person, orgnaization, city, region, money, etc. Such text-mining tools are often referred as “entity-extraction tools” (checkout the table below).

3. Entity Relation extractors: The objective here is not only to find entities mentioned in the document, but also how they are related to each other. I wasn’t able to find any freely available online tools that do this, but I am aware that some PennState researchers are working on this.

4. Document Relation Extractors: The objective here is to go beyound the limits of a single document and identify common themes between different documents and how they related to each other. I haven’t seen any tool that currently provide such feature.

List of text-mining tools

Organization Web Service Online Tool Type (based on above categories) Freeware Comments
Yahoo Yes NO Keyword Yes  
NaCTem Yes Yes Keyword Yes  
ClearForest SWS Yes Yes Entity Yes They also provide Java Desktop client and Firefox Add
Translated Labs No Yes Keyword Yes  
TermeExtractor No Yes Keyword Yes To use full version you need to create a login. Also the tool only works in Firefox.
Whatizit Yes Yes Keyword/Entity Yes Whatizit has interesting concept of pipeline which allows you to select a vocabulary

Based on my personal evaluation, I felt ClearForest SWS does a pretty good job of entity extraction. It was able to find people, organizations, cities, regions, country. Further it offers its technology and tools in various formats such as firefox addon, desktop java application, webservice, and an online tool. Below is an image of clearForest tool as a firefox-addon.

Clear Forest Firefox Addon

Enjoy

About Ritesh Agrawal

I am a applied researcher who enjoys anything related to statistics, large data analysis, data mining, machine learning and data visualization.
This entry was posted in Geo-Locations, Text Mining, Web Service and tagged . Bookmark the permalink.

18 Responses to Online Text Mining Tools & WebServices

  1. NelSenso says:

    Hy!
    On http://www.nelsenso.it you can find some web tools around:
    – Text Mining
    – Text Summarization
    – Text Classification
    – Information Retrieval

    Languages supported: Italian and English.
    (free registration required)

    Have fun!

  2. Ritesh says:

    hi nelsenso,

    thanks for the link. I tried using summarization tool and its really interesting. Is there any SOAP webservice that I can use. I am building a tool that compares different text mining algorithm. It wil be nice to have Text Summarization over there.

    Regards
    ritesh

  3. NelSenso says:

    Hy Ritesh,
    thanks for reply, I hope in the next future to have time to develop a SOAP webservice for NelSenso Text Mining Web Tools.

    Stay Tuned!

  4. J says:

    A program that can perform Document Relation Extraction, once documents have been preprocessed into XML by an included component is Starlight. See links for more information.

    http://starlight.pnl.gov/

    http://www.futurepointsystems.com/

  5. arifin says:

    What kind of algorithm are consist of text mining? I have a problem looking for them algorithm? Thanks before

  6. ragrawal says:

    Hi arifin,

    There are many different algorithms for text mining. I am not an expert of text mining but here are some points for you. Search for “Natural language Processing algorithms” and ‘latent semantic analysis’.

  7. Ihtisham says:

    Very useful information. Thanks Guys!

    CEO
    Innovative Consulting

  8. Hemnawajeni says:

    Lots of folks blog about this matter but you said really true words.

  9. NelSenso.it says:

    Hy ragrawal!
    Finally It’s available the Nelsenso SOAP webservice that you can use: http://www.nelsenso.it/nelsensowebservice.asmx

  10. aflam maroc says:

    This post is great. thank you for sharing these helpful infos. I appreciate your work man

  11. Xavier Teruel says:

    Ritesh, et al,
    Great work on putting this together.

    I am trying to do something that does not seem to match what you have described.

    I have a CSV text file with millions of rows. Each represents parts of an user action (has user name and date/time plus other data related to user action). Each user action consists of several rows in the text file. For example, a user entering a medication order will result in multiple rows in the text file (the rows will be close together in the file, but will not always appear as back to back rows in the file). We need to find the most common group of rows/records in the file so we know what are the most common user actions.

    I was wondering if there is a name for this type of search and of any available tools (web-based preferably) that could assist.

    Many thanks,

    Xavier

    • ragrawal says:

      Hi Xavier,

      This seems to very specific problem and not related to text mining. This seems to be more of an aggregation problem. It’s difficult to say anything without completely understanding the data.

  12. Joanna says:

    Really thanks for ur information and all the comments. It is really helpful for my FYP.

  13. mfadel says:

    Hello Agrawal,
    I guess that the tool and the information could be updated, the tool is not working?

  14. trupthi says:

    hello agarwal,
    i am new to the area of text mininh can pls help how to go about

  15. rajdeo says:

    Is there any way to extract phase level relation

  16. lavanya says:

    Hi…i’m doing my academic project on text mining.. we are using Pattern taxonomy model in which how to divide the documents into positive and negative documents..pls give some ideas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s