Projects / Data mining from internet - Personal Wiki of Subhrajit Bhattacharya

Database & Data management applications >>
Data mining from internet
Dec 31, 1969

This project is currently dormant.

As we all realize, the internet is the biggest resource of data. However most of it is scattered in different places, not well-organized, difficult for a computer to understand and often unreliable. The aim of this project is to systematically extract data from this mess such that it can be readily used by computer codes. The project is just at its initial stage.

The main sections in the project may be summarized as follows:

Extraction of useful data from web-pages
Caching the data on multiple servers in a distributed manner and regularly updating the data
Client-side searchability of the data
Client-server communication
Establishment of a distributed data relationship graph so as to, a) make data better searchable, b) enable AI programs to understand the data

Research Projects

You are viewing this site in administrative mode. Click here to view in regular mode. Personal Wiki of Subhrajit Bhattacharya Projects / Data mining from internet
Home Projects & Research Teaching & Articles Media & Downloads About
Publications, Talks and Theses Publications Theses and Reports Presentations Video of Presentations Project Highlight Slides Engineering, Science and Mathematics Mathematics Robotics, AI and Automation Dynamics and Fluid Mechanics Highlights of Current Robotics and AI projects Highlights from NSF CAREER project Software, Libraries and Information Technology Programming Libraries Web-Apps Graphics Programs Database & Data management applications Other Projects Community-maintained projects SITE MAP	Short URL for this page:[close] Press Ctrl+C to copy Database & Data management applications >> Data mining from internet Dec 31, 1969 *This project is currently dormant.* As we all realize, the internet is the biggest resource of data. However most of it is scattered in different places, not well-organized, difficult for a computer to understand and often unreliable. The aim of this project is to systematically extract data from this mess such that it can be readily used by computer codes. The project is just at its initial stage. The main sections in the project may be summarized as follows: Extraction of useful data from web-pages Caching the data on multiple servers in a distributed manner and regularly updating the data Client-side searchability of the data Client-server communication Establishment of a distributed data relationship graph so as to, a) make data better searchable, b) enable AI programs to understand the data Research Projects
	Page last modified on April 24, 2011, at 08:25 PM EST. (cc) Subhrajit Bhattacharya