Data mining from internet
Dec 31, 1969
This project is currently dormant.
As we all realize, the internet is the biggest resource of data. However most of it is scattered in different places, not well-organized, difficult for a computer to understand and often unreliable. The aim of this project is to systematically extract data from this mess such that it can be readily used by computer codes. The project is just at its initial stage.
The main sections in the project may be summarized as follows:
- Extraction of useful data from web-pages
- Caching the data on multiple servers in a distributed manner and regularly updating the data
- Client-side searchability of the data
- Client-server communication
- Establishment of a distributed data relationship graph so as to, a) make data better searchable, b) enable AI programs to understand the data