Common data leakage websites:
UCI: The classic data set of machine learning and data mining, including multiple data sets under classification, clustering and regression problems. It is classic and ancient, but it is still active in the sight of researchers.
National data: The data comes from the National Statistics Bureau of People's Republic of China (PRC) and China, which includes the data of China's economy and people's livelihood, covering all aspects of the month, quarter and year. Both comprehensive and authoritative.
Amazon: A cross-scientific cloud data platform from Amazon, including data sets in chemical, biological, economic and other fields.
Figshare: a platform for sharing research results, where you can find the research results of Daniel from all over the world to share and obtain research data.
Github: A very comprehensive data acquisition channel, including database resources in various sub-fields, covering natural science and social science in an all-round way, suitable for researchers and data analysts.
Second, using reptiles can get valuable data.
Here are some website platforms. We can use crawlers to capture the data on the website. Some websites also provide API interfaces to obtain the data, but they need to pay.
1. Financial data, 2. Online loan data; 3. Annual report of the company; 4. Risk capital data; 5. Social platform; 6. Employment recruitment; 7. Catering food; 8. Transport and tourism; 9. E-commerce platform; 10. Video data; 1 1. House information; 12. Rent a car; 13. New media data; 14. Classification information.
Third, the data trading platform.
Because there is a great demand for data now, many data trading platforms have also been born. Of course, there are also a lot of free data on these platforms for the data you paid for.
Youyi Data: initiated by the National Information Center, it is a data platform with national information resources and a leading data trading platform in China. The platform has B2B and B2C trading modes, including data resources in government affairs, society, socialization, education, consumption, transportation, energy, finance, health and other fields.
Data Hall: It focuses on Internet comprehensive data transaction, and provides data transaction, processing and data API services, including data in the fields of voice recognition, medical health, traffic geography, e-commerce, social networking, image recognition, etc.
Fourth, the network index.
Baidu Index: an index query platform, which can check the attention of a topic in various time periods according to the changes of the index, and has a good guiding role in trend analysis and public opinion prediction. In addition to paying attention to trends, there are tools for accurate analysis such as demand analysis and crowd portraits, which are of great reference significance to market research. Similarly, two other search engines, sogou and 360, have similar products, which can be used as a reference.
Ali Index: an authoritative commodity transaction analysis tool in China, which can view commodity search and transaction data by region and industry. Based on the transaction data of Taobao, Tmall and 1688 platforms, we can basically see the general situation of domestic commodity transactions, which is of great significance for trend analysis and industry observation.
Youmeng Index: Youmeng has comprehensive statistics and analysis in the statistics and analysis of mobile Internet application data, which is very helpful for learning mobile products, doing market research and analyzing user behavior. In addition to the Union League Index, Union League's Internet report is also an excellent reading for understanding Internet trends.
Verb (abbreviation for verb) network collector
Network collector is a simple and quick way to collect the distributed content on the network through software. It has a good content collection function and does not need technical cost, and is used as a primary collection tool by many users.
Fortune: A New Generation of intelligent cloud Reptiles. The fastest crawler tool is 9 times faster than other similar products. With tens of millions of IP, you can easily initiate countless requests, and the data is stored in the cloud, which is safe, convenient, simple and fast.
Train Collector: a professional software for crawling, processing, analyzing and mining Internet data, which can quickly and flexibly crawl scattered data information on web pages.
Octopus: a simple and practical collector with complete functions, simple operation and no need to write rules. Unique cloud collection, shutdown can also run the collection task on the cloud server.