Live Crawler
We track multiple social networking sites including Flickr, Foursquare, Instagram, Panoramio, Tecent Weibo, Sina Weibo, Twitter, Youtube, Amazon, Dianping, Fantong, as well as some forum and blog sites. It provides the best real-time coverage of multi-modality UGC such as text posts, user comments, images, videos, user profiles and user relations. In order to ensure continual real-time crawling, we build a set of live robust crawlers that works well across different platforms, channels, and is easy to maintain and extend. The crawlers are made intelligent and robust by supporting IP proxy, heuristically crawling, noise filtering, exception handling, as well as multiple threads and distributed crawling.
People Sense
People sense focuses on celebrities who have influenced our society by providing ranked profiles available on the web. People can learn about celebrities better through browsing word cloud, hot images and hot tweets.
Topics Sense
Topic sense will help people better understand the topics of interests in society, their sentiments, evolving live events and social communities, as well as the social trends and user habits.
As a part of the Live Observatory, NExT builds robust live crawlers to monitor and crawl data from several social media sites including Sina Weibo, Tencent Weibo, Twitter, Instagram, Foursquare, etc.. The User Generated Contents (UGCs) we crawled include text, image, location-oriented UGC, as well as user, annotation and other structured information. For legal reasons, the data can only be used internally for research purposes.