Showing posts with label business intelligence. Show all posts
Showing posts with label business intelligence. Show all posts

Tuesday, August 3, 2010

Extracting "gems" from web pages using BeautifulSoup and Python - BI part

Lately, I am trying to find a way to answer this question: what's the best local restaurants or spas? So we are trying to figure out which local businesses will worth the most to go after.

Ok, that could be a supervised or unsupervised learning problem. If I can find some data that indicates brand awareness, and use some metrics as predictors, then it's a supervised problem. Well, if I cannot find the indicator kind of response, then I have to figure out another way to sort of build that index using whatever predictive metrics I can find online and for free.

The first thing came to my mind was yelp.com, which is a relatively comprehensive rating website on local businesses. Naturally what I want to do is to crawl that website and get some useful information out, like location of business (in order to calculate distance of that business to a particular user), number of ratings (one indicator of brand awareness), rating itself (brand quality), etc.

Secondly, I want to see if I can get some twitter data. If I happened to know the ip of the twitter user, I can figure out his/her location and just see how many tweets a local business can have. Similarly, google analytics might be helpful too, but I am not sure.