Categories
Code

Web Scraping Flipkart Prices with PHP

The India Price Tracker tool uses web scraping to extract the prices of products listed on the Flipkart website. It uses the CURL library of PHP to fetch the HTML DOM of a Flipkart page and then uses regular expressions to extract the price and product image from the Meta tags.

Make sure you specify a User Agent string with your CURL request else the Flipkart server will reject your request.

]*>([^<]*)<\/h1>/';
	preg_match($regex, $html, $title);

	$regex = '/data-src="([^"]*)"/i';
	preg_match($regex, $html, $image);

	if ($price && $title && $image) {
		$response = array("price" => "Rs. $price[1].00", "image" => $image[1], "title" => $title[1], "status" => "200");
	} else {
		$response = array("status" => "404", "error" => "We could not find the product details on Flipkart $url");
	}

	return $response;
}
Categories
Code

Flipkart Price API with Google Apps Script

Flipkart, the popular shopping website in India that sells everything from erasers to televisions, offers no API and therefore if you were to extract the pricing information of any Flipkart product, screen scraping is the only alternative.

Flipkart stores the pricing data inside <meta> tags with “itemprop” set to “price” and it is thus relatively easy to pull this information for the price tracker tool.

Here’s the Google Apps Script code that extracts the price details, item title and the thumbnail image given the URL of the product page using Regular Expressions. You can easily use this in combination with HTMLService  to create an API that returns pricing data for Flipkart product as JSON or XML.

 
 
function priceFlipkart(url) {

  if (url !== "") {

    try {

      /* Extract the HTML source of the Flipkart Page */
      var page = UrlFetchApp.fetch(url).getContentText();

      /* Regular Expression to extract Price from the META tag */
      var regex = /<meta[^>]*itemprop\s*=\s*"price"\s*content\s*=\s*"([^"]*)"/gi;

      if ((price = regex.exec(page)) !== null) {

        regex = /<meta[^>]*name\s*=\s*"og_title".*content\s*=\s*"([^"]*)/gi;           
        title = regex.exec(page);

        /* We are using Canonical URL as it containes no tracking parameters */
        regex = /<meta[^>]*name\s*=\s*"og_url".*content\s*=\s*"([^"]*)/gi;           
        canonical = regex.exec(page);

        /* The thumbnail image of the Flipkart Product */
        regex = /<meta[^>]*name\s*=\s*"og_image".*content\s*=\s*"([^"]*)/gi;           
        image = regex.exec(page);

        if (title && canonical && image) {                
          Logger.log(title[1] + "|" + image[1] + "|" + price[1]);
        } else {
          Logger.log("Could not fetch " + url);
        }          
      }        
    } catch (e) {        
      Logger.log("Flipkart Error: " + e.toString());
    }     
  }
}