Script to check how many pages from a site is indexed in Bing.com

OK, this is part of what I’m doing to fix my old script (php pagerank checker and sh*t). I noticed that my Bing.com indexed page checker and Bing.com bot last access checker did not work anymore. It was because I’m not using Bing.com API at all to get the data. Instead, I do ┬ásome simple scraping on Bing.com search result page. (So, I’m not calling any Azure Datamarket API here)

Without more a do, this is the full PHP script code:

//helper function
function between($string, $start, $end)
    {
    $string=" " . $string;
    $ini   =strpos($string, $start);

    if ($ini == 0)
        return "";

    $ini+=strlen($start);
    $len=strpos($string, $end, $ini) - $ini;
    return substr($string, $ini, $len);
    }

//another helper function
function file_get_contents_curl($url, $referer="", $ua="Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Ubuntu/10.04 Chromium/7.0.514.0 Chrome/7.0.514.0 Safari/534.7")
    {
    $ch=curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //Set curl to return the data instead of printing it to the browser.
    if ($referer!="") {
        curl_setopt($ch, CURLOPT_REFERER, $referer);
    } else {
        curl_setopt($ch, CURLOPT_REFERER, $url);
    }
    //curl_setopt($ch, CURLOPT_URL, $url);
    if ($ua!="") {
		curl_setopt($ch, CURLOPT_USERAGENT, $ua);
	} else {
		curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
	}

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $data=curl_exec($ch);
    curl_close ($ch);

    return $data;
    }

//this is the main function
function msn_indexed($uri, $badge = 0)
    {
    $uri =trim(str_ireplace('http://', '', $uri));
    $uri =trim(str_ireplace('http', '', $uri));
    $url ='http://www.bing.com/search?q=site%3A' .urlencode( $uri).'&go=&qs=n&sk=&form=QBLH&mkt=en-WW';
    $data=file_get_contents_curl($url);
    
    if (strpos($data, 'sb_count')!==FALSE) {
	return (integer)str_replace(",", "", trim(between($data, '', 'result')));
    } else {
	return 0;
    }
}

No fancy and advanced code there, just simple cut and grab. You might consider using regex when parsing the search results from Bing.com

How to use it

$dx=msn_indexed("www.ahowto.net");

The result would be in integer (0 if Bing.com hasn’t indexed any of your site’s pages).

Fully working demo can be seen here: http://www.vrank.org on “Bing.com Indexed” part.

bing indexed page in search result

this is how you check how many your site’s pages are indexed by Bing.com

As you can see that the result may vary depending on your location (or where you put the script) and sometimes, Bing.com gave invalid result (such as 0, where the real value might be higher than that)

  1. Nice script
    Thank you..

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackbacks and Pingbacks: