OK, this is part of what I’m doing to fix my old script (php pagerank checker and sh*t). I noticed that my Bing.com indexed page checker and Bing.com bot last access checker did not work anymore. It was because I’m not using Bing.com API at all to get the data. Instead, I do some simple scraping on Bing.com search result page. (So, I’m not calling any Azure Datamarket API here)
Without more a do, this is the full PHP script code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | //helper function function between($string, $start, $end) { $string=" " . $string; $ini =strpos($string, $start); if ($ini == 0) return ""; $ini+=strlen($start); $len=strpos($string, $end, $ini) - $ini; return substr($string, $ini, $len); } //another helper function function file_get_contents_curl($url, $referer="", $ua="Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Ubuntu/10.04 Chromium/7.0.514.0 Chrome/7.0.514.0 Safari/534.7") { $ch=curl_init($url); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //Set curl to return the data instead of printing it to the browser. if ($referer!="") { curl_setopt($ch, CURLOPT_REFERER, $referer); } else { curl_setopt($ch, CURLOPT_REFERER, $url); } //curl_setopt($ch, CURLOPT_URL, $url); if ($ua!="") { curl_setopt($ch, CURLOPT_USERAGENT, $ua); } else { curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); } curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_TIMEOUT, 30); $data=curl_exec($ch); curl_close ($ch); return $data; } //this is the main function function msn_indexed($uri, $badge = 0) { $uri =trim(str_ireplace('http://', '', $uri)); $uri =trim(str_ireplace('http', '', $uri)); $url ='http://www.bing.com/search?q=site%3A' .urlencode( $uri).'&go=&qs=n&sk=&form=QBLH&mkt=en-WW'; $data=file_get_contents_curl($url); if (strpos($data, 'sb_count')!==FALSE) { return (integer)str_replace(",", "", trim(between($data, '<span class="sb_count" id="count">', 'result'))); } else { return 0; } } |
No fancy and advanced code there, just simple cut and grab. You might consider using regex when parsing the search results from Bing.com
How to use it
1 | $dx=msn_indexed("www.ahowto.net"); |
The result would be in integer (0 if Bing.com hasn’t indexed any of your site’s pages).
Fully working demo can be seen here: http://www.vrank.org on “Bing.com Indexed” part.
As you can see that the result may vary depending on your location (or where you put the script) and sometimes, Bing.com gave invalid result (such as 0, where the real value might be higher than that)
Nice script
Thank you..