Taking a closer look, the Bing search results weren’t www.bing.com/search URLs, which are correctly blocked by Bing’s robots.txt file. They were coming from www.bing.com/entities/search. This pattern is not blocked, which is how the related URLs ended up indexed by Google. As for why those URLs are no longer indexed? Google may have noticed and pulled them.
But what are these /entities URLs? They seem to be a hybrid of map results and search results. Take a look, for instance, at this Bing search for [cable television seattle].
The first few listings (after the ads) are web results, with a map on the right. The link to “cable television” (circled above) is to an /entities page.
Scrolling down below the fold are local listings and a link to “see all business listings”, also a link to the /entities page, followed by more web results.
That /entities pages is slightly different from the regular web search results (a larger map, more local listings, web search results above or below the business listings, and yet not exactly like the Bing Maps page. A “Local” tab is highlighted (which isn’t an available tab in the regular web search or Maps search).
For reference, here are the web results at the bottom of the page (missing from the regular maps results page).
Arguably, these pages are basically Bing search results, which Google doesn’t want to index. As Google notes in their webmaster guidelines:
“Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.”
Google wants to send searchers to an answer, not to more search results (for that matter, that’s what Bing wants to do too). Google (and not to make it weird or anything, but by Google I mean, in part, me) started talking about this back in 2007. In this case, Bing hasn’t yet added /entities to their robots.txt file, but Google appears to have removed the pages (and fairly quickly; yesterday, over 30,000 URLs were indexed). Google has noted before that they may remove these types of pages from their index if the pages don’t provide additional value beyond the aggregation of listings.
How do you add value to search results pages? Give the user a reason to visit that page first. Do the Bing pages do that? The yellowpages.com listing just above where the www.bing.com/entities result was is still there. Isn’t it a search results page too?
It’s hard to say. Both pages include data beyond the web listings, including address, phone number, and ratings.
The question of how to add value to these types of pages is an ongoing challenge and it’s clearly a work in progress for search engines too.