@style(/css/Faq.CSS)
The ShopWiki Crawler
ShopWiki finds products using Web crawlers similar to other search engines. Other search engines that use crawlers are Google, Yahoo, Bing, Ask, and many others. It has become a standard of the web. This means we look into a Web site's domain for all robots.txt files first, which tell our crawlers which files it may search. All web sites have the ability to define what parts of their domain are off-limits to specific robot user agents. ShopWiki respects and obeys all robots.txt files.
Please note that we only update our copy of these files periodically. If you have recently blocked us from crawling or given us access to crawl your site, the results will not be immediate. For answers to any other questions or concerns, please email us at
crawler@shopwiki.com.
robots.txt
Web administrators should use the information below to update your sites robots.txt files.
Our current User Agent String is
ShopWiki/1.0 ( +http://www.shopwiki.com/w/Help:Bot )
'''If you would like us to not crawl your site''', please add this to your robots.txt:
User-agent: ShopWiki
Disallow: /
After you add this, you will no longer receive traffic from ShopWiki once we remove your products from our index. Some webmasters think this will enable them to get Goggle SERPs, if that is your objective, please contact us. It often is not the case. Please email James at jk@shopwiki.com for an explanation as to the dynamics of google from our testing
'''If you feel that we are crawling too fast''', please add this to your robots.txt:
User-agent: ShopWiki
Crawl-Delay: 5
This will slow our crawl to 1 page (at most) every 5 seconds. We recommend this for sites that have limited bandwidth. Although our crawler attempts to consume limited bandwidth through intelligent understanding of your website, you may still wish to be listed and reduce the impact of our crawler on your site.
'''If you would like to explicitly allow ShopWiki's crawlers on your site''', please add this to your robots.txt:
User-Agent: Shopwiki
Allow: /
For more information on robots.txt files, see
robotstxt.org or this
tutorial.
META Robots
Another method for controlling what robots can access is via the META robots directive.
You can add this to your pages so we, nor any other bot, do not index them, or follow links on them.
<meta name="robots" value="noindex,nofollow">
You can also conditionally add this. If you want to allow us and no one else to crawl your site, you can do something like this:
'''jsp'''
<%
if ( request.getHeader("User-Agent").indexOf("ShopWiki") < 0 )
%>
'''asp/c#'''
<%
if ( Request.UserAgent.indexOf("ShopWiki") < 0 )
%>
IP Range
'''If you wish to filter our crawlers by IP''', they are in the range 8.11.2.*
ShopWiki Partner Store Browser
[0 to 9] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M][N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z]