Thursday, July 16, 2015

Exclude SharePoint 2013 Site Collection(s) from External Indexing (Google, Yahoo, etc) Host Named Site Collections (HNSC)

Many of public sites developed in SharePoint that are configured to allow anonymous access with restriction that should not be indexed by any search engines. We can prevent that by placing robots.txt file in the root of our SharePoint sites in question. (If you have Host Named Site Collections, you will want to add the robots.txt file at the root of each HNSC, as these are treated as separate sites (not just web app root).
There are multiple ways to perform this, one of which is the new SEO site collection feature in SharePoint 2013: http://blog.mastykarz.nl/search-engine-optimization-sharepoint-2013/. This method requires activating the Search Engine Optimization Feature on each site collection (Publish site collections already have this feature enabled), which can be daunting and/or accomplished through powershell as well.
Alternatively, you can perform this manually with the following powershell:
Create a Robots.txt file and enter the following in that file to exclude the whole site and its contents:
User-Agent: *
Disallow: /
Optionally, you may elect to exclude portions of the site from being indexed, if so, use the following format to achieve:
User-Agent: *
Disallow: /_Layouts/
Disallow: /SiteAssets/
Disallow: /Lists/
Disallow: /_catalogs/
Disallow: /WorkflowTasks/
After adding the content to exclude, save the robots.txt file and run the following powershell as administrator:
$file = [system.io.file]::ReadAllBytes("<robots.txt full path>");                                         
$siteToAddFile = Get-SPSite "<Site to add the robots.txt>";
$siteToAddFile.RootWeb.Files.Add("robots.txt", $file, $true);




Add Explicit Managed path to the robots.txt file
  1. Open Central Admin and navigate to Application Management > Define Managed Paths
  2. Select your specific web application from the drop down
  3. Add a new path for /robots.txt
  4. Switch the type to Explicit inclusion
  5. Click OK
  6. Run IISREST

No comments:

Post a Comment