Thursday, September 26, 2013

SharePoint 2010 Enterprise Search File Types Inclusion - not Exclusion

I just had a client which was looking to move their internet-facing search infrastructure from the expensive Google Search Appliance to SharePoint 2010 Enterprise Search. After creating/configuring the content sources, I launched a full crawl. Once the crawl completed, I noticed that my index contained ~66% of the content the GSA indexed.

As I dug into the GSA configuration, I came across a frustrating scenario. GSA has a list of file extensions to EXCLUDE from crawls...which was a very short list (jpg's, gif's, mov, avi, mp3, etc)....my problem was that SharePoint 2010 Enterprise Search's file type list is an Inclusion list, as opposed to an Exclusion list. The pre-populated list contains office and other common document extensions (~20) OOTB. This list helps with indexing collaboration documents and content without indexing lots of files which collaboration users would never require.

While that's a time saver for those implementing standard Enterprise Search in an intranet scenario, it is not a very good model for indexing external/public-facing content, (internet site, line of business applications/databases) for internet users, as many of these articles/content are spread across a wide range of file types.

For those of you out there who are screaming "my 2010 SharePoint Farm contains an exclusion list for crawled file types"... I bet you are running FAST Search as opposed to Enterprise Search... as FAST (F4SP) for SharePoint does indeed have the exclusion list and not the inclusion list.

After a bit of digging, I was able to find a post which detailed a solution for replacing Enterprise Search's Inclusion list, with an Exclusion list...(flipping the scenario upside-down and providing the same configuration as GSA).

Thanks to Allen Wang's and Venkat's posts: SharePoint 2010 Search File Type Include or Exclude
Thanks to Venkat's post for the PowerShell Solution: SharePoint 2010 Enterprise search to maintain Exclusion List for Crawled file Types Instead of Inclusion List

<Excerpted from above blog>

To flip the current Search Service Application to Maintain Exclusion File Types list instead of Inclusions list: (Run the below command in SharePoint PowerShell Console:)

  • Find your Search Admin Application's Application Class ID:
$sa = Get-SPServiceApplication | where { $_.ApplicationClassId -eq “52547a3d-66ed-468e-b00a-8c4a3ec7d404″ }


  • Set the Search Service Application to maintain Excluded File Types: (Run the below command in SharePoint PowerShell Console:)

$sa.SetIsExtensionIncludeList($sa.GetVersion(),0);

  • Stop and Start Search: (Run the below command in SharePoint PowerShell Console:)

net stop OSearch14
net start OSearch14

  • Remove the existing File Types: (Run the below command in SharePoint PowerShell Console:)
*Replace the “SSA” with the name of your Search Service Application*

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity “SSA
$content = New-Object Microsoft.Office.Server.Search.Administration.Content($ssa)
$extList = $content.ExtensionList
$list = New-Object System.Collections.ArrayList
foreach ($ext in $extList)
{
$list.Add($ext);
}
for ($i = 0; $i -lt $list.Count; $i++)
{
$ext = $list[$i]
$ext.FileExtension
$ext.Delete()
}

  • Run a full crawl on content source and you should now see all the pages are being crawled except the file types in exclusion list

Tuesday, September 24, 2013

Modifying File Types in SharePoint 2010 Enterprise Search Index via PowerShell

Powershell commands for File Types in SharePoint 2010 content index

Powershell commands for adding, deleting and getting the file types in the content index.

Add a new file type to the content index
$searchApplicationName = Get-SPEnterpriseSearchServiceApplication "<NAME OF YOUR SEARCH SERVICE APPLICATION>"
$searchApplicationName | New-SPEnterpriseSearchCrawlExtension "ascx"

List a particular file extension
$searchApplicationName = Get-SPEnterpriseSearchServiceApplication "<NAME OF YOUR SEARCH SERVICE APPLICATION>"
$searchApplicationName | Get-SPEnterpriseSearchCrawlExtension "ascx"

Display list of all file extensions
$searchApplicationName = Get-SPEnterpriseSearchServiceApplication "<NAME OF YOUR SEARCH SERVICE APPLICATION>"
$searchApplicationName | Get-SPEnterpriseSearchCrawlExtension

Delete an extension
$searchApplicationName = Get-SPEnterpriseSearchServiceApplication "<NAME OF YOUR SEARCH SERVICE APPLICATION>"
$searchApplicationName | Get-SPEnterpriseSearchCrawlExtension "ascx"| Remove-SPEnterpriseSearchCrawlExtension

Friday, September 6, 2013

Office 365 - SharePoint Online Improves Limits and Makes It Easier to Restore Documents

Based on feedback and reports on how customers use the service, Microsoft made the following improvements to SharePoint Online:

Improved file upload experience
  • Increased file upload limit from 250MB to 2 GB
  • Expanded support for a broader range of file types: SharePoint Online now accepts a broader range of file types, specifically .exe and .dll. 
Uploading large files into SkyDrive Pro (applicable to team site document libraries as well); files were dragged and dropped from the desktop into the Web interface.
See SharePoint Online blocked file type list.
 Increased Site collection and list look-up limits
  • Increased site collection limit from 3,000 to 10,000 
  • List look-up threshold increased to 12 look-ups.
*Note: this increase only applies to Office 365 Enterprise plans (including Education and Government) - Office 365 Small Business and Midsize Business remain at a single site collection and twenty site collections respectively.
Review the list of all SharePoint Online boundaries and limitations. 
Improved self-restoration
  • Increasing recycle bin retention duration (from 30-90 days)
  • Turning versioning on by default for new SkyDrive Pro libraries with 10 versions being retained.
A user's SkyDrive Pro Recycle Bin accessed by clicking the gear icon > Site contents 


  • All above announcements apply to all Office 365 business plans - except the 10,000 site collection increase (only applicable to Office 365 Enterprise plans (including Education and Government).
  • Does not apply to Office 365 Home Premium offering, which combines the latest Office applications with Skype and SkyDrive storage.
  • Office 365 dedicated plans are not receiving this same update, because they are managed in a unique, isolated infrastructure.
  • SharePoint will not execute any arbitrary EXEs or DLLs uploaded by a user to a team site or to their SkyDrive Pro.
    1. SharePoint will only accept uploads from authenticated users reducing risks that an outside attacker could post any malicious files.
    2. SharePoint has an antivirus scanning engine built in to detect malicious files.
    3. If user's attempt to execute a malicious file in their synced folders, Outlook and Windows have warning pop-up dialogs requesting consent from the user before the malicious file can execute.
    4. Many users also have antivirus scanning applications on their client computers and therefore would detect and quarantine any malicious files.
    5. Finally, should admins have cause to worry about these scenarios, they can enable auditing to any document library to detect which end user initially uploaded the malicious file.

Wednesday, September 4, 2013