Content Search is a search tool that can be used to return Documents matching specific text inside their contents. Content Search is also known as dtsearch.
This method of searching can be useful to find Documents that contain text not captured in the metadata, for increased flexibility. Content searching will be most effective when searching for rare words, as common words will potentially return many results.
It will return Documents that match the query but not highlight the section of a Document containing it, and the text needs to be selectable as text (not part of an image, for instance) in order to be included in the results.
Only Documents that have been indexed by Content Search will be returned, and to be indexed for Content Search, the Documents must be one of the indexable file types and meet file size limits. These are all outlined below.
File types that will be indexed:
- Microsoft office document types:
- Microsoft Word documents (including macro-enabled and template files)
- Microsoft Excel files (including macro-enabled and template files)
- Microsoft Powerpoint files (including macro-enabled and template files)
- Open Office document files:
- Email files
- Text files (plain text and HTML)
There is also a file size limit:
- Spreadsheet files (MS Office and OpenOffice) cannot exceed 8MB
- Text files cannot exceed 2MB
- All other types cannot exceed 32MB
Files will not be indexed if:
- They are password protected
- They are corrupted
- They have restrictive access rights encoded within the file (e.g. some PDFs)
- They are a file type that is not supported by Content Search
- They contain features not supported by Content Search (in which case it may be only partially indexed), e.g. some kinds of embedded objects
- Some PDF files cannot be indexed (e.g. if the PDF is an "image" format, where the text is not selectable).