No announcement yet.

Word proximity searching

  • Filter
  • Time
  • Show
Clear All
new posts

  • Word proximity searching

    First time poster, long time user.

    I have been using OSF* tools for quite some time now in gathering disk images and performing analysis of contents through indexing and searching to support legal matters. I have recently had a request to search for terms that are in proximity to one another in emails. After indexing the mailboxes i would like to use a Word Search List to look for several terms in the index. However, my terms also include looking for words in proximity to one another.

    I see in documentation that regular expressions can be used in a 'filter' control, that does not seem to be available in the latest version. Additionally, i have attempted to use regular expressions in the word search file but it does appear to be treating the line by line search terms as regular expressions. Note, i am using regular expressions b/c you can construct expressions which express proximity searching capabilities.

    Does a Word Search file accept regular expressions as search terms?
    Where is the 'filter' box, referenced in the documentation, exist in the latest version?
    How would one go about doing word proximity searching in this tool?

    Thanks in advance!

  • #2
    OSF provides a number of advanced search options in the Search Index module.

    Match ANY search word (Boolean OR)

    Search for pages which contain AT LEAST ONE of the given search terms. The results will be sorted in order of the number of terms matched, and the determined relevancy score. Click the "any search words" radio button to enable this search option.

    Match ALL search word (Boolean AND)

    Search for pages which contain ALL of the given search terms. The results will be sorted in order of the number of terms matched, and the determined relevancy score. Click the "all search words" radio button to enable this search option.

    Wildcard searches

    You can use wildcard characters '*' and '?' in your search terms to search for multiple words and return larger set of results. An asterisk character ('*') in a search term represents any number of characters, while a question mark ('?') represents any single character.

    This allows you to perform advanced searches such as "zoom*" which would return all pages containing words beginning with "zoom". Similarly, "z??m" would return all pages containing four letter words beginning with 'z' and ending with 'm'. Also, "*car*" would be a search for any words containing the word "car". More complex full regular expressions are not supported when searching an index.

    Exact phrase searches

    An exact phrase search returns results where the phrase of words are found, in the same order that they are specified. For example, an exact phrase search for the words "green tea" would only return results where the phrase 'green tea' appears. It would not return pages where the words 'green' and 'tea' are found separately, or in a different order such as, 'tea green'.

    To specify an exact phrase search term, you need to enclose the words that form the phrase using double quotation marks. You can also combine the use of exact phrase searches with normal search terms and wildcard search terms within a single search query (eg. "green tea" japan*).Note however, that wildcards within exact phrases (eg. "green te*") are not supported.

    Exclusion/negative searches

    You can precede a search term with a hyphen character to exclude that search term from being included in your search results. For example, a search for "cat -dog" would return all pages containing the word "cat" but not the word "dog".

    Date range and Email field filters
    These additional filters appear in the advanced search options window.

    Some additional comments on proximity: This is one of the factors used to determine the score, which determines the sort order. So for example, in a multi-word search, if all the words are found in the same section of the document the result is more likely to be near the top of the result set. You can't for example do a search for words that are exactly 10 words apart however.

    There are other modules in OSF that do use regular expressions, but not the Search Index function. So maybe that's where the reference to a filter came from?