Announcement

Collapse
No announcement yet.

Search Index (German Umlaute)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • David (PassMark)
    replied
    I am guessing you are referring to the search History tab. There is no way to export the contents of the tab from the User Interface.

    But there is still an option to get all the data. There is an series of XML files that contains the search history for each case and each set of index files.

    The default location of the file is below. But the location will vary depending on the folder you selected for your case and the name you gave your index.

    C:\Users\<User Name>\Documents\PassMark\OSForensics\Cases\<Case Name>\Index\<Index name>\History\<history temp file name>.xml"

    There is one XML file per search performed and the XML files contain the search results as well as the search terms used, so it maybe isn't exactly what you are after.

    We'll also look at adding a better export to CSV function to the UI.

    Leave a comment:


  • Forensik
    replied
    Any chance to export or print the search index result list?

    On teh screen it is presented as a nice table only offering the possibility to delte single items or the whole list.

    best regards


    Originally posted by Forensik View Post
    Hi Mark,

    I did several searches containing umlaute like ä,ö,ü and ß and documents containg the characters were found in E-Mails, Attachments, DOC, DOCX, PDF, PPT file formats.

    Thank you very much for the fast fix.

    Leave a comment:


  • Forensik
    replied
    Hi Mark,

    I did several searches containing umlaute like ä,ö,ü and ß and documents containg the characters were found in E-Mails, Attachments, DOC, DOCX, PDF, PPT file formats.

    Thank you very much for the fast fix.

    Originally posted by David (PassMark) View Post
    This should be fixed in the V1.2 Alpha released today.
    http://www.passmark.com/forum/showth...a-Beta-release

    If you can check this out and confirm this is the case, that would be good.

    Leave a comment:


  • David (PassMark)
    replied
    This should be fixed in the V1.2 Alpha released today.
    http://www.passmark.com/forum/showth...a-Beta-release

    If you can check this out and confirm this is the case, that would be good.

    Leave a comment:


  • David (PassMark)
    replied
    We did some testing with various document and various indexing options.

    It seems there is a bug in the German stemmer. A stemmer is an algorithm that allows you to search for the word, "run" and also find results for "running" and "runs".

    So in some cases the German ß character is not being handled correctly when it gets stemmed.

    A quick solution is to turn off stemming. See the screen shot below for how to do this,


    We are working on a full solution to enable the stemmer to be used. The solution should be in the next V1.2 patch release.

    Leave a comment:


  • Forensik
    replied
    all sorts of documents mainly PDF, DOC (old), also xls and docx.

    Originally posted by David (PassMark) View Post
    What exact type of document was this from?
    Word, PDF, Excel, Powerpoint, OpenOffice, etc..
    Also if it was a Microsoft Office format was it the new Office format (e.g .DOCX) or the old format (e.g. .DOC)

    Leave a comment:


  • David (PassMark)
    replied
    What exact type of document was this from?
    Word, PDF, Excel, Powerpoint, OpenOffice, etc..
    Also if it was a Microsoft Office format was it the new Office format (e.g .DOCX) or the old format (e.g. .DOC)

    Leave a comment:


  • Forensik
    replied
    Hi there,

    I indexed and searched office documents. there were several docs in the case matching the umlaut-word. none was found doing the "hausübung" search. though it worked with "haus*bung". I am quite sure that I did not alter Enable accent/diacritic insensitivity-Option , I might have checked the stemming option for German.

    best regards

    Leave a comment:


  • Ray (PassMark)
    replied
    Just did some testing to confirm that searching for words with umlauts worked with some correctly encoded HTML files, and it worked correctly here in the test cases.

    So yes, we would need more details such as the type of file that you are searching to further verify. It could be a text file with wrong/unexpected encoding. It could be a PDF file with an unusual text layer that did not match the OCR image, etc. If you can send us a copy of the file, even better.

    Another thing to confirm is if you have changed the "Advanced" setting (under "Step 2" in the Create Index process) and if you have checked the option to "Enable accent/diacritic insensitivity".

    This would cause a word like "heiße" to be indexed as "heisse", so that it would be searchable for both "heiße" and "heisse", but it would lead to wildcard results where it would return for "hei*e" but not "hei?e" (because it is internally considered as two letters - ss).

    Leave a comment:


  • David (PassMark)
    replied
    It should work.
    What type of document was this from?
    e.g. a text file, a PDF, E-mail, Word files, etc..

    Leave a comment:


  • Forensik
    started a topic Search Index (German Umlaute)

    Search Index (German Umlaute)

    It seems the Search Index does not consider German umlaute like ü, ä, ö or specific German characters like "ß"? Am I right? Or, is there any way I can activate this?

    A search like "hausübung" retrieves no results. "haus?bung" -> no results, "haus*bung" shows N-results all containing the word.

    best regards
    Last edited by Forensik; 08-13-2012, 03:30 PM.
Working...
X