Announcement

Collapse
No announcement yet.

Search for SSN, Date, Phone, etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • David (PassMark)
    replied
    Hmmm. I thought the RegEx issue was solved with the build we sent you? Wasn't it?
    We aren't aware of any crash problems in that release we sent? Can we get more detail?
    We also aren't aware of any problem with the filter function not exporting the data you want.

    I'll E-mail you, as these sound like fairly minor issues that are easy to solve if we have the details.

    Update: This change to RegEx is also in the V1.2 public release.

    Leave a comment:


  • bfbcping
    replied
    I appreciate all the work you guys have done to support this product. Unfortunately, OSForensics has proven to be a bit too unreliable and buggy for use as a forensics product. I really like your concept in simplifying the process with easy to follow steps and simple results. I spent 12 years at my last job using a very complicated and expensive forensics program for more nit-and-gritty investigating, but the current application seemed to be right in your wheelhouse - mining already gathered data for specific patterns and intelligence.

    Unfortunately, the RegEx issues combined with a few crashes, some commands not responding at all, and the filter export only listing information and counts without locations or other contextual data have made us decide to look elsewhere for our limited use case.

    Again, I appreciate your time and effort in supporting our testing process.

    Leave a comment:


  • David (PassMark)
    replied
    We are going to E-mail you a link to a pre-release of the next minor OSF release.
    It will have changes to the regex handling noted above, among other things.

    Leave a comment:


  • David (PassMark)
    replied
    In fact you might find that a much lower limit is also OK.
    e.g. 50MB
    But it really depends on what is on the disk that is being indexed. The value should be theory auto-set itself to something reasonable.

    Leave a comment:


  • bfbcping
    replied
    I'll try with 1GB, but the actual PST is around 2.5GB.

    tim
    Last edited by bfbcping; 06-25-2012, 03:40 PM.

    Leave a comment:


  • Ray (PassMark)
    replied
    We've checked into this.

    A 2GB PST file does not need a "Max file size limit" set to 2GB. The "Max file size limit" only applies to the individual messages and attachments found within the PST file.

    As for ISO files, as noted above, we are not indexing the contents of ISO files. You should mount the ISO image as a separate drive (under "Manage case"->"Add device") and index it separately. You must have manually added ".iso" to the list of extensions to index. We would advise to remove this.

    So there should be no need to specify a max file size over 2GB if the only reason you are doing so at the moment is for PST files and ISO files.

    Having said that, you might have run into some other limit during your original indexing attempt. You should check the log message. Perhaps you only needed to increase your max file size to 1GB?

    Leave a comment:


  • David (PassMark)
    replied
    Certainly it should be working for PST files > 2GB. If it isn't we need to fix it. (I had the feeling we already testing out to at least 5GB per PST file). Did you try indexing your large PST file with the file size limit set to lower value?

    For an ISO there is no point trying to do a direct text extraction. I think at the moment OSF isn't automatically unpacking the ISO to index the files inside. You would need to mount it as a separate drive to index it.

    Let me check the behavior on these files types and get back to you.

    Leave a comment:


  • bfbcping
    replied
    Originally posted by David (PassMark) View Post
    If you need a quick patch release for the RegEx change, let me know. Otherwise we'll put it into the next public release in a week or so.

    For the file size issue I think the key question is why are you trying to set the file size limit above 2GB? In other words what single file on your hard disk is larger than 2GB and is worth indexing for it's text content.

    Note that the limit doesn't apply at all to compound files. For example for Zip files, the limit applies to each of the files inside of the Zip file and not the Zip file itself.

    If there is a good reason we can increase the per file limit beyond 2GB, at the expense of using more RAM, but we aren't aware of any reason why this might be required.
    If you have a patch that will fix the RegEx issue, it would be swell.

    We have tested on several large files including ISOs (which are compound files) and, more importantly, PST files.

    Thanks,

    tim

    Leave a comment:


  • David (PassMark)
    replied
    If you need a quick patch release for the RegEx change, let me know. Otherwise we'll put it into the next public release in a week or so.

    For the file size issue I think the key question is why are you trying to set the file size limit above 2GB? In other words what single file on your hard disk is larger than 2GB and is worth indexing for it's text content.

    Note that the limit doesn't apply at all to compound files. For example for Zip files, the limit applies to each of the files inside of the Zip file and not the Zip file itself.

    If there is a good reason we can increase the per file limit beyond 2GB, at the expense of using more RAM, but we aren't aware of any reason why this might be required.

    Leave a comment:


  • bfbcping
    replied
    Originally posted by Tim (PassMark) View Post
    The first reason that no results were returned from the more complicated "valid only" expressions was that we were using a function to execute the expression that favored speed, so the more complicated ones were not being executed properly. We'll be making some changes to the next build so we call a slower, but more accurate, PCRE function to return the results.

    Another issue may be that these two "valid only" expressions are still not 100% accurate, in some of our tests (through both OSForensics and testing using http://www.regextester.com/) ,

    (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
    returned more junk results than the simpler ones, for example it will match "000000000000000000001234567890AB"
    "001ISB601089506"
    and
    "0x7619693978f91d90539ae786".

    The other one,
    /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
    doesn't seem to match even valid SSNs.
    Yeah, the second one I had pulled down from a help board while I was just trying to find something that would work in OSForensics. The former was the one we've been using in Spider 2008 (which we are looking at replacing). I'll be looking forward to trying the next build.

    OSForensics might be out of the running, however, as we're still having issues with the file size limitation. We've tried it on a Win7 64-bit machine on NTFS (BitLocker enabled) and a Win8 Preview 32-bit on NTFS (no BitLocker) with similar results concerning any files over 2GB. The log simply states to Check Configuration Settings, and when we try to change the setting, a pop-up comes up that says "Max file size must be less than 2GB."

    thanks,

    tim

    Leave a comment:


  • Tim (PassMark)
    replied
    The first reason that no results were returned from the more complicated "valid only" expressions was that we were using a function to execute the expression that favored speed, so the more complicated ones were not being executed properly. We'll be making some changes to the next build so we call a slower, but more accurate, PCRE function to return the results.

    Another issue may be that these two "valid only" expressions are still not 100% accurate, in some of our tests (through both OSForensics and testing using http://www.regextester.com/) ,

    (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
    returned more junk results than the simpler ones, for example it will match "000000000000000000001234567890AB"
    "001ISB601089506"
    and
    "0x7619693978f91d90539ae786".

    The other one,
    /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
    doesn't seem to match even valid SSNs.

    Leave a comment:


  • David (PassMark)
    replied
    Are you using 32bit or 64bit?
    There shouldn't be any file size limit in 64bit.
    Which file are you worried about?

    I hate debugging long regex expressions. Can take forever to find a missing slash. Was this a 2 line expression, or did the forum split it up when you posted it? Are you sure it worked at some point and it was a Perl expression?

    I'll also check it if have some max line length limit.

    Leave a comment:


  • bfbcping
    replied
    Originally posted by David (PassMark) View Post
    From the help file,

    "FilterOptions
    Perl compatible regular expressions (PCRE) are used when filtering the results displayed when browsing the search index. Several regular expression have been pre defined for quick use but you can also type your own regular expressions in the edit below the list. Currently the search is case insensitive, so "TEST" will return the same results as "test".

    For example to search for any entry containing the word "test" select the Custom option from the filter drop down list, type "test" and then click the search button. To find only entries that begin with the word "test" use "^test", the "^" character is used to indicate the pattern match must start at the beginning of the found word.

    To search for one of the special characters (eg $ ^ .) you will need to escape the character with "\", eg "\.com". For more information on the format and special characters used see the Perl regular expressions help page.

    There are several pre-configured regular expressions available from the drop down list, these are found in the the "RegularExpressions.txt" file in the OSForensics program data directory (ProgramData\PassMark\OSForensics). These have been collected from various sources and are kept as simple as possible while still returning fairly accurate results, please note these will not be 100% accurate in all situations."


    Note that the
    regular expressions in this filter work on the index of words and not directly on the full document text. Words in the index are already divided up based on white space and punctuation. So it doesn't make sense to use a regex containing spaces.

    Words well for items like E-mail addresses however. For which the regex is,
    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
    Definitely what I was looking for.
    I had tried putting the regular expression into the Custom filter with no success before, but I have since tried some other expressions successfully. It seems that the expressions that are simply built to look for SSN formats work as expected, but ones that are more complicated and try to filter out invalid SSNs does not return ANY results. For example:

    Just find the format:
    ^(\d{3}-?\d{2}-?\d{4}|XXX-XX-XXXX)$
    ^((\d{3}-?\d{2}-?\d{4})|(X{3}-?X{2}-?X{4}))$

    Find valid SSN:
    /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
    (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})

    Now, we have used the expressions that filter for only valid expressions in other tools without issue. Does it have something to do with the indexing?

    Also, does the Professional version eliminate the 2GB file size limitation?

    Thanks!

    Leave a comment:


  • David (PassMark)
    replied
    From the help file,

    "FilterOptions
    Perl compatible regular expressions (PCRE) are used when filtering the results displayed when browsing the search index. Several regular expression have been pre defined for quick use but you can also type your own regular expressions in the edit below the list. Currently the search is case insensitive, so "TEST" will return the same results as "test".

    For example to search for any entry containing the word "test" select the Custom option from the filter drop down list, type "test" and then click the search button. To find only entries that begin with the word "test" use "^test", the "^" character is used to indicate the pattern match must start at the beginning of the found word.

    To search for one of the special characters (eg $ ^ .) you will need to escape the character with "\", eg "\.com". For more information on the format and special characters used see the Perl regular expressions help page.

    There are several pre-configured regular expressions available from the drop down list, these are found in the the "RegularExpressions.txt" file in the OSForensics program data directory (ProgramData\PassMark\OSForensics). These have been collected from various sources and are kept as simple as possible while still returning fairly accurate results, please note these will not be 100% accurate in all situations."


    Note that the
    regular expressions in this filter work on the index of words and not directly on the full document text. Words in the index are already divided up based on white space and punctuation. So it doesn't make sense to use a regex containing spaces.

    Words well for items like E-mail addresses however. For which the regex is,
    \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

    Leave a comment:


  • bfbcping
    replied
    Thanks, that will help with phone numbers, but I'm also looking for SSN (both with and without dashes) or pretty much any other time I might need spaceholders for numbers only.

    Leave a comment:

Working...
X