Announcement

Collapse
No announcement yet.

Search for SSN, Date, Phone, etc

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search for SSN, Date, Phone, etc

    Is there a way to search for a specific number format like ###-##-#### to find ONLY numbers when using OSForensics?

  • #2
    I assume you have created an index with the "Create index" function and now wish to search it.

    The easiest way to do this is to go to the "Search index" function, select the the "Browse index" tab then use the built in "Phone number - US" filter (bottom right of the window).

    Comment


    • #3
      Thanks, that will help with phone numbers, but I'm also looking for SSN (both with and without dashes) or pretty much any other time I might need spaceholders for numbers only.

      Comment


      • #4
        From the help file,

        "FilterOptions
        Perl compatible regular expressions (PCRE) are used when filtering the results displayed when browsing the search index. Several regular expression have been pre defined for quick use but you can also type your own regular expressions in the edit below the list. Currently the search is case insensitive, so "TEST" will return the same results as "test".

        For example to search for any entry containing the word "test" select the Custom option from the filter drop down list, type "test" and then click the search button. To find only entries that begin with the word "test" use "^test", the "^" character is used to indicate the pattern match must start at the beginning of the found word.

        To search for one of the special characters (eg $ ^ .) you will need to escape the character with "\", eg "\.com". For more information on the format and special characters used see the Perl regular expressions help page.

        There are several pre-configured regular expressions available from the drop down list, these are found in the the "RegularExpressions.txt" file in the OSForensics program data directory (ProgramData\PassMark\OSForensics). These have been collected from various sources and are kept as simple as possible while still returning fairly accurate results, please note these will not be 100% accurate in all situations."


        Note that the
        regular expressions in this filter work on the index of words and not directly on the full document text. Words in the index are already divided up based on white space and punctuation. So it doesn't make sense to use a regex containing spaces.

        Words well for items like E-mail addresses however. For which the regex is,
        \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

        Comment


        • #5
          Originally posted by David (PassMark) View Post
          From the help file,

          "FilterOptions
          Perl compatible regular expressions (PCRE) are used when filtering the results displayed when browsing the search index. Several regular expression have been pre defined for quick use but you can also type your own regular expressions in the edit below the list. Currently the search is case insensitive, so "TEST" will return the same results as "test".

          For example to search for any entry containing the word "test" select the Custom option from the filter drop down list, type "test" and then click the search button. To find only entries that begin with the word "test" use "^test", the "^" character is used to indicate the pattern match must start at the beginning of the found word.

          To search for one of the special characters (eg $ ^ .) you will need to escape the character with "\", eg "\.com". For more information on the format and special characters used see the Perl regular expressions help page.

          There are several pre-configured regular expressions available from the drop down list, these are found in the the "RegularExpressions.txt" file in the OSForensics program data directory (ProgramData\PassMark\OSForensics). These have been collected from various sources and are kept as simple as possible while still returning fairly accurate results, please note these will not be 100% accurate in all situations."


          Note that the
          regular expressions in this filter work on the index of words and not directly on the full document text. Words in the index are already divided up based on white space and punctuation. So it doesn't make sense to use a regex containing spaces.

          Words well for items like E-mail addresses however. For which the regex is,
          \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
          Definitely what I was looking for.
          I had tried putting the regular expression into the Custom filter with no success before, but I have since tried some other expressions successfully. It seems that the expressions that are simply built to look for SSN formats work as expected, but ones that are more complicated and try to filter out invalid SSNs does not return ANY results. For example:

          Just find the format:
          ^(\d{3}-?\d{2}-?\d{4}|XXX-XX-XXXX)$
          ^((\d{3}-?\d{2}-?\d{4})|(X{3}-?X{2}-?X{4}))$

          Find valid SSN:
          /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
          (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})

          Now, we have used the expressions that filter for only valid expressions in other tools without issue. Does it have something to do with the indexing?

          Also, does the Professional version eliminate the 2GB file size limitation?

          Thanks!

          Comment


          • #6
            Are you using 32bit or 64bit?
            There shouldn't be any file size limit in 64bit.
            Which file are you worried about?

            I hate debugging long regex expressions. Can take forever to find a missing slash. Was this a 2 line expression, or did the forum split it up when you posted it? Are you sure it worked at some point and it was a Perl expression?

            I'll also check it if have some max line length limit.

            Comment


            • #7
              The first reason that no results were returned from the more complicated "valid only" expressions was that we were using a function to execute the expression that favored speed, so the more complicated ones were not being executed properly. We'll be making some changes to the next build so we call a slower, but more accurate, PCRE function to return the results.

              Another issue may be that these two "valid only" expressions are still not 100% accurate, in some of our tests (through both OSForensics and testing using http://www.regextester.com/) ,

              (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
              returned more junk results than the simpler ones, for example it will match "000000000000000000001234567890AB"
              "001ISB601089506"
              and
              "0x7619693978f91d90539ae786".

              The other one,
              /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
              doesn't seem to match even valid SSNs.

              Comment


              • #8
                Originally posted by Tim (PassMark) View Post
                The first reason that no results were returned from the more complicated "valid only" expressions was that we were using a function to execute the expression that favored speed, so the more complicated ones were not being executed properly. We'll be making some changes to the next build so we call a slower, but more accurate, PCRE function to return the results.

                Another issue may be that these two "valid only" expressions are still not 100% accurate, in some of our tests (through both OSForensics and testing using http://www.regextester.com/) ,

                (?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
                returned more junk results than the simpler ones, for example it will match "000000000000000000001234567890AB"
                "001ISB601089506"
                and
                "0x7619693978f91d90539ae786".

                The other one,
                /^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
                doesn't seem to match even valid SSNs.
                Yeah, the second one I had pulled down from a help board while I was just trying to find something that would work in OSForensics. The former was the one we've been using in Spider 2008 (which we are looking at replacing). I'll be looking forward to trying the next build.

                OSForensics might be out of the running, however, as we're still having issues with the file size limitation. We've tried it on a Win7 64-bit machine on NTFS (BitLocker enabled) and a Win8 Preview 32-bit on NTFS (no BitLocker) with similar results concerning any files over 2GB. The log simply states to Check Configuration Settings, and when we try to change the setting, a pop-up comes up that says "Max file size must be less than 2GB."

                thanks,

                tim

                Comment


                • #9
                  If you need a quick patch release for the RegEx change, let me know. Otherwise we'll put it into the next public release in a week or so.

                  For the file size issue I think the key question is why are you trying to set the file size limit above 2GB? In other words what single file on your hard disk is larger than 2GB and is worth indexing for it's text content.

                  Note that the limit doesn't apply at all to compound files. For example for Zip files, the limit applies to each of the files inside of the Zip file and not the Zip file itself.

                  If there is a good reason we can increase the per file limit beyond 2GB, at the expense of using more RAM, but we aren't aware of any reason why this might be required.

                  Comment


                  • #10
                    Originally posted by David (PassMark) View Post
                    If you need a quick patch release for the RegEx change, let me know. Otherwise we'll put it into the next public release in a week or so.

                    For the file size issue I think the key question is why are you trying to set the file size limit above 2GB? In other words what single file on your hard disk is larger than 2GB and is worth indexing for it's text content.

                    Note that the limit doesn't apply at all to compound files. For example for Zip files, the limit applies to each of the files inside of the Zip file and not the Zip file itself.

                    If there is a good reason we can increase the per file limit beyond 2GB, at the expense of using more RAM, but we aren't aware of any reason why this might be required.
                    If you have a patch that will fix the RegEx issue, it would be swell.

                    We have tested on several large files including ISOs (which are compound files) and, more importantly, PST files.

                    Thanks,

                    tim

                    Comment


                    • #11
                      Certainly it should be working for PST files > 2GB. If it isn't we need to fix it. (I had the feeling we already testing out to at least 5GB per PST file). Did you try indexing your large PST file with the file size limit set to lower value?

                      For an ISO there is no point trying to do a direct text extraction. I think at the moment OSF isn't automatically unpacking the ISO to index the files inside. You would need to mount it as a separate drive to index it.

                      Let me check the behavior on these files types and get back to you.

                      Comment


                      • #12
                        We've checked into this.

                        A 2GB PST file does not need a "Max file size limit" set to 2GB. The "Max file size limit" only applies to the individual messages and attachments found within the PST file.

                        As for ISO files, as noted above, we are not indexing the contents of ISO files. You should mount the ISO image as a separate drive (under "Manage case"->"Add device") and index it separately. You must have manually added ".iso" to the list of extensions to index. We would advise to remove this.

                        So there should be no need to specify a max file size over 2GB if the only reason you are doing so at the moment is for PST files and ISO files.

                        Having said that, you might have run into some other limit during your original indexing attempt. You should check the log message. Perhaps you only needed to increase your max file size to 1GB?
                        Ray
                        PassMark Software

                        Comment


                        • #13
                          I'll try with 1GB, but the actual PST is around 2.5GB.

                          tim
                          Last edited by bfbcping; Jun-25-2012, 03:40 PM.

                          Comment


                          • #14
                            In fact you might find that a much lower limit is also OK.
                            e.g. 50MB
                            But it really depends on what is on the disk that is being indexed. The value should be theory auto-set itself to something reasonable.

                            Comment


                            • #15
                              We are going to E-mail you a link to a pre-release of the next minor OSF release.
                              It will have changes to the regex handling noted above, among other things.

                              Comment

                              Working...
                              X