Announcement

Collapse
No announcement yet.

email requests or suggestions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • email requests or suggestions

    I am running into a wall and need a few things clarified if they are possible or if there are plans to be added.

    1) When indexing a pst file, does it also scan for deleted items such as with the email viewer?

    2) Is there anyway the email viewer can use the above mentioned indexes? Use case, attorney asks for pst file of results they can potentially redact from if items belonging to other cases/clients match terms. In email viewer I can export messages to accommodate this however, with large pst files it takes hours to load and scan and in this case nearly an hour per search. The program then crashed exporting 700 messages (cpu never went over 24% and ram utilization was about 60% - multicores off by default maybe?). When the program crashed I then had to wait another 8+ hours to load the pst and another few hours to scan for deleted items to attempt slow searches all over again.

    3) Can there please be a way to export case items that are emails back to emails (to work around the above)?

    4) Can there be a way to satisfy ediscovery requests on a pst that complies with concordance tiff files and load files and bates stamping?

    If the case emails could be in a pst (generated by or added into OSForensics) it would be great if the ediscovery requests could be fulfilled. I am hoping this feature is already present but I am missing it. Loading a pst (or other file types) and exporting tiffs with bates stamping and concordance load file would be amazing.

  • #2

    Load times:
    It shouldn't take hours to load a PST. So unless it is truly gigantic there is something wrong (or something far from optimal).

    How big is the PST file?
    What storage is backing the PST file (is it on a local SSD?).
    What version of OSF are you using?

    For comparison, I can load a 2.1GB PST file in 59 seconds. This is on a relatively slow 7 year old CPU (i7-5820K) with the PST being stored on a local SSD (Samsung 970 EVO). RAM in OSF usage was 800MB. With modern hardware load time would probably be 30sec. Subsequent scanning for deleted / orphaned EMails added another 20sec.

    Search times:
    Once this 2GB PST was loaded, search times were under 1 second (too quick to measure with a stopwatch), if just the mail headers were searched.
    If both the headers and body text was searched then search times were 1min 15 sec.

    What type of search are you doing? (Body text, date ranges, RegEx, etc..)
    If you have a few different searches to do, can you combine them as a single RegEx to save search time.

    Crashing:
    Program should never crash. So if you could send us a crash dump and debug log that might help.

    What format were you attempting to export data to (PDF, MSG, HTML, add to case)?

    Even better would be to get a copy of the source PST file. We could sign NDA if required.

    TIFF files:
    There is no option to export EMail to a TIFF file.
    TIFF as a file format was declared dead back in 2004 with the fax machine (and every year since).

    EMail Export options:
    At the moment the only options are, MSG, HTML and PDF.
    There is no option to remove a few Emails from a PST file and keep the rest of the PST intact. (Maybe you can just use Outlook for this? Except that in this case you won't see deleted items)

    Future Email export options:
    There is a new export coming in V9 of OSF. There will be an option just to export selected attachments (filtering by size and type).
    There should be a beta release of V9 next week.

    Indexes:
    Yes, you can index EMails. Building the initial index can take a while (in my example above it took 9min for a 2GB file). This makes searching them much faster later on (under 1sec for body text search). This makes sense if you have dozens of searches to do and you are looking for a needle in a hay stack. It might make less sense if you are trying trying to remove the needle but keep the hay stack. I would need to check about the indexing of deleted / orphaned EMails. It should index them.





    Comment


    • #3
      Thanks for the quick reply. The workstation is an i7 with 16gb of ram and a 1tb ssd for the OS and 4tb mechanical drive for temporary storage. The pst file is 28gb in size. I will try and put the pst back on the ssd and see if results are any better.

      The reason I was asking, for another case the DOJ asked for the emails to be submitted as tiff files with concordance load files. I understand it is far from optimal, but this is what they are specifically requesting of the discovery documents.

      Comment


      • #4
        28GB is a pretty big file.

        Unfornately just knowing your CPU is an "i7" doesn't mean much anymore.
        For example the Intel Core i7-11700K @ 3.60GHz is around four times faster than the Intel Core i7-620UM @ 1.07GHz for single threaded tasks and 30 times faster for multi-threaded.

        I know you said that RAM usage was only 60%, but 16GB of RAM also probably isn't enough if you are playing around with 28GB files. (you can buy an additional 16GB for just $80, so it is well worth it if it saves you a few hours). If you get into a position when you are even a little short on RAM, the O/S will start swapping memory memory pages to disk. This incurs something like a 500x performance hit.

        Mechanical drives are useless for this type of work. Even SATA SSDs are pretty rubbish. For $50 you can get a small M2 SSD that will be around 100x faster for random access. If this only saves you 1 hour, this a great investment.

        DOJ asked for the emails to be submitted as tiff files with concordance load files
        Yes, I was being a bit facetious in my last post. TIFF is pretty much dead, but we are aware it is still in use in some areas (very much like the fax machine).
        I've added it to out list of things to have a look at in the future. In the meantime you might need to use PDF and then convert to concordance as a 2nd step.

        In my opinion, the whole plan doesn't make make any sense. In a 28GB file there must be 100s of thousands of EMails. What could any lawyer do with 300,000 random TIFF files? You would need to rebuild them back into some type of structured index, (i.e. exactly what a PST is to start with). But by then you have lost all the attachments and meta data, etc.. TIFF is also a rubbish format from a storage efficiency point of view. So your 28GB file might end up being 500GB of TIFFs. Someone might be stupid enough to then attempt an OCR job on the TIFFs, which might take weeks.



        Comment

        Working...
        X