Announcement

Collapse
No announcement yet.

Advanced disk test data drops

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advanced disk test data drops

    Hi all,

    new to this forum so hopefully I am doing this in the right spot.

    I am running the advanced disk test to validate disk for the new systems we want to introduce at work.

    I am testing on a 2xIntel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz

    with latest generation of Intel SSD's (INTEL SSDSC2KG480G

    the thing is that when I run the test with my specific test case:

    2 threads sequential writes (block size 2Mb / file size 2Gb / 50% random data)
    1 thread sequential reading (block size 2Mb / file size 2Gb / 50% random data)
    Click image for larger version

Name:	2020-02-27 10_01_58-Microsoft Edge.png
Views:	496
Size:	20.8 KB
ID:	46681

    test runs for 3600 seconds with 0.25ms sample size (so 14400 samples per hour)

    I filled up the disk with a dump file of 440Gb to mimic worst case scenario.. (through FSutil commands)

    I get some unexplainable drops in the graphs..

    Click image for larger version

Name:	2020-02-27 10_01_51-Microsoft Edge.png
Views:	377
Size:	14.5 KB
ID:	46680

    when I look at the datadump (export of the cvs)
    I can see steps were no data has been transferred either right or read..

    Click image for larger version

Name:	2020-02-27 10_05_24-.png
Views:	314
Size:	2.6 KB
ID:	46679

    so that causes the system to think there is a time step were there was no data transfer and thus screwing up the min/max/average calculations.
    (please see attached files / images)

    I am checking other disks from other manufacturers it happens also.. but it seems random, so it migh tthe PC or the SSD causing this I don't know. but it seems very odd behaviour for these very expensive SSD's..

    Am i doing something wrong? does anybody else have seen something similar?

    next test is a different version of intel disk, and I will re check with an empty disk to see if it is different.

    I just want to know why this happens and if i can discard this as a tool issue or I must clasify this as a disk issue.

    many thanks
    Paul
    Attached Files

  • #2
    with 0.25ms sample size
    You mean 0.25 seconds. (or 250ms).

    So it appears there were several 500ms periods over the hour during which no disk I/O took place. Windows isn't really a real time operating system so the result isn't so surprising (i.e. Microsoft makes no promises about the response time of the operating system, and will happily do things like install updates in the middle of an important task).

    I do think the delay (from the applications point of view) is real however.

    But there might be many reasons for this. A few possibilities.
    - TRIM / Garbage collection on the drives
    - The O/S blocks everything while something else important happens (e.g. disk swapping, reading from a bad optical drives are classic ones).
    - Errors on the SATA bus that required retransmission of the data.
    - Bad blocks on the disk that the drive shuffles around (check SMART data for this, relocated sectors)
    - There was in fact no significant delay at a very low level. Maybe it was just a timing issue. i.e. the disk was writing data the whole time, but the data was supplied from the Windows cache or DRAM cache on the drive itself. In the meantime the application is still blocked. Or a combination of caching and TRIM.

    Would need more testing to come to any conclusion.

    but it seems very odd behaviour for these very expensive SSD's.
    I don't know how much you paid, but these Intel SSDSC2KG480G8 aren't expensive SSDs. $160 from Amazon at the moment. If by expensive you mean poor value, then you would be correct.
    By today's standards they are somewhat old and slow.

    Comment


    • #3
      Hi David, thanks for your response..

      I am jut getting into this so my apologies for being ignorent in a lot of things,

      the SSD in fact is enterprize class which is a minimum requirment for my disk but so is cost. so if you have trusted high quality SSD's for around that price I am alwasy interested.

      What I did find in the meanwhile is that the drops disappear when there is more room on the disk available.
      the above test was speficially done with next to no room left on the disk.. (e.g. 4Gb of free space)

      however that makes me wonder how the passmark software writes and reads so much data (55Gb per thread) without it being physically present on the disk itself..
      do you have any insight on how that works? does it erase before writing? has that to do with the block size and specific file size I input in the settings?

      Click image for larger version  Name:	2020-02-28 11_35_33-Window.png Views:	0 Size:	7.6 KB ID:	46690
      best regards,
      Paul.

      Comment


      • #4
        If you select a 2GB test files, the the amount of disk space used will never be larger than 2GB (per thread).

        It is well documented that SSDs perform badly when full. Google will throw up 100s of articles on the subject.

        Enterprise drives are just normal SSDs with slightly higher write endurance (and sometimes with encryption, if you need that). So any drive with a good write endurance / warranty should work.
        Some of the Intel Optane drives have 10x the endurance and 5x the speed compared to the drive you are testing (but they are seriously expensive),

        Comment


        • #5
          hi david,

          when i was running the test to verify my results (i just pushed go one more time)
          the drops were back.. so it is not the free space.. I cant identify what causes it. the PC is not conneted to the internet, and no other activities are running.
          so any suggestions on what this could be? I will have to get more disks involved to rule out a faulty one..

          new test with same settings same disk same settings previous test.

          Click image for larger version

Name:	2020-03-03 15_54_18-Microsoft Edge.png
Views:	470
Size:	157.6 KB
ID:	46713 Click image for larger version

Name:	2020-03-03 15_54_09-Microsoft Edge.png
Views:	429
Size:	186.3 KB
ID:	46715


          in any case thanks for your response.
          it was very helpfull.

          best regards,
          Paul.
          Attached Files

          Comment


          • #6
            Intel supplies software with their drives and you can force a TRIM operation from the software (or at least they used to).

            Empty the drive. Then force a TRIM. Then wait a while for the operation to complete (don't know how long, but lets say 30min).
            Then try the test again when the drive is empty and in an optimal state.

            Comment


            • #7
              Hi david,

              well trim was enabled so that wasn't it. also a clean disk did not give me a better result..
              looking into other disks also..

              I was looking into the datadump you get from your test and I was wondering how the Min / Max latency is calculated.

              i can get the averge calaculated in the excel but not the minimum..
              any knowledge in how i can do this?


              i figured that I could calculate the amount of MB/ms and use that number and the block size to calculate it.
              so if in 250ms i can transfer 41.94304 MB i know this is 0.16777216 MB / ms, and with a block size of 2.097152 i can calculate the time it takes to write one block.
              2.097152 / 0.167504153 = 12.5 ms which seems to be valid since the output latency says aorund that number.. it seems legit (with some deviation) if you average that on 14400 samples, but min / max is not and i dont know why. any suggestions?

              Click image for larger version

Name:	2020-03-04 17_23_29-Window.png
Views:	443
Size:	27.1 KB
ID:	46724 Click image for larger version

Name:	2020-03-04 17_23_52-test data calculations.xlsx - Excel.png
Views:	461
Size:	4.6 KB
ID:	46722
              Best regards,
              Paul.
              Attached Files

              Comment


              • #8
                well trim was enabled so that wasn't it
                Sorry I don't understand the logic.
                I was suggesting that having TRIM active might be the cause of the short pauses in activity and you are saying TRIM can't be the problem because it was enabled.

                Max and Min latency is the max and min time required to carry out the disk read or write. As in your case a read / write takes 12ms, there would be around 83 of them per second. The report is just a summary of activity. Not the result of every I/O request.

                If you need to look at each low level I/O operation then try ProcMon.
                https://docs.microsoft.com/en-us/sys...nloads/procmon


                Comment


                • #9
                  Hi david,

                  sorry I mis read you comment.. I will have to check the disk again without TRIM enabled correct?



                  I need to get my supplier to test multiple disks soon for selection.
                  and I am working on a test script they can use to validate.

                  our requirements have been adapted so there is a maximum allowed latency..
                  I assume that number is trustworthy?

                  on the other side.. we would like to implement a bit of spread in the results.. so my boss asked me to run the tests multiple times and then see how much deviation is on the maximum latency.

                  I can run te test with a script multiple time (that works amazingly well)
                  but I would like the results to be saved seperatly. is that possible?

                  so

                  LOOP 10
                  {
                  ADT_REMOVEALLTHREADS

                  ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 100 100 50 0 0

                  ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 100 100 50 0 0

                  ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 0 100 50 0 0

                  ADT_SETEXPORT HTML SVT C:\export.html

                  ADT_RUNTESTS 20 250
                  }

                  at this point it overwrites itself 10 times..
                  is thtere a way to get 10 seperate files?

                  many thanks!
                  Paul

                  Comment


                  • #10
                    The advanced disk test export function doesn't append to existing files so this won't work in a script loop. You would need to unroll the loop and instead run it X times with a separate filename for each run eg;

                    ADT_REMOVEALLTHREADS
                    ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 100 100 50 0 0
                    ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 100 100 50 0 0
                    ADT_ADDTHREAD C 2097152 2097152 UNCACHED SYNCH 10 0 100 50 0 0

                    ADT_SETEXPORT HTML SVT C:\export1.html
                    ADT_RUNTESTS 20 250

                    ADT_SETEXPORT HTML SVT C:\export2.html
                    ADT_RUNTESTS 20 250

                    ADT_SETEXPORT HTML SVT C:\export3.html
                    ADT_RUNTESTS 20 250

                    etc...




                    Comment


                    • #11
                      Thanks Tim that works great!

                      but can someone comment on the trustworthyness of the latency data?
                      I assume you validated this thoroughly correct?

                      is it safe to use these numbers for validation?

                      on the other side is it possible to discard the first few measurements because they seem a bit unstable in the beginning.

                      many thanks!


                      Comment


                      • #12
                        As far as we know it is correct.

                        But you have to understand what you are measuring. But there are many layers of software, device drivers and hardware between the software application and writing data to the flash memory chip in a SSD.

                        Windows has a cache, the disk controller sometimes has a cache and some SSDs also have a DDR memory cache. Then there is possible encryption layers, file system layers, bus protocols, etc..
                        So you are never measuring the pure speed of the disk hardware.

                        Comment

                        Working...
                        X