So hardware designers make hardware that is fast for localized, sequential access. If however you have an application that doesn't use memory that way, there can be a huge performance penalty.
For example reading a byte from RAM might be 7 CPU cycles for a cache hit and 250 cycles for a cache miss.
How big a penalty depends on the hardware, cache and RAM. This is what the graph demonstrates.

Why it steps back up so much at the end of your results I don't know. Might be something do to with the cache design. My PC doesn't do this.
Leave a comment: