Great reference to have on hand for working with big data.
When you’re designing a performance-sensitive computer system, it is important to have an intuition for the relative costs of different operations. How much does a network I/O cost, compared to a disk I/O, a load from DRAM, or an L2 cache hit? How much computation does it make sense to trade for a reduction in I/O? What is the relative cost of random vs. sequential I/O? For a given workload, what is the bottleneck resource?
When designing a system, you rarely have enough time to completely build two alternative designs to compare their performance. This makes two skills useful:
- Back-of-the-envelope analysis. This essentially means developing an intuition for the performance of different alternate designs, so that you can reject possible designs out-of-hand, or choose which alternatives to consider more carefully.
- Microbenchmarking. If you can identify the bottleneck operation for a given resource, then you can construct a micro-benchmark…
View original post 312 more words