What's New
July 28, 2010
MOSAID Showcases Solid State Drive Prototype
Applications
Replacing or augmenting rotating magnetic disk storage with solid state storage promises vast improvements in computer performance and power consumption. This technology has been termed Storage Class Memory or SCM1. While there are many emerging memory technologies with potential for SCM application, this paper focuses on proven NAND Flash technology. Current mainstream NAND Flash memory devices have a slow 40MB/s interface that does not allow many devices to be connected to a single channel. Today’s HLNAND devices feature a 266MB/s interface and support a virtually unlimited number of devices on a channel, while also offering lower interface power. Using HLNAND, Storage Class Memory is viable today using proven NAND Flash technology
Storage Class Memory
Today’s mainstream computer memory is organized hierarchically as shown in Figure 1. A pyramid with CPU located at the top indicates that there are smaller amounts of memory located near the CPU and greater amounts located further away.
Closest to the CPU are registers within the CPU itself. These registers can be accessed by the CPU within a single CPU clock cycle, typically several hundred pico-seconds at today’s multi-GHz clock frequencies. Several levels of cache memory, L1, L2, and possibly L3 cache, are also located within the CPU and can be accessed within approximately 10 clock cycles. The next level of hierarchy is DRAM main memory which has an access latency of 50ns or roughly 100 CPU clock cycles. Mainstream DDR2-800 SDRAM modules provide up to 6.4GB/s bandwidth. To this point, the memory hierarchy is well balanced, with each level down providing several orders of magnitude more memory with a single order of magnitude increased latency. However, below the DRAM layer there is a huge gap in both latency and bandwidth. Rotating magnetic Hard Disk Drives (HDD) require several milli-seconds seek time for the head to swing into position over the desired track. This translates to more than 10 million CPU cycles. Clearly if the CPU requires data from the hard disk it will be waiting for an eternity. Once the head is positioned over the appropriate track, the data transfer from HDD to DRAM will also be slow due to the rotational speed of the disk and the track density. As a result, the HDD provides only 1% of the bandwidth available from DRAM.
Figure 2 shows a memory hierarchy with HLNAND-based SCM filling the gap between DRAM main memory and HDD. Here the DRAM has been upgraded to DDR3-1600 where a single module can provide up to 12.8GB/s. Single bit-per-cell (SLC) NAND Flash has a page read time of 25µs and a page program time of 200µs, while two bit-per cell (MLC) NAND Flash delivers a page of read data in 60µs and programs a page in 600µs. This results in a latency ranging between 105 and 106 CPU clock cycles depending on the cell technology and whether a read or write operation is desired. HLNAND-based SCM provides a latency improvement of two orders of magnitude over HDD for accesses that cannot be serviced by the DRAM main memory layer. Furthermore, the bandwidth is better matched. SCM based on HLNAND can provide up to 2GB/s bandwidth depending on the number of HyperLink channels. Also, depending on the system, the HDD may be eliminated entirely. For example, portable computers may not require HDD.