16Gb/s and Beyond with Single-Ended I/O in High-Performance Graphics Memory

To validate this last statement regarding the degree of crosstalk observed across the channel, Figure 8 compares the un-equalized back-drilled case, with and without aggressors (e.g., all remaining DQ lines in the Byte, along with the EDC and DBI signals). Based on this simulation, about 238mV of crosstalk is expected in the cleaner (e.g., back-drilled) of the two channel environments. More crosstalk would be expected in the absence of back drilling. Thus, as a goal of this paper is to demonstrate a practical path to 16Gb/s, all remaining simulations assume back drilling of vias in the PCB as a foundation for additional enabling steps, including equalization.

Fig 8

Figure 8: Simulated data eye openings at 16Gb/s – ISI only (left) and with additive crosstalk from the remaining high-speed data lines, EDC and DBI signals (right).

Based on a further review of the raw pulse response from the right side of Figure 7, a practical, power-efficient equalizer solution might only need to address the 1^st post cursor. The GDDR6 I/O incorporates both tunable single-tap de-emphasis into the output driver and tunable single-tap DFE within the input path, both designed to operate on the 1^st post cursor.

Figure 9 compares the relative effectiveness of the available de-emphasis and DFE. As shown, the de-emphasis improves the eye height by 6mV, while degrading the eye width by 1ps. The DFE, on the other hand, improves the eye height by 65mV without degrading the eye width. It is important to note that the results shown are channel-specific and are insufficient to make a universal assessment of the relative value of either equalization method, though some qualitative observations can be made when comparing the corresponding pulse responses, as demonstrated by Figure 10.

Fig 9

Figure 9: Simulated data eye openings at 16Gb/s – No equalization (left), single-tap de-emphasis (center) and single-tap DFE (right).

As highlighted in Figure 10, de-emphasis-based equalization (green and blue curves) reduces the overall amplitude of the signal while reducing the 1^st post cursor. As a result, the optimal amount of de-emphasis corresponds to a balance between signal amplitude and ISI cancellation. For the channel under consideration, 3dB of de-emphasis (blue curve) nearly reduces the 1^st post cursor to zero, yet, as will be shown, a larger eye opening is possible with only 1dB of de-emphasis (green curve). This is because 3dB of de-emphasis does not leave enough of the main cursor to provide a net increase in eye opening, while 1dB of de-emphasis, on the other hand, results in a net positive of 6mV.

Intuitively, because the DFE zeros out the 1^st post cursor without reducing the signal amplitude, one would expect a better overall result, which is clearly demonstrated in Figure 9. One other nuance captured in Figure 10 is that de-emphasis, while primarily addressing the 1^st post cursor, may impact additional post cursors for better or worse. As shown in this particular case, the 2^nd post cursor is degraded slightly by de-emphasis, while this behavior does not occur with DFE. However, this same fact that de-emphasis may affect more than just the tap in question could produce much better results under different channel conditions.

Fig 10

Figure 10: Overlay of channel pulse responses comparing various equalization methods.

Figure 11 presents two additional equalization conditions. As shown on the left, when combining the “best” amount of de-emphasis, namely 1dB, with a corresponding optimized tap of DFE (to cancel the remaining 1^st post cursor ISI), the resulting eye is smaller than that achieved by applying DFE alone (see right side of Figure 9). This is because the de-emphasis unnecessarily reduces the signal amplitude and the DFE offers no gain to compensate for that reduction.

The eye diagram on the right of Figure 11, corresponding to 3dB of de-emphasis, is also interesting. Recalling from the pulse responses of Figure 10 that even though 3dB of de-emphasis would almost perfectly zero out the 1^st post cursor, the resulting eye height remains identical to the un-equalized case (while the timing degrades by 3ps). Comparison of this eye with the original un-equalized eye in Figure 9 reveals that the ISI is, indeed, reduced by the de-emphasis, but the overall signal amplitude is reduced by a similar amount (at least when all of the crosstalk and reflections are accounted for).

Fig 11

Figure 11: Simulated data eye openings at 16Gb/s – Combined de-emphasis and DFE (left) and stronger (3dB) de-emphasis (right).

Here are two final observations regarding equalization. It is worth noting that none of the equalization methods described herein improve eye width. Thus, every effort should be made to minimize crosstalk across these high-speed parallel interconnects. It is also important to understand that while additional equalization methods could be employed in this application, such are not explicitly called for by the JEDEC GDDR6 specification, and therefore are not evaluated here. Nevertheless, 1-tap of DFE, coupled with the back-drilling of PCB vias, appears sufficient to support 16Gb/s signaling.

GDDR6 Performance Measurements

As it is generally helpful to increase confidence through complimenting simulation with measured results, ATE-based characterization of Micron’s first GDDR6 offering is shared, beginning with Figure 12, which compares the measured link margin at 16Gb/s and 16.5Gb/s, based on shmooing the DRAM and tester reference voltages along with the phase of the data relative to the data clock and strobe. Green and red points distinguish between error-free operation, and the detection of errors, respectively. As shown, GDDR6’s support for the stronger 48Ω termination is expected to improve signaling margins, especially at higher speeds.

Fig 12

Figure 12: Measured link margin shmoos at 16Gb/s/pin and 16.5 Gb/s/pin for 60Ω and 48Ω line termination.

Figure 13 presents the impact of DFE from two perspectives. First, the maximum achievable data rate (x-axis), as determined by an agreed-upon degree of eye opening (height and width), is plotted against an increasing amount of DFE compensation (y-axis). There are at least two key take-aways. First, the observation that, in spite of the relatively clean tester environment, there is a clear benefit to be gained in optimizing the DFE coefficient selection, above and below which the maximum achievable data rate is degraded. And second, 16Gb/s is nearly achievable without DFE, and thus the equalization adds margin and reliability to the interface.

Fig 13

Figure 13: Measured achievable data rate shmoo (left) and corresponding link margin shmoos for three DFE settings: no equalization (bottom-right), optimal DFE (center-right) and maximum DFE (top-right).

For a deeper comparison, the right side of the figure presents three measured link margin shmoos corresponding to no equalization, optimal DFE setting, and maximum (not optimal) DFE. Interestingly, the non-optimized, maximum DFE setting does not degrade the results substantially, but the optimal setting clearly represents the best solution, in terms of symmetry and overall eye height. Figure 14 presents the impact of enabling de-emphasis. Based on these results, de-emphasis appears to provide substantial benefit over the ATE channel.

Fig 14

Figure 14: Measured link margin shmoos at 16Gb/s without and with single-tap de-emphasis enabled.

Fig 15

Figure 15: Measured 20Gb/s data eye based on a PRBS6 pattern

While the preceding results demonstrate full DRAM functionality up to as high as 16.5Gb/s, it is possible for the overall performance of an architecture to be capped by timing limitations in the memory array itself. To determine if this GDDR6 interface could extend beyond the 16.5Gb/s range, the device was placed into a mode of operation which exercises only the I/O while bypassing the memory array. The oscilloscope measurement presented in Figure15 confirms that when bypassing the memory array, and with a small, but helpful, boost in I/O supply voltage, it is possible to push Micron’s GDDR6 I/O as high as 20Gb/s.

Summary

As compute systems continue to advance, their efficacy often depends on the accessibility of memory. While some high-tier applications can absorb the high cost and complexity of HBM, the performance of GDDR DRAM continues to scale, providing a more flexible, low-risk, cost-effective alternative. Through reviewing the current state of GDDR5X and ATE-based measurements of Micron’s first GDDR6 offering, along with known circuit and channel enhancements (namely an improved DRAM package ball out definition with looser pitch and via back drilling within the PCB), we are confident in claiming that GDDR6 data rates will extend beyond the 14Gb/s/pin target defined by JEDEC all the way to 16Gb/s/pin. As a result, GDDR6 looks to be an attractive compliment for generations to come.

This article is an edited version of a DesignCon 2018 Best Paper Award winner.
Download the full paper here

References:

[1] S. J. Bae et al., "A 60nm 6Gb/s/pin GDDR5 Graphics DRAM with Multifaceted Clocking and ISI/SSN-Reduction Techniques," ISSCC-2008, pp. 278-613.

[2] H. Y. Joo et al., "A 20nm 9Gb/s/pin 8Gb GDDR5 DRAM with an NBTI monitor, jitter reduction techniques and improved power distribution," ISSCC-2016, pp. 314-315.

[3] D. U. Lee et al., "1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV," ISSCC-2014, pp. 432-433.

[4] M. Brox et al., “An 8Gb 12Gb/s/pin GDDR5X DRAM for Cost-Effective High-Performance Applications”, ISSCC-2017, pp. 388-389.

[5] NVIDIA TITAN Xp – Design & Visualization, https://www.nvidia.com/en-us/design-visualization/products/titan-xp/, Accessed November 3, 2017

[6] T. M. Hollis, “Data Bus Inversion in High-Speed Memory Applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, no. 4, April 2009.

[7] R. Kho et al., "A 75 nm 7 Gb/s/pin 1 Gb GDDR5 Graphics Memory Device with Bandwidth Improvement Techniques," in IEEE Journal of Solid-State Circuits, pp. 120-133, Jan. 2010.

[8] A. Shiloy, “GDDR5X Standard Finalized by JEDEC: New Graphics Memory up to 14 Gbps,” January 22, 2016., https://www.anandtech.com/show/9883/gddr5x-standard-jedec-new-gpu-memory-14-gbps, Accessed November 3, 2017.

[9] B. K. Casper, M. Haycock and R. Mooney, "An accurate and efficient analysis method for multi-Gb/s chip-to-chip signaling schemes," 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302), Honolulu, HI, USA, 2002, pp. 54-57.

16Gb/s and Beyond with Single-Ended I/O in High-Performance Graphics Memory

Report Abusive Comment