Channel operating margin (COM) is a well-documented IEEE standard that has been used successfully since 2014 in the design of channels and specification of interconnect. The goal of this article is not to explain what COM is, or how it works (see the reference section for this information). Rather, this article reflects on the COM origin story as recalled by Rich Mellitz, who was in the room when the need for something like COM was realized, and who was one of the chief architects of the spec. The following aims to capture the story of the creation of the COM standard in the words of Mellitz, as well as its evolution and where it might be headed next.

Some would argue that it all began in the 1990s with the needs of PCI Express®, InfiniBand™, 10 Gbps Ethernet,1 and when semiconductor companies had to specify electrical channels. These electrical channels use separate differential pairs for transmit and receive. Here, the term “line” is intended to mean one transmit to receive differential pair. 

Before getting too far into the story of COM, it might be best to also define what is meant by data rate. IEEE specifies the delivered data rate for a MAC. For example, 10 Gbps Ethernet1 was really 10 Gbps Ethernet on four pairs of twinaxial cabling. In other words, 2.5 Gbps per line. However, since the data was 8B10B NRZ encoded, the actual line rate was 3.125 Gbps. Thus, the Nyquist rate is 1.5625 GHz. This is different from PCIe, OIF, and InfiniBand, where the data rate is actual symbol transfer rate per line. For convenience, one can simply refer to 25 Gbps, 50 Gbps, 100 Gbps, 200 Gbps, and 400 Gbps per line without detailing the actual data rate load for encoding.

Identifying the Problem (1990-2010)

In the 1990s, copper electrical bus rates were mostly under a gigahertz. Losses and crosstalk below a gigahertz are generally considered “well behaved” because the electrical wavelengths are on the order of PCB design sizes. At this time, it was sufficient for many semiconductor manufacturers to have rudimentary channel requirements based on characteristics described as simple functions of frequency. Eye diagrams emerged for compliance testing, augmenting the typical test method of the time: set up and hold timing verification.  

Around 2002, IEEE’s Ethernet broke the 10 Gbps barrier, which led to other 10 Gbps projects, such as IEEE Std 802.3ap-20072, where 10 Gbps per line interconnect channels were defined for a backplane and data center twinaxial cabling. The focus for the 10 Gbps copper backplane and cable project was frequency domain (FD) limit masks to support a 1-m backplane reach objective. Although this was sufficient for interconnect designers of the time, unfortunately, the interaction between these masks and transceiver specifications was somewhat weak. 

In 2010, the IEEE project IEEE Std 802.3ba-20103 extended inter-box cabling to 7 m of electrical cable using the same 10 Gbps FD masks for electrical channel compliance. 2012 showed a push for 25 Gbps per line as the IEEE 100 Gbps Backplane and Copper Study Group4 kicked off. Electrical lengths of concern shrank to about an inch as a result of the 25 Gbps per line signaling. This broke the FD mask paradigm because in order to make channel compliance work, too much guard band would be needed. Essentially, there was no easy way to budget between insertion loss, crosstalk, return loss, and transceiver capability. This need paved the way for COM. 

Very quickly, it was discovered that relying on maximum insertion loss was not sufficient. It was also realized that insertion loss curves near 13 GHz were not smooth. The aberrations around a fitted, smooth, insertion loss curve were called insertion loss deviation (ILD). More ILD meant less margin. What caused this was that via/connector/package geometries and the spacing between them were approaching the critical electrical lengths. This resulted in reflection starting at 5 GHz. It was known that more reflection caused more ripple in the insertion loss curve, and semiconductor manufacturers indicated this would result in lower performance. The frequencies of interconnect impairments also spawned conversation contrasting NRZ and PAM4. Although NRZ dominated 25 Gbps designs, the 50 Gbps line rate favored PAM4. 

Crosstalk was another issue addressed during the 10 Gbps per line project.2 Crosstalk was converted to a single RMS voltage, called integrated crosstalk noise (ICN), which is computed with the normalized integration of the power sum of all frequencies in the crosstalk responses. (Recall Parseval’s theorem, which states that total power in the time domain is the same as total power in the FD.) In addition, insertion loss to crosstalk ratio (ICR) was borrowed from J. Salz’s work,5 supporting the notion of a budget between crosstalk and insertion loss.

At around the same time, some people were having discussions about how to determine a maximum channel capability based upon the Salz limit.5 This tactic had been used for the higher power, lower radix “BaseT” standards. The assumption is that transceivers have at their disposal unlimited DFE and FFE. Data center switch and network cards require orders of magnitude less power per line and have an order of magnitude higher radix and density. The Salz limit was interesting, but required too much power for the backplane application. So, the industry ended up focusing on ILD and ICR, as these were the things that were important for physical design.

In 2010, there was still no standard method or simulation to evaluate performance. Specifically, there was a lack of signal integrity simulation standardization. The result was that standards development was relegated to what could be called the “ouch test.”  The interconnect designers would create BGA ball to BGA ball models called channels, and transceiver vendors would say “ouch” when the channel was too tough or not working in a lab experiment. For standards development, deciding on channel and transceiver parameters was kind of like playing poker. Unfortunately, at the time, there was a significant disconnect between physical design and what the simulations could provide. 

During this time, interconnect designers seemed happy using insertion loss, return loss, crosstalk, and ICR curves, gaining apparent performance by minimizing ICN and ILD for design. Unfortunately, the FD bounds, while good for interconnect designers, were of limited use for transceiver designers. Consider that the 10 Gbps backplane ILD mask was reasonable for the physical design of data center switches and servers. The original expectation was that five DFE taps would handle the data center designs like IBM’s Blade Center. The disconnect was that the actual designs requited up to 50 DFE taps. Moving to 25 Gbps per lane (25G), it was realized that a linkage was needed between the physical channel design and transceiver or SerDes design. The two spoke different languages. This growing need for a “Rosetta Stone” paved the way for something like COM.

COM Evolution (2011-Present)

Interconnect designers require a budget that includes insertion loss, crosstalk, and reflections. However, consideration of SerDes needs must also be part of this budget. Around 2010-2011, the group was working on projects for 25G and started to experiment with post processed FD metrics graduating to including a “dibit” time domain response suggested by Charles Moore.6 The method was mostly based on power losses, but did not have direct linkage to the time sampled SerDes. This opened the door to time domain. 

Early in the 25G project, the group started examining the channel pulse response. A data stream is made up of a pulse response convolved with a symbol stream. A pulse response was recognized as perhaps the lynchpin that would connect the SerDes designer and interconnect designer. Many published works suggested that a SerDes architect could translate pulse responses into design capability. Anecdotally, interconnect designers can see direct effects of features that resulted from loss, reflection, and crosstalk.   

Prior to the COM proposal, there was a lot of angst about converting S-parameter measurements made in the FD into a pulse response in the time domain. Determining a pulse response is somewhat easier if a transmitter filter, receiver filter, and a pulse response filter are applied before converting the S-parameter into a pulse response using an FFT.

At that time, SiSoft (now part of MathWorks) had a proprietary way to create a pulse response from FD S-parameters, and SiSoft employees were active in the IEEE meetings. Walter Katz (SiSoft) favorably correlated pulse responses, which they compared to the pulse responses for a filtered FFT method were considered for COM.7 This is when things started to get interesting. The turning point was moving discussions to pulse response analysis. 

Pulse responses sampled at one symbol interval correlate to one unit interval (UI) spaced samples in a data stream waveform (because of linear time invariance and convolution). For these purposes, UI corresponded to the time between symbol samples. The RMS of the data waveform sampled at one UI represents voltage average power. The same voltage average power could be determined by taking the root of sum of the squares (RSS) of the samples in the pulse (as long as the data was somewhat random). An inter-symbol interference (ISI) noise vector was created by not including the sample at the pulse peak. Since crosstalk is all noise, the entire sampled crosstalk pulse response was used as noise. There was now a way to combine crosstalk with reflections, and then compare them to pulse peak (which would be proportional to insertion loss).

Next, the group began to discuss cursors, which refer to samples of the victim response space at one UI. The peak sample index is cursor 0. Samples before would be negative cursors and samples after would be positive cursors.

The industry needed to move to the statistical domain. The RSS for samples of a pulse response is ISI. It corresponds to the RMS of respective sampled noise of the random data response. RMS noise can be considered a normal or Gaussian distribution. Enter the statistics of noise. The group talked about voltage of noise at certain probability, such as a probability of 1e-12 corresponds to ± 7 sigma where sigma is the RMS. Much discussion ensued about whether the assumption of Gaussian noise was overly pessimistic for copper channels. 

During the same era, other standard groups addressed the issues of expected noise. Work on PCI Express Generation 1 and 2 and SAS/SATA, for example, centered around data patterns that created the worst-case ISI or noise. This concept was called peak distortion. The objective in the IEEE project was to address the ISI that corresponded to a line error rate of close to 1e-12. The worst-case ISI error rate is typically many orders of magnitude lower. Conversations started by aligning samples to the pulse peak. The group addressed actual clock and data recovery sampling much later. The sum of the magnitude of the 40 worst UI spaced samples in the pulse response would seem to correspond to probability of 1e-12. 

 What was significant here was the whole notion of doing statistical analysis with crosstalk. What are the statistics that should be used? Should the industry just use RMS values for everything? One of the discoveries during this process was that, when using statistical Gaussian noise assumptions for the noise one gets in backplanes and cables, one ends up completely over designing.8 In other words, one overpredicts the noise by quite a bit as required by a maximum bit error ratio. That did not sit well, so the group decided to use what was considered to be the “real” noise profiles that are generated. This was the point when COM could take advantage of the actual nature of electrical channels. Actual electrical crosstalk and ISI noise distributions were not independent and identically distributed (IID) random processes.

Then, a curious thing happened. People started publishing their interconnect models. The IEEE working groups became a public repository for channel models that were representative of interconnects being produced, including backplanes and cables. In the past, someone might show you a picture and graphs of their interconnect. But once 25 Gbps was reached, everyone realized it was a way to manage the standard process by using channel S-parameter models of what the industry might be doing or planning. This became even more prolific at 50 Gbps. These models are a management tool for standards development. The other half is managing transmit and receive parameters, which were embodied as COM parameter tables to be incorporated into the standard. 

COM was proposed in 2012 for a channel compliance method9, which included the IID nature of interconnect and minimum transceiver capability. Transceiver capability is embodied in the tables within the standard. COM is a documented algorithm in IEEE802.3 and it is NRZ and PAM4 capable. An evolutionary MATLAB example script was used throughout all the projects which used COM (see Figure 1). Although not a standard compliance, the script proved useful to move the wave of standards development.

SIJ-1M30F1x700.jpgFigure 1. Implementation of COM.

Parameters are represented in a spreadsheet which the MATLAB script uses to statistically evaluate electrical S-parameter channel models using an algorithm procedure described in Annex 93A and presently for 200 Gbps Annex 178A of the IEEE 802.3 Ethernet standards. Moving forward, there are plans to incorporate the MATLAB COM script and associated spreadsheets in an IEEE SA Open Source under an IEEE802.3 umbrella, which will lead to a new COM evolution.  

Since its inception as part of the 802.3bj project, COM has undergone many revisions based upon industry needs and changing market potential. It has been adopted for other IEEE projects such as IEEE Std 802.3bm-2015,10 IEEE Std 802.3by-2016,11 IEEE Std 802.3bs-2017,12 IEEE Std 802.3cd-2018,13 IEEE Std 802.3ck-2022,14 IEEE Std 802.3df-2024,15 and IEEE P802.3dj.16 In addition, COM has been borrowed for OIF and InfiniBand standards, which dovetail with IEEE standards.

REFERENCES 

  1. IEEE P802.3ak 10GBASE-CX4, https://www.ieee802.org/3/ak/index.html.
  2. IEEE Std 802.3ap-2007, IEEE P802.3ap Backplane Ethernet, https://www.ieee802.org/3/ap/index.html.
  3. IEEE Std 802.3ba-2010, IEEE P802.3ba 40Gb/s and 100Gb/s Ethernet, https://www.ieee802.org/3/ba/index.html.
  4. IEEE Std 802.3bj-2014,100 Gb/s Backplane and Copper Cable, https://www.ieee802.org/3/bj/index.html.
  5. J. Salz, “Digital transmission over cross-coupled linear channels,” Technical Journal, July–August 1985, 64 (6), pp.1147–59. Bibcode:1985ATTTJ..64.1147S. doi:10.1002/j.1538-7305.1985.tb00269.x. S2CID 10769003.
  6. C. Moore and A. Healey, “A Method for Evaluating Channels,” IEEE802.3 100 Gb/s Backplane Copper Study Group, Singapore, March 2011.
  7. R. Mellitz, A. Ran, W. Bliss, W. Katz, and P. Patel, “Consensus Building Group Report Channel Analysis Method for 802.3bj Qualification and Specification,” 100 Gb/s Backplane and Copper Cable Task Force, May 2012, Interim Meeting, Minneapolis, Minn., https://www.ieee802.org/3/bj/public/may12/diminico_02a_0512.pdf.
  8. A. Ran and R. Mellitz, “Analysis of Contributed Channels using the COM Method,” 100 Gb/s Backplane and Copper Cable Task Force, July 2012, San Diego Calif., https://www.ieee802.org/3/bj/public/jul12/ran_01a_0712.pdf.
  9. R. Mellitz, C. Moore, M. Dudek, M. P. Li, and A. Ran, “Time-Domain Channel Specification: Proposal for Backplane Channel Characteristic Sections,” 100 Gb/s Backplane and Copper Cable Task Force  Plenary, July 2012, San Diego Calif., https://www.ieee802.org/3/bj/public/jul12/mellitz_01_0712.pdf.
  10. IEEE Std 802.3bm-2015, 40 Gb/s and 100 Gb/s Fibre Optic, https://www.ieee802.org/3/bm/index.html.
  11. IEEE Std 802.3by-2016, 25 Gb/s Ethernet, https://www.ieee802.org/3/by/index.html
  12. IEEE Std 802.3bs-2017, 200 Gb/s and 400 Gb/s Ethernet, https://www.ieee802.org/3/bs/index.html
  13. IEEE Std 802.3cd-2018, 50 Gb/s, 100 Gb/s, and 200 Gb/s Ethernet, https://www.ieee802.org/3/cd/index.html
  14. IEEE Std 802.3ck-2022, 100 Gb/s, 200 Gb/s, and 400 Gb/s Electrical Interfaces, https://www.ieee802.org/3/ck/index.html
  15. IEEE Std 802.3df-2024, 400 Gb/s and 800 Gb/s Ethernet, https://www.ieee802.org/3/df/index.html
  16. IEEE P802.3dj IEEE P802.3dj 200 Gb/s, 400 Gb/s, 800 Gb/s, and 1.6 Tb/s Ethernet Task Force, https://www.ieee802.org/3/dj/index.html.

Richard Mellitz is presently a Distinguished Engineer at Samtec, supporting interconnect signal integrity and industry standards. Richard has been a key contributor to IEEE802.3 electrical standards for many years. He led efforts to develop radically new IEEE and OIF time domain specification methods called COM and Effective Return Loss. Early in his career, he founded and chaired an IPC committee authoring the industry’s first TDR standard. Richard holds many patents in interconnect, signal integrity, design, and test. Richard received the IEEE Standards Association Medallion and the Intel Achievement Award for spearheading the industry’s first graduate signal integrity programs at the University of South Carolina. Richard was also honored with the DesignCon 2022 Engineer of the Year Award.