Artigo Acesso aberto Revisado por pares

Low-Complexity Multi-size Cyclic-Shifter for QC-LDPC Codes

2016; Electronics and Telecommunications Research Institute; Volume: 39; Issue: 3 Linguagem: Inglês

10.4218/etrij.17.0116.0341

ISSN

2233-7326

Autores

Hyeong‐Ju Kang, Byung‐Do Yang,

Tópico(s)

Cooperative Communication and Network Coding

Resumo

ETRI JournalVolume 39, Issue 3 p. 319-325 ArticleFree Access Low-Complexity Multi-size Cyclic-Shifter for QC-LDPC Codes Hyeong-Ju Kang, Hyeong-Ju Kang hjkang@koreatech.ac.kr Search for more papers by this authorByung-Do Yang, Corresponding Author Byung-Do Yang bdyang@cbnu.ac.kr Corresponding Authorbdyang@cbnu.ac.krSearch for more papers by this author Hyeong-Ju Kang, Hyeong-Ju Kang hjkang@koreatech.ac.kr Search for more papers by this authorByung-Do Yang, Corresponding Author Byung-Do Yang bdyang@cbnu.ac.kr Corresponding Authorbdyang@cbnu.ac.krSearch for more papers by this author First published: 01 June 2017 https://doi.org/10.4218/etrij.17.0116.0341Citations: 4 Hyeong-Ju Kang (hjkang@koreatech.ac.kr) is with the School of Computer Science and Engineering, Korea University of Technology and Education, Cheonan, Rep. of Korea. Byung-Do Yang (corresponding author, bdyang@cbnu.ac.kr) is with the Department of Electronics Engineering, Chungbuk National University, Cheongju, Rep. of Korea. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012R1A1A1011221). AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The decoding process of a quasi-cyclic low-density parity check code requires a unique type of rotator. These rotators, called multi-size cyclic-shifters (MSCSs), rotate input data with various sizes, where the size is the amount of data to be rotated. This paper proposes a low-complexity MSCS structure for the case when the sizes have a nontrivial common divisor. By combining the strong points of two previous structures, the proposed structure achieves the smallest area. The experimental results show that the area reduction was more than 14.7% when the proposed structure was applied to IEEE 802.16e as an example. I. Introduction Low-density parity-check (LDPC) codes have been adopted in many communication standards since their rediscovery [1], [2]. Among the various kinds of LDPC codes, quasi-cyclic LDPC (QC-LDPC) codes are widely used because of their easy encoder/decoder implementation. The decoding process of a QC-LDPC code requires rotation operations because of its cyclic characteristic. Unlike an ordinary rotator, however, the rotator in the QC-LDPC decoder should support various sizes, where the size is the amount of data to be rotated. A rotator with varying sizes is called a multi-size cyclic-shifter (MSCS) [3]. An MSCS block occupies just tens of thousands of gates, but a highly parallel QC-LDPC decoder usually includes around a dozen MSCS blocks [4], [5]. The area of an MSCS block, therefore, affects the total area of a decoder significantly, making area reduction a main subject of research on MSCS structures. An MSCS block can be simply implemented by assigning a multiple-input multiplexer (MUX) to each output, but this structure requires wide-input MUXs. To overcome the overhead incurred by wide-input MUXs, a pre-rotator structure has been proposed, where small pre-rotators reduce the input width of the MUXs substantially [3]. Another kind of MSCS structure is based on normal barrel rotators. To support various sizes, two barrel rotators are placed in parallel [4], [6] or series [7]. The other kind is based on a Benes network, which can route any permutation [8], [9]. A Benes network is efficient for permutation but inappropriate for optimization with rotation properties. Some previous structures were optimized using the characteristic of the required rotation sizes. In some QC-LDPC codes used in communication standards, the rotation sizes are multiples of a common divisor. In this paper, this characteristic is called the size distribution property. For example, the QC-LDPC codes in IEEE 802.16e and IEEE 802.22 require rotation operations using sizes that are multiples of 4 [10], [11]. This property has been exploited by the pre-rotator structure and the rotator-in-series structure [3], [7]. In this paper, we propose a new low-complexity MSCS structure that is optimized using the size distribution property. The proposed structure combines the structures of [3] and [9]: the input data are processed by a pre-rotator as in [3] and then the remaining shifts are performed by a few small Benes networks rather than a MUX-network. With this combination, the proposed structure can take advantage of both structures, that is, the efficiency of a Benes network and pre-rotation optimization. This paper is organized as follows. Section II describes the MSCS operation and previous MSCS structures. The proposed structure is presented in Section III, and the experimental results are compared in Section VI. Section V draws the conclusion. II. Multi-size Cyclic Shifter An MSCS block rotates a specified amount of data. If N w-bit data are input with a size z and shift amount s, the first z input data are rotated by s to produce the output . The output data beyond the specified size z are redundant. The sizes and shift amounts are determined by the adopted QC-LDPC codes. One of the simplest ways to implement an MSCS block is to use a MUX-network. Because an output datum can be selected from N input data, an N-to-1 (w-bit) MUX is used to connect N inputs to an output. This structure is simple but requires many wide-input MUXs, which are usually implemented by a large number of 2-to-1 MUXs. To improve the MUX-network structure, a pre-rotator can be introduced that exploits the size distribution property, as shown in Fig. 1(a) [3]. In the pre-rotator structure, the input data are rotated by pre-rotators for which the amount of input data is the greatest common divisor (GCD) of the rotation sizes, and then the outputs of the pre-rotators are processed by a MUX-network. This structure reduces the input width of each MUX substantially, but the MUX-network still occupies a significant area. If the GCD is g and , the structure consists of Ng g-input pre-rotators and NNg-to-1 MUXs. The equivalent number of 2-to-1 MUXs is , where the first term is for the pre-rotators and the second term is for the MUX-network. The delay is , where TMUX is the delay of a 2-to-1 MUX. Another class of MSCS structures is implemented with two barrel rotators. The two barrel rotators can be placed in parallel or in series, as shown in Figs. 1(b) and (c). In the rotator-in-parallel structure, the two barrel rotators rotate the input data to the left (the original direction) by s and to the right (the opposite direction) by , respectively [6]. The results of the barrel rotators are then combined by MUXs to become the output data. In the rotator-in-series structure, the first barrel rotator rotates the input data by s, and the second one rotates the result of the first one in the same direction by [7]. The output data are also selected from the barrel rotator outputs. Figure 1.Open in figure viewerPowerPoint Previous structures: (a) pre-rotator and MUX-network structure [3], (b) rotator-in-parallel structure [6], (c) rotator-in-series structure [7], and (d) Benes-network structure [9]. The rotator-based structures have been optimized by the size distribution property or a redundant MUX elimination. In the rotator-in-series structure, if the GCD of the required sizes is a power of two, that is, 2m, the shift amount of the second barrel rotator is always a multiple of 2m. This property eliminates m MUX stages of the second barrel rotator. With this elimination, the number of MUXs is , where the last term in the parenthesis is for the last selection MUX. In the rotator-in-parallel structure, the data rotated around at each MUX stage of the rotators are not selected at the last selection stage [6]. The MUXs for the redundant data are not necessary and can be removed. With this elimination, the number of MUXs is , where the subtraction term is due to the elimination. The other class of MSCS structure is based on the Benes network, as shown in Fig. 1(d). The Benes network can perform any permutation on the input data with switches when N is a power of two [8]. A switch consists of two 2-input MUXs, operating in the BAR or CROSS state, as shown in Fig. 2(a). In [9], the authors proposed a control signal generation scheme for the MSCS operation and a Benes network with switches or switches for the case when or . The switch proposed in [9] consists of three switches, as shown in Fig. 2(b). Figure 2.Open in figure viewerPowerPoint Benes network switches [9]: (a) switch and (b) switch. The Benes network for consists of one stage of switches at the center and n stages of switches at the front and rear, respectively. An example for is shown in Fig. 3. In these cases, the number of MUXs is and the delay is . In the two equations, the first term in the parentheses is for the n stages of the switches in the front and rear, and the second term is for the center stage of the switches. Figure 3.Open in figure viewerPowerPoint Benes network when [9]. The critical path of the switch proposed in [9] includes three 2-to-1 MUXs. A faster switch was proposed in [12], which has two 2-to-1 MUXs in the critical path. With these switches, the total delay can be reduced to for . The previous structures are compared in Table 1 for and GCD . In the first column, MN, RIP, RIS, BN, and F3 are the MUX-network structure of [3], the rotator-in-parallel structure of [6], the rotator-in-series structure of [7], the Benes-network structure of [9], and the Benes-network structure with the fast switch of [12], respectively. The number of MUXs, shown in the second column, is normalized by Nw, and the delay in the third column is normalized by TMUX. When m is small, the BN structure shows the least number of MUXs, but other structures are better with large m. This is because the BN structure itself is efficient but cannot exploit the size distribution property. Table 1. Number of MUXs and delay for and . Structures Number of MUXs (× Nw) Delay (× TMUX) MN () () RIP () () RIS () () BN () () F3 () () FC () () III. Proposed Fine–Coarse Structure In this section, we propose an MSCS structure based on the Benes network, which adopts pre-rotators to exploit the size distribution property. The proposed structure replaces the MUX-network in Fig. 1(a) with the Benes networks in Fig. 1(d). This combination compensates for the weakness of the two previous structures. The overhead of the MUX-network is reduced by the adoption of the Benes network, and the size distribution property, which has not been exploited by a Benes-network-based structure, can be used by the pre-rotation scheme. The proposed structure is called the fine–coarse (FC) structure because the coarse rotation of the Benes networks follows the fine rotation of the pre-rotators. The FC structure consists of Ng g-input pre-rotators and g Ng-input Benes networks, as shown in Fig. 4. A rotation scheme with pre-rotators was introduced in [3], but the scheme was described for MUX-network control without a detailed mathematical analysis. In this paper, the scheme is modified for rotation control of the Benes network and a mathematical analysis is presented in the appendix. Figure 4.Open in figure viewerPowerPoint Proposed FC structure. In the FC structure, the input data are rotated in two steps. The rotation amount can be described by , where . The rotation by sp is performed by the pre-rotators, and the rotation by sB is done by the Benes networks. At the first step, each set of g input data are grouped and rotated by sp within the group. The kth output datum of each pre-rotator is then gathered by the kth Benes network. In the Benes network, the gathered data are rotated by sB with a rotation size of . Because a group has g data, the data are actually rotated by sBg. As a result, a rotation by s is performed. With this scheme, however, some data are rotated by . This insufficient rotation is caused by the inside-group rotation at the pre-rotation step. At this step, some data should be moved over the group boundary, but they are not. To address this problem, the kth Benes network rotates the gathered data by when . The rotation process of the FC structure is illustrated in Fig. 5, where , , , and with and . In the figure, a small circle represents a datum, and the number in a circle is the index of the corresponding datum. There are four pre-rotators, each of which groups four input data and rotates them by within the group. The kth Benes network with then rotates the kth output data of the pre-rotators by because , but the third Benes network rotates the input data by because . The figure shows that the rotation by is performed well. The ith output data with are redundant and indicated by question marks in the figure. Figure 5.Open in figure viewerPowerPoint Rotation example when , , , and . The FC structure reduces the number of MUXs by reducing the number of MUX stages. In a Benes network, two stages are removed as N is decreased by half. In the FC structure, the amount of input data of a Benes network is N/g, so the number of MUX stages is fewer by about 2log2 g than in a Benes network with N input data. Because the number of MUX stages in the pre-rotators is ⌈log2 g⌉, the resulting reduction is about log2 g stages for the MUXs. As an example for a detailed analysis, let us assume that and . The number of MUXs of the BN structure is , as shown in Table 1. In the FC structure, a pre-rotator occupies , so the number of MUXs for Ng pre-rotators is . A Benes networks includes , so the number of MUXs for g Benes networks is . The total number of MUXs is . Compared to the BN structure, mNw MUXs are reduced in the new structure, as shown at the last row of Table 1. Along with the reduction in the number of MUXs, the FC structure reduces the delay time. In the above example, the delay time of the pre-rotator is mTMUX, and that of a Benes network is with the fast switch in [12]. The total delay time is , which is less than that of the F3 structure by mTMUX. For comparison in a real application, Table 2 shows the number of MUXs and the delay of the FC and previous structures when they are applied to IEEE 802.16e. The terms in the first column follow those of Table 1. Tables 1 and 2 show that the FC structure is the smallest one with a comparable delay. Table 2. Comparison for IEEE 802.16e. Structures Number of MUXs Delay MN 7 TMUX RIP 8 TMUX RIS 13 TMUX BN 13 TMUX F3 12 TMUX FC 10 TMUX IV. Experimental Results To compare the synthesis results, the MSCS structures were implemented in the register transfer level for IEEE 802.16e and synthesized by the Cadence RTLCompiler with a 0.25 μm process library. The constraints were specified so that the synthesized circuits operate as fast as possible with 2-input MUXs preserved in the datapath. The data bit width w was assumed to be 8. The synthesis results are shown in Table 3, with the same terms in the first column as those of Table 1. The second column of the table shows the area results converted to the number of equivalent 2-input NAND gates. The delay of each entire circuit is presented in the third column with the delay of the datapath part shown in the fourth column. The delay time is measured in picoseconds. Table 3. Synthesis results. Structures Gate counts Delay (ps) Datapath delay (ps) MN 57,990 2,615 1,946 RIP 28,884 2,403 2,013 RIS 33,715 3,134 2,722 BN 33,039 4,236 2,788 F3 31,702 3,623 2,720 FC 24,627 3,009 2,521 Within the previous structures, the RIP structure is smaller and faster than the Benes-network-based structures (BN and F3). In Tables 1 and 2, the Benes-network-based structures require fewer MUXs, but the RIP structure shows less area because it uses simpler control signals. The control signals required for the rotator-based structures are just shift or size values such as s or . In contrast, the Benes-network-based structures transfer the shift and size values into the switch control signals, increasing the delay time and circuit complexity. The fast switch in [12] compensates for this inferiority to some degree, but the compensation is not sufficient. The proposed FC structure exceeds the limit of the Benes-network-based structures, achieving the smallest area and comparable delay time. Compared to the smallest previous structure, the area is reduced by around 14.7%. This reduction is somewhat less than the expectation from Table 2, but still significant. Within the Benes-network-based structures, the area reduction is 25.5% from BN and 22.3% from F3. In terms of delay, the FC structure is not the fastest one, but is located about in the middle of the structures, faster than three previous structures and slower than two. If we do not consider the MN structure because of its much larger area, the FC structure is the second fastest one. V. Conclusion This Paper proposed a low-complexity MSCS structure. By combining previous structures, the proposed FC structure achieves the smallest area. The area reduction is around 14.7% compared with the smallest conventional structure. The delay of the FC structure is comparable with other structures. Because a QC-LDPC decoder requires many MSCS blocks, the area reduction can contribute significantly to reducing the total decoder area. The experimental results are shown for IEEE 802.16e, but the FC structure can be applied to any QC-LDPC decoder when the required rotation sizes have a common divisor. Appendix This appendix describes the process mentioned in Section III mathematically and shows its correctness. Let us define input data di and output data oi, where . Rotation by s with size z can be defined as follows: (1) Output oi are not defined for . If dj,k is the kth datum of the jth group after input data grouping, then , where and . The output data of the pre-rotation step ej,k can be written as follows: (2) For a value of k, the data ej,k are gathered by the kth Benes network and rotated by sB or to be fj,k. The kth Benes network with (the first case of (2)) performs the rotation by sB as follows: (3) The kth Benes network with (the second case of (2)) performs the rotation by as follows: (4) For output data , where and . With the above equations, we can check whether (1) is satisfied. Four cases arise from the values of j and k in , and one case is analyzed in this paper because of its length. If and , which belongs to the second case of (1). With this j and k, (5) Equation (5) satisfies the second case of (1). Biographies Hyeong-Ju Kang received his BS, MS, and PhD degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 1998, 2000, and 2005, respectively. From 2005 to 2006, he was with Magnachip Semiconductor, where he participated in the development of a smart card controller. From 2006 to 2009, he worked for GCT Semiconductor to develop baseband modem chips for several digital mobile TV standards. Since 2009, he has been at the School of Computer Science and Engineering at the Korea University of Technology and Education, Cheonan, Rep. of Korea, where he is now an associate professor. His current research interest includes communication modems, embedded processors, and deep learning acceleration circuits. Byung-Do Yang received his BS, MS, and PhD degrees in electrical engineering and computer science from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 1999, 2001, and 2005, respectively. He was a senior engineer at the Memory Division, Samsung Electronics, Suwon, Rep. of Korea, in 2005, where he was involved in the design of DRAM. Since 2006, he has been at Chungbuk National University, Cheongju, Rep. of Korea, where he is now a professor. His research interests are analog circuits, digital circuits, memory circuits, and power IC designs. References 1D.J.C. MacKay and R.M. Neal, "Near Shannon Limit Performance of Low Density Parity Check Codes," Electron. Lett., vol. 32, no. 18, Aug. 1996, pp. 1645– 1646. 2D.J.C. Mackay, "Good Error Correcting Codes Based on Very Sparse Matrices," IEEE Trans. Inform. Theory, vol. 45, no. 2, Mar. 1999, pp. 399– 431. 3M. Rovini, G. Gentile, and L. Fanucci, "Multi-size Circular Shifting Networks for Decoders of Structured LDPC Codes," Electron. Lett., vol. 43, no. 17, Aug. 2007, pp. 938– 940. 4X. Peng et al., "A 115 mW 1 Gbps QC-LDPC Decoder ASIC for WiMAX in 65 nm CMOS," IEEE Asian Solid-State Circuits Conf., Jeju, Rep. of Korea, Nov. 14–16, 2011, pp. 317– 320. 5Y. Cui et al., "Ultra Low Power QC-LDPC Decoder with High Parallelism," Int. SOC Conf., Taipei, Taiwan, Sept. 26–28, 2011, pp. 142– 145. 6X. Chen, S. Lin, and V. Akella, "QSN–a Simple Circular-Shift Network for Reconfigurable Quasi-Cyclic LDPC Decoders," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 10, Oct. 2010, pp. 782– 786. 7B. Xiang et al., "An 847–955 Mb/s 342–397 mW Dual-Path Fully-Overlapped QC-LDPC Decoders for WiMAX System in 0.13 μm CMOS," IEEE J. Solid-State Circuits, vol. 46, no. 6, June 2011, pp. 1416– 1432. 8V.E. Benes, "Optimal Rearrangeable Multistage Connecting Networks," Bell Syst. Tech. J., vol. 43, no. 4, July 1964, pp. 1641– 1656. 9D. Oh and K.K. Parhi, "Low-Complexity Switch Network for Reconfigurable LDPC Decoders," IEEE Trans. Very Large Scale Integr. Syst., vol. 18, no. 1, Jan. 2010, pp. 85– 94. 10IEEE Std. 802.16, IEEE Standard for Air Interface for Broadband Wireless Access Systems, 2012. 11IEEE Std. 802.22, IEEE Standard for Information Technology–Local and Metropolitan Area Networks–Specific Requirements–Part 22: Cognitive Wireless RAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Policies and Procedures for Operation in the TV Bands, 2011. 12H.-J. Kang, "Multi-size Circular Shifter Based on Benes Network with High-Speed 3 × 3 Switch," J. Korea Inst. Inform. Commun. Eng., vol. 19, no. 11, Nov. 2015, pp. 2637– 2642. Citing Literature Volume39, Issue3June 2017Pages 319-325 FiguresReferencesRelatedInformation

Referência(s)
Altmetric
PlumX