Efficient and scalable cross‐by‐pass‐mesh topology for networks‐on‐chip

Artigo

Produção Nacional Revisado por pares

Efficient and scalable cross‐by‐pass‐mesh topology for networks‐on‐chip

2017; Institution of Engineering and Technology; Volume: 11; Issue: 4 Linguagem: Inglês

10.1049/iet-cdt.2016.0184

ISSN

1751-861X

Autores

Usman Ali Gulzari, Sheraz Anjum, Shahrukh Aghaa, Sarzamin Khan, Frank Sill Torres,

Tópico(s)

Parallel Computing and Optimization Techniques

Resumo

IET Computers & Digital TechniquesVolume 11, Issue 4 p. 140-148 Research ArticleFree Access Efficient and scalable cross-by-pass-mesh topology for networks-on-chip Usman Ali Gulzari, Corresponding Author Usman Ali Gulzari usmangulzari2000@yahoo.com Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, Pakistan Department of Electrical Engineering, The Univesity of Lahore, Islambad Campus, PakistanSearch for more papers by this authorSheraz Anjum, Sheraz Anjum Department of Computer Science, COMSATS Institute of Information Technology, WahCantt, PakistanSearch for more papers by this authorShahrukh Aghaa, Shahrukh Aghaa Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, PakistanSearch for more papers by this authorSarzamin Khan, Sarzamin Khan Department of Electrical Engineering, COMSATS Institute of Information Technology, WahCantt, PakistanSearch for more papers by this authorFrank Sill Torres, Frank Sill Torres Department of Electronic Engineering, Federal University of Minas Gerais, Belo Horizonte, BrazilSearch for more papers by this author Usman Ali Gulzari, Corresponding Author Usman Ali Gulzari usmangulzari2000@yahoo.com Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, Pakistan Department of Electrical Engineering, The Univesity of Lahore, Islambad Campus, PakistanSearch for more papers by this authorSheraz Anjum, Sheraz Anjum Department of Computer Science, COMSATS Institute of Information Technology, WahCantt, PakistanSearch for more papers by this authorShahrukh Aghaa, Shahrukh Aghaa Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, PakistanSearch for more papers by this authorSarzamin Khan, Sarzamin Khan Department of Electrical Engineering, COMSATS Institute of Information Technology, WahCantt, PakistanSearch for more papers by this authorFrank Sill Torres, Frank Sill Torres Department of Electronic Engineering, Federal University of Minas Gerais, Belo Horizonte, BrazilSearch for more papers by this author First published: 22 May 2017 https://doi.org/10.1049/iet-cdt.2016.0184Citations: 11AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract This study presents an efficient and scalable networks-on-chip (NoC) topology termed as cross-by-pass-mesh (CBP-Mesh). The proposed architecture is derived from the traditional mesh topology by addition of cross-by-pass links in the network. The design and impact of adding cross-by-pass links on the topology is analysed in detail with the help of synthetic, hotspot as well as embedded traffic traces. The advantages of proposed CBP-Mesh as compared with its competitor topologies include reduction in the network diameter, increase in bisection bandwidth, reduction in average numbers of hops, improvement in symmetry and regularity of the network. The synthetic traffic traces and some real embedded system workloads are applied on the proposed CBP-Mesh and its competitor two-dimensional-based NoC topologies. The comparison of analytical results in terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high throughput and good scalability at small increase in power and energy. 1 Introduction The rising complexity of system-on-chip (SoC) designs by adding more and more processing elements (PEs) requires communication schemes that are more flexible and robust than the classic shared buses [[1]]. A promising solution for on-chip communications among PEs is the paradigm networks-on-chip (NoC), which is based on packet oriented communication [[2]]. Highlighting the characteristics of NoC are their modular structure and their concurrency of computation as well as communication [[3]]. Principal issues to be addressed in NoC are reduction of power consumption and energy utilisation at low penalties in performance, latency and throughput [[4]]. Further issues are network scalability and design complexity of routing elements [[5]]. Therefore, the choice of an appropriate topology is mandatory. The most promising and widely applied NoC topology is the so-called mesh, which profits from a regular and simple structure [[6]]. However, mesh networks suffer under poor scalability for large amount of PEs due to the great number of multi-hop links needed to provide complete reachability [[7]]. Alternative solutions are meshes with hierarchical topologies like diagonal-mesh (D-Mesh) [[8]] and flattened butterfly [[9]], which reduce the average hop count in the NoC. However, most proposed structures lead to increasing router complexity as well as higher costs in terms of power consumption and energy [[9]]. This work presents an efficient cross-by-pass-mesh (CBP-Mesh) architecture design with high scalability. The architecture of the proposed network topology is based on the basic mesh topology and adds cross-by-pass links (CBP-Links). The additions of CBP-Links provide reduced paths by over-passing in-between nodes for traverse the packets between source to destination and increase the performance of the network w.r.t. its predecessor's and competitors topologies. These additional links are more effective for approaching longer distance nodes, considerably reduce the average latency and result in higher throughput of the network. In order to demonstrate the performance of the proposed CBP-Mesh, the synthetic hotspot traffic and five different embedded application workloads were applied on proposed CBP-Mesh and selected topologies. The proposed topology design is compared with some of its predecessor's such as mesh, Torus and central-connected-mesh (C2-Mesh) topologies as well its competitors extended-mesh (XD-Mesh) and D-Mesh [[10]-[15]]. The simulation results indicate that CBP-Mesh is an efficient candidate among its selected topologies due to its less average latency and increased throughput at the cost of a slight increase in network power and energy for on-chip communication. The rest of the paper is organised as follows. Section 2 provides background and related works, while Section 3 proposes the new topology. Section 4 provides impacts of adding links to topology characteristics and Section 5 presents the comparisons with different mesh topologies. Finally, Section 6 draws the conclusions. 2 Background Mesh (see Fig. 1a) is a commonly applied topology for NoC for multicore systems [[11]]. The mesh topology offers a simple and regular network design, and thus, is a widely chosen option for NoC [[13]]. However, with increasing number of PEs, mesh networks start to suffer performance degradation. This follows mainly from the drastic increase of the network diameter, leading to higher latencies. Further, the throughput is reduced as the number of nodes increases stronger than the bisection width (see also Section 2). Hence, novel types of NoC mesh topologies have been proposed, aiming at the reduction of the network diameter and the increase of the bisection width. Hereby, the principal difference between all these new topologies is the type and the application of the interconnections (see Fig. 1). Fig. 1Open in figure viewer 3 × 3 meshes including routers (Ro) and interconnection links(a) Mesh, (b) Torus, (c) D-Mesh, (d) XD-Mesh, (e) C2-Mesh, (f) CBP-Mesh We propose in the following a classification into four basic types of links for NoC mesh topologies (see Fig. 1). The classic mesh-links (M-Links, see Fig. 1a) connect direct horizontal and vertical neighbour PEs, while Torus-links (T-Links, see Fig. 1b) interlink horizontal and vertical PEs over longer distances [[14], [15]]. Diagonal-links (D-Links, see Fig. 1c) connect direct diagonal neighbour PEs, extended-links (XD-Links) connect the opposite corners PEs of a mesh through the central node (see Fig. 1d) and cross-links (C-Links, see Fig. 1e) interlink diagonal PEs over longer distances [[13], [16]]. The Torus topology (Fig. 1b) applies T-Links and possesses a considerably lower network diameter compared with the standard mesh [[13]]. The topologies D-Mesh (Fig. 1c) confront this limitation by using additional D-Links [[12]]. However, D-Mesh topology requires higher degree routers, resulting in considerably increased costs in terms of power consumption [[14]]. In detail, D-Mesh applies M- and D-Links [[14]]. Thereby, the number of D-Links is higher than in mesh and Torus. It can be concluded that D-Mesh is a complex and costly network topology [[11]]. XD-Mesh (Fig. 1d) and C2-Mesh (Fig. 1e) have been proposed to reduce complexity and costs at constant high scalability [[13], [16]]. In detail, XD-Mesh applies XD-Links between corner nodes through the centre node, while C2-Mesh adds a single link for each 3 × 3 Mesh-network connection between two opposite nodes [[13]]. It should be noted that XD-Mesh and C2-Mesh offer simplicity and low cost, but having a lower performance in comparison to D-Mesh [[16]]. 3 CBP-Mesh This section describes the proposed CBP-Mesh motivation of the design architecture and details the placement of the CBP-Links their features. 3.1 Motivation To increase the performance of the mesh network, the worst case scenario of hop count for traversing a packet from source to its destination node should be addressed. The worst cases of hop count in 3 × 3 mesh network include the opposite corner nodes of Ro0,0 ↔ Ro2,2 and Ro0,2 ↔ Ro2,0 (see Fig. 1a), which is four hops. The T-Links on Torus network cover this distance in two hops by hoping the corner nodes. By using D-Links, XD-Mesh and D-Mesh topologies take two hops through the central node to reach its opposite corner node. C2-Mesh uses one extra C-Link from a 3 × 3 mesh network, that reduces the hop count between two opposite corner nodes Ro0,0 ↔ Ro2,2 in Fig. 1e. However, the other opposite corner nodes side of Ro0,2 ↔ Ro2,0 in Fig. 1e is ignored. The proposed CBP-Mesh design adds the two CBP-Links to a mesh network, placed between both pairs of opposite corner nodes of Ro0,0 ↔ Ro2,2 and Ro0,2 ↔ Ro2,0 to reduce the hop and connect directly (see Fig. 1f for details). Introduction of the CBP-Links minimise distance of two hops as in the case of Torus, XD-Mesh and D-Mesh networks to single hop. Addressing the worst case scenario of opposite corners in CBP-Mesh network resulted in higher performance of the network. The proposed CBP-Mesh is a scalable topology with its basic building block of the CBP-Mesh as shown in Fig. 1f. CBP-Mesh architecture can be extended to odd (5 × 5) number of nodes, higher number of nodes or odd/even (3 × 4) number of nodes in the network as shown in Figs. 3, 4 and 6c, respectively. CBP-Links in addition to M-Links provide multipath that helps to accommodate more adaptive and dynamic routing algorithms in the proposed network. Fig. 2Open in figure viewer CBP-Mesh router (Ro) with router links for interlinking the connection with neighbour routers nodes Fig. 3Open in figure viewer 5 × 5CBP-Mesh network Fig. 4Open in figure viewer Different CBP-Mesh network architectures(a) 5 × 5 CBP-Mesh, (b) 3 × 7 CBP-Mesh, (c) 3 × 9 CBP-Mesh Fig. 5Open in figure viewer Compassion of proposed and selected topologies(a) Average network latency, (b) Average network throughput, (c) Total network power, (d) Energy per data transferred packets Fig. 6Open in figure viewer MPEG-4 application implemented on D-Mesh and CBP-Mesh networks(a) MPEG-4 application task graph with the bandwidth requirements, (b) MPEG-4 cores on 3 × 4 D-Mesh network, (c) MPEG-4 cores on 3 × 4 CBP-Mesh network 3.2 Principle architecture This section describes the proposed CBP-Mesh architecture and details the placement of the CBP-Links. 3.2.1 Assign links to CBP-Mesh network A CBP-Mesh network is a combination of a classic-mesh network with extra CBP-Links (see also Section 3). Thus, the average node distance can be decreased, leading to smaller network diameter. The placement of these CBP-Links is an essential step of designing a CBP-Mesh network and shall be detailed in the following. A CBP-Mesh is defined as Definition 1.A CBP-Mesh is a two-dimensional network with size m × n, with m, n ≥ 3. It consists of a set of nodes N = {(x, y) | 0 ≤ x ≤ m − 1, 0 ≤ y ≤ n − 1}, with (0,0) is the node most northwest. Each node has its own router Rox,y. A CBP-Mesh Router (see Fig. 2) is defined as follows: Definition 2.A CBP-Mesh router has the degree of 3, 4, 5 or 8 with possible ports LN, LS, LE and LW enabling connections to neighbour routers on the north (rN), south (rS), east (rE) and west (rW) using links lN, lS, lE and lW. Further, a CBP-Mesh router can have ports LNE, LNW, LSE and LSW for connecting the CBP-Links (see green arrow line in Fig. 2) lNE, lNW, lSE and lSW. The east neighbour rx,yE and west neighbour rx,yW of router Rox,y of node (x, y) are the routers of nodes (x, y + 1) and (x, y − 1). Similarly, north neighbour rx,yN and south neighbour rx,yS of Rox,y are of nodes (x − 1,y) and (x + 1,y). The insertion of CBP-Links is defined as follows: Definition 3.The router Rox,y of node (x,y), with x and y are even numbers, is connected to routers of nodes (x − 2, y − 2), (x + 2, y − 2), (x − 2, y + 2) and (x + 2, y + 2) by CBP-Links using the ports LNW, LNE, LSW and LSE of Rox,y. Fig. 3 depicts an exemplary 5 × 5 CBP-Mesh with origin point (0,0). Here, router Ro2,2 is connected to its direct neighbours Ro1,2, Ro3,2, Ro2,3 and Ro2,1 via links at its ports LN, LS, LE and LW. Further, Ro2,2 is connected via CBP-Links at the ports LNW, LNE, LSW and LSE with the routers Ro0,0, Ro0,4, Ro4,0 and Ro4,4. The related assignment algorithm for all links of a CBP-Mesh can be found in Appendix. The M-Links are in the CBP-Mesh network and the total numbers of links can be determined Nlinks,CPB_odd and Nlinks,CPB_even of odd and even CBP-Meshes from the following equations: (1a) (1b) 3.3 CBP-Mesh network features CBP-Links of the proposed CBP-Mesh adds the following benefits over its competitor topologies. Fig. 4 illustrates how CBP-Mesh network reduces the network diameter and the distance between nodes. In Fig. 4, three different colour routers (blue, green and red) are shown with hexagonal boxes interlinking each other over the network. The blue routers have one, two or four CBP-Links (see green lines) in addition to M-Links (see black lines). The green colour router has one and red router has two hops distance to the nearest CBP-Links router nodes. The CBP-Links connect the longer distance nodes by over-passing the in-between nodes (like fly over on the roads) of the network. The CBP-Links connect the corner node routers Ro0,0, Ro0,4, Ro4,0, Ro4,4 ↔ Ro2,2 to a central node router in one hop. The middle terminal nodes Ro0,2, Ro4,2 ↔ Ro2,0, Ro2,4 take also one hop to connect (see blue arrow lines in Fig. 4). Similarly, corner nodes Ro0,0, Ro0,4, Ro4,0, Ro4,4 via centre node Ro2,2 take two hops to traverse to opposite nodes in diagonal direction. The middle terminal nodes Ro0,2 ↔ Ro4,2 via Ro2,0 or Ro2,4 and Ro2,0↔ Ro2,4 via Ro0,2 or Ro4,2 connect each other using one hop only. The adjacent green router nodes take one more and red router nodes take two more hops using M-Links from the above router nodes in CBP-Mesh network. Further advantages are the connection of CBP-Links to the central/terminals of the network (see Fig. 4), which provides improved traffic flow and reduced hop count. The addition of CBP-Links in the proposed CBP-Mesh effectively address the worst case hop count between nodes Ro0,0 ↔ Ro4,4 or nodes Ro0,4 ↔ Ro4,0 from nine in a 5 × 5 Mesh network to two hops. The grey areas in Fig. 4a indicate three types of network diameters for the m × n CBP-Mesh, namely the diagonal diameter (), the end to end diameter () and the middle diameter (). The estimation of these parameters for symmetric CBP-Mesh with dimension n × n is as follows: (2) (3) (4a) (4b)The grey areas in Fig. 4b indicate the path between nodes Ro1,0 and Ro1,6 in a 3 × 7 network, which would have a hop count of six in a mesh network. In contrast, in the proposed CBP-Mesh the hop count reduces to five (see double arrow lines in Fig. 4b). For networks with larger amount of nodes, the gain due to CBP-Links increases considerably. For example, in the 3 × 9 network depicted in Fig. 4c, the hop counts between extreme nodes Ro1,0 and Ro1,8 reduces from 8 for a common mesh to 6 in case of the proposed CBP-Mesh. As CBP-Links provide the alternative paths between two nodes increases the tolerance of the network against failing links and routers. Consequently, the proposed CBP-Mesh is more robust than the classic mesh and the C2-Mesh. As the proposed CBP-Mesh scale-up, the CBP-Links become more effective in reducing the distance between nodes in the network. The 3 × 9 CBP-Mesh is shown in Fig. 4. The Ro0,0 ↔ Ro0,4 and Ro2,0 ↔ Ro2,4 (see blue dotted arrow in Fig. 4) reduces the hop count to two as compared with four in mesh, Torus, XD-Mesh and D-Mesh networks. Similarly, Ro0,0 ↔ Ro0,6 and Ro2,0 ↔ Ro2,6 take three hops by using the CBP-Links and adjacent green router nodes take one more hop to route packets to their destinations. 4 Topology characteristics This section presents the characteristics NoC topology and compares the proposed CBP-Mesh with other selected topologies. 4.1 Characteristics of a CBP-Mesh NoC topology can be characterised by its network diameter, bisection width, path diversity, number of links, degree of routers and the existence of path diversity [[17]-[20]]. Table 1 compares the general characteristics of the analysed NoC topologies, whereas symmetric meshes are assumed (size n × n). Table 1. General characteristics of different symmetric mesh network topologies (size n × n) Characteristics Mesh Torus D-mesh XD-Mesh C2-Mesh CBP-Mesh number of nodes n2 n2 n2 n2 n2 n2 diameter 2n − 2 n − 1 n − 1 n − 1 n − 1 n − 1 bisection width n 2n 3n − 2 n + 2 n + 2 2(n + 1) number of links 2n2 − 2n 2n2 4n2 − 6n + 2 2n2 − 2n + 8 2n2 − 2n + 4 2n2 − 2n + 2 router degree 3 to 5 5 4 to 9 4 to 9 4 to 9 4 to 9 path diversity yes yes yes yes yes yes The network diameter is defined as the maximum shortest path between all terminal node pairs of the NoC [[18]]. For example, the network diameter of an n × n mesh is 2n − 2. Reducing the network diameter leads to the minimisation of the hop count between nodes, and thus, to a decrease of the overall latency. It follows, that mesh has the longest diameter, while CBP-Mesh offers the shortest one. The bisection width is the number of links that need to be removed in order to separate a network into two equal parts [[13]]. For example, the bisection width of an n × n mesh is n [[14]]. The bisection bandwidth, which is the bandwidth available between both parts, results from the product of bisection width and link bandwidth. Adding links to the network increases the bisection width due to the enhanced number of paths between two sub-networks. Consequently, the throughput is higher and the traffic flow in the network will be improved. The topology with smallest bisection bandwidth is mesh followed by XD-Mesh and C2-Mesh. In contrast, mesh has the lowest number of links followed by C2-Mesh, XD-Mesh and the proposed CBP-Mesh. A link means the connection between two routing elements, i.e. router, in a NoC. For example, the number Nlinks of links of an n × n mesh is 2n2 − 2n. The highest of amount of links are applied in D-Mesh. The degree of the router means the number of links that can be connected to the router. Thereby, a higher degree leads to increase in complexity and costs in terms of power consumption of the router. The comparison of required router types reveals that mesh and Torus topologies apply only medium degree routers, while the other topologies use very complex routers with a degree of up to 9. Path diversity refers to the number of available paths between two nodes in a NoC. It should be noted that higher path diversity increases the fault tolerance of the network. Table 2 compares the type of links each analysed topology applies (see also Fig. 2), whereas again symmetric meshes are assumed (size n × n). It is assumed that the C- and CBP-Links have some class due to its shape similarly XD- and D-Links consider as some class. As all topologies are mesh-like topologies, the number of M-Links is the same for all. A further analysis reveals that Torus and C2-Mesh require only low amounts of additional links, while the number of D-Links for D-Mesh increases quadratically with n. This tendency is the same for the number of CBP-Links in the proposed CBP-Mesh, however, with a four or nine times lower rising factor depending on the type of mesh. Table 2. Types of links used by analysed symmetric mesh topologies Topology M-Links T-Links D-Link CPB-Links mesh 2n2 − 2n — — — Torus 2n2 − 2n 2n — — D-Mesh 2n2 − 2n — 2(n − 1)2 — XD-Mesh 2n2 − 2n — 2(n − 1) — C2-Mesh 2n2 − 2n — — (n − 1) CBP-Mesh (odd) 2n2 − 2n — — CBP-Mesh (even) 2n2 − 2n Table 3 details the amount of required routers and links for 5 × 5 NoC realised in the analysed topologies. The data indicate that D-Mesh requires a considerable amount of 6 and/or 9 degree routers and possess a very high number of links. In contrast, mesh and Torus apply solely routers with up to five ports at the low amount of links. Finally, XD-Mesh, C2-Mesh and CBP-Mesh have balanced requirements of routers and links. Table 3. Number of links and routers with different degrees for an 5 × 5 selected topologies Topology Router types Nlinks 3-Port 4-Port 5-Port 6-Port 7-Port 9-Port mesh 4 12 9 0 0 0 40 Torus 0 0 25 0 0 0 50 D-Mesh 0 4 0 12 0 9 62 XD-Mesh 0 16 4 0 4 1 48 C2Mesh 0 16 8 0 0 1 44 CBP-Mesh 0 12 8 4 0 1 48 The summary of the advantages gained in the proposed CBP-Mesh network includes: Reduction in network diameter. Reduction in number of hops between nodes. Increase in bisection bandwidth of the network. Availability of multi-paths to network's centre node. Additional fault tolerance of the network. Scalability. 5 Simulation results To compare the proposed CBP-Mesh to existing approaches, different network topologies are implemented and analysed using NoCTweak [[21]] simulator. The results are presented in this section. 5.1 Simulation environment The NoC topologies mesh, Torus, XD-Mesh, C2-Mesh, D-Mesh and CBP-Mesh networks are implemented in NoCTweak simulator, which is an open source and cycle-level accurate tool written in SystemC [[21]]. NoCTweak is selected for implementation and simulation of all networks due to the availability of large sets of workloads of different synthetic traffic and real embedded system application traces. The existing source routing algorithm to compute the shortest path and NMAP algorithm to map embedded application on the processing cores of network are used [[21]]. The hotspot, synthetic and real embedded traffic traces are applied to the proposed CBP-Mesh and its competitor topologies for comparison. The other simulator configurations used in the simulations are given in Table 4. Table 4. Experiments performed in NOCTweak with following parameters technology 65 nm operating voltage 1.0 V clock frequency 1 GHz input buffer size 16 flits number of virtual 8 packet length (flit units) 150 flit length 32 bits 32 bits router 3-stage pipeline each simulation runs 100,000 cycles warm-up cycle time 20,000 cycles links length 1000 μm flit injection rate 0.02 flits/cycle/node 5.2 Scalability analysis The synthetic traffic traces of hotspot workload are applied to the proposed CBP-Mesh and its competitor networks. The analysis focuses on the scalability of the network topologies in terms of latency, throughput, power and energy. Therefore, each network topology has been implemented for different symmetric sized meshes of 3 × 3 to 9 × 9 networks and the results are depicted in Figs. 5a–d. As expected, the mesh topology scales up badly. Its latency increases more and more for complex networks, while the throughput drops significantly (see Figs. 5a and b). However, mesh network took low cost in terms of total network power and energy due to its simple network design as shown in Figs. 5c and d. Similar observations can be done for Torus. The Torus topology shows an average increase of latency and moderate reduction in throughput for rising network complexity. However, for application in larger networks long interconnections are required, resulting in designs that are not regular. Consequently, router design and routing strategies have to be more complex resulting in lower scalability. In contrast, XD-Mesh and C2-Mesh offer good performance parameters at reasonable power and energy increase. The XD-Mesh and C2-Mesh networks possess low complexity and low cost. Its performances are lower compared with D-Mesh though. The results indicate good performance parameters in latency and throughput for D-Mesh. However, the extensive use of links and high degree routers leads to high costs in terms of power and energy. In contrast, the proposed CBP-Mesh scales with very good performance with considerable lower penalties in performance against D-Mesh (see Figs. 5a–d). As can be seen, the proposed 7 × 7 CBP-Mesh has the lowest average latency and good throughput, which is only outperformed by the costly D-Mesh and the bad scalable Torus. Compared with the classic mesh, latency and throughput improve by 36.9 and 38.9%, while power consumption and energy increase by 13.5 and 13.6%. In comparison to D-Mesh, latency and throughput of the CBP-Mesh are 13.0 and 33.3% lower at 23.7 and 18.2% lower costs in terms of power and energy. In case of the 9 × 9 network topologies, CBP-Mesh has the lowest average latency, which is 21.8 and 45.7% lower in comparison to D-Mesh and the standard mesh, respectively. In contrast, the 11 × 11 CBP-Mesh has 34.4% lower power consumption and 35.5% less effort in energy in comparison to its D-Mesh counterpart. 5.3 Performance for embedded application Besides the synthetic traffic, the NoCTweak simulator provides several real-time embedded application traces. An NMAP algorithm is selected to map all the tasks of embedded application's task graph to the tiles of the NoC. Some real-world embedded applications selected for analysis and comparisons of topologies are given in Table 5: Table 5. Some embedded applications with required number of cores Embedded applications Applications required task MPEG4 decoder Mpeg4 with 12 cores Wifirx baseband receiver WiFi with 25 cores video object plane and decoder Vopd with 16 cores video conference encoder Vce with 25 cores multimedia system mms with 25 cores The complete task graph of one of the chosen applications, i.e. MPEG-4 decoder having 12 cores showing the bandwidth requirement and information flow among different tasks is depicted in Fig. 6a. The NMAP algorithm is applied to mapping the tasks of embedded applications on the tiles of the networks. The placement of cores for D-Mesh and CBP-Mesh networks is shown in Figs. 6b and c. The SDRAM (C9) has more traffic load, than other cores in the MPEG-4 application and therefore it requires more links to connect to other cores in the network (see Fig. 6a). The C9 in MPEG-4 plays a vital role in the overall performance of the network. The role of C9 is therefore illustrated in two competitors, i.e. D-Mesh and CBP-Mesh for performance. D-Mesh provides C9 eight direct links to connect to other PEs in the network. The C9 is directly connected to C0, C1, C2, C4, C5, C8, C6 and C10 in D-Mesh that reduces the hops count in the network. However, the traffic from C2 → C8 has to pass through C9 in order to take the shortest path and therefore packets have to be buffered at C9. Also the traffic from C3 → C8 may traverse the path through C9 and C5 routers. Similarly, packets from C4 → C7 may also pass through C9. It means C9 due to heavy traffic load becomes a bottleneck in the network. D-Mesh has the highest number of links in the network, which increase the complexity of router and cost of the network to a great extent. The proposed CBP-Links in CBP-Mesh provides direct connectivity between C0 → C9 and C2 → C8 pairs of nodes. The C9 node directly connects with C0, C4, C6 and C7. The traffic from C3 → C8 take only one hop through C2 to traverse the packets using the M- and CBP-Link as compared with three hops in D-Mesh network. The C5 and C10 both require only one hop to its destination node of C9. CBP-Mesh divides traffic with good balance in the network and therefore do not create the bottleneck at C9 as in the D-Mesh network. The individual latency of C9 both in D-Mesh and CBP-Mesh is compared using NoCTweak. The CBP-Mesh V9 and V10 took 37 and 75% less latencies than the D-Mesh C9 node. The average network latency, throughput, total power and energies of networks are analysed by applying five different real-world embedded applications traffic traces to six different NoC networks including the proposed network. The results for these four parameters are shown in Figs. 7a–d. Fig. 7Open in figure viewer Compassion different application for proposed and selected topologies(a) Average network latency, (b) Average network throughput, (c) Total network power, (d) Energy per data transferred packets Simulation results indicate that the MPEG-4 application of a CBP-Mesh improves latency and throughput by up to 8.9 and 15.7% from its predecessor C2-Mesh, while costs in terms of power and energy increase by up to 3.6 and 8.3%. Compared with more complex D-Mesh NoC topology, latency stays, reduce 7.6%, while the throughput is up to 15.7% lower than D-Mesh. However, CBP-Mesh outperforms from D-Mesh topology in terms of power and energy penalty, which are 22 and 31% lower. Similarly, the proposed CBP-Mesh has been implemented in NoC with five real embedded system applications workloads traces and compared other selected topologies. Simulation results indicate that the application of a CBP-Mesh improves latency in all other topologies. Compared with all applications for NoC topologies, throughput of CBP-Mesh is higher than other topologies expect D-Mesh because of a higher number of links than the CBP-Mesh. However, CBP-Mesh outperforms D-Mesh in terms of power and energy penalty, which are lower in all the applications (see Figs. 7a–d). 6 Conclusion NoC is a promising paradigm to enable fast and reliable on-chip communication in large-scale multiprocessor systems. The performance of NoC is driven by several parameters including the network topology. The impact of topology further increases with increasing system complexity. The principal difference between NoC topologies is the type and the application of the interconnections that has been classified into four basic links in this study. Additionally, this work proposes a new NoC topology termed as CBP-Mesh that improves the standard mesh by adding new CBP-Links to the network. This modification helps in the reduction of the network diameter, minimises the average number of hops and increases the bisection width of the NoC. The proposed CBP-Mesh and five other topologies proposed in previous works are implemented using NoCTweak. Synthetic as well as five different real embedded system workloads are applied to analyse average network latency, throughput, total power and energies of all the networks. Simulation results indicate that CBP-Mesh efficiently reduces the distance amongst nodes as compared with other selected topologies. The proposed network also improves the average network latency and throughput at a less cost of power and energy than its predecessors, i.e. mesh, Torus, C2-Mesh and XD-Mesh. Compared with its competitor D-Mesh topology, CBP-Mesh exhibits lesser latency as well as throughput. However, CBP-Mesh outperforms D-Mesh network in terms of power and energy penalty. Furthermore, the analytical results also indicate good scalability of the proposed network with increasing network complexity. In short, the proposed NoC topology can be used for communication among cores with lesser latency and greater throughput at a reasonable penalty in terms of power and energy consumption. 7 Acknowledgment The authors are thankful to COMSATS Institute of Information Technology for providing the platform to carry out this research work. This work was partially supported by the Brazilian agencies FAPEMIG and CNPq. Appendix 9 Link assignment algorithm for a CBP-Mesh network with size m × n. The current router node is ro(x,y), and the connecting links to neighbours routers are and CBP-Links (see Fig. 8). Fig. 8Open in figure viewer Link assignment algorithm for a CBP-Mesh network 8 References [1]Zarandi, H.R.: ‘ A fault-tolerant core mapping technique in networks-on-chip’, 2013, 7, (August), pp. 238– 245 [2]Sehgal, V.K., Chauhan, D.S.: ‘State observer controller design for packets flow control in networks-on-chip’, J. Supercomput., 2010, 54, (3), pp. 298– 329 [3]Khawaja, S.G., Mushtaq, M.H., Khan, S.A., et al.: ‘Designing area optimized application-specific network-on-chip architectures while providing hard QoS guarantees’, PLoS One, 2015, 10, (4), pp. 1– 17 [4]Pomante, L.: ‘HW/SW co-design of dedicated heterogeneous parallel systems: an extended design space exploration approach’, IET Comput. Digit. Tech., 2013, 7, (6), pp. 246– 254 [5]Morgan, A.a., El-Kharashi, M.W., Elmiligi, H., et al.: ‘Unified multi-objective mapping and architecture customisation of networks-on-chip’, IET Comput. Digit. Tech., 2013, 7, (6), pp. 282– 293 [6]Ju, X., Yang, L.: ‘Performance analysis and comparison of 2 × 4 network on chip topology’, Microprocess. Microsyst., 2012, 36, (6), pp. 505– 509 [7]Catania, V., Mineo, A., Monteleone, S., et al.: ‘Energy efficient transceiver in wireless network on chip architectures’. Proc. DATE ‘16, 2016, pp. 1321– 1326 [8]Balfour, J., Dally, W.J.: ‘Design tradeoffs for tiled CMP on-chip networks’. Proc. 20th Annual Int. Conf. Supercomputing ICS 06, 2006, vol. 28, no. 1, p. 187 [9]Kim, J., Balfour, J., Dally, W.J.: ‘ Flattened butterfly topology for on-chip networks’ [10]Anjum, S., Chen, J., Yue, P., et al.: ‘ Delay optimized architecture for on-chip communication’, 2009, 7, (2), pp. 104– 109 [11]Ya-gang, W., Hui-min, D.U., Xu-bang, S.: ‘ Topological properties and routing algorithm for semi-diagonal torus networks’, 2011, 18, (October), pp. 64– 70 [12]Hu, W., Lee, S.E., Bagherzadeh, N.: ‘ DMesh: a diagonally-linked mesh network-on-chip architecture.’ [13]Arora, L.K.: ‘ C 2 Mesh’, 2012, pp. 282– 286 [14]Gulzari, U.A., Anjum, S., Agha, S.: ‘Cross by pass-mesh architecture for on-chip communication’. Proc. – IEEE 9th Int. Symp. Embedded Multicore/Manycore SoCs, MCSoC 2015, 2015, pp. 267– 274 [15]Ouyang, Y., Zhu, B., Liang, H., et al.: ‘Networks on chip based on diagonal interlinked mesh topology structure’, Comput. Eng., 2009 [16]Swaminathan, K., Lakshminarayanan, G., Ko, S.: ‘ A novel hybrid topology for network on chip’, 2014, pp. 1– 6 [17]Via, O., Insertion, L.L.: ‘ “It’ s a small world after all”: NoC performance’, 2006, 14, (7), pp. 693– 706 [18]Sanju, V., Chiplunkar, N., Khalid, M., et al.: ‘ A performance study of 2D mesh & torus for network on chip based system’, 2013, pp. 0– 4 [19]Elmiligi, H., Morgan, A., El-Kharashi, M.: ‘Power optimization for application-specific networks-on-chips: a topology-based approach’, Microprocessors, 2009 [20]Grecu, C., Ivanov, A., Pande, P., et al.: ‘Towards open network-on-chip benchmarks’. Proc. Int. Symp. Networks-on-Chips, NOCS, 2007 [21]Tran, A., Baas, B.: ‘ NoCTweak: a highly parameterizable simulator for early exploration of performance and energy of networks on-chip’, 2012 Citing Literature Volume11, Issue4July 2017Pages 140-148 FiguresReferencesRelatedInformation

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Efficient and scalable cross‐by‐pass‐mesh topology for networks‐on‐chip