High‐performance user plane function (UPF) for the next generation core networks

Artigo Acesso aberto Revisado por pares

High‐performance user plane function (UPF) for the next generation core networks

2020; Volume: 9; Issue: 6 Linguagem: Inglês

10.1049/iet-net.2020.0033

ISSN

2047-4962

Autores

Whai‐En Chen, Chia Hung Liu,

Tópico(s)

Advanced Optical Network Technologies

Resumo

IET NetworksVolume 9, Issue 6 p. 284-289 Special Issue: Intelligent Computing: a Promising Network Computing ParadigmFree Access High-performance user plane function (UPF) for the next generation core networks Whai-En Chen, Corresponding Author Whai-En Chen wechen@niu.edu.tw orcid.org/0000-0003-0140-160X Department of CSIE, National Ilan University, Yilan, TaiwanSearch for more papers by this authorChia Hung Liu, Chia Hung Liu orcid.org/0000-0002-0771-049X Department of CSIE, National Ilan University, Yilan, TaiwanSearch for more papers by this author Whai-En Chen, Corresponding Author Whai-En Chen wechen@niu.edu.tw orcid.org/0000-0003-0140-160X Department of CSIE, National Ilan University, Yilan, TaiwanSearch for more papers by this authorChia Hung Liu, Chia Hung Liu orcid.org/0000-0002-0771-049X Department of CSIE, National Ilan University, Yilan, TaiwanSearch for more papers by this author First published: 03 November 2020 https://doi.org/10.1049/iet-net.2020.0033Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Comparing with previous mobile communications, the fifth generation (5G) mobile communication provides three different types of services including enhanced mobile broadband, massive machine type communications, and ultra-reliable and low latency communications. To fulfil various requirements of these services, 5G defines new technologies and architectures such as the next-generation core network (NGC) and the new radio of the access network (5G-AN). The user plane function (UPF), which is an essential component in 5G NGC, connects 5G-AN and packet data networks (e.g. internet). Typically, the UPF provides the tunnelling, internet protocol/port translation, and forwarding functions. To provide flexibility and scalability deployment for 5G NGC, this study virtualises the UPF by using the Docker container. However, the virtualisation architecture degrades the performance of the UPF, and the performance of the UPF dominates the performance (e.g. throughput) of the 5G user-plane transmission. To provide high-performance packet processing, this study utilises Intel Data Plane Development Kit to develop the UPF, reduces the number of memory copy on header processing, and investigates the CPU core allocation methods for the UPF deployment. Based on the results of this study, the proposed UPF can provide the UPF functions and process the packets up to 40 Gbps on a x86-based platform. 1 Introduction The fifth generation (5G) mobile communication [1, 2 ] provides three major types of services including enhanced mobile broadband, massive machine type communications, and ultra-reliable and low latency communications. The three different service types require different quality of service (QoS). To fulfil various requirements of these services, 5G defines new technologies and architectures such as the next-generation core network (NGC) and the new radio (NR) of the access network (5G-AN) [3 ]. 5G NR provides the high-bandwidth and low-latency radio accesses. 5G NGC achieves a flexible and efficient core-network platform by adopting software defined networking [4 ], network functions virtualisation (NFV) [5 ], and network slicing [6 ] technologies. Fig. 1 illustrates the 5G NGC architecture. To achieve network slicing, the control plane functions (CPFs) are further split into the access and mobility management function (AMF), session management function (SMF), policy control function (PCF), application function (AF), network slice selection function (NSSF), authentication server function (AUSF), and unified data management (UDM). The AMF terminates and handles the non-access-stratum (NAS) signalling, such as NAS ciphering and integrity protection, registration management, connection management, mobility management, access authentication and authorisation, and security context management. The SMF manages the sessions, allocates the user equipment's (UE's) internet protocol (IP) address, manages the traffic for user plane function (UPF), and notifies the downlink (DL) data by paging the UE. The PCF provides the policy rules to CPFs and provides the subscription information for policy decisions. The AF interacts with policy frameworks for policy control and handles the application's influence on traffic routing. The NSSF selects the network slice instances to serve the UE, determines the allowed network slice selection assistance information, and the AMF set to be used for the UE. The AUSF acts as an authentication server and UDM generates the authentication and key agreement credentials, handles the user identification, provides the access authorisation, and manages the subscription. The UPF connects 5G-AN and packet data network (PDN) such as internet. Besides tunnelling and forwarding the packets, the UPF temporarily buffers the DL packets and waits for the paging process. Thus, the UPF is the key component of the user data transmission in 5G NGC. Fig 1Open in figure viewerPowerPoint 5G NGC architecture The efficiency of handling signalling and user-data packets can be improved by separating the core-network functions into the control-plane and user-plane functions [7 ]. The deployment of the control-plane and user-plane functions is presented and evaluated in the paper [8 ]. To fulfil the requirements of these services, Jose et al. [6 ] introduced the concept of network slicing that further separates the control-plane functions. In the user plane, all packets are processed by the UPF. Sebastian and Manzoor [9 ] proposed that the UPFs balance the processing load by predicting network traffic transmitted to the UPFs. The network function virtualisation (NFV) is used to construct the 5G NGC for supporting various network functions [10, 11 ]. The authors in [12, 13 ] discussed the resource allocation methods of the virtual machines to fit the QoS requirements of 5G services. However, the NFV overhead increases the processing latency. Kernel-based virtual machine (KVM) [14 ] and Docker container [15 ] are two useful virtualisation technologies on Linux for NFV deployment. KVM is an open-source virtualisation technology built in Linux. Docker container is a lightweight virtualisation technology and uses operating-system-level virtualisation. The Docker containers bundle the specific applications and are isolated from each other. The KVM virtual machine contains the guest operation system, which produces more virtualisation overhead and consumes more computing and memory resources. The authors in [16, 17 ] point out that the performance of the container is better than that of KVM. Thus, in this paper, we adopt a container to build the virtualisation platform. Specifically, the UPF is virtualised and deployed in the Docker container to minimise the overhead. Although the container provides an efficient virtualisation platform, the virtualisation overhead of the container still degrades the processing speed since the service overlaid the operating system and the container. To solve this issue, we utilise Intel Data Plane Development Kit (DPDK) [18 ] to develop the UPF. The DPDK consists of libraries used to accelerate packet processing. Whai-En and Chia-Hung [19 ] compared the performance of socket-based and DPDK-based media gateway. The performance of applications using the DPDK library is much greater than the performance of socket-based. Whai-En [20 ] used DPDK to forward packets and tested the packet forwarding throughput of 8-core and 4-core CPUs. In this paper, we will use the DPDK library to design high-performance UPF. To achieve high-performance, the UPF should process the packet as fast as possible. The computing power allocation (e.g. CPU core assignment) for the virtualised UPF should be carefully considered. The rest of the paper is organised as follows. Section 2 elaborates on the hardware architecture and introduces the testing tool. Section 3 presents the DPDK-based UPF design. Section 4 investigates the performance of the proposed UPF with seven core-allocation methods. The conclusions are given in the last section. 2 Hardware architecture and testing tool In this paper, we use two identical computers as the testing machine and the system under test (SUT). Fig. 2 illustrates the hardware structure of computers. Fig 2Open in figure viewerPowerPoint Architecture of the test machines The computer contains two central processing units (CPUs; i.e. CPU0 (a) and CPU1 (b)), a random-access memory (RAM; see (d) and (g)) for each CPU and a network interface card (NIC (c)). The model of CPU is ‘Intel Xeon (R) CPU E5-2620 v4’, and the model of NIC is ‘Intel X710-T4’ [21 ]. The NIC contains four full-duplex 10 Gigabit Ethernet ports. The CPUs connect to their RAM through the double data rate 3 channel (see (e) and (f)). CPU0 is connected to CPU1 through two quick path interconnect (QPI) interfaces. Either CPU0 or CPU1 can access the NIC that is connected to CPU0. In this case, CPU0 accesses the NIC through the peripheral component interconnect express (PCIe) interface, and CPU1 accesses the NIC through the QPI and PCIe interfaces. Note that the performance of the UPF deployed on CPU1 is less than the performance of the UPF deployed on CPU2. The detailed performance is evaluated and discussed in Section 4. This paper adopts the packet generator (Pktgen) [22 ] as the testing tool to evaluate the throughput of the proposed DPDK-based UPF. The Pktgen is a software-based traffic generator and developed based on the DPDK framework. Before performing tests, we should obtain the limitation (i.e. the maximum throughput) of the Pktgen. A self-test is executed to investigate the maximum throughput of the Pktgen. The system architecture of the self-test is shown in Fig. 3. Fig 3Open in figure viewerPowerPoint System architecture of the test machine This system architecture includes (a) the Pktgen, (b) an igb_uio driver, and (c) a NIC. The Pktgen runs in the user space without virtualisation. The igb_uio driver is a DPDK driver embedding in the kernel space of Linux (Ubuntu 16.04). The NIC contains four ports (i.e. Ports A, B, C, and D). The Pktgen can generate and receive the testing packets simultaneously. To obtain the maximum throughput, Port A connects to Port B, and Port C connects to Port D. The NIC ports are connected through the ethernet augmented category 6 (Cat.6A) cables. The test machine contains two Intel E5-2620 CPUs and each CPU has eight physical cores (i.e. cores 0–7). In the experiments, this paper enables the hyper-threading (HT) technology. In the operating system, the command ‘cat/proc/cpuinfo ’ is used to display the identifiers of the 16 logical cores (lcores) in each CPU. The identifiers assigned by the system are presented in Fig. 4. Fig 4Open in figure viewerPowerPoint CPU core deployment Fig. 4 illustrates the lcore allocation for CPU0 and CPU1. At CPU0, the lcore identifiers 2i and 2i + M are assigned to Core i, where M is the maximum logical core number (i.e. M = 16). At CPU1, the lcore identifiers 2i + 1 and 2i + M + 1 are assigned to Core i. Since the receiving procedure includes performance calculation and more complex than the transmission, the receiving process requires more computing power (i.e. more cores). In the self-test, the NIC is connected to CPU0, and we assign the lcores at CPU0 to the Pktgen to achieve the maximum performance. For example, in this test, lcore0 is assigned to the operating system, lcore2 is assigned to control the data statistics of Pktgen, lcores 4, 6, 8, 10, 20, 22, 24, and 26 are allocated to receive the testing packets, and lcores 12, 14, 28, and 30 are allocated to generate the testing packets. The packet sizes used in the Pktgen self-test ranges from 64 to 1518 bytes, and the tunnelling headers are also considered. Fig. 5 illustrates the user-plane protocol stacks in 3GPP NGC [23 ] between the UE and the PDN. Based on the protocol stacks, we obtain the packet (or frame) format. The UPF connects to the 5G-AN through the N3 interface and the PDN through the N6 interface. The general packet radio service tunnelling protocol (GTP) [24 ] tunnels are built between the 5G-AN and the UPF at N3, and there is no tunnel between the UPF and the PDN. Therefore, the UPF adds the GTP tunnel header when the packet is transmitted from the UPF to the UE. The UPF removes the GTP tunnel header when the packet is transmitted from the UPF to the PDN. The GTP tunnel header includes an 8-byte GTP header, an 8-byte user datagram protocol (UDP) header, and a 20-byte IP header. The total length of the GTP tunnel header is 36 bytes. The L1 header of ethernet includes a 7-byte preamble, 1-byte start of frame delimiter, and 12-byte inter-frame gap. The total length of the L1 header is 20 bytes. For ethernet, the L2 header is a 14-byte ethernet header. Consider the tunnelling procedure, the minimum sizes of the testing packets received by the UPF from the N3 and N6 interfaces are 100 and 64 bytes, respectively. The maximum sizes of the testing packets received by the UPF at N3 and N6 are 1518 and 1482 bytes. The sizes 64, 100, 128, 256, 512, 1024, 1482, and 1518 bytes are adopted for testing. The details of the packet sizes are elaborated in Section 4. Fig 5Open in figure viewerPowerPoint User-plane protocol stacks between UE and UPF The maximum packet throughput and maximum throughput percentage of Pktgen self-test are illustrated in Fig. 6. Note that, in this paper, we test the maximum throughput with zero packet loss. The UPF performs tunnelling function which adds or removes the GTP tunnel header from the testing packets. Thus, the packets’ length is changed after they are processed by the UPF. The maximum packet throughput is increased or decreased due to the tunnelling procedure when the measurement unit of the speeds is ‘bits per second (bps)’. Thus, in this paper, we utilise ‘packets per second (pps)’ as the unit of the maximum packet throughput. Fig 6Open in figure viewerPowerPoint Pktgen self-test with different packet size Since the maximum packet throughput when the packet size increases, it is not easy to understand the output throughput. To show the output throughput, this paper conducts the max throughput percentage by using the following equation: (1) In this equation, is 20 bytes, is the packet size (e.g. 64, 100, and 128), B is 8 bytes, Tx is the maximum packet throughput (maximum packet per second), and is 40 Gbps. The throughput percentages are 63% for 64-byte, 90% for 100-byte, and 97% for 128-byte packet sizes. After the packet size is larger than 256 bytes, the throughput percentage of the Pktgen reaches 100%. In this section, we introduce the hardware and system architectures of the testing machine and the SUT. Specifically, we elaborate on the testing tool (i.e. Pktgen) and the computing power (i.e. core) assignment for the Pktgen. The Pktgen can test the performance that is less than its output. The self-test result shows that the Pktgen can generate 100% throughput traffic when the packet size is more than 256 bytes. 3 DPDK-based UPF design We utilise DPDK library and develop the UPF functions including the translation, tunnelling, and forwarding functions. To achieve high performance on the container, the UPF utilises the buffer structure and performs as a few memory copies as possible. The system architecture of the DPDK-based UPF is illustrated in Fig. 7. Fig 7Open in figure viewerPowerPoint Architecture of the UPF Fig. 7 illustrates the system architecture of the UPF. The architecture including (a) the IP lookup table; (b) the translation function; (c) the tunnelling function; (d) the forwarding function; and (e) the DPDK library. The IP lookup table records the mapped IP/port and the destination NIC ports for translation and forwarding. The binary search method is adopted in the table lookup procedure. The translation function searches the mapped IP/port from the IP lookup table, modifies the header fields, and re-calculates the checksum by using the DPDK API rte_ipv4_cksum() and rte_ipv4_udptcp_cksum(). Upon receipt of the packets from the PDN (e.g. internet), the translation function modifies the destination IP/port. On the contrary, while receiving the packets from the UE, the translation function modifies the source IP/port. While receiving a tunnelled packet, the tunnelling function removes the outer IP and UDP/transmission control protocol (TCP) headers. While receiving a packet without tunnelling, the tunnelling function translates the IP/port and then adds the GTP tunnel headers. The forwarding function retrieves the incoming packets and sends them to the translation function. Upon receipt of the packets from the tunnel function, the forwarding function forwards these packets to the output ports through the DPDK application programming interface (API). The detailed packet processing flows are elaborated as follows. In the UE's uplink flow, the forwarding function of the UPF receives the packets from the transmitter (TX)/receiver (RX) queues (Figs. 7 f –i ) by calling the DPDK API rte_eth_rx_burst(). Upon receipt of the packets, the forwarding function sends the packet pointers to the tunnelling function. The tunnelling function removes the GTP tunnel header including the outer IP header, UDP header, and GTP header, and then returns the revised packet pointer to the forwarding function. The forwarding function sends the packet pointer to the translation function. The translation function retrieves the source IP/port to search the lookup table, modifies the source IP/port, calls the DPDK API rte_ipv4_cksum() to update the ipv4 checksum, and returns the packet pointer to the forwarding function. Finally, the forwarding function calls the DPDK API rte_eth_tx_buffer() to send out the packet to the TX/RX queue. In the UE's DL flow, the forwarding function of the UPF receives the packets from the TX/RX queue by calling the DPDK API rte_eth_rx_burst(). Upon receipt of the packets, the forwarding function sends the packet pointer to the translation function. The translation function retrieves the source IP/port from the header to search the lookup table, modifies the destination IP/port, calls the DPDK API rte_ipv4_cksum() to re-calculate the IPv4 checksum, and returns the packet pointer to the forwarding function. The forwarding function sends the packet pointer to the tunnelling function. The tunnelling function adds the GTP tunnel header and then returns the revised packet pointer to the forwarding function. Finally, the forwarding function calls the DPDK API rte_eth_tx_buffer() to send out the packet to the TX/RX queue. To improve the performance, the DPDK-based UPF should perform less number of memory copies. Among the major functions of the UPF, the translation function modifies the IP and UDP/TCP header, and the tunnelling function adds or removes the GTP tunnel headers. The payload is not modified. Thus, we focus on the header modifications and leave the payload unchanged. The detailed tunnelling procedure is elaborated as follows. The packets stored in the DPDK buffer are shown in Figs. 8 and 9. There is a headroom in the front of the packet and a tailroom in the back of the buffer [25 ]. In this paper, the headroom and tailroom are set to 128 bytes. We utilise the packet pointer movement to replace the memory copy operations. Fig 8Open in figure viewerPowerPoint Packet header process for UE's uplink Fig 9Open in figure viewerPowerPoint Packet header process for UE's DL The GTP tunnel header has 36 bytes. In the UE's uplink flow, the DPDK-based UPF removes the GTP tunnel header by moving the packet pointer 36 bytes from Fig. 8 ① to ②. Specifically, the DPDK API rte_pktmbuf_mtod() moves the packet pointer to ① and the DPDK API rte_pktmbuf_adj() moves the packet pointer to ②. Then, the tunnelling function adds the L2 header by using a memory copy (14 bytes). The packet pointer ② is sent to the forwarding function. Note that the 36-byte data are still in the headroom and are not deleted. The GTP tunnel header has 36 bytes. In the UE's DL flow, the DPDK-based UPF adds the GTP tunnel header by moving the packet pointer 36 bytes from Fig. 9 ① to ②. Specifically, the DPDK API rte_pktmbuf_mtod() moves the packet pointer to ① and the DPDK API rte_pktmbuf_prepend() moves the packet pointer to ②. Then, the tunnelling function adds the L2 header, tunnel IP, tunnel port, and GTP-U header by using memory copy (48 bytes). The packet pointer ② is sent to the forwarding function. Note that the tailroom is not changed. The size of the headroom (i.e. 128 bytes) is enough for adding a 36-byte GTP tunnel header. In this section, we illustrate the functions of DPDK-based UPF, and the packet processing in the UE's uplink and DL flows. We also implement the IP lookup table and introduce how to decrease and increase the packet header with less memory copy. In the next section, the performance of the DPDK-based UPF is investigated with different core allocation methods. 4 Performance evaluation Based on the hardware architecture described in Section 2, we evaluate the performance of the UPF. In the performance evaluation, the system should be stable and provide high performance. To obtain high performance, we set the performance policy of the basic input/output system (BIOS) configuration to ‘performance’ mode [26 ]. To maintain the stability and prevent the unexpected burst output, we disable the ‘turbo boost’ mode in the BIOS configuration. To reduce the interference from the operating system and other applications, we assign the operating system and the DPDK-based UPF to different lcores in most cases. We allocate the logical core 0 in CPU 0 and logical core 1 in CPU 1 to perform the operating system. In most cases, these two lcores are not assigned to the DPDK-based UPF. The testing architecture is shown in Fig. 10. In this section, we utilise Pktgen (Fig. 10 d ) running on PC1 as a test tool to evaluate the performance of the DPDK-based UPF running on PC2. Note that the hardware equipment and the operating systems of PC1 and PC2 are the same. Fig 10Open in figure viewerPowerPoint Testing architecture of DPDK-based UPF The SUT running on PC2 includes (a) the DPDK-based UPF; (b) an igb_uio driver; and (c) a NIC. The DPDK-based UPF is developed based on DPDK and runs in a Docker container [27 ]. The igb_uio driver is a DPDK driver embedding in the kernel space. The model of the NICs (i.e. NIC1 and NIC2) in PC1 and PC2 is ‘Intel X710-T4’, and each NIC contains four ports. In the test, we connect the ports on NIC1 to the ports on NIC2 through Cat.6A [28 ] cables that can provide 10 Gbps within 100 m. The Pktgen on PC1 generates the testing packets and transmits these packets to PC2. The ports on NIC2 (c) receives the testing packets and transmits the testing packets to the DPDK-based UPF (a) through the igb_uio driver (b). Upon receipt of the packets, the DPDK-based UPF retrieves the IP/port from the IP and UDP/TCP headers and performs an IP lookup. After the DPDK-based UPF obtains the mapped IP/port, it translates the IP/port and forwards the translated packets to NIC2 through the igb_uio driver. Then NIC2 transmits the packets back to PC1. Note that the DPDK-based UPF receives the tunnelled packets at the N3 interface and the packet without tunnelling at the N6 interface. While receiving the tunnelled packet, the DPDK-based UPF removes the outer IP and UDP/TCP headers and translates the IP/port in the inner IP and UDP/TCP headers. While receiving the packet without tunnelling, the DPDK-based UPF translates the IP/port and then adds the GTP tunnel headers, which is mentioned in Section 3. Upon receipt of the packets from the PDN (e.g. internet), the DPDK-based UPF translates the destination IP/port. On the contrary, while receiving the packets from the UE, the DPDK-based UPF translates the source IP/port. To achieve the maximum throughput, this paper designs seven core-allocation methods to allocate CPU cores to the DPDK-based UPF. To analyse the performance (i.e. the maximum throughput with zero packet loss), we divide the methods into the single-core deployment and the multi-core deployment. In the first method of the single-core deployment, we allocate lcore 0 to the DPDK-based UPF to share the same logical core with the operating system. The first method is called the shared core (SC) method which attempts to utilise the rest computing power of lcore 0. Since the SUT contains two CPUs and the CPU connects through the QPI interface. The second method and third method allocate an individual CPU core in CPU0 and an individual core (IC) in CPU1, respectively. The second method is called the IC method. In the third method, the CPU core accesses the packets through the QPI interface and is called the IC through QPI (ICQ) method. In the ICQ method, we expect that there is the extra latency for transmission through QPI and the throughput may be degraded. In this section, we investigate whether the overhead is acceptable. If no, the ICQ method should be avoided. In multi-core deployment, each CPU provides 16 lcores, and the SUT totally contains 32 lcores. The detailed identifiers of the lcores can be found in Fig. 4. We can allocate up to four lcores to each NIC port. Thus, we have four methods that allocate two/four cores and enable/disable the HT technology (w/ or w/o HT), respectively. The four methods are 2Core w/ HT, 2Core w/o HT, 4 Core w/ HT, and 4 Core w/o HT. 2Core w/ HT utilises two lcores in one physical core. 2Core w/o HT utilises two lcores in two physical cores. 4Core w/ HT utilises 4 lcores in two physical cores. 4Core w/o HT utilises four lcores in four physical cores. Consider the tunnelling procedure, the extra headers for GTP-v1 tunnel are removed while receiving the packets at the N3 interface. The headers include 20 bytes IP header, 8 bytes UDP header, and 8 bytes GTP-v1 header, and the total header length for the GTP tunnel is 36 bytes. Thus, we evaluate the performance of the DPDK-based UPF for removing the tunnel by using 100 bytes (i.e. 64 + 36 = 100) as the minimum packet size. The packet sizes for testing are 100, 128, 256, 512, 1024, and 1518 bytes. The test results for the single-core allocation and the multi-core allocation are shown in Figs. 11 and 12, respectively. Fig 11Open in figure viewerPowerPoint DPDK-based UPF performance of single-core allocation for removing tunnel Fig 12Open in figure viewerPowerPoint DPDK-based UPF performance of multi-core allocation for removing tunnel Fig. 11 illustrates that the throughput of the small packet size is low and increases when the packet size increases. That is because with the same input traffic speed the larger the packet size, the less packets are processed. We also observe that the overhead of transmission through QPI is heavy and the ICQ method has the worst performance. As a result, the ICQ method is not acceptable. The throughput values of the SC and IC methods are almost the same and the IC method is slightly higher than the SC method. Based on this result, we can assign the DPDK-based UPF with the same logical core as the DPDK control system. Note that the performance of the SC and IC can almost reach 100% while the packet size is >256 bytes, the performance of the ICQ can reach 100% when the packet size is >1024 bytes. Based on the performance test, the CPU attached by the NIC should be assigned to the DPDK-based UPF. Fig. 12 presents the performance of the multi-core allocations. We observe that more CPU cores output higher performance and the HT cores perform similar performance as the physical cores. All multi-core allocation methods can reach 100% throughput when the packet size is >256 bytes. Note that the 2Core w/ HT method only 65.54% performance when the packet size is 100 bytes. That means the 2Core w/ HT method does not provide enough computing power to process the incoming packets. Based on the result, two individual physical cores are enough to process 10 Gbps uplink traffic. For the performance test for adding the GTP tunnel header, the DPDK-based UPF adds the GTP tunnel headers onto the packets. Thus, the maximum packet size for testing is 1482 bytes (i.e. 1518−36 = 1482), and the packet sizes for testing are 64, 128, 256, 512, 1024, and 1482 bytes. The test results for the single-core allocation and the multi-core allocation are shown in Figs. 13 and 14, respectively. Fig 13Open in figure viewerPowerPoint DPDK-based UPF performance of single-core allocation for tunnelling Fig 14Open in figure viewerPowerPoint DPDK-based UPF performance of multi-core allocation for tunnelling Fig. 13 shows similar trends as Fig. 11. The performance is better when the packet size is larger. The IC method slightly outperforms the SC method and dramatically outperforms the ICQ method. Compare Figs. 11 and 13, in the tunnelling procedure, the performance for removing tunnel is better than that for adding tunnel since adding GTP tunnel headers is more complex than removing the GTP tunnel headers. We also confirm that the ICQ method cannot output high throughput, and cross-QPI allocation should be avoided. Fig. 14 presents the results for the multi-core allocation of processing DL packets. Fig. 14 shows similar trends as Fig. 12. When the packet size is >256 bytes, the throughput values of the multi-core allocation methods are almost the same. Compare Fig. 11 with Fig. 13 and Fig. 12 with Fig. 14, we notice that in the removing tunnel procedures the maximum throughput can reach 100% for large packet size (e.g. >512 bytes), but the maximum throughput cannot reach 100% in the tunnelling procedures. The reason is the tunnelling procedures add GTP header to the incoming packets, and the total traffic is more than the limitation of the transmission lines (e.g. 40 Gbps). Take 512-byte packets for example, if the incoming traffic is 40 Gbps, the outgoing traffic is 42.70 Gbps after adding a 36-byte GTP header. The extra 2.7 Gbps packets are dropped in this situation. We then check the maximum throughput limitation by using the following equation: (2) is 20 bytes, is 36 bytes and is the packet size. Based on (2 ), we obtain 88.46% for 256 bytes, 93.66% for 512 bytes, 96.66% for 1024 bytes and 97.68%. Based on this result, we can deploy multiple UPFs on different physical machines and distribute the traffic to another UPF when an UPF reach the maximum throughput. We also confirm that two physical cores can provide high performance for the DL traffic. Based on the uplink and DL testing results, we give a CPU core allocation example to support 40 Gbps full-duplex traffic. The CPU core deployment and system architecture of the DPDK-based UPF can be found in Figs. 4 and 9, respectively. In this example, lcore 16 is assigned to the DPDK control system, lcore 0 and lcore 2 are assigned to Port 1, lcore 4 and lcore 6 are assigned to Port 2, lcore 8 and lcore 10 are assigned to Port 3, and lcore 12 and lcore 14 are assigned to Port 4. In the packet forwarding procedure, the latency of the IP lookup may affect the performance of the DPDK-based UPF. Thus, we evaluate the table lookup latency. To handle huge IoT traffic, the IP lookup table contains lots of IP/port mapping entries. In this experiment, we evaluate the table size with 102, 103, 104, and 105 IP/port entries with a log scale. Theoretically, more entries increase more table lookup latency. The test result of the lookup table is shown in Fig. 15. Fig 15Open in figure viewerPowerPoint Average table lookup latency The DPDK-based UPF performs a binary search for IP lookup. We utilise the binary search algorithm to find the mapped IP/port. To find the target IP/port in the lookup table, binary search compares the target value to the middle entry of the lookup table. If they do not match, half of the lookup table is eliminated and the binary search continues searching on the remaining half. The searching process is repeated until the target IP/port is found. Therefore, if there are n IP/port entries in the lookup table, the worst case of the time complexity is O (log n ). When the n (i.e. table size) is 102, the lookup latency is 0.195 μs. When n is 103, the lookup latency is 0.262 μs. When n is 104, the lookup latency is 0.361 μs. When n is 105, the lookup latency is 0.496 μs. The above results of the lookup latency follow O (log n ). The average packet processing latency is above 1000 μs, and the table lookup latency is no more than 0.5 μs. Thus, the cost of table lookup latency can be ignored. 5 Conclusion The UPF is the key component of the 5G NGC. To provide flexibility and scalability deployment for 5G NGC, this paper virtualises the UPF by using the Docker container. To provide high-performance packet processing, this paper utilises Intel DPDK to develop the UPF, reduces the number of memory copy on header processing, and investigates the CPU core allocation methods for the UPF deployment. According to our experiment results, only two physical cores are required to handle 40 Gbps packets, and the optimal throughput reaches 60.69% with 64-byte packet size and 100% throughput when the packet size is >256 bytes. Based on the results of this paper, the 5G NGC can provide the UPF functions and process the packets up to 40 Gbps on an x86-based platform. 6 References 1Nathalie O. Marc B. Gael F. et al.: ‘A programmable and virtualized network & IT infrastructure for the internet of things: how can NFV & SDN help for facing the upcoming challenges ’. 2015 18th Int. Conf. on Intelligence in Next Generation Networks, Paris, France, 2015 2Philipp S. Maximilian M. Henrik K. et al.: ‘Latency critical IoT applications in 5G: perspective on the design of radio interface and network architecture ’, IEEE Commun. Mag., 2017, 55, (2 ), pp. 70 – 78 3‘3GPP TS 38.912’. Available at https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3059, accessed 1 March 2020 4Rashid A. Martin R. Nadir S.: ‘Hybrid SDN networks: a survey of existing approaches ’, IEEE Commun. Surv. Tutorials, 2018, 20, (4 ), pp. 3259 – 3306 5Juliver H. Juan B.: ‘Resource allocation in NFV: a comprehensive survey ’, IEEE Trans. Netw. Serv. Manage., 2016, 13, (3 ), pp. 518 – 532 6Jose O. Pablo A. Diego L. et al.: ‘Network slicing for 5G with SDN/NFV: concepts, architectures, and challenges ’, IEEE Commun. Mag., 2017, 55, (5 ), pp. 80 – 87 7Imtiaz P. Ali R. Ismail G. et al.: ‘A survey on low latency towards 5G: RAN, core network and caching solutions ’, IEEE Commun. Surv. Tutorials, 2018, 20, (4 ), pp. 3098 – 3130 8Paul A. Nico B. Jakob B. et al.: ‘5G radio access network architecture based on flexible functional control/user plane splits ’. 2017 European Conf. on Networks and Communications (EuCNC), Oulu, Finland, 2017 9Sebastian P. Manzoor K.: ‘Anticipatory user plane management for 5G ’. 2018 IEEE 8th Int. Symp. on Cloud and Service Computing (SC2), Paris, France, 2018 10Ali M. K. R. Uma C. et al.: ‘Improving performance and scalability of next generation cellular networks ’, IEEE Internet Comput., 2019, 23, (1 ), pp. 54 – 63 11Van-Giang N. Anna B. Karl-Johan G. et al.: ‘SDN/NFV-based mobile packet core network architectures: a survey ’, IEEE Commun. Surv. Tutorials, 2017, 19, (3 ), pp. 1567 – 1602 12Satyam A. Francesco M. Carla C. et al.: ‘VNF placement and resource allocation for the support of vertical services in 5G networks ’, IEEE/ACM Trans. Netw., 2019, 27, (1 ), pp. 433 – 446 13Neetu R. Yiyong Z. Yunfei Z. et al.: ‘Virtual core network resource allocation in 5G systems using three-sided matching ’. ICC 2019–2019 IEEE Int. Conf. on Communications (ICC), Shanghai, China, 2019 14Wanqing B. Wenli L.: ‘A novel VSFTP-based KVM virtualization cloud deployment scheme ’. 2018 5th IEEE Int. Conf. on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE Int. Conf. on Edge Computing and Scalable Cloud (EdgeCom), Shanghai, China, 2018 15Nikhil M. Ankita G. Jaimeel S.: ‘Docker swarm and Kubernetes in cloud computing environment ’. 2019 3rd Int. Conf. on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019 16MinSu C. HwaMin L. Kiyeol L.: ‘A performance comparison of linux containers and virtual machines using Docker and KVM ’, Cluster Comput., 2019, 22, pp. 1765 – 1775 17Komal K. Kiranbir K.: ‘Performance study of applications using Dockers container ’. Proc. Int. Conf. on Advances in Electronics, Electrical & Computational Intelligence (ICAEEC), Prayagraj, India, 2020 18‘Developer Quick Start Guide Learn How to Get Involved with DPDK’. Available at https://www.dpdk.org/, accessed 1 March 2020 19Whai-En C. Chia-Hung L.: ‘Performance enhancement of virtualized media gateway with DPDK for 5G multimedia communications ’. 2019 Int. Conf. on Intelligent Computing and its Emerging Applications (ICEA), Tainan, Taiwan, 2019 20Whai-En C.: ‘Packet forwarding enhancement for virtualized next-generation core networks ’. 2018 27th Wireless and Optical Communication Conf. (WOCC), Hualien, Taiwan, 2018 21‘Intel Ethernet Converged Network Adapter X710-T4’. Available at https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ethernet-x710-t4-brief.pdf, accessed 1 March 2020 22‘The Pktgen Application’. Available at https://pktgen-dpdk.readthedocs.io/en/latest/, accessed 1 March 2020 23‘3GPP TS 23.501’. Available at https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3144, accessed 1 March 2020 24‘3GPP TS 29.060’. Available at https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1595, accessed 1 March 2020 25‘DPDK 9. Mbuf Library’. Available at https://doc.dpdk.org/guides/prog_guide/mbuf_lib.html, accessed 1 March 2020 26‘DPDK 10. How to get best performance with NICs on Intel platforms’. Available at https://doc.dpdk.org/guides-18.11/linux_gsg/nic_perf_intel_platform.html, accessed 1 March 2020 27‘Docker’. Available at https://www.docker.com/, accessed 1 March 2020 28‘Balanced Twisted-Pair Telecommunication Cabling and Components Standard’. Available at https://innovave.com/wp-content/uploads/2016/01/TIA-568-C.2.pdf, accessed 1 March 2020 Citing Literature Volume9, Issue6Special Issue: Intelligent Computing: a Promising Network Computing ParadigmNovember 2020Pages 284-289 FiguresReferencesRelatedInformation

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

High‐performance user plane function (UPF) for the next generation core networks