Data‐driven car‐following model based on rough set theory
2017; Institution of Engineering and Technology; Volume: 12; Issue: 1 Linguagem: Inglês
10.1049/iet-its.2017.0006
ISSN1751-9578
AutoresShenxue Hao, Licai Yang, Yunfeng Shi,
Tópico(s)Transportation Planning and Optimization
ResumoIET Intelligent Transport SystemsVolume 12, Issue 1 p. 49-57 Research ArticleFree Access Data-driven car-following model based on rough set theory Shenxue Hao, Shenxue Hao School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China School of Information Engineering, Shandong Yingcai University, Jinan, People's Republic of ChinaSearch for more papers by this authorLicai Yang, Corresponding Author Licai Yang yanglc@sdu.edu.cn School of Control Science and Engineering, Shandong University, Jinan, People's Republic of ChinaSearch for more papers by this authorYunfeng Shi, Yunfeng Shi School of Control Science and Engineering, Shandong University, Jinan, People's Republic of ChinaSearch for more papers by this author Shenxue Hao, Shenxue Hao School of Control Science and Engineering, Shandong University, Jinan, People's Republic of China School of Information Engineering, Shandong Yingcai University, Jinan, People's Republic of ChinaSearch for more papers by this authorLicai Yang, Corresponding Author Licai Yang yanglc@sdu.edu.cn School of Control Science and Engineering, Shandong University, Jinan, People's Republic of ChinaSearch for more papers by this authorYunfeng Shi, Yunfeng Shi School of Control Science and Engineering, Shandong University, Jinan, People's Republic of ChinaSearch for more papers by this author First published: 02 November 2017 https://doi.org/10.1049/iet-its.2017.0006Citations: 16AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The car-following model is an important micro-traffic model for simulating car-following behaviour in traffic engineering and research studies. Conventional car-following models are always presented using mathematical equations reflecting ideal traffic conditions. In the big data era, data-driven models become a popular trend. In this study, a data-driven car-following model based on the rough set theory is proposed to consider information hidden in a field data set. On the basis of field data obtained from measurement devices such as the next generation simulation (NGSIM) trajectory data set, and using the methods of the rough set theory, an optimal decision rule set is established. Redundant attributes and redundant attribute values are removed for simplifying the car-following behaviour decision problem. Attribute significance and weights are computed for selecting matching rules. A car-following behaviour decision algorithm is designed to choose appropriate rules to determine the follower's velocity according to current observations. Simulations illustrate that the proposed data-driven car-following model can simulate the micro-traffic behaviour of followers well. 1 Introduction Micro-traffic phenomena have been paid more attention in recent decades. Car-following models are considered by researchers to analyse the micro-traffic behaviour of vehicles on road segments. Car-following theories are developed to model the motion of vehicles following preceding ones on a single lane without overtaking. The car-following concept was proposed by Pipes in 1953 [1]. In further decades, researchers proposed many car-following models to describe followers’ behaviour and traffic phenomena such as the Gazis-Herman-Rothery (GHR) [2, 3], optimal velocity model (OVM) [4], intelligent driver model (IDM) [5], generalised force model) (GFM) [6], full velocity difference (FVD) [7] etc. These car-following models are generally easier to comprehend, and describe car-following behaviour well. Some of them have been widely used in commercial packages of micro-traffic simulations such as the VISSIM, AIMSUN, and PaRAMICS [8]. However, these models, called conventional car-following models in this paper, are based on mathematical equations under some ideal assumptions. Traffic phenomena simulated using these models may differ from real situations to some extent, since some factors affecting car-following behaviour are difficult to be quantified for computation, for example, the vehicle type. Furthermore, the parameters of car-following models need to be calibrated when they are applied in different situations such as different road segments or different vehicle types of the leader and follower. To address these problems, data-driven car-following models can be considered a reasonable alternative. With the development of computational technologies, car-following models based on field data started to attract more attention of researchers. However, some essential issues need to be noted. First, field data from different measurement devices can be inaccurate, inconsistent, or incomplete, which would constrain the applications of data-driven car-following models. Second, the attributes of field data sets have different effective weights related to car-following behaviour, which should be considered in data-driven car-following models. The rough set theory proposed by Pawlak [9] is a mathematical tool for dealing with incomplete data effectively. It can be utilised to discover hidden knowledge and information from field data sets [10]. The significance of attributes can also be calculated without any prior knowledge. On the basis of the rough set theory, a data-driven car-following model is proposed in this paper, which can simulate the realistic micro-traffic behaviour of vehicles on segments of roads. Some realistic factors affecting car-following behaviour are considered in this model such as the number of vehicles in front of the follower in its lane, the vehicle types of the leader and follower etc. The proposed data-driven car-following model extracts car-following behaviour decision rules from a field data set using the rough set theory, and determines the behaviour of the follower according to observations and selected matching rules from the decision rule set. First, we deal with the trajectory data of the NGSIM I80 data set (next generation simulation) to obtain a valid car-following data set, which is used as the basis for extracting car-following decision rules. Second, an optimal decision rule set is established using the attribute reduction and value reduction of the rough set theory. Then, a car-following behaviour decision algorithm is designed considering the significance and weights of attributes. Finally, simulations are carried out to analyse the validity and reliability of this model. This paper is organised into six sections. Section 2 presents related studies. In Section 3, we present the methodology of the rough set theory. Section 4 describes the procedure of data processing to obtain the optimal decision rule set, where the car-following decision algorithm is also designed. In Section 5, simulations are performed to analyse the validity of the proposed car-following model. This paper is concluded in Section 6. 2 Related literature The car-following model is an underlying component of micro-traffic simulations and the modern traffic flow theory, which attempts to simulate interactions between two successive vehicles in a single lane [11]. Since the car-following concept was introduced by Pipes in 1950s, many researchers have contributed to develop car-following models in the following decades. Most of car-following models describe the micro-traffic behaviour of the leader and follower using mathematical equations, which usually involve the velocity of the n th and (n −1)th vehicles, as well as the space headway between the two vehicles. Considering the space headway, Bando et al. [4] proposed the OVM in 1995, which depends on the idea that the following vehicle has an optimal velocity. Furthermore, Helbing and Tilch [6] considered differences in velocities to propose the GFM, and Jiang et al. [7] proposed the FVD model (FVDM) based on the GFM. More recently, many researchers focused on investigating car-following models to reveal the characteristics of micro-traffic behaviour. More factors influencing car-following behaviour are considered such as multiple headways [12], real-time road conditions [13], heavy-duty vehicle types [14] etc. These conventional car-following models are used in micro-traffic simulations quite well, and some of them have been applied in commercial packages. However, most of car-following models utilise some specific parameters in mathematical equations that affect car-following behaviour, whereas other realistic factors are ignored. The main reason is that an increase in the number of parameters in a mathematical equation results in higher computational complexity. On the other hand, parameters calibrated for a car-following model are usually based on traffic data from a particular scenario, which can be hardly adaptable to other cases. From this point of view, researchers have proposed to use fuzzy logic-based models to predict car-following behaviour instead of using mathematical equations. Khodayari et al. [15] developed a car-following model using a fuzzy inference system to simulate and predict the future behaviour of a driver-vehicle unit. Hao et al. [16] also proposed a fuzzy logic-based multi-agent car-following model considering human factors affecting car-following behaviour. However, it is a challenge to define an appropriate fuzzy set and associated membership functions in such fuzzy logic-based models. In the era of big data, data-driven modelling methods are a reasonable alternative. Ossen et al. studied the car-following behaviour of individual drivers using trajectory data extracted from high-resolution digital images, and found that there are considerable differences in the car-following behaviour of individual drivers. These differences are expressed as optimal parameter values in car-following models [17]. Vasileia et al. pointed out that data-driven approaches are more flexible, and allow the incorporation of additional information to car-following models. They established a data-driven car-following model based on locally weighted regression [18]. He et al. [19] also proposed a simple non-parametric car-following model driven by field data using the k -nearest neighbour method. Kendziorra et al. [20] described a data-driven car-following model, in which the acceleration of the follower is modelled using a distribution sampled directly from data. These data-driven models do not require specific functions for fitting, and they are easier to incorporate various factors for specific purposes. Traffic characteristics can be replicated by such data-driven car-following models without parameter calibration. Although data-driven car-following models become more popular, there are still some issues that need to be considered in further research. (i) Field data may be incomplete, inaccurate, or inconsistent due to the errors of the measurement device etc. (ii) There may be redundant or useless data in the field data set, which needs to be removed for efficiency. (iii) Attribute weights should be considered in car-following behaviour predictions, since different attributes affecting car-following behaviour have different influence powers. (iv) Some attributes affecting car-following behaviour are difficult to be quantified for computation purposes. Considering these issues, in this paper, a data-driven car-following model is proposed based on the rough set theory. The car-following problem is modelled as a decision problem. First, redundant attributes and attribute values are removed from the field data set. Then, an optimal decision rule set is obtained, and attribute significances are calculated using the rough set method. A decision algorithm is designed to determine the behaviour of the follower considering different attribute weights. The framework of the proposed model and the concepts of the rough set theory are introduced in the next section. 3 Methodology 3.1 Model framework Data-driven car-following models are easy to consider more realistic parameters that do not need parameter calibration. Previous data-driven car-following models usually compute the follower's velocity using the k -nearest neighbouring records in the field car-following data set [18, 19]. Neither attribute or attribute value redundancy, nor attribute significance in raw data are considered. Moreover, knowledge hidden in the raw data set is not fully utilised. In this paper, a data-driven car-following model based on the rough set theory is proposed. The model can reveal more useful information from raw data to obtain an optimal decision rule set, even if there is some incomplete or inconsistent data in the raw data set. The main idea of the proposed data-driven car-following model is as follows. First, field data is processed to remove useless data, and the original decision system is extracted from the raw data. Second, redundant attributes contained in the original decision system are removed using the attribute reduction of the rough set theory to simplify the decision system. Third, redundant attribute values and redundant records are removed using the value reduction to obtain the optimal decision rule set for improving the efficiency of car-following behaviour decisions. Finally, a decision algorithm considering attribute weights is designed to deal with different cases of car-following situations. The framework of the proposed model is presented in Fig. 1. A raw data set is collected from historical traffic data, and an optimal car-following decision rule set is extracted after data processing including attribute reduction and value reduction. Then, discernibility-based attribute weights are calculated to determine car-following behaviour. According to the observation of the leader's and follower's states at time t, the car-following decision algorithm predicts the following action of the follower at time t + 1. This data-driven car-following model can be applied to car-following behaviour control in the future, when a detector unit is equipped in the vehicle, which can record historical trajectory data and detect the current vehicle state. It can also be applied to traffic simulation systems, when data comes from traffic simulations. In the next section, we discuss the major concepts of the rough set theory, which are used in this paper. Fig. 1Open in figure viewerPowerPoint Framework of data-driven car-following model based on rough set theory 3.2 Rough set theory The rough set theory was first proposed by Prof. Pawlak for processing imprecision, vagueness, and uncertainty of data. It is considered an alternative to the fuzzy set theory [9]. Rough sets have important applications in intelligent decision-making systems [21], and are suitable for automatically acquired rules [22], since they can handle inaccurate, inconsistent, and incomplete data. In the rough set theory, an information system is defined as a four-tuple (U, A, V, f), where universe U is a non-empty finite set of objects, A is a non-empty finite set of attributes, V is the value domain of attribute set A and f is an information function, where for and . If , where , the information system is called a decision system, where C is the condition attribute set and D is the decision attribute set. Object indiscernibility is an important concept of the rough set theory. Let ind(P) denote the indiscernibility relation on attribute set (1) where a (x) denotes the value of object x corresponding to attribute a. That is, objects x and y cannot be identified using the values of attributes in set P according to data existing in the information system. Symbol U/ ind(P) (simplified as U/P) represents the partition based on indiscernible relation ind(P). For and indiscernibility relation ind(P), the upper and lower approximations of X are defined as follows: (2) (3) Let P denote a condition attribute set and Q denote a decision attribute set. There is a P -positive region of Q defined as (4) If , , it can be said that c is a redundant attribute. Since the raw data of the decision system comes from a field experiment, there can always be redundant data. The attribute reduction and attribute value reduction of the rough set theory can remove such redundant data to simplify the decision system. The attribute reduction is used to delete irrelevant or unimportant attributes from the decision system, and maintain the partition capacity of the attribute set. The attribute value reduction needs to be done to further simplify the decision system. For a decision system, C is the condition attribute set, and D is the decision attribute set. There are partitions and . The rule of the decision system can be described as follows: (5) where rx is the rule of the decision system, des(Xi) is the specific value of the condition attribute of object x, and des(Yi) is the specific value of the decision attribute of object x. After attribute reduction and attribute value reduction, the optimal decision rule set can be obtained. On the basis of the optimal decision rule set obtained from car-following field data, the behaviour of the follower can be determined using an effective decision algorithm. In the next section, we describe the car-following model based on the rough set theory in detail including data processing, factor selection, decision rule extraction, and decision algorithm design. 4 Data-driven car-following model based on rough set theory Conventional car-following models usually consider differences in velocities and headways in a single lane as major parameters. The velocity of the follower is calculated using a mathematical equation under ideal assumptions. The parameters of the model need to be calibrated in different scenarios, and it is difficult to consider attributes that are hard to be quantified. Hidden information in raw data may also be discarded. To make up for the shortages of mathematical car-following models, data-driven car-following models are proposed to simulate the velocity of the follower according to similar records found in historical field data. However, the proposed data-driven models rarely consider factors other than velocities and headways, and the incompleteness of raw data is also ignored. In this paper, based on the rough set theory, we propose a data-driven car-following model considering more realistic factors such as vehicle types, the number of vehicles in front of the follower in the lane, velocities, headways, and so on. This rough set theory-based data-driven car-following model can obtain optimal decision rules from raw data sets, considering more useful information affecting car-following behaviour. This model can simulate car-following behaviour more realistically. The main idea of the proposed data-driven car-following model is to extract optimal decision rules from the raw data set using the rough set theory, and determine the follower's behaviour at time t + 1 according to the observation at time t considering attribute significance hidden in the raw data set. The car-following field data process is presented in the next section. 4.1 Car-following field data analysis and processing Trajectory data used in this paper is the NGSIM I80-1 data set [23], containing data for I80 in Emeryville, CA, collected from 4:00 p.m. to 4:15 p.m. on 13 April 2005, which was downloaded from the research data exchange, a web-based intelligent transport system (ITS) data resource. There are 18 attributes in the original trajectory data of the NGSIM I80-1. Montanino and Punzo [24] reconstructed it in 2015. Ten attributes are remained in the reconstructed data set including vehicle_id, frame_id, lane_id, local_y, mean_speed, mean_accel, vehicle_length, vehicle_class_id, follower_id, and leader_id. The position of a vehicle is represented by local_y. Attributes leader_id and follower_id indicate the leading and following vehicles, respectively. The headway attribute used in this paper is calculated using the attribute local_y of the leader and follower. On the basis of the idea of the rough set theory, it is necessary to analyse this data and extract useful factors affecting car-following behaviour. 4.1.1 Factors affecting car-following behaviour On the basis of the NGSIM I80-1 data set, in this paper we consider the number of vehicles ahead of the follower and the vehicle class as two new realistic factors influencing car-following behaviour, to improve the ability of the car-following model in simulating traffic scenarios. The number of vehicles ahead of the follower indicates the number of vehicles in front of the follower within a certain distance in the same lane at time t. As shown in Fig. 2, the average speed of the follower decreases as the number of vehicles in front of the follower increases. It is clear that the number of vehicles in front of the follower has significant impact on car-following behaviour. This phenomenon is consistent with actual situations when drivers cautiously follow the preceding vehicle in high-traffic density conditions. Fig. 2Open in figure viewerPowerPoint Number of vehicles ahead and average speed of follower in NGSIM I80-1 data set Another important factor affecting following behaviour is the vehicle class. This property is difficult to be quantified in conventional car-following models. This implies that the car-following behaviour of different types of vehicles cannot be simulated using the same car-following model. Conventional car-following models need to be calibrated using different field data sets to simulate different types of vehicles. The data-driven car-following model can extract car-following rules for all types of vehicles, and based on these rules the car-following behaviour can be simulated well. As shown in Fig. 3, different classes of the follower and leader result in different ranges of velocity, and this phenomenon corresponds to the reality that different vehicle classes lead to different following behaviour. For example, the green points represent velocity pairs where the classes of the leader and follower are both three. It is clear that the velocity is low in this car-following situation. Of course, while there can be similar velocities corresponding to different classes, there can still be different headways, because different classes of vehicles have different braking distances. That is, different classes of vehicles have different car-following behaviour. Fig. 3Open in figure viewerPowerPoint Velocities of leader and follower for different vehicle classes in NGSIM I80-1 data set. Red ellipse represents range of follower velocity where both classes are 2. Blue ellipse represents follower velocity where leading and following vehicle classes are 2 and 3, respectively. Green ellipse represents follower velocity where both classes are 3 On the basis of the analysis above, it can be concluded that the number of vehicles in front of the follower and the vehicle class is two factors affecting car-following behaviour. In this paper, we consider velocities, headways, the number of vehicles in front of the follower, and vehicle classes as major factors affecting car-following behaviour, to establish a data-driven car-following model which can overcome the deficiencies of conventional car-following models. In the next section, we use the raw data of the NGSIM I80-1 data set to extract a decision rule set and establish a data-driven car-following model. 4.1.2 Field data process The main objective of this paper is to model car-following behaviour based on field data. In the proposed car-following model, factors such as velocities, headways, the number of vehicles in front of the follower, and vehicle classes are considered. To improve the efficiency of the proposed model, the data sampling period is chosen as 1 s in this paper (a frame means 0.1 s). The records of the first 1000 frames and last 1000 frames in the original data set are removed, because not all vehicles are marked in these frames, which result in inaccurate vehicle counting. Lane changing and overtaking behaviour is not considered in this paper, and this data is also removed. Since lane 7 is a ramp entrance in the NGSIM I80-1 data set, the car-following behaviour in lane 6 is interfered by vehicles entering lane 6 from lane 7. Therefore, vehicles in lanes 6 and 7 are also removed. The steps of data processing of the NGSIM I80-1 data set are described below: Step 1 : Count vehicles in front of the follower every 1 s. The number of vehicles in front of the follower is computed by counting vehicles ahead of the follower in the same lane within a certain distance every 1 s. The distance threshold is 100 m, which is the maximum car-following distance [7]. Step 2 : Remove records in which preceding and following vehicles do not exist in the same frame. This means that such records in the original data set are not car-following pairs. Step 3 : Remove lane changing and overtaking records. If lane_id of a vehicle changes, there is lane changing or overtaking behaviour. Step 4 : Remove records having frame_id lower than 1000 or larger than 8000, since not all vehicles are marked in these frames. Remove records with vehicles in the last 100 m in any lane, because the number of vehicles in front of such vehicles is not accurate. Step 5 : Remove all other error records such as records with headways larger than 100 m, records with incorrect lane_id, follower_id, leader_id etc. Step 6 : Extract leader–follower pairs including positions, velocities, headways, numbers of vehicles in front of the follower, vehicle classes etc. After data processing, the original car-following decision system is obtained. There are several attributes including frame_id, lane_id, the number of vehicles in front, leader_id, leader–class, leader–position, leader–velocity, leader–acceleration, follower_id, follower–class, follower–position, follower–velocity, follower–acceleration, and the headway. For simplicity, attributes contained in the original car-following decision system such as leader–class, leader–position, leader–velocity, leader–acceleration, follower–class, follower–position, follower–velocity, follower–acceleration, the headway, and the number of vehicles in front of the follower are denoted by lc, lp, lv, la, fc, fp, fv, fa, hw, and vehs, respectively. According to the reconstructed trajectory data of the NGSIM I80-1 [24], the vehicle position is represented by the original attribute local_y, that is, the longitudinal coordinate position of the vehicle in the lane, and the horizontal coordinate is ignored. On the basis of the original car-following decision system, decision rules are extracted in the next section. 4.2 Optimal decision rule set establishment To establish a data-driven car-following model, it is necessary to extract a car-following decision rule from the original decision system obtained in Section 4.1. The decision problem of the data-driven car-following model is to determine the velocity of the follower at time t + 1 based on the status of the leader and follower at time t, the status of the leader at time t + 1, and the traffic situation at time t (in this paper, the number of vehicles in front of the follower). On the basis of the original decision system, we can obtain the decision rule set, which includes the status of the leader and follower at time t (represented by frame_id) and t + 1 (expressed by frameid + 10), the number of vehicles in front of the follower, and vehicle classes. Attributes such as frame_id, leader_id, follower_id, of the original decision system are removed, because they are not condition or decision attributes of the decision problem. Since the time interval from t to t + 1 is 1 s (10 frames), which is short enough, it is assumed that the attribute vehs is constant within the time interval. The position of the follower and headway at time t + 1 can be computed using the velocity of the follower at time t + 1, and then the two attributes of the follower are also removed. Accordingly, the attributes of the original decision rule set are obtained as follows: A = {lc, fc, vehs, lp (t), lv (t), la (t), fp (t), fv (t), fa (t), hw (t), lp (t + 1), lv (t + 1), la (t + 1), fv (t + 1)}. Let C = {lc, fc, vehs, lp (t), lv (t), la (t), fp (t), fv (t), fa (t), hw (t), lp (t + 1), lv (t + 1)} denote the condition attribute set and D = {fv (t + 1)} the decision attribute set. In the condition attribute set, different attributes have different effects on car-following behaviour in the decision problem; there even may be redundant attributes. We solve this problem based on the rough set theory as follows. Definition 1.For , the dependency of decision attribute D on the corresponding condition attribute is (6) Definition 2.For , the significance of condition attribute c is (7) The two definitions show that attribute c is more important to decision attribute D if the value of Sig(c) is larger. If Sig(c) is zero, it is said that attribute c is redundant to decision attribute D. According to the two definitions, attribute reduction is used to remove all redundant attributes having significance 0. However, the attribute reduction problem is non-deterministic polynomial (NP)-hard [25]. For simplicity, we first simplify condition attribute set C according to the features of the trajectory data. In the car-following behaviour decision problem,
Referência(s)