Artigo Acesso aberto Revisado por pares

Doubly constrained gravity models for interregional trade estimation

2020; Elsevier BV; Volume: 100; Issue: 2 Linguagem: Inglês

10.1111/pirs.12581

ISSN

1435-5957

Autores

Mattia Cai,

Tópico(s)

Global trade and economics

Resumo

This paper discusses a family of methods grounded in the doubly constrained gravity model (DCGM) for the estimation of product-specific origin–destination matrices of interregional trade. We argue that several estimation procedures documented in the literature, although outwardly unrelated, can be conceptualized as applications of the DCGM framework. We show that DCGM estimation requires less restrictive assumptions and fewer data than commonly thought. We demonstrate that, in ideal conditions, the unknown trade flows can be recovered exactly through a standard application of the RAS algorithm. Finally, we examine how a DCGM estimator might fare in applications using Monte Carlo techniques and give an example of its use with real-world data. En este artículo se analiza una familia de métodos basados en el modelo de gravedad con doble restricción (DCGM, por sus siglas en inglés) para la estimación de matrices origen–destino de productos específicos del comercio interregional. En el artículo se sostiene que, aunque externamente no estén relacionados, es posible conceptualizar varios procedimientos de estimación documentados en la literatura como aplicaciones del marco DCGM. Se muestra como la estimación del DCGM requiere supuestos menos restrictivos y menos datos de lo que se cree comúnmente. Se demuestra que, en condiciones ideales, los flujos comerciales desconocidos pueden ser recuperados con exactitud mediante una aplicación estándar del algoritmo RAS. Finalmente, se estudia cómo podría funcionar un estimador DCGM en aplicaciones que utilizan técnicas de Monte Carlo y se da un ejemplo de su uso con datos del mundo real. 本稿では、地域間取引における商品別の出発地-目的地(O-D)マトリクス推定のための空間的相互作用モデル (doubly constrained gravity model:DCGM)を基礎にした方法について論じる。論文に記されているいくつかの推定方法は、表面的には関連性がないが、DCGMのフレームワークの応用として概念化することができる。DCGMの推定には、一般的に考えられているよりも制限的ではない仮定と少ないデータが必要であることを示す。理想的な条件下で、RASアルゴリズムの標準的なアプリケーションにより不明な取引のフローが正確に回復されることを示す。最後にモンテカルロ法を使用しているアプリケーションでDCGMの推定関数がどのように実行されるかを検討し、リアルワールドデータにおける使用例を提示する。 For many important economic issues, policy analysis at a sub-national scale (state, region, province, etc.) requires information on how the various parts of the country interact through trade in goods and services (e.g., Lecca, Christensen, Conte, Mandras, & Salotti, 2020). Unfortunately, hardly any governments in the world produce official survey-based statistics on interregional trade. This paper is concerned with how to bridge that information gap in applied work. Specifically, we discuss a family of methodological approaches to the estimation of product-specific origin–destination (OD) matrices of interregional trade. The question of how to estimate bilateral trade flows at the sub-territorial level has attracted relatively little attention from economists since Schwarm, Okuyama, and Jackson (2006, p. 84) complained about the "dearth of examples in the literature."Understandably, regional analyses that concentrate primarily on a single territorial unit generally show little interest in constructing a full set of trade flows explicitly linking all parts of the country. Instead, they focus on estimating only those economic aggregates that are strictly necessary for their study. Thus, for example, input–output studies often rely on non-survey regionalization of model coefficients estimated from country-level data rather than attempt estimating interregional trade (Hermannsson & McIntyre, 2014; Loizou, Chatzitheodoridis, Polymeros, Michailidis, & Mattas, 2014; Morrissey, 2016; PwC, 2014; Yu, Hubacek, Feng, & Guan, 2010). In recent years, non-survey methods have been at the centre of a large and rapidly growing body of research (Bonfiglio, 2009; Bonfiglio & Chelli, 2008; Flegg & Tohmo, 2013, 2019; Flegg & Webber, 2000; Flegg, Webber, & Elliott, 1995; Kowalewksi, 2015; Kronenberg, 2009; Lamonica & Chelli, 2018; Többen & Kronenberg, 2015). Although comparatively limited in size, the literature on interregional trade estimation has already experimented with a number of approaches. Frequently, the starting point is provided by data on freight transport. These are typically subjected to numerous and often complex adjustments in order to translate the observed flows from physical to monetary units (Többen, 2017), account for multiple modes of transportation (Llano, Esteban, Pérez, & Pulido, 2010), or address problems of statistical reliability (Schwarm et al., 2006), missing data and mismatching commodity classifications (Park, Gordon, Moore, & Richardson, 2009). The estimation of interregional trade has also been approached using the gravity model as a framework. At its simplest, the gravity model posits that the intensity of trade between two regions is a function of their respective economic masses and the distance between them. The main difficulty with these methods is that it is not at all clear how to parametrize the model (Simini, González, Maritan, & Barabási, 2012), as that seems to require precisely the type of interregional trade data whose unavailability one is trying to overcome. In applications, model parameters are usually determined through a combination of calibration with freight transport data (Boero, Edwards, & Rivera, 2018; Lindall, Olson, & Alward, 2006), econometric estimation from international trade statistics (Fingleton, Garretsen, & Martin, 2015; Riddington, Gibson, & Anderson, 2006), and guesswork (Johansen, Egging, & Ivanova, 2018; Sargento, Ramos, & Hewings, 2012). As these approaches have fairly heavy data and labour requirements, some authors have attempted to develop simpler alternatives by extending the logic of non-survey input–output regionalization to a multiregional context (Gallego & Lenzen, 2009; Haddad, Samaniego, Porsse, Ochoa, & de Souza, 2011; Jahn, 2017). In this case, the estimates generated by non-survey techniques are often used in conjunction with a balancing procedure (e.g., the RAS technique) to incorporate additional information available to the analyst. For all its diversity, the existing literature on interregional trade estimation is skewed towards heavily applied contributions. Overcoming the limitations of the data often requires ad-hoc assumptions and estimation procedures are seldom linked to a broader theoretical framework. As a result, it is not always clear how to generalize beyond the specific data availability conditions under which they were developed, or how to assess the relative strengths and weaknesses of alternative approaches. It adds to the difficulty of drawing lessons from past experiences that—again, due to a lack of suitable data—very few attempts have been made to evaluate the accuracy of the trade estimates (Distefano, Tuninetti, Laio, & Ridolfi, 2019; Fournier Gabela, 2020). In this paper, the problem of estimating interregional trade is examined through the lens of one specific flavour of the gravity model, the doubly constrained gravity model (DCGM). The DCGM is popular among transportation planners as a tool to estimate (or forecast) OD matrices of passenger and, less commonly, freight flows (Barbosa et al., 2018; Roy & Thill, 2004). Despite the similarity in objectives, that body of work has inspired surprisingly few applications—most notably, Lindall et al. (2006)—and no broader methodological discussion in the context of interregional trade estimation. Conversely, the freight models of transportation planners place great emphasis on issues (e.g., modal split, transshipment and warehousing) that in the context of interregional trade estimation represent nuisances rather than objects of primary interest (Davydenko & Tavasszy, 2013). Here, we use the term DCGM with reference to a framework with two defining features (Section 2). First, the unobserved bilateral flows to be estimated behave according to some version of the gravity equation. Second, the total flows into and out of each region (e.g., overall supply and use) are known or can at least be estimated. It is worth noting that the DCGMs of the transportation literature have one additional characteristic: they use a specific form of the gravity equation derived via maximum entropy arguments and accordingly calibrate the key parameter of the model to cost data (Wilson, 1967, 1970). In our case, by contrast, the gravity equation is simply assumed to hold. It is stated, however, in a form general enough to accommodate maximum entropy formulations, specifications derived from economic theory, as well as a variety of empirical implementations encountered in the applied trade literature. This flexibility expands the range of empirical techniques that can be used to determine the model's parameters. For example, a gravity specification based on microeconomic principles (e.g., Anderson & Van Wincoop, 2004; Chaney, 2018) can help establish a link between the unknown parameters of the interregional model and those of a behavioral equation that can be estimated econometrically from international trade data. Furthermore, in cases where the analyst has no other option but to make numerical assumptions about parameter values, it ensures that those assumptions can at least be informed by a large body of empirical literature. This paper contributes to the methodological discussion on interregional trade estimation in four ways. First, it calls attention to the fact that estimation procedures based on the DCGM actually impose weaker restrictions on the data-generating process than it is commonly assumed. Secondly, it shows that DCGM estimation of interregional trade need not be as data and computationally intensive as the few existing applications. In fact, under ideal conditions, a very simple procedure—which we refer to as the "plain vanilla" DCGM estimator—is enough to recover an unknown OD matrix of trade with perfect accuracy. Thirdly, we make a case that several experiences described in the literature, although seemingly unrelated, can be thought of as applications of the DCGM approach. Finally, we examine how the DCGM estimator might behave in real-world applications through a series of Monte Carlo experiments (Section 3). Focusing on the plain vanilla DCGM, we assess how sensitive the bilateral trade estimates are to various types of model misspecification and data quality problems (Section 4). This exercise provides useful indications regarding not only the likely magnitude of the estimation error in applications, but also any systematic biases in the trade estimates. The paper also includes an example of how the plain vanilla DCGM estimator can be operationalized in a real world application (Section 5). Suppose that an economy of interest consists of n > 1 regions and an unspecified number of traded commodities. Our analysis will only look at one of those commodities (a 'widget'), but the same line of reasoning applies to any other traded product. For i, j = 1,…, n, let xij denote the overall amount of widgets produced in region i and used in region j during a certain time period of interest. Taken together, this collection of bilateral flows defines an n × n OD 'trade matrix', X. The analyst's problem is that X, which contains information necessary for applied work, cannot be observed and must therefore be estimated. With a view to estimating X, a useful starting point is represented by the following special case. With bilateral trade defined according to Equation 1, suppose that the value of the separation factor fij can be observed for all i, j pairs. Suppose further that the marginal totals of the trade matrix are also available. Thus, region i's total widget production mi* = ∑jxij and region j's total widget demand m*j = ∑ixij are known for all i and j. Under these assumptions, standard arguments imply that the origin and destination factors—the ai's and the bj's of equation 1—can be solved for (up to a constant factor) (Bacharach, 1970; Idel, 2016; Miller & Blair, 2009; Sen & Smith, 1995). Effectively, this means that the true trade matrix X can be recovered exactly. To do so, a simple iterative procedure known as the RAS algorithm can be used. In a nutshell, RAS rescales bi-proportionally a matrix of initial values (the "prior" matrix) so that the resulting matrix (the "scaled" matrix) matches a set of pre-specified row and column totals (the "target" totals). In our case, X emerges as the scaled matrix when RAS is applied to the prior matrix F = [fij] with the target totals provided by the m's. A more formal discussion of this point can be found in the Appendix. This idealized setup in which the true trade matrix can be recovered exactly will be referred to as the DCGM. In passing, it should be noted that this specification of the separation factors requires that distances be strictly positive. This rules out several simple metrics like flat-earth distances between region centroids (or capital cities), as those imply dij = 0 whenever i = j. The simple distance decay process of Equation 7 could be enriched with additional explanatory variables. Suppose, for example, there is reason to believe that there is something systematically different about intraregional trade. This can be accommodated in our framework by assuming that , where I(·) is the indicator function, that is, I(i = j) takes the value one for intraregional flows and zero in all other cases. Naturally, flexibility comes at the price of increased data requirements, as more elaborate models inevitably have more parameters (e.g., not only the distance elasticity θ, but also the home bias parameter λ) for which estimates have to be obtained. Irrespective of what assumptions and data sources are employed to construct the , once , and are available for all i and j, estimating the trade matrix amounts to a standard application of the RAS algorithm: to find , one only needs to scale the matrix to the estimated marginal totals. It is worth emphasizing that DCGM estimation does not require any information nor make any assumptions about the origin and destination factors. Identifying the functional form, explanatory variables and parameters of the as and the bs in Equation 1 is not necessary. The remainder of this paper will focus predominantly on the case in which the separation factors are specified according to the power law of Equation 7. To distinguish it within the broader class of DCGM estimators, this approach will be referred to as the "plain vanilla" DCGM estimator. The reason of our emphasis on the plain vanilla estimator is that Equation 7 represents the standard way by which distance decay is modeled in applied trade economics. Thus, its single unknown parameter sits in the middle of a considerable body of theoretical and empirical literature. Several approaches to interregional trade estimation documented in the literature can be thought of as applications of the plain vanilla DCGM. Among these, there are studies explicitly inspired by the DCGMs of transportation planners—such as Lindall et al. (2006) and Johansen et al. (2018)—but there are also studies that do not put themselves in that tradition. The latter include Fournier Gabela (2020), Riddington et al. (2006), Sargento et al. (2012) and Yamada (2015). In these analyses, initial estimates of bilateral trade are first predicted using an empirical gravity equation (e.g., one of an econometric nature) and then improved upon using the RAS algorithm. As a corollary, this argument implies that the values selected for , , and in Equation 8 do not affect the value of . Even if the corresponding true values were known, the accuracy of the trade estimates would not improve. In the same spirit, augmenting the model of Equation 5 with an additional explanatory variable does not affect the trade estimates as long as that variable enters the equation multiplicatively and its value varies only by origin or by destination (e.g., degree of specialization in Sargento et al., 2012). This is true whether or not the variable in question is a meaningful determinant of interregional trade. Similar considerations apply to variables that vary only across product categories (e.g., the tax and margin rates in Johansen et al., 2018). Just like , the estimated trade matrix is compatible with an entire family of gravity model specifications of which Equation 5 is just a member. In fact, the relevance of this analysis extends beyond the estimation methods that are explicitly based on the gravity model. Specifically, there are interesting implications for those estimation procedures that, although building on a theoretical framework other than the gravity model, still make use of RAS scaling. For example, one of the techniques Distefano, Tuninetti, et al. (2019) use to reconstruct an unknown bilateral trade network is to apply the RAS algorithm to a prior matrix with generic entry . This procedure can be conceptualized as the plain vanilla DCGM estimator with set equal to 1. In turn, this provides an intuitive interpretation for the RAS scaling factors: they represent estimates of a gravity model's origin and destination factors. More generally, it is common practice for analysts to 'balance' their interregional trade estimates—however obtained—to ensure consistency with the broader accounting framework in which they are embedded. Often, this is done using some version of the RAS method (Gallego & Lenzen, 2009; Haddad et al., 2011; Zhao & Squibb, 2019). When looked at through the lens of the DCGM, such estimation approaches comprise a full, though implicit, specification of the separation factors. In addition, our analysis of Equation 8 implies that seemingly important variables used in the construction of the pre-balancing estimates may end up having no effect whatsoever on the post-balancing estimates. In particular, this is the case of any variable whose inclusion only augments the model with a multiplicative term that scales proportionally all flows from a certain origin (like ) or to a certain destination (like ). Section 2 has shown that, under certain assumptions an unknown OD matrix of interregional trade can be recovered with perfect accuracy using the plain vanilla DCGM estimator. Namely, those assumptions are that: (i) the bilateral flows represented in the trade matrix follow the deterministic gravity model of Equation 1 with the separation factors given by the distance decay function in Equation 4; (ii) the elasticity θ of trade to distance is known; and (iii) the row and column totals of the matrix, the m's, are observed without uncertainty. In real-world applications, it is highly unlikely that any of these assumptions will hold exactly. On the one hand, any empirically feasible model specification can only approximate the actual trade flows. On the other, both the distance elasticity and the marginal totals of the trade matrix will be estimated with some degree of error. How can these departures from the assumptions be expected to reflect on the accuracy of the estimated trade matrix? This type of sensitivity analysis is complicated by the fact that DCGM estimates are obtained using the RAS method. There is unfortunately no analytical expression either for the scaled matrix or for its derivatives with respect to the prior matrix and the target totals. Thus, to investigate how plain vanilla DCGM estimation might perform in real-world applications, we use a Monte Carlo simulation approach. In each simulation run, we construct a hypothetical trade matrix with known properties and imagine having to recover it from partial information. We examine a variety of scenarios that differ in the type and quality of the information that is presumed available. All analyses are carried out in the R environment for statistical computing (R Core team, 2016). In our experiments, it will be necessary to simulate imperfect knowledge of the trade matrix's row and column totals. Again, the true marginal totals of X are given by mi* = ∑jxij and m*j = ∑ixij for all i and j. We assume that the analyst cannot observe these quantities, but only the corresponding estimates and , where the ηs represent stochastic multiplicative error terms. In any given region i, the pair (ηi*, η*i) is drawn independently from a bivariate normal distribution with mean one, standard deviation ση and correlation ρη. Here, ρη is presumed to be non-negative, reflecting the fact that—given how estimations are likely to be carried out in in practice—row and column totals that refer to the same region will generally be biased in the same direction. 1 The are balanced to ensure that the resulting DCGM estimation problem is well-behaved (i.e., that ). To assess how the plain vanilla DCGM estimator performs when the assumptions that guarantee its accuracy are relaxed, we conduct four types of analyses. Initially, the empirical significance of each of the assumptions is investigated in isolation. Thus, a first set of simulations examines the implications of misspecifying the distance elasticity, while retaining the assumptions the trade matrix follows a deterministic gravity model and that its marginal totals are known (experiment 1). A second set of simulations considers the consequences of inaccurately estimated row and column totals in the context of a deterministic gravity equation with known distance elasticity (experiment 2). A third set of simulations allows bilateral trade flows to randomly depart from the deterministic gravity equation, but still posits that the distance elasticity and the marginal totals are accurate (experiment 3). Eventually, we try to determine the likely magnitude and pattern of the estimation errors under more realistic scenarios in which several assumptions are violated at the same time (experiment 4). All simulation experiments consist of one thousand replications for each combination of parameter values. In each simulation run, a trade matrix estimate has to be compared with its true counterpart X. The difference between corresponding elements, , will be referred to as a "residual." Following Sargento et al. (2012), we condense the residuals into a single scalar measure of estimation error using the standard total percentage error, 2 . Although a useful gauge of overall discrepancy, the STPE does not convey any information as to what trade flow estimates are biased more severely and in what direction. In the analyses below, it is therefore complemented by several other dissimilarity measures. For example, we will often be interested in assessing whether there is a general tendency for intraregional flows to be over- or under- estimated. For that purpose, we will compute the diagonal total percent error, . In applied work, an accurate estimate of the distance elasticity θ may not be easy to obtain. How is the trade estimates' validity affected if differs from the true θ of the gravity model? To explore this issue, we examine what happens when we try to recover a true matrix generated with θ = 0.9 under various alternative choices of . Specifically, the analysis considers values of at 0.1 intervals in the [0.3, 1.5] range. This range contains approximately 90% of the distance elasticity estimates reviewed by Disdier and Head (2008). Throughout, it is assumed that both σɛ and ση are zero. Table 1 describes the distribution over simulation runs of several measures of dissimilarity between the true and the estimated trade matrix. The symbols M and SD respectively refer to the mean and the standard deviation of the simulated distributions. As expected, when is set equal to the actual distance elasticity of the model, the trade matrix is recovered exactly. Indeed, it can be easily verified that in this case the STPE (column (1)) is always zero. As long as the assumed distance elasticity remains reasonably close to the truth, the estimated matrix exhibits tolerable levels of error. As moves further away from θ in either direction, both the mean STPE and its standard deviation increase. Whether is over- or under- stated, the effects on the STPE are analogous. The corresponding estimates, however, deviate from the true trade matrix in radically different ways. This is apparent in Figure 1, where the same simulated trade matrix is estimated four times for different values of . Each panel plots element-by-element percentage errors, against bilateral distances. In relative terms, a large value of implies a steep distance decay process. Thus, when , the estimated matrix overstates trade between nearby locations and understates trade across larger distances. For , the reverse happens. A natural way of measuring the intensity of biases of the type displayed in Figure 1 is through the slope of the ordinary least squares regression of PEij on logdij. In Table 1, column (2) summarizes its simulated distribution under varying distance elasticity assumptions. As increases, the relationship between the relative estimation error associated with a certain trade flow and the distance between its origin and its destination goes from steeply positive to steeply negative. In column (3), the R-squared of the regression suggests that the relationship is relatively tight at all levels of . From these results, it is clear that the single-simulation exercise of Figure 1 is just an instance of a much more general pattern. The implications for intra-regional trade estimation are worth noting. By nature, intra-regional flows take place over comparatively short distances. In the simple setup of this simulations, this means that whenever is specified incorrectly, the resulting intra-regional trade estimates are all biased in the same direction: upwards when the distance elasticity is set too high and downwards when it is set too low. This can indeed be seen in column (4), which reports what share of the trade matrix's diagonal entries is overestimated for varying choices of . Finally, we examine how the results of Table 1 would be affected if a number of regions other than n = 25 were assumed. This sensitivity analysis considers two alternative values of n, namely 10 and 50. For reasons of space, the full results of this exercise are only displayed in the supplemental materials (Table S1). As expected, in qualitative terms they are entirely analogous to those of Table 1. From a quantitative perspective—everything else being the same—the bias associated with an inaccurate choice of becomes more severe as the number of regions grows. This makes intuitive sense: an increase in n makes the problem of recovering the trade matrix more difficult, as the number of unknown matrix entries rises more rapidly than the number of constraints provided by the row and column totals. Loosely speaking, when the number of regions becomes larger, the estimator leans more heavily on the distance elasticity parameter. As the number of regions in the system grows, how fast does the accuracy cost of a bad of choice of increase? Suppose, for example, that is set equal to 1.2. If n = 10, this leads on average to a STPE of 15% (with SD 2.1). By contrast, when n = 50 the average STPE becomes 19 (SD 1.5). It is tempting to interpret these figures as indicating that, even in the face of a considerable increase in the number of regions, the accuracy of the estimates deteriorates only modestly. Once again, however, focusing solely on the STPE misses the fact that a bad choice of biases introduces a very systematic source of bias. Consider, for instance, the estimation error associated with total intra-regional trade as summarized by the DTPE. With , when the system consists of 10 regions, intraregional trade is overestimated by 20% (SD 2.2) on average. When the number of regions increases to 50, the average DTPE becomes as large as 46% (SD 2.4). Abstracting from this specific example, Figure 2 represents the simulated distribution of the DTPE for various combinations of and n. It is indeed apparent that intra-regional trade estimates are biased much more severely by distance elasticity misspecification if the number of regions is large than if it is small. In reality, the row and column totals of the target matrix are generally unknown. Instead, they have to be estimated indirectly from the available data. Thus, one will typically have to work with target totals that contain at least some amount of measurement error. How will this reflect on the accuracy of the estimated trade matrix? In our simulation framework, the degree of reliability of the available marginal totals is controlled by the parameter ση. We assess the repercussions of measurement error in row and column totals for four alternative values of ση, namely 0.025, 0.05, 0.10 and 0.15. Given that the ηs are normally distributed, setting ση equal to, say, 0.05 implies that approximately 95% of the time the estimated marginal totals will be off by up to 10%. Thus, our choices for ση denote degrees of measurement error that range from modest to severe. In this analysis, it is assumed that the distance elasticity is known (i.e., ) and that the underlying gravity model is deterministic (σɛ = 0). Initially, ρη is set equal to zero. The main results are presented in Table 2. Naturally, as increasing amounts of measurement error are introduced in the marginal totals, the estimated matrix gradually departs from its true counterpart. Thus, the simulated STPE distribution (column (1)) is centred at 3% (SD 0.3) or ση = 0.025 and at 17% (SD 2.2) for ση = 0.15. At the element-by-element level, inaccurate row and column totals give rise to a distinctive pattern of residuals. Depending on its sign, measurement error in a certain marginal total tends to push or pull all entries of the corresponding row or column in the same direction. Evidence of this can be found in column (2), which describes the simulated distribution of the 'row-wise rate of error concordance'. By row-wise rate of error concordance we refer to the percent share of the trade matrix's entries for which the estimated flow is

Referência(s)