The evolution of Zipf's Law for U.S. cities
2019; Elsevier BV; Volume: 99; Issue: 3 Linguagem: Inglês
10.1111/pirs.12498
ISSN1435-5957
AutoresAngelina Hackmann, Torben Klarl,
Tópico(s)Human Mobility and Location-Based Analysis
ResumoExploiting the hierarchical structure of cities and based on a dataset for U.S. cities between 1840 and 2016, the aim of this paper is to analyze the evolution of the U.S. city size distribution. For that purpose we estimate a general three-parameter Zipf model, which can be traced back to Mandelbrot (1982), and validate our results by means of the hierarchical scaling law. Especially in the second half of the twentieth century, we find a pronounced departure from the exact Zipf's law. The city size distribution has become more equally distributed over time. Besides, the applied estimation method reveals evidence for leading cities dominating the remaining largest cities. Thus, the growing equality of the city sizes can be explained rather by growing smaller cities than by a loss of importance of the largest ones. Este artículo explota la estructura jerárquica de las ciudades y utiliza un conjunto de datos de ciudades de EE.UU. entre 1840 y 2016 con el objetivo de analizar la evolución de la distribución de tamaños de las ciudades de EE.UU. Para ello se estimó un modelo general de Zipf de tres parámetros, que se remonta a Mandelbrot (1982), y los resultados se validaron mediante la ley de escala jerárquica. Especialmente en la segunda mitad del siglo XX, se encontró una pronunciada desviación de la ley exacta de Zipf. La distribución del tamaño de las ciudades se ha venido igualando con el tiempo. Además, el método de estimación aplicado revela pruebas de que las principales ciudades dominan a las restantes ciudades más grandes. Así pues, la creciente igualdad de los tamaños de las ciudades puede explicarse más bien como debida el crecimiento de las ciudades más pequeñas que a la pérdida de importancia de las más grandes. 本稿では、都市の階層構造を利用し、1840~2016年の米国都市のデータセットに基づいて、米国都市の規模の分布の進化を分析する。この目的のために、古くはMandelbrot(1982)にまで遡る、一般的な3パラメータのZipfモデルにより推計を行い、階層スケーリング則によって検証した。特に20世紀後半には、正確なZipfの法則からの大きな逸脱が認められる。都市の規模の分布は、時間の経過とともに、より均等になってきている。さらに、適用した推計法から、主要都市が他の大都市より大きいことを示すエビデンスが得られる。以上のように、都市の規模の均等化は、大都市の重要性の消失ではなく、むしろより小規模都市の拡大によって説明される。 Gabaix (1999) proposed that "[...] city size processes must have the time to converge to Zipf's law". Accordingly, city size processes can be described as an evolutionary process where different states of urbanization require different forms of city size distributions. Stated in other words: Even if the city size process converges to the well-known exact Zipf's law 11 Based on the family of Zipf models, the terms "exact Zipf's law" and "one-parameter Zipf model" are used interchangeably in this paper. If the scaling exponent differs from one, we'll receive a two- or three-parameter Zipf model. See section 2 for more details. , this law must not necessarily fit every stage of urbanization. Using a novel methodology and based on a dataset for U.S. cities between 1840 and 2016, the aim of this paper is to answer four important questions: First, which form of the presented family of Zipf models can be used to describe the U.S. city size distribution? Second, do we observe that the U.S. city size distribution exhibits a smooth transition towards the exact Zipf's law from the beginning or are there periods showing a pronounced departure from Zipf's law? Third, if we observe periods of departure, will the city sizes be more equally or unequally distributed than predicted by the exact Zipf's law? Fourth, employing information from the hierarchical structure of cities, do we always find evidence for primate cities for a specific period of time? In order to answer the raised first three questions, we estimate a more general three-parameter Zipf model, which can be traced back to Mandelbrot (1982). To validate these results and to answer the fourth question, we make use of the finding made by Chen (2012b) who shows that Zipf's law can be derived by the hierarchical scaling law based on a hierarchical urban structure. Intuitively, if the top level of a hierarchy is vacant, we can conclude that there is no evidence for primate cities. The paper makes the following points: First, for the great majority of the examined years between 1840 and 2016, the U.S. city size distribution can be described by a two-parameter Zipf model with a decreasing scaling exponent. From this result we can conjecture that the U.S. city size distribution has become more equally distributed over time thereby diverging from the exact Zipf's law. Moreover, we find evidence for leading cities dominating the remaining largest cities, which indicates that the growing equality of the city sizes is due to the growth of smaller cities instead of a loss of importance of the largest cities. Relating our results with the findings made by Black and Henderson (2003) or Dobkins and Ioannides (2000), we further conclude that especially in the last decades of the twentieth century, the growth of the largest U.S. areas has mainly taken the form of suburbanization. The next section outlines the one-, two- and three-parameter Zipf model. In section 3 we show the correspondence between Zipf's law and the hierarchical scaling law. Section 4 presents the estimation and validation procedure, followed by section 5 which presents the data. Section 6 discusses the results. Section 7 contrasts our findings with the relevant literature and concludes. In this section we shortly introduce the three-, two- and one-parameter Zipf models. Suppose that denotes the size of a city with rank , where the largest city has rank 1. Further, let denote a scale-translational parameter and define as the scaling exponent. As shown by Chen (2012b), Zipf's law can be transformed into the hierarchical scaling law, which can be applied to reveal the scaling relations of the hierarchical structure of the city sizes. We will use the mathematical relationship between the two models in order to validate the estimated one-, two- or three-parameter Zipf model. In what follows, we briefly show the correspondence between the hierarchical scaling law and Zipf's law. Based on Chen (2016), in this section, we present the estimation as well as the validation procedure that is used to study the evolution of the U.S. city size distribution. As a first step, the scaling range is determined, which is a straight line on the plot with the logarithmized size of the city on the y-axis and the logarithmized rank of the city on the x-axis. Cities beyond this scaling range represent underdeveloped cities and they are not considered in the analysis. Applying an OLS estimation yields a residual value for each city and standardized residuals can be calculated. As proposed by Chen (2015), if a standardized residual value is smaller than or larger than 2, then the associated data point will be treated as an outlier based on the significance level and it will be left out of the estimation. The ascertained model can be transformed into the hierarchical scaling law 7, which is based on the hierarchy constructed by the city number law or city size law. In order to validate our Zipf models, we apply the city number law 5. Given a number ratio of e.g. , then the number of cities in the different levels will be a geometric sequence such as . The average city size at each level can be easily calculated, leading to a number based urban hierarchy. We can make a least square calculation to examine whether the hierarchical scaling law can be well fitted to this hierarchical dataset and thereby whether our estimated Zipf model can be validated. If the difference ( ), we can be sure that the city structure follows the hierarchical scaling law 7 and the ascertained Zipf model can be validated. In order to study the evolution of the U.S. city size distribution, a dataset from the U.S. Bureau of the Census is applied. It contains the population data of the 100 largest urban places in the U.S., which we refer to as "cities" in this paper. 55 Before 1950, urban places were defined as incorporated places with at least 2500 inhabitants. Since 1950, the Census Bureau has differentiated between large cities, which are considered in our study, and urbanized areas in order to account for suburban areas in the vicinity of large cities. The data can be accessed online from https://www.census.gov/population/www/documentation/twps0027/twps0027.html\urban, http://demographia.com/db-uscity98.htm and https://www.census.gov/data/tables/2016/demo/popest/total-cities-and-towns.html. As a sample selection criterion, we follow Rosen and Resnick's (1980) number threshold approach and examine a fixed number of cities every ten years from 1840 to 2016. 66 An overview of the number of inhabitants for selected years can be found in Table 1. Besides the size distribution of the 100 largest cities, for the last four dates, we additionally consider larger data samples to check whether the results change when including more cities. The dataset is, therefore, supplemented by the sizes of the 601 largest cities for the years 1990 and 2000 as well as the sizes of the 300 largest cities for 2010 and 2016. We follow Chen's (2016) procedure to estimate the scaling exponent for different parameters and stop when the value of goodness of fit ( ) reaches its highest value. 77 As an example, the estimation results for the years 1880 and 2016 are depicted in Figure 2. Figure 1 shows the evolution of the scaling exponent over time. For most of the years, this maximum is attained for (see Table 2), thus rejecting a three-parameter Zipf model. Further, the scaling exponent decreases over the time horizon (see Figure 1). For the first 60 years (from 1840 to 1900) fluctuates around the value one. From 1910 to 1950, we observe that remains constant taking a value slightly above . Starting with the year 1950, the calculated scaling exponent distinctly decreases to in 1990, followed by a further reduction with a value of until 2016. Estimating a scaling exponent , which is significantly lower than one, indicates a city size distribution which is more equally distributed than expected by the exact Zipf's law. However, we also find exemptions from this behavior. In particular, we cannot reject the exact Zipf's law ( ) for the years 1860 and 1870. In the years 1880 and 1890 the value of goodness of fit did not reach its maximum for , but for . Hence, the U.S. city size distribution followed the three-parameter Zipf model in these years. For the dataset from 1850, is optimal and cannot be rejected. We receive a special form of the two-parameter model. As an example, Figure 2 compares the estimation results for and in the years 1880 and 2016. For 1880, the city size distribution is most accurately described by a three-parameter model ( ), as the largest city is too small to dominate the remaining cities. Hence, there is a gap between the real largest city in the data and the possible largest city predicted by the model. For the year 2016, on the contrary, the two-parameter model ( ) fits the data set most accurately. We find evidence for a leading city dominating the remaining largest cities. To check the robustness of our findings, we considered larger datasets, yielding rather similar estimation results. 77 For the year 1990, the 587 largest cities and for 2000 all of the 601 cities are within the scaling range. In 2010, 299 cities and in 2016 all 300 cities are included in the estimation. Section 4.1 precisely describes how to determine the scaling range. We find that the scaling parameter is optimal and the scaling exponent is slightly decreasing from in 1990 to in 2016. To sum up, and with the exemption of the years 1850 to 1890, the U.S. city size distribution significantly (at a 5% level of significance) follows a two-parameter Zipf model in the years 1840-2016 even when considering larger samples. Hence, we can clearly reject the exact form of Zipf's law for U.S. city data. We find that for most of the years the city size distributions are more equally distributed than expected by the exact Zipf's law and that they have become more equally distributed over time. We exploit the above mentioned dual relationship between the hierarchical scaling and Zipf's law to obtain a more precise understanding of the structure of urban hierarchies. In particular, we want to explore whether or not the existence of primate cities is a time invariant pattern that describes the U.S. city size distribution. In order to answer this question, we have to make sure that the city structure follows a hierarchical scaling law. 88 Detailed information on the construction of urban hierarchies and the validation procedure are given in section 4.3. According to the city number law 5, the cities are ranked into 7 levels. If the one- or the two-parameter Zipf model fits the data, the first level in the hierarchical structure consists of the largest city. The next level comprises the second and third largest cities, the third level consists of the fourth to the seventh largest cities and so on. The last level is supposed to comprise 64 cities, but because our dataset only contains 100 cities, the last level comprises 37 cities. Hence, it is not included in the estimation (see Table 3). The estimation of a three-parameter or the special form of the two-parameter model with suggests an absence of leading cities. That is why the first two levels are absent when constructing the city hierarchy. So, the four largest cities are classed with the third level. Again, the last level comprising the forty smallest cities is not included in the estimation, as it is a lame-duck class. 99 The classification for the years 1880 and 2016, when a three-parameter model and a two-parameter model hold, is presented in Table 3. Looking at Table 4, we see that the city structure follows a hierarchical scaling law from 1840 to 1950 as well as for 1990-2016 when larger datasets are used. For these years, our estimated Zipf models can be validated. We can confirm the absence of leading cities for the years 1850, 1880 and 1890, in which a three- or special form of the two-parameter model was estimated ( ), by finding that the hierarchical scaling law fits an urban structure without top levels. For the remaining years, in which we estimated a Zipf model with , we find that the hierarchical scaling law fits an urban structure with the largest cities at the top levels. Hence, we can confirm the existence of leading cities for most of the years. For the 100 largest cities, we observe a pronounced divergence from the hierarchical scaling law starting with 1960 until the year 2016. This can also be seen by comparing the log-rank/log-size plot with the hierarchical scaling relation between the average sizes in the hierarchies of the U.S. cities and the city numbers. To sum up, for the 100 largest cities and for most of the time span 1840-2016 we find evidence for leading cities dominating the remaining largest U.S. cities and we find a divergence from the hierarchical scaling law. This paper reveals the following aspects of the evolution of the U.S. city size distribution: (1) The 100 largest U.S. cities can mostly be described by a two-parameter Zipf model between 1840 and 2016. (2) For most of the years, the examined scaling exponent is lower than one and it has decreased, especially during the second half of the twentieth century. (3) The U.S. city size distribution has become more even over time and diverged from the exact Zipf's law. (4) For most of the years, we find evidence for leading cities dominating the remaining largest U.S. cities. When relating our findings to the existing relevant literature, it is striking that the great majority of studies uses cross-sectional data to check whether or not Zipf's law holds exactly. 1010 A detailed literature review on the theoretical and empirical findings on Zipf's law is given by Arshad et al. (2018). For instance, Krugman (1996) and Gabaix (1999) use data for U.S. Metropolitan Statistical Areas (MSAs) and find that the one-parameter Zipf model holds exactly for a minimum threshold of 280,000 inhabitants. These findings are recently confirmed by Schmidheiny and Suedekum (2015) using novel data from an EC-OECD project. Zipf's law also occurs when applying other city definitions, like economic areas (Berry and Okulicz-Kozaryn (2012)), natural cities (Jiang and Jia (2011)) or geographic clusters (Rozenfeld et al. (2011)). Some studies found opposing results for the U.S. city size distribution (Eckhout (2004)) or found that Zipf's law only holds for the upper tail of the distribution while the body and lower tail are lognormal (Levy (2009), Malevergne et al. (2011) and Ioannides and Skouras (2013)). Using U.S. census data, Soo (2005) found that the largest cities are more evenly and the largest urban agglomerations are more unevenly distributed than predicted by the exact Zipf's law (also see Gan et al. (2006) and Ioannides and Overman (2003)). Focusing on the long-term perspective of Zipfs law, again, the results depend on the employed city definition. For Metropolitan Statistical Areas, Black and Henderson (2003) or Dobkins and Ioannides (2000, 2001) find an increasing urban concentration, which is higher than predicted by the exact Zipf's law. Other authors focused on states (Soo (2012)), counties (Beeson et al. (2001) and Desmet and Rappaport (2017)) or minor civil divisions (Michaels et al. (2012)) in the U.S.. Closest to our study is Gonzáles-Val (2010). Comparing U.S. incorporated places, the author finds that the city sizes are lognormally distributed and more unequally distributed than predicted by the exact Zipf's law. Regarding the upper tail of the city size distribution, the author finds that the cities become more equally distributed over time. This is in line with our results, which clearly show that since 1960, the scaling exponent significantly drops year by year until 2016, indicating more evenly distributed city sizes and a departure from Zipf's law for the 100 largest cities. In contrast to Gonzáles-Val (2010), who explains the convergence of the city sizes with a loss of importance of the largest cities, for most of the time span, we find evidence for leading cities dominating the remaining largest cities. Our results indicate that the growth of the smaller cities plays the main role in the convergence process. At the same time, Black and Henderson (2003) and Dobkins and Ioannides (2000) found that U.S. MSAs have become more unequally distributed during the twentieth century. Connecting these results to the convergence of city sizes, we found, confirms an increasing suburbanization in the growth process of the largest U.S. urban areas starting in the 1960s (Soo (2005)). 1111 According to Boustan and Shertzer (2013), a large portion of suburbanization in the U.S. over the twentieth century can be explained by factors associated with the natural evolution process of urbanization, like rising incomes, which led to a larger demand for housing and land, as well as transportation improvements, especially the growing network of interstate highways. Furthermore, the authors state that factors associated with the flight-from-blight theory of suburbanization, like school quality, taxes, crime-rates and socioeconomic factors of the population, reinforced the spatial dispersion. Also see Mieszkowski and Mills (1993), Bayoh et al. (2006) and Kim (2000). The main point this paper makes is that the U.S. city size distribution has moved away from the exact Zipf's law, especially in the second half of the twentieth century. While for the years 1850, 1880 and 1890, leading cities are missing, they exist for each Census year from 1900 onwards. The scaling exponent decreased, indicating more equally distributed city sizes. In turn, different regimes of Zipf models imply different conditions of city development. Thus, the main deficiency of this paper is that we cannot identify which are the driving forces leading to this evolutionary development away from the exact Zipfs law over time. A more elaborated investigation is definitely needed, but beyond the scope of this paper. Besides that, the rather subjective definition of a city might influence the results this paper makes. Insofar, the results cannot be generalized to other countries or to the same country but with a different city definition. However, this problem is common to every study dealing with city-level data.
Referência(s)