Artigo Acesso aberto Revisado por pares

Exploring the Uses of Matched Employer–Employee Datasets

2010; Wiley; Volume: 43; Issue: 2 Linguagem: Inglês

10.1111/j.1467-8462.2010.00594.x

ISSN

1467-8462

Autores

Paul H. Jensen,

Tópico(s)

Social Policy and Reform Studies

Resumo

Australian Economic ReviewVolume 43, Issue 2 p. 209-216 Free Access Exploring the Uses of Matched Employer–Employee Datasets Paul H. Jensen, Paul H. Jensen Melbourne Institute of Applied Economic and Social Research and Intellectual Property Research Institute of Australia, The University of Melbourne I am grateful to Hielke Buddelmeyer, Dean Hyslop, Eric Iversen, Beth Webster and Mark Wooden for valuable conversations about the benefits, practicalities and difficulties associated with the creation and use of linked employer–employee datasets. This article has benefited from their extensive experience.Search for more papers by this author Paul H. Jensen, Paul H. Jensen Melbourne Institute of Applied Economic and Social Research and Intellectual Property Research Institute of Australia, The University of Melbourne I am grateful to Hielke Buddelmeyer, Dean Hyslop, Eric Iversen, Beth Webster and Mark Wooden for valuable conversations about the benefits, practicalities and difficulties associated with the creation and use of linked employer–employee datasets. This article has benefited from their extensive experience.Search for more papers by this author First published: 01 June 2010 https://doi.org/10.1111/j.1467-8462.2010.00594.xCitations: 3 AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL 1. Introduction The last decade or so has seen a remarkable increase in the availability of new unit-record datasets, which link employers and employees. Such datasets are now well established in Norway, Japan, New Zealand, the United States and numerous other industrialised countries around the world (see Haltiwanger et al. 1999 and Bryson, Forth and Barber 2006 for collections of papers using these datasets). As their name suggests, the main attribute of this type of dataset is that it provides important information on both employer and employee, so researchers are able to disentangle the effects of these two factors when analysing important issues such as wage determination, profitability and productivity. This has been a common concern with earlier studies that have only accounted for employee-specific factors. Given the wealth of interesting issues at the intersection of labour and industrial economics for which such data are useful, it begs the question: why has Australia lagged so far behind the rest of the world in creating such datasets?1 As has been often noted in the past, the availability of new data enables researchers to ask interesting (and tractable) policy-relevant questions. For instance, the increase in the supply of research output on income and labour dynamics following the creation of panel datasets, such as the Panel Study of Income Dynamics (PSID), the Household, Income and Labour Dynamics in Australia (HILDA) Survey and the German Socio-Economic Panel (GSOEP), is a good illustration of this point. This suggests that there are important benefits for governments in creating such datasets because they attract researchers, thereby increasing the quantity (and hopefully quality) of the evidence used to design public policy. There is also some evidence to suggest that the availability of data creates interesting questions, in addition to the more common arguments for reverse causation (see Stafford 1986 and Hamermesh 2008 for a discussion of these issues). In this data survey, we provide a comprehensive overview of the burgeoning literature on the use of matched employer–employee datasets. Rather than summarising the results of this vast literature, we provide a guide to the main matched employer–employee datasets around the world, an overview of the publications produced using these datasets and an examination of some of the obstacles to their creation in Australia. One often-raised concern regarding the implementation of linked employer–employee datasets in Australia is that it represents an invasion of personal and company privacy. We re-examine the claim that privacy interests dominate public interest in light of the management of privacy concerns in other countries around the world that have created employer–employee matched datasets.2 2. Rationale for Creating Employer–Employee Datasets In this section, we provide some discussion of the rationale for the creation of matched employer–employee datasets and provide a snapshot of their use and potential. Although the focus here is very much on the intersection between labour and industrial economics, there are many other interesting potential applications of employer–employee matched datasets in macroeconomics, sociology and demography (see Hamermesh 2008 for some examples).3 For the most part, matched datasets serve a common objective that can be categorised in the following way: to aid analysts in the separation of employer and employee effects. The overarching conclusion that can be drawn from the introduction of employer–employee datasets is that both firms and workers play important roles in explaining observed differences in the earnings and productivity of individual workers; to ignore the effects of one would be to overstate the effects of the other. One common use of matched employer–employee datasets in labour economics relates to tests for wage discrimination in the workforce. There is a long and rich history in labour economics relating to the explanation of observed differences in wages across different 'groups of individuals' (for example, by gender, race and ethnicity). The most common test for wage discrimination uses individual-level regressions with a set of observable proxies for productivity and then infers the level of discrimination based on residual wage differences. This approach has proved to be very popular but suffers from a rather acute problem—the proxies used for productivity do not adequately control for group-specific factors that shape productivity and therefore it is impossible to rule out the fact that the observed residual wage differential is the result of unobserved factors other than discrimination or prejudice. The obvious solution to this problem would be to estimate individual-level productivity using a comprehensive set of demographic control variables and then to compare these estimates with the residual wage estimates. Because any unobserved factors that differ across groups are likely to shape both productivity and wages in the same way, this should not bias the test for discrimination. However, the data required to estimate individual productivity are typically unavailable, and resourceful researchers have therefore turned to matched employer–employee datasets, often at the establishment or plant level, as a second-best solution.4 Some of the best work in this area has been done by Hellerstein and Neumark (2006, 2008) and Bayard et al. (2003). Andersson et al. (2009) examine the relationship between firm strategy in an innovative industry (the US software industry) and firms' human resource management practices. Their hypothesis is that firms operating in innovative industries pay a wage premium to hire talented workers in order to keep ahead of their rivals. Analysis of this issue requires information on both firms' compensation payments and their sales revenues. The need for such comprehensive microdata has prevented these sorts of issues being examined in the past (although some studies have focused on this relationship for CEOs, as their compensation is often observable). Andersson et al. (2009) show that firms that operate in subsectors of the software market that are characterised by high potential upside gains from innovation do pay more to their 'star' workers than firms operating in more stable subsectors of the market. To provide a feel for the richness of the employer–employee dataset used in this paper, consider that it tracks the universe of software firms and their employees within 10 states in the United States and contains detailed information on firm revenues by different software product classes and workers' earnings (including exercised stock options and bonuses) over the period 1992–2001. 3. Examples of Employer–Employee Datasets In this section, we provide a comprehensive overview of the major matched employer–employee datasets that are currently in use around the globe. There are, of course, differences in the datasets—in terms of the variables contained, the confidentiality restrictions imposed, the length of the time series, and whether they are cross-sectional or longitudinal—which will shape the types of questions that can be addressed using each dataset. We leave the reader to discover the idiosyncrasies of each dataset and instead focus on the broad characteristics of each, and some of the papers that have been produced using each dataset. 3.1 US Worker Establishment Characteristics Database The US Worker Establishment Characteristics Database (WECD) is the major matched employer–employee dataset used in the United States. It was constructed by the U.S. Census Bureau and links employees' data for a sample drawn from the 1990 Decennial Census of Population to employers' data from the 1989 Longitudinal Research Database. Readers are advised to read Troske (1998) for more details. The final WECD sample (with data on 3102 establishments and 129 606 employees) is reasonably representative of the manufacturing sector of the US economy (note that the sample is limited to the manufacturing sector and only covers a small fraction of the workforce in the sector). It has been utilised in a range of papers including Hellerstein and Neumark (2006). 3.2 New Zealand's Linked Employer–Employee Database Following the observed success of other countries in Europe and North America in their effort to create a matched employer–employee dataset, Statistics New Zealand initiated a similar project in the early 2000s. New Zealand's Linked Employer–Employee Database (LEED) itself integrates information from two different sources: data on pre-tax payments made to employees (which come from New Zealand's Inland Revenue department) and data on jobs, earnings and turnover (which come from the Longitudinal Business Frame at Statistics New Zealand). See Hyslop and Maré 2007 for an example of the application of LEED. 3.3 Norwegian Linked Employer–Employee Database The Norwegian LEED is a large and comprehensive linked employer–employee dataset provided by Statistics Norway. The dataset includes information on workers (gender, education), jobs (position, earnings, wages, fringe benefits, working hours) and firm characteristics such as industry, sector and municipality. Access to the dataset is managed by public authorities and is relatively easy—it can be done by co-location or by encryption. In the dataset, there exists a separate identifying number for each person, establishment and enterprise. This makes it possible to track workers over time, even if they change employer or have unemployment spells. See Møen, Salvanes and Sørensen (2004), Salvanes and Førre (2003), Dale-Olsen (2006) and Hunnes, Møen and Salvanes (2007) for more details. 3.4 Japanese Matched Employer–Employee Database Two sources are used to create the Japanese matched employer–employee dataset (which forms the basis of the paper by Fukao et al. 2006). The employer-side information comes from the annual Census of Manufacture, Larger Establishment Sample, which covers all establishments in the manufacturing sector with 30 or more full-time employees. The employee-side information comes from the Basic Survey of Wage Structure, which is an annual survey of establishments in all sectors with 10 or more full-time employees. The employer transcribes individual workers' information on work hours, wage, age, education, tenure and annual bonus for the year prior to the survey. 3.5 French Linked Employer–Employee Dataset The characteristics of the French linked employer–employee dataset are described in Margolis (2006). The employer and employee data are collated from four different sources: (i) a longitudinal dataset of firm accounts established by INSEE, France's National Institute for Statistics and Economic Studies; (ii) the Modification of Structure database, also compiled by INSEE, which covers all asset transfers between firms (for example, mergers and acquisitions) of more than 8 million French Francs; (iii) the Annual Declarations of Social Data, which is a longitudinal dataset on every job held in the private, state-owned, local government and non-profit sectors by every worker in France; and (iv) the Permanent Demographic Sample, which provides other census-based data on individuals. 3.6 Canadian Workplace and Employee Survey The Canadian Workplace and Employee Survey (WES) was designed to examine a range of issues relating to the demand and supply sides of the labour market using data on employers and employees. The relevant sample is drawn from the Business Register, which is maintained by Statistics Canada using employee lists provided by survey respondents. The initial sample was selected in 1999 and has been subsequently surveyed at 2-year intervals. Each new wave of the survey is augmented with a sample of new company 'births'. Drolet (2002) uses WES 1999 data to examine the causes of gender pay differences. 3.7 Danish Linked Employer–Employee Dataset The Danish linked employer–employee dataset comes from a stratified sample of 3200 Danish private sector firms with more than 20 employees. The dataset was constructed by merging information from a survey of firms and a longitudinal employer–employee dataset on firms' characteristics, performance and employees. The survey was administered by Statistics Denmark in May–June 1999. In the survey, each firm was asked about its work organisation, compensation systems, recruitment, internal training practices and its employee performance mechanisms. Firms were also asked to differentiate between salaried employees and piecemeal workers. For more on this dataset and its application in various contexts, see Datta Gupta and Erikkson (2006) and Erikkson (2003). 3.8 German Matched Employer–Employee Dataset The German matched employer–employee dataset (LIAB)5 combines Federal Employment Agency statistics on employment with plant-level data from the IAB Establishment Panel. The two datasets are matched using a unique establishment-level identifier. Employment data include the individual's three-digit occupation, daily gross wage, gender, year of birth, nationality, marital status, number of children and education. Among other things, the IAB Establishment Panel includes data on sales revenue, exports, investment and age, plus information on employment conditions, the total wage bill, training costs and hours worked. This dataset has been used in Bauer and Bender (2004) and Addison et al. (2006). 3.9 Australian Workplace Industrial Relations Survey 1995 Australia's experience with matched employer–employee datasets is limited to the Australian Workplace Industrial Relations Survey (AWIRS) 1995. The 1995 version of this survey (AWIRS95) was different from its earlier incarnation, AWIRS90, in that it included samples of employee surveys in addition to the employer survey of industrial relations structures, processes and outcomes. The workplace component of AWIRS95 included a number of different samples of Australian employers and a range of different survey instruments. The unique feature of AWIRS95 was a survey of employees that was undertaken in conjunction with the main workplace survey (see Morehead et al. 1997 and Hawke and Wooden 1995 for more details on AWIRS95). This dataset has been used in Wooden (2001), Wooden and Bora (1999) and Almeida-Santos and Mumford (2004). 3.10 British Workplace Employee Relations Survey 1998 The British Workplace Employee Relations Survey (WERS) 1998 was the first example of a matched employer–employee dataset in Britain and it included all workplaces (public and private) with 10 or more employees, across a range of industries (see Forth and Kirby 2000). The survey included data from interviews with the manager of the workplace, a union representative and up to 25 randomly selected employees. Also see Frijters et al. (2004) for an application. This survey has since been repeated: in WERS 2004, interviews with 3200 managers and 1000 union representatives were undertaken, and more than 20 000 employees returned the questionnaire. 3.11 European Structure of Earnings Survey 2002 The European Structure of Earnings Survey (ESES) 2002 is a collection of national surveys conducted in all European Union states using a harmonised collection protocol. As such, it is one of the only examples of a multi-country employer–employee matched dataset. ESES includes information on the level and structure of employee remuneration, employee characteristics (including, gender, age, education and occupation) and their employer (including industry and size). The match between employer and employee is achieved using a unique identifier. For more on this dataset, see Hipolito (2007). 4. Confidentiality Issues The most common objection to the creation of matched datasets (which applies to providing data in general) is that it potentially breaches privacy. Citizens are typically required to provide a great deal of information to the government throughout their life, much of which is highly sensitive. The thought that academics, bureaucrats or others may be able to get access—and perhaps misuse—such data files is of great concern. However, there are ways to overcome these concerns. As part of the U.S. Census Bureau's Longitudinal Employer–Household Dynamics (LEHD) Program,6 a new dataset known as the Quarterly Workforce Indicators was released; it includes a lot of confidential microdata on unemployment insurance wage records and other sensitive demographic and economic information. Considerable effort has gone into making these records confidential using a 'permanent multiplicative noise distortion factor' (see Abowd, Stephens and Vilhuber 2005 for the details), which distorts all input sums, counts, differences and ratios while retaining the analytical validity of the released data. In addition, establishment data is re-weighted to provide state-level comparability with the Quarterly Census of Employment and Wages from the U.S. Bureau of Labor Statistics. In some countries—such as Norway—confidentiality issues have been handled within existing legislative frameworks. The legislation in Norway articulates the process that each prospective user must follow in order to access and use the data. This reaches through to the individual level, as each researcher who is authorised to use the microdata signs a non-disclosure agreement that covers confidential information. There are also major cultural differences across countries relating to confidentiality. Transparency in Norway, for example, is much more engrained than in most other countries. One illustration of this is that anybody in Norway can check online to see someone else's earnings, their wealth and their annual taxes in any given year! This information is made public by the tax authorities based on tax returns. However, they have fairly sophisticated safeguards to ensure that such data are not misused. 5. Conclusions This data survey article has outlined the contribution that matched employer–employee datasets can play in both creating and answering interesting issues in a range of domains. This has obvious benefits for empirical researchers wanting to publish cutting-edge papers in economics journals, but there are also obvious benefits for public policy, as better data can lead to an increase in the number of papers addressing interesting, policy-relevant economic and social issues. Notwithstanding the benefits of matched employer–employee datasets, there are legitimate concerns about the confidentiality of the data because the data we are talking about contain personal details of firms and employees. Given that these techniques have been initiated and implemented in other countries, there does not appear to be any technical impediment to implementing similar safeguards in Australia. Although such safeguards—including providing data by remote-access—may curtail the widespread use of the data, this should not be viewed as a major problem. The main obstacle appears to be that it would require a change in Australian legislation. Given the current Australian Government's stated belief in the merits of evidence-based policy formulation, it seems surprising that advocates for the introduction of matched employer–employee datasets in Australia are not more vocal. The main beneficiaries of the creation of these datasets would be the government agencies that develop public policy. Although the financial implications of matching employer–employee datasets are non-trivial, the potential benefits are huge. Achieving these gains, however, requires the general policy-making community to back the call for greater data availability. Endnotes 1 There has been one taxpayer-funded employer–employee dataset created in Australia: the Australian Workplace Industrial Relations Survey (AWIRS) 1995. However, the employee survey component of AWIRS95 was never repeated and the relevance of employer–employee datasets appears to have fallen off the Australian policy agenda. 2 Although privacy issues are a major concern in Australia, to the best of my knowledge, there were no breaches of privacy with regard to the release of the AWIRS95 data. 3 Other interesting applications of matched employer–employee datasets exist in economic geography, industrial demography and regional science (for example, see Lane and Stephens 2006). 4 One stream of the literature that has attempted to investigate wage discrimination using individual-level productivity relates to academic employment. In this employment environment, researchers such as Kahn (1995) have taken advantage of the fact that the most important productivity metric—(quality-adjusted) number of publications—can be fairly easily observed. 5 This is not the only German matched employer–employee dataset (see, for example, Jirjahn and Stephan 2004). 6 This was the first large-scale linked employer–employee dataset in the United States and was founded by John Haltiwanger, John Abowd and Julia Lane. References Abowd, J. M., Stephens, B. E. and Vilhuber, L. 2005, ' Confidentiality protection in the Census Bureaus Quarterly Workforce Indicators', U.S. Census Bureau Technical Paper no. TP-2006-02, Suitland, Maryland . Addison, J. T., Bellman, L., Schank, T. and Teixeira, P. 2006, ' The determinants of the employment structure: Wages, trade, technology, and organisational change', in Making Linked Employer–Employee Data Relevant to Policy, DTI Occasional Paper no. 4, eds A. Bryson, J. Forth and C. Barber, Department of Trade and Industry, London . Almeida-Santos, F. and Mumford, K. 2004, 'Employee training in Australia: Evidence from AWIRS', Economic Record, vol. 80, pp. S53– 64. Andersson, F. D., Freedman, M., Haltiwanger, J. C., Lane, J. and Shaw, K. L. 2009, 'Reaching for the stars: Who pays for talent in innovative industries?', Economic Journal, vol. 119, pp. F308– 32. Bauer, T. K. and Bender, S. 2004, 'Technological change, organizational change, and job turnover', Labour Economics, vol. 11, pp. 265– 91. Bayard, K., Hellerstein, J., Neumark, D. and Troske, K. 2003, 'New evidence on sex segregation and sex differences in wages from matched employee–employer data', Journal of Labor Economics, vol. 21, pp. 887– 922. A. Bryson, J. Forth and C. Barber (eds) 2006, Making Linked Employer–Employee Data Relevant to Policy, DTI Occasional Paper no. 4, Department of Trade and Industry, London . Dale-Olsen, H. 2006, ' Using linked employer–employee data to analyse fringe benefits policies: Norwegian experiences', in Making Linked Employer–Employee Data Relevant to Policy, DTI Occasional Paper no. 4, eds A. Bryson, J. Forth and C. Barber, Department of Trade and Industry, London . Datta Gupta, N. and Erikkson, T. 2006, ' New workplace practices and the gender wage gap: Can the new economy be the great equaliser?', in Making Linked Employer–Employee Data Relevant to Policy, DTI Occasional Paper no. 4, eds A. Bryson, J. Forth and C. Barber, Department of Trade and Industry, London . Drolet, M. 2002, 'Can the workplace explain Canadian gender pay differentials?', Canadian Public Policy, vol. 28, pp. S41– 63. Erikkson, T. 2003, ' The effects of new work practices: Evidence from employer–employee data', in Advances in the Economic Analysis of Participatory and Labor-Managed Firms: The Determinants of the Incidence and the Effects of Participatory Organizations, vol. 7, eds T. Kato and J. Pliskin, Elsevier, Amsterdam . Forth, J. and Kirby, S. 2000, Guide to the Analysis of the Workplace Employee Relations Survey 1998, National Institute of Economic and Social Research , London . Frijters, P., Shields, M., Theodoropoulos, N. and Wheatley Price, S. 2004, ' Testing for employee discrimination using matched employer–employee data: Theory and evidence', Department of Economics Research Paper no. 915, University of Melbourne . Fukao, K., Kambayashi, R., Kawaguchi, D., Kwon, H. U., Kim, Y. G. and Yokoyama, I. 2006, ' Deferred compensation: Evidence from employer–employee matched data from Japan', Hitotsubashi University Research Unit for Statistical Analysis in Social Sciences Discussion Paper no. 187, Institute of Economic Research, Hitotsubashi University . Haltiwanger, J. C., Lane, J. I., Spletzer, J. R., Theeuwes, J. J. M. and Troske, K. R. 1999, Contributions to Economic Analysis: The Creation and Analysis of Employer–Employee Matched Data, vol. 241, North Holland, Amsterdam . Hamermesh, D. 2008, 'Fun with matched firm– employee data: Progress and road maps', Labour Economics, vol. 15, pp. 662– 72. Hellerstein, J. and Neumark, D. 2006, ' Using matched employer–employee data to study labor market discrimination', in Handbook on the Economics of Discrimination, ed. W. M. Rodgers, Edward Elgar Publishing, United Kingdom . Hellerstein, J. and Neumark, D. 2008, 'Workplace segregation in the United States: Race, ethnicity and skill', Review of Economics and Statistics, vol. 90, pp. 459– 77. Hipolito, S. 2007, ' The gender pay gap in Europe: An international comparison with matched employer–employee data', unpublished paper, Faculty of Economics and Management, University of Alicante . Hunnes, A., Møen, J. and Salvanes, K. G. 1997, ' Wage structure and labor mobility in Norway 1980', in Wage Structure, Raises and Mobility: International Comparisons of the Structure of Wages Within and Across Firms, eds E. Lazear and K. Shaw, University of Chicago Press, Chicago . Hyslop, D. and Maré, D. 2007, 'Earnings heterogeneity and job matching: Evidence from Linked Employer–Employee Data', New Zealand Journal of Employment Relations, vol. 32, no. 1, pp. 1– 16. Jirjahn, U. and Stephan, G. 2004, 'Gender, piece rates and wages: Evidence from matched employer–employee data', Cambridge Journal of Economics, vol. 28, pp. 683– 704. Kahn, S. 1995, 'Women in the economics profession', Journal of Economic Perspectives, vol. 9, no. 4, pp. 193– 206. Lane, J. and Stephens, B. 2006, 'Integrated employer–employee data: New resources for regional data analysis', International Regional Science Review, vol. 29, pp. 264– 77. Margolis, D. 2006, ' Compensation policy, human resource management practices and takeovers', in Making Linked Employer–Employee Data Relevant to Policy, DTI Occasional Paper no. 4, eds A. Bryson, J. Forth and C. Barber, Department of Trade and Industry, London . Møen, J., Salvanes, K. G. and Sørensen, E. Ø. 2004, ' Documentation of the Linked Employer–Employee Data Base at the Norwegian School of Economics and Business Administration', unpublished paper, Norwegian School of Economics and Business Administration, Bergen . Morehead, A., Steele, M., Alexander, M., Stephen, K. and Duffin, L. 1997, Changes at Work: The 1995 Australian Workplace Industrial Relations Survey, Addison Wesley Longman, Melbourne . Salvanes, K. G., Burgess, S. and Lane, J. 1999, ' Sources of earnings dispersion in a linked employer–employee data set: Evidence from Norway', in Contributions to Economic Analysis: The Creation and Analysis of Employer–Employee Matched Data, vol. 241, eds J. C. Haltiwanger, J. I. Lane, J. R. Spletzer, J. J. M. Theeuwes and K. R. Troske, North Holland, Amsterdam . Stafford, F. 1986, ' Forestalling the demise of empirical economics: The role of microdata in labor economics research', in Handbook of Labor Economics, vol. 1, eds O. Ashenfelter and R. Layard, North Holland, Amsterdam . Troske, K. R. 1998, ' The worker–establishment characteristics database', in Labor Statistics Measurement Issues, eds J. Haltiwanger, M. Manser and R. Topel, University of Chicago Press. Wooden, M. P. 2001, 'Union wage effects in the presence of enterprise bargaining', Economic Record, vol. 77, pp. 1– 18. Wooden, M. P. and Bora, B. 1999, 'Workplace characteristics and their effects on wages: Australian evidence', Australian Economic Papers, vol. 38, pp. 276– 89. Citing Literature Volume43, Issue2June 2010Pages 209-216 ReferencesRelatedInformation

Referência(s)
Altmetric
PlumX