Statistical investigation of hourly OMNI solar wind data
2011; American Geophysical Union; Volume: 116; Issue: A12 Linguagem: Inglês
10.1029/2011ja017027
ISSN2156-2202
AutoresL. J. Thatcher, Hans‐Reinhard Müller,
Tópico(s)Geomagnetism and Paleomagnetism Studies
ResumoJournal of Geophysical Research: Space PhysicsVolume 116, Issue A12 Solar and Heliospheric PhysicsFree Access Statistical investigation of hourly OMNI solar wind data L. J. Thatcher, L. J. Thatcher [email protected] Department of Physics and Astronomy, Dartmouth College, Hanover, New Hampshire, USASearch for more papers by this authorH.-R. Müller, H.-R. Müller Department of Physics and Astronomy, Dartmouth College, Hanover, New Hampshire, USA Center for Space Plasma and Aeronomic Research, University of Alabama in Huntsville, Huntsville, Alabama, USASearch for more papers by this author L. J. Thatcher, L. J. Thatcher [email protected] Department of Physics and Astronomy, Dartmouth College, Hanover, New Hampshire, USASearch for more papers by this authorH.-R. Müller, H.-R. Müller Department of Physics and Astronomy, Dartmouth College, Hanover, New Hampshire, USA Center for Space Plasma and Aeronomic Research, University of Alabama in Huntsville, Huntsville, Alabama, USASearch for more papers by this author First published: 30 December 2011 https://doi.org/10.1029/2011JA017027Citations: 11AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Abstract [1] Hourly OMNI solar wind data are sorted into categories reflecting membership of each data point to either slow or fast solar wind streams, or to either coronal mass ejection or corotating interaction region environments. The categorization is inspired by Yermolaev et al. (2009) and modified from there. Durations and coverage fractions of each category are investigated, together with their dependence on the solar activity cycle. The results are in line with physical expectations for the solar wind at 1 AU. A further analysis, treating hourly solar wind fluctuations as a constrained random walk process, is carried out independently for each solar wind category and discussed. The resulting step size distributions are found to be largely symmetric across zero, resembling a random walk deviation from a long-term average. This constrained random walk can in principle be used to fill gaps in the OMNI data and perform other OMNI data extrapolations. Key Points Effective categorization of hourly OMNI solar wind data Solar wind decomposed as long term averages and random walk Distributions of solar wind variables and their derivatives 1. Introduction [2] In recent years, the termination shock crossings of the two Voyager spacecraft and the successful energetic neutral atom (ENA) mapping by the Interstellar Boundary Explorer (IBEX) satellite have further kindled already great interest in the physics of the global heliosphere. Computer models of the global heliosphere remain an important tool for understanding the often surprising observations. In response to the new observations, efforts to include all necessary physics and increase the realism of the models are underway by a number of research groups. [3] The solar wind as measured by various spacecraft in the inner heliosphere is not a stationary flow, but a manifestly time-dependent flow. This time dependence is transported through the entire heliosphere, including beyond the termination shock and out to the plasma on the interstellar side of the heliopause. The plasma distribution contributes to neutral loss and production terms, so that a time dependence of neutral measurements is to be expected as well. IBEX has indeed observed a time dependence of the prominent neutral "ribbon" feature and other short- and long-term evolution in the neutral fluxes [McComas et al., 2010; Reisenfeld et al., 2010]. To model a time-dependent global heliosphere, it would be ideal to load the inner boundary, where solar wind is being fed into the simulation, with observed solar wind in every possible direction (i.e., measured on a sphere around the Sun). In reality, few measured directions are available for each moment in time, primarily the earthward line of sight. To make progress, usually an approximate solar wind model is used, loosely tied to actual solar wind behavior. These models are sometimes augmented by direct observations, mostly in the ecliptic, that are Carrington averaged [e.g., Izmodenov et al., 2005] to spread the information to all ecliptic longitudes. To date, Ulysses is the only mission to yield systematic out-of-ecliptic solar wind data [e.g., Ebert et al., 2009]; Ulysses data are used in global heliospheric modeling as well [Pogorelov et al., 2010]. [4] The long-term goal for the work presented in this current paper is to extrapolate the long-duration OMNI data taken at the Sun-Earth line to all other longitudes in the ecliptic in such a way that the extrapolated data are statistically similar to the OMNI data on physically important timescales. A further extension will construct a viable whole sphere time-dependent input data set by combining this ecliptic data set with the Ulysses measurements out of the ecliptic. The focus lies on the OMNI data because they span four solar cycles. Some pathways that take solar wind close to the nose of the heliosphere and then onward to the heliotail can be estimated to take around two solar cycles or more to reach the tail. Depending on their energy, ENAs produced by such solar wind parcels may take on the order of 10 years to travel back to IBEX, during which time they are liable to charge exchange with a time-variable plasma background. In short, a global heliosphere model driven by four solar cycles of realistic solar wind input is appropriate for the simulation of time-dependent neutral fluxes measured currently by IBEX. [5] As an initial building block of the outlined larger project, this current paper pursues a more modest goal: It seeks to establish the statistical characteristics of the OMNI data and how the OMNI data change from hour to hour. The OMNI data set is available at the Space Physics Data Facility (SPDF) of NASA's Goddard Space Flight Center (http://omniweb.gsfc.nasa.gov/). The hourly OMNI data set [King and Papitashvili, 2005] covers the time range from November 1963 to present and consists of a careful synopsis of observations by 18 spacecraft. The data are time averaged by SPDF and available in multiple cadences, including the hourly cadence utilized in this paper. The data are also ballistically time shifted, such that they refer to a common reference point at 1 AU. The data coverage is less than complete, as there has not always been a spacecraft present in the solar wind (undisturbed by the magnetosphere) for the entire time period that OMNI spans. Moreover, instrument outages that cannot be bridged also leave data gaps in the hourly OMNI data. The future extrapolation technique will fill these data gaps, to create a continuous hourly data set that is identical to OMNI wherever possible and is statistically similar to it everywhere else. [6] The OMNI data have been used in a variety of ways in the past, including studies of the statistics of the solar wind. The large time span of OMNI provides researchers with the data needed to assess the solar wind and the interplanetary magnetic field (IMF), not just within a single solar cycle, but across successive solar cycles. Using OMNI, Richardson et al. [1996] studied general changes in the solar wind within a solar cycle, and Smith and Balogh [2003] analyzed the change in open flux between solar cycles, which they found to differ from one solar cycle to another, unlike earlier studies. The long scope of OMNI also provides for meaningful statistical surveys. Those surveys often focus on a particular solar wind characteristic, for example the low proton density statistics done by Richardson et al. [2000] or Watari et al. [2000]; both found strong correlations between proton density and the passage of transient events. Richardson and Cane [1995] show that low temperature to speed ratio in the solar wind strongly correlates with the passage of coronal ejecta, and provide an expected temperature for solar wind protons, Tex = (0.0106 v − 0.278)3/R for v < 500 km/s and Tex = (0.77 v − 265)/R for v ≥ 500 km/s, where Tex is in K, v is the measured solar wind speed in km/s, and R is the distance from the Sun in AU. Elliott et al. [2010] further refined the temperature-speed relationship by combining OMNI and Ulysses data, while Ebert et al. [2009] used Ulysses to characterize solar wind flows at various radii. Borovsky and Denton [2010] characterized corotating interaction regions (CIRs) and the wave characteristics ahead, during and following the passage of a CIR. General statistical studies of the solar wind, such as those by Mullan and Smith [2006], have found plasma β and Alfvèn speed distributions to be lognormal and the respective variances of these variables to be too low for the independence of density and magnetic field. Many of the statistical surveys study correlations between solar wind signatures and geoeffectiveness [e.g., Papitashvili et al., 2000]. A broad statistical survey of OMNI by Yermolaev et al. [2005] led to a semiautomatic solar wind categorization method developed by Yermolaev et al. [2009, hereinafter Y09], which will be discussed in detail below. Much of the work based on the Y09 study has concentrated on geoeffectiveness and magnetospheric storms [e.g., Yermolaev et al., 2011] and will not be discussed in the present paper. [7] The remainder of this paper discusses the autocorrelation times of direct and composite solar wind variables (section 2). The different categories of solar wind are introduced and the process of distinguishing between them is discussed. A fully automated method is introduced and compared with the partially manual method in Y09 (section 3). In section 4, a brief analysis of the OMNI data between categories suggests a method for modeling the solar wind as a random walk signal superposed onto a long-term average. 2. OMNI Data Autocorrelation [8] Filling the data gaps with interpolated data values has been treated previously in the literature by Qin et al. [2007]. Their method provides for a smooth transition at the beginning and the end of a gap interval from the last/first known data point to an average solar wind; the transition time is inspired by the autocorrelation of solar wind variables. The autocorrelation analysis is repeated in this paper with very similar results. Here, the analysis provides a motivation for averaging window lengths and other procedures requiring ad hoc time ranges. The autocorrelation time is defined as the maximum time separation of two hourly OMNI data points whose correlation coefficient, stays above 0.8. Here N is the number of hours in the entire OMNI data set, k is the hourly separation, xi is OMNI data from hour i, and the means 〈x〉 and 〈x〉′ span [1, N − k] and [1 + k, N], respectively. This calculation is done for as much of the data set as possible by applying equation (1) to the entire OMNI data set and excluding terms from the sums when either xi or xi+k are unavailable due to a gap. Care is taken such that all sums in equation (1) include and exclude the same terms. [9] Others, such as Qin et al. [2007], have opted for a more streamlined approach by taking the Fourier transform (FT) of the series in equation (1) to get as described by Press et al. [1998], where k is the hourly separation, Hk is the FT of the series xi, and the asterisk denotes complex conjugation. This approach is repeated in this paper, though this method cannot tolerate data gaps. Although methods have been developed to apply FTs to irregularly sampled data sets, there is no canonical technique accepted as of yet, so only periods with more than 256 consecutive nongap hourly OMNI data are used in the autocorrelation analysis. The autocorrelation coefficients depend upon the span of the FT. Autocorrelation times become longer with FT span, but eventually the autocorrelation times level off at a span of 256 h. If longer durations of gapless data exist, the longer span FT is used. A weighted average of these autocorrelation times is taken to determine the autocorrelation time for all OMNI. The autocorrelation results based on both equations (1) and (2) are presented in Table 1. Table 1. Characteristic Autocorrelation Times of Hourly OMNI Data Showing the Results of Two Methods for Determining the Autocorrelation Coefficient: The Standard Summation and a Fourier Transforma Autocorrelation Time Variable Sumb FTc ∣B∣ 5 4.36 ∣v∣ 19 13.48 n 3 5.47 T 4 5.33 QI 3 2.47 nv 3 4.3 Pram 2 3.92 nkBT 2 3.9 β 1 2.37 Pt 4 5.70 a Characteristic autocorrelation times of hourly OMNI data defined here as time separation (in hours) between two points in the hourly OMNI data that have an autocorrelation coefficient of 0.8. b Standard summation (equation (1)). c Fourier transform (equation (2)). [10] Additional variables are added in Table 1, all of which are composites of the core OMNI variables (v, n, T, B). The solar wind quasi-invariant (QI) was first introduced by Osherovich et al. [1999] as a proxy for sunspot number compiled entirely from in situ solar wind data; QI is defined as the inverse square of the Alfvèn Mach number [B/(v )]−2. The value of ρ is given by the product of the hourly density n provided by OMNI and proton mass mp = 1.67 × 10−27 kg throughout this paper. The product nv is the number flux, which, when combined with particle momentum, gives the fluid ram pressure Pram = ρv2. Thermal pressure and the plasma beta β adhere to standard definitions. Perpendicular pressure Pt, as defined by Jian et al. [2006a, 2006b], describes the pressure perpendicular to magnetic fields in the solar wind frame of reference. This pressure was found by Jian et al. [2006a, 2006b] to be greatly increased when one solar wind flux rope overtakes another, specifically for the passage of a coronal mass ejection (CME) or a CIR. It is given as Pt = B2/(2 μ0) + ∑jnjkBT⊥,j, where j represents the proton, electron and α particle values (assume nα = 0.04 np and Tα = Tp), the plasma is assumed to be quasi-neutral, T is assumed isotropic, and Te− is set to 2 × 105 K, identical to what is used by Jian et al. [2006a, 2006b]. [11] The autocorrelation times are only a few hours, or data points, long, short relative to scales covered by this study. The two methods have somewhat divergent results, and both differ from those found by Qin et al. [2007]. In all three studies, the solar wind speed has the longest autocorrelation time. This result is an expression of the fact that the supersonic solar wind velocities stay coherent longer than the other variables whose relative fluctuations are larger. The other variables included in both this paper and the work by Qin et al. [2007], n and Pram, have autocorrelation times longer in this paper than those found by Qin et al. [2007]. The discrepancies between the two sets of results are likely due to the different procedures used in the two papers when encountering data gaps; Qin et al. [2007] use averages from data surrounding data gaps to fill the gaps before determining autocorrelation times, while this paper simply avoids gaps and only includes periods with no gaps. With the exception of Pt, all of the composite variables have shorter autocorrelation times than the core variables, possibly because the perpendicular pressure best models the method of information transfer between interacting solar wind streams. Later in this paper, a general time length of 3 h is used for taking averages. Table 1 demonstrates that 3 h serves well as an approximate autocorrelation time when all variables are being considered together. [12] A final note is that the autocorrelation time does not lie solely within the time domain. For a given stream, the source region on the Sun is constantly moving relative to the Earth, so every hour that passes means the stream source has changed slightly as well. Observations from STEREO A and B spacecraft give researchers the data necessary to decouple the effects on autocorrelation by time and rotation. Opitz et al. [2009] show that the autocorrelation times can increase when rotation is taken into account, and autocorrelation times on the order of days are found for solar wind emanating from a single source region on the Sun. However, the longer autocorrelation times are still well under a single Carrington rotation and not long enough to strongly correlate OMNI data emanating from a single source region from one Carrington rotation to the next. 3. Solar Wind Categories [13] The solar wind has been divided into distinct categories in much of the literature on the subject. Categorization also benefits statistical analyses of the solar wind, as analysis of only a single category will involve more constrained data. Distinguishing between these categories often requires human analysis, which is time consuming and introduces subjectivity. Y09 developed an automated categorization method that allows them to perform the human analysis more efficiently, as a second step. In this paper, we will analyze and modify the Y09 method to proceed without manual intervention. The results from the modified method are used to do category-dependent statistical analysis of the entire OMNI data set. [14] Y09 manually categorized a large number of short OMNI time series, which they then statistically analyzed in great depth. This statistical analysis led to a set of criteria for each category. The criteria are largely based upon whether the hourly values of the OMNI data are either greater or less than a certain threshold value. This method checks each hour of data independently, except for two categories, interplanetary shock and heliospheric current sheet, and each criterion is weighted. For example, velocity is weighted very heavily for determining fast and slow wind, while density is weighted less heavily, though still important. The criteria and positive weights, both shown in Table 2, give a probability pj(i) for a given category j for each hour i, where k runs through the relevant criteria for the given category j, Nj is the number of criteria for j, αj,k(i) is either 0 for criteria not met or 1 for criteria met, and Wj,k is the weight for the criterion k of category j. Should the pj(i) be above a given value (usually 0.6), that hour is flagged as category j. Problematically, this method allows for a given hour of data to be labeled as no category or multiple categories. In these situations, which occur quite frequently, manual analysis must determine the category, as described by Y09. A fully automatized categorization is realized by adjusting the methodology so the category with the highest pj(i) is selected as the singular category. This method will hereinafter be referred to as "mod Y09". Mod Y09 uses only four categories: fast, slow, CIR, and CME. Fast, slow, and CIR categories are identical to those by Y09, but the CME category in mod Y09 encompasses magnetic cloud (MC) and ejecta as well as sheath categories of Y09 (identical to CIRs, but located forward of MC/ejecta rather than fast solar wind streams). All other Y09 categories (e.g., Rare) are considered a single "other" category here. Table 2. The Criteria Used for Determining the Categorization of Hourly OMNI Dataa Solar Wind Type n (W) V (W) B (W) T/Tex (W) nkBT (W) β (W) Slow >3 <450 <1 0.5 2.0 0.5 Fast <20 ≥450 <1 0.5 2.0 0.5 Ejecta <10 <0.5 <0.01 <0.5 0.5 4.0 1.0 1.0 MC 10 <0.5 <0.01 3 >5 >1 >0.007 >1 0.5 0.5 3.0 0.5 0.5 a As developed by Y09. The coronal mass ejection category used in the text is the combination of ejecta and magnetic cloud (MC) criteria. Densities are in cm−3, velocities in km/s, magnetic field strengths are in nT, and pressures are in nPa. b Corotating interaction region. [15] Category durations are one of the tools used by Yermolaev et al. [2009] to assess their categorization method. They are the length in time of consecutive hours of data in which a single category is selected as the dominant type, though in this paper, any of these single category periods which border gaps are excluded as it is not known whether that category changes or remains the same during the data gap. The histograms for the entire OMNI data set are shown in Figure 1, with Figure 1a showing the histogram using the criteria as presented and Figure 1b showing the histograms using the modified criteria, discussed below. The high number of short-duration events in Figure 1a is doubly troubling: (1) the histograms presented by Yermolaev et al. [2009, Figure 4] are different in both shape and magnitude, with peaks toward longer durations (generally between 4 and 12 h, but upwards of 20 h in the case of MC) and very few events in the shorter durations (<4 h); and (2) the general trends in Figure 1a are similar to what we would expect from random assignment of categories, with most durations being a single hour and geometric decay toward longer durations, leaving the whole concept of categorization and/or the criteria questionable; thus, a further modified version of the process is needed for full automatization. Figure 1Open in figure viewerPowerPoint Histograms of continuous, same category durations. The histograms are generated using criteria developed by Y09 applied to hourly OMNI data spanning 1963–2009. Unlike Y09, the categorization is fully automated. (a) The histogram using modified criteria of Y09 without further processing and (b) the histogram using the modified version of the criteria with averaging and merging. [16] The second method in this paper, which produces the duration histogram in Figure 1b, is a two step merging-averaging method (hereinafter MAM). Running, 7 h, unweighted averages of pj(i) for each i, give Whichever category j that has the greatest pj′(i) for hour i is the category which i is designated. This method was found to underrepresent CMEs, so CMEs are considered flagged if the average value goes above the set threshold, similar to what was done originally by Y09. The original threshold, 0.6, was too high and the total number of CMEs too low, so the threshold was lowered to 0.52. After averaging over the category probabilities to determine the proper category, like categories are merged across small breaks in the category; if a single category, such as CIR, is within 3 h of the same category, again CIR, separated by any other nongap categories, such as fast or slow, the entire period is considered to be of the flanking type, in this case CIR. Such merging is done first for CMEs, then CIRs, fast and last slow; categories merged first are favored for final selection. As is shown in Figures 1 and 2, MAM leaves the set with more longer-duration events and fewer shorter-duration events. For both the category averaging and merging, the process utilizes data within 3 h of the data point in question. This time window is chosen both for the quality of results compared with other time windows and because 3 h is a good fit to the autocorrelation times from section 2, which provides a physical rationale for averaging and merging. The current system is fully automated, produces results closer to those presented in Y09, and will suit the needs of our further analysis. [17] Figure 2 shows histograms using both mod Y09 and MAM at different stages in the solar cycle, 1995.5–1997.5, a period of time associated with a solar minimum, and 2000.0–2002.0, a period of time associated with a solar maximum; the histograms are normalized by total hours for each category such that the sum over all durations yields 1 for each category. As designed, large drops in short durations (<10 h) for slow and fast and very short durations (<5 h) for CIRs are apparent when going from mod Y09 to MAM for both time periods, repeating the same trends seen when taken across the entire data set (Figure 1). The normalized duration histograms show little difference between the two time periods; the sawtooth shape of the CME histogram for MAM over solar minimum is due to the very low number of CMEs during the entire period, see Table 3 which presents the total hourly coverage of each category during the two time periods. CMEs drop considerably during the solar minimum, as expected, while CIR coverage drops during solar maximum, partially due to the signal being lost under all the CMEs and partially due to the general lack of order on the solar surface; CIR incidence appears to benefit from the ordered structure of a solar minimum. Note that MAM generates fewer total hours of data. MAM excludes data points included in mod Y09 because MAM has, in general, longer gap adjacent events, which are excluded from our histograms (see above). Figure 2Open in figure viewerPowerPoint Normalized histograms of continuous, same category durations which (a and b) correspond to a solar minimum and (c and d) cover a solar maximum. Table 3. Total Hourly Coverage by Category During a 2 Year Perioda Time Period Method Slow Fast CMEb CIRc 1995.5–1997.5 mod Y09d 8687 (54%) 3354 (21%) 2337 (15%) 1423 (9%) 1995.5–1997.5 MAMe 7976 (57%) 3349 (24%) 1601 (11%) 1161 (8%) 2000.0–2002.0 mod Y09d 9190 (48%) 4145 (25%) 4088 (24%) 344 (2%) 2000.0–2002.0 MAMe 7297 (44%) 3744 (23%) 5003 (30%) 531 (3%) a Either 1995.5–1997.5 (solar minimum) or 2000.0–2002.0 (solar maximum). b Coronal mass ejection. c Corotating interaction region. d Method using only four categories: fast, slow, CIR, and CME. Fast, slow, and CIR categories are identical to those by Y09, but the CME category in mod Y09 encompasses MC and ejecta as well as sheath categories of Y09. e Two step merging-averaging method. [18] Yermolaev et al. [2010, Figure 3] present hourly coverage by category that differs only slightly from what we have found here, shown in Figure 3. The CME coverage by both the mod Y09 method and MAM tends to be lower and the coverage by fast and slow in mod Y09 and MAM tends to be greater than the coverage found by Yermolaev et al. [2010]. However, there are many similarities in the general shapes of the curves, for example CIR coverage, which years have the most CMEs, and the relative coverage between fast and slow are all fairly similar to that by Yermolaev et al. [2010]. Figure 3Open in figure viewerPowerPoint Relative coverage of categories using the merging-averaging method. 4. Elements of a Statistical Analysis [19] Distributions of the OMNI data generally do not fit well to a normal distribution; two examples are given in Figures 4 and 5 for velocity magnitude and thermal pressure. Because of the heavy dependence on velocity (Table 2) to distinguish between the two most dominant solar wind types, fast and slow (∼75% of the data is one of these two types), the velocity is not a typical representation of the distributions found among the variables. However, the differences between mod Y09 and MAM are more visible in the velocity distributions for this very same reason. The fast and slow streams have their boundary smoothed, as MAM allows for values slightly above 450 km/s to be slow and slightly less than 450 km/s to be fast. The three fold jump in number of hours labeled as CIRs and having speeds near 450 km/s is a demonstration of the CIRs' place at the interface between slow and fast streams. The MAM with its built in pecking order stretches categories to encompass nearby data points (replacing other categories, and thereby shrinking the latter), and the data which the CIRs are stretched into have values between typical fast and slow wind values (e.g., 450 km/s). The thermal pressure distributions (Figure 5) are somewhat more typical of other variables' distributions; most of the variable distributions, like thermal pressure, have distinct and separate distribution peaks for each category, meaning each category has a unique typical value and is well described by lognormal distributions (found by others such as Mullan and Smith [2006]), which is illustrated in Figure 6 by switching to a logarithmic abscissa. For all four categories, the thermal pressure distributions are strongly lognormal. Figure 4Open in figure viewerPowerPoint Histograms of solar wind speed spanning 1963–2009 using (a and c) the single hour process and (b and d) criteria with averaging and merging. The data bin in Figures 4a–4d is 5 km/s. Figure 5Open in figure viewerPowerPoint Histograms of solar wind thermal pressure spanning 1963–2009 using (a and c) the single hour process and (b and d) criteria with averaging and merging. The data bin in Figures 5a–5d is 0.001 nPa. Figure 6Open in figure viewerPowerPoint Histograms of solar wind speed by categories in log space. The data bin is set reflexively to maintain a total of 200 da
Referência(s)