Does Systematic Sampling Preserve Granger Causality with an Application to High Frequency Financial Data?

In applied econometric literature, the causal inferences are often made based on temporally aggregated or systematically sampled data. A number of theoretical studies have pointed out that temporal aggregation has distorting effects on causal inference and systematic sampling of stationary variables preserves the direction of causality. This paper examines the issue in detail by plugging in theoretical cross covariances into the limiting values of least squares estimates in a VAR framework. The asymptotic distributions of the estimates of systematically sampled process are expressed in terms of the cross covariances of the disaggregated process. An extensive Monte Carlo study is conducted to examine small sample results. Quite contrary to the stationary case, this paper shows that systematic sampling of integrated series may induce spurious causality. In particular, systematic sampling induces spurious bi-directional Granger causality among the variables if the uni-directional causality runs from a non-stationary series to either a stationary or a non-stationary series. On the other hand, systematic sampling preserves the uni-directional causality among the variables if the uni-directional causality runs from a stationary series to either a stationary or a non-stationary series. It is observed that in general the most distorting causal inferences are likely at low levels of sampling intervals where the order of sampling-span just exceeds the actual causal lag. At high levels of systematic sampling, causal information concentrates in contemporaneous correlations. An empirical exercise illustrates the relative usefulness of the results further.


Does Systematic Sampling Preserve Granger Causality with an
Application to High Frequency Financial Data?
1. Introduction: The use of highly temporally aggregated and systematically sampled data for causal inference is quite common in the applied econometric literature. The issue of applying regular interval sampling techniques is often encountered when dealing with high frequency data in the field of finance. Sampling at regular frequencies when trades arrive non-synchronously and at high frequency can distort correlation measures (Scholes and Williams, 1977;Epps, 1979;Hayashi and Yoshida, 2005). However, many studies of cross-correlations between equity market instruments and their derivatives still rely on regularly spaced data, sampled at 1, 5 or 10 minute intervals as discussed in Bollen, O'Neill and Whaley (2016). These issues are becoming more important as the availability of equity market derivative products increases and the frequency of trading intensifies.
There is substantial literature addressing the issue of the effect of temporal aggregation and systematic sampling in various aspects such as the univariate ARIMA structure (see Wei 1990 and citation therein), unit roots, cointegration, exogenity, measures of persistence, impulse response functions and forecasting, (see Marcellino (1999) and references therein). And few have focused on causality issue. Sims (1971) warns that aggregation could result in a spurious causal relationship. Wei (1982) using Geweke's linear decomposition demonstrates that temporal aggregation can change a true one-sided Granger Causal relationship into a two-sided causal system. On the other hand, it shows that the systematic sampling preserves the one-sided causal relationship between the variables and the unidirectional causal system becomes weaker when they are systematically sampled further. In other words, while systematic sampling does not introduce a spurious causal relationship, the likelihood of detecting a true causal relationship may decline. Cunningham and Vilasuso (1995) have conducted Monte Carlo simulation to examine the consequence of temporal aggregation and systematic sampling on causal inferences. They find that temporal aggregation is between two and ten times more likely to fail to detect a true causal relationship than is systematic sampling. Cunningham and Vilasuso (1997) have examined the influence of temporal aggregation and systematic sampling on money-output relationship. Their results demonstrate that the use of systematic sampling in forming time aggregates rather than temporal aggregation.
Most of the literature have examined the effect of temporal aggregation and systematic sampling on causal inferences for the stationary case and find that systematic sampling preserves the direction of causality. When the series are non-stationary, in practice, we take appropriate differencing and conduct Granger causality test to determine the direction of causation. Mamingi (1996) using Monte Carlo simulations shows that systematic sampling of integrated process produces misleading causal inferences. Rajaguru (2004) have derived the relationship between the cross-covariances of systematically sampled and disaggregated process and find that systematic sampling of integrated process convert uni-directional system into bidirectional. Both studies arrive at the general conclusion that systematic sampling of integrated processes induces spurious causal processes. It assumes that all variables in the system are non-stationary. However, they fail to address the scenario where one of the series is non-stationary while the other variables are stationary. In such cases, the causality in the underlying data generating process could run from stationary variable to non-stationary and vice versa. It is essential to analyse the nature of causal distortion due to systematic sampling in such scenarios.
Moreover, the causal distortion in the presence of unit roots due to systematic sampling based on cross-covariance analysis (Rajaguru, 2004) could be misleading in the presence of lagged dependent variables when it is used in a VAR framework. This paper derives the relationship between the cross-covariances of aggregated and disaggregated process for the general integrated process of order d. It derives the condition for the order of integration in a VAR framework by incorporating lagged dependent variables for which the one-sided causal relationship tends to show two-sided causal relationship due to systematic sampling. Monte Carlo simulation technique has been employed to justify the theoretical findings. It also examines the effect systematic sampling on causality when the true data generating process follows 1) uni-directional causality 2) bi-directional causality and 3) no causality between the variables of interest.
In the next section, we derive the relationship between the theoretical cross covariance of aggregated and disaggregated processes. This result plays a fundamental role in our exercise and is applicable to both stationary and integrated processes. In Section 3, we then derive the limiting values of least squares estimates and the corresponding t-ratios of a VAR(1) process under different levels of systematic sampling.
In section 4, we test each of our theoretical findings empirically using equity market indices and associated derivatives. In the concluding section we highlight some important issues involved in Granger causality testing with systematically sampled data.

Sampled Series
, be a weakly stationary process with mean zero and variance covariance matrix The d j -th difference of the systematically sampled series (j-th component) is simply the weighted sum of the d jth difference of the basic series. The following proposition shows the relationship between the cross-covariances of the systematically sampled series and the basic series.

Proposition 1
The cross covariance between i-th and j-th components of the systematically sampled series W iτ and W jτ-k can be expressed in terms of cross covariances of the i-th and j-th components of the basic disaggregated series it w and jt w , that is, where L operates on the index of γ w ij (k) such that Lγ w ij (k) = γ w ij (k-1) and Further, the matrix representation of the above is given by (that is all the components of the vector process are integrated of the same order d), then where L operates on each element of the matrix Γ w (k).
The above proposition also reveals that the cross covariances between systematically sampled series is simply the weighted sum of the cross covariances of the basic series.

Estimates of VAR(p) process based on Systematically Sampled Data
Consider the basic vector process t z and let w t = ) ,..., , ( , be a weakly stationary process with mean zero and variance covariance matrix Suppose the covariance stationary process t w has the following VAR(p) representation The variance covariance matrix of t e is set to be diagonal to make sure that there are no contemporaneous relationships among the variables in the basic form. The system of normal equations (Yule-Walker equations) for the above process is given by Given i Φ 's and using the fact that We now consider estimating the following n-variate VAR(p) model with systematically sampled series: for k=1,2,…,p.
It is clear from the construction that 0 This can be re-written as Notice from (10) that the probability limit of the estimates of the model based on systematically sampled data is a function of the cross covariances of the aggregated process and further can be expressed as the cross covariances of the basic disaggregated process. Furthermore, from (5), the cross-covariances of the basic process are a function of the coefficients of the VAR(p) of the disaggregated process. Thus, the estimated parameters of the aggregated VAR(p) is the weighted sum of the cross-covariances of the basic process with the weights being the coefficients of the VAR(p) of the disaggregated process. Since our objective is to assess the effects of systematic sampling on Granger causality and to simplify computations we specialize the analysis to a bivariate VAR(1) process.

Aggregated VAR(1) process
To derive more specific results consider the following bivariate VAR(1) system i.e., The system of equations described in (5) can be written as  respectively. Let Since systematic sampling of a VAR(1) process produces a VARMA(1,h; h≤1) process at low levels of aggregation (Marcellino, 1999), we first carried out a Monte Carlo experiment by fitting VAR(p), p=1,2,3 models to τ 1 W and τ 2 W derived from (5) for m=3. Based on T=10,000 replications we observe that the coefficient estimates of the systematically sampled VAR(1) model remain largely unaffected by the VAR order. The AIC and BIC criteria also lead to the selection of a VAR(1) process for the systematically sampled series. We, therefore, proceeded to obtain analytical results from the following bivariate VAR (1) process: i.e., ,2) as defined earlier represents the error process of the aggregated model, ) and where T is the effective sample size after aggregation. For the case of systematic sampling 1 2 An important well-known problem of temporal aggregation or systematic sampling is the creation of contemporaneous correlation even when such a correlation is absent. Using the VAR(1) system in (12) and the corresponding tests statistics is given by It can be shown that, the above parameters, described in (25)-(30), of the systematically sampled process can be expressed in terms of the moments of the disaggregated process and these in turn can be expressed in terms of the parameters of the original basic disaggregated process using (13)-(22).

Proposition 2
If there does not exist Granger causality between the basic series then the Granger causality between the systematically sampled series is also absent. Proof: In this case 0 21 12 = = ϕ ϕ and with 0 12 = σ the two series are uncorrelated. Therefore, from (18), (19) and (22) 0 ). Further we can see that for all k and j i ≠ . Thus, if the cross-covariances between the basic series are zero then the cross-covariances between the systematically sampled series will also be zero. And from (25) and (26) we can see that 0 lim lim * 21 . Thus, if there is no Granger causality between the basic series then the Granger causality between the systematically sampled series will also be absent.
The general result described by proposition 2 does not depend on s d i ' . In particular, the systematically sampled two independent random walk processes will remain causally unrelated when they are estimated in the differenced form. It can also be inferred that , suggesting that the systematic sampling does not create any contemporaneous correlation among the variables when there does not exist Granger causality between the variables in the disaggregated form is absent.

Case 2: Causality between the disaggregated series is one-sided
Let 0 12 = ϕ such that w 2t does not Granger cause w 1t and there exists uni-directional causality from t w 1 to t w 2 . It can be shown that  11   2  21  11  11  12  11  12  22  11   2  21  11  12   1 (   22  22  22  11   2  21  11  1  11  21  22  22  12  21  22 It has been well established in the earlier literature (Mamingi (1998) and Rajaguru (2004)) that, for the stationary case, systematic sampling preserves the direction of Granger causality. In this section, we establish the condition under which the unidirectional causal system turns into a feedback system due to systematic sampling.
Thus we have the following theorem.

Theorem 1
Systematic sampling induces spurious bi-directional Granger causality among the variables if the uni-directional causality runs from a non-stationary series to either a stationary or a non-stationary series.
Equivalently, systematic sampling induces spurious bi-directional Granger causality among the variables if 0 1 > d .
To analyze the magnitude of the causal distortion due to systematic sampling in the presence of unit roots, we consider the following example.
Notice that the expression described above for the case of systematic sampling when 1 is same as the case of temporal aggregation when 0 The key findings are summarized below: 1) If the one-sided causality runs from a white noise series (in differences) to a differenced stationary series in the basic disaggregated form then systematic sampling will not produce a spurious feedback relationship even if 1 1 = d .
However, this may not hold when 1 1 > d .
2) As m increases VAR(1) tends to become VAR(0). However, when 11 ϕ reaches unity, we get a near co-integrated specification in the I(2) space and as a result VAR(1) remains VAR(1) as m increases.
3) The conversion from VAR(1) to VAR(0) for the higher order of systematic sampling confirms that the converse of the Proposition 2 need not be true. In turn, we can conclude that not finding causality among the variables with systematically sampled data doesn't necessarily mean that the variables are not related in the disaggregated form.
4) It can also be observed from the contemporaneous regressions that all causal information concentrate in the contemporaneous relationship among the variables due to systematic sampling of integrated process. Moreover, the spurious contemporaneous relationships do not disappear even if the order of aggregation is larger.

Case 3: Granger causality between the original series is bi-directional
In this case both 12 ϕ and 21 ϕ are non-zero. The required aggregated parameters ) are given in (25) and (26). To make computations easier, we set Scenario 1: Stationary processes: 0 The interesting feature of the above derivation is that systematic sampling preserves the feedback causal relation among the variables when the order of systematic sampling is odd at lower levels of aggregation. Since by construction 1 21 12 < ϕ ϕ , from (45) and (46) Notice that the expression described above for the case of systematic sampling when both 1 is same as that for the case of temporal aggregation when And thus all the inferences made for the case of temporal aggregation of stationary process is applicable to the case of systematic sampling of I(1) processes.
The key findings are summarized below: 1) Just as in the one-way causal system the VAR(1) in the feedback system tends to become VAR(0) as m increases.
2) What is more disturbing though is that a positive 12 ϕ may become negative * 12 limϕ p . Furthermore, the magnitudes of * 12 lim ϕ p are such that in practice it is quite possible to conclude that causality is one-way though it is bi-directional.

Contemporaneous Correlation:
Again, consider estimating the contemporaneous regression equation given by (29), and the cross-covariances in this expression take the form Notice that the cross covariances for the case of systematic sampling when are such that in practice it is quite possible to conclude that causality is one-way though it is bi-directional.
The key findings are summarized below: 1) As observed by Ericsson et al. (2001) for m=2, the contemporaneous regression coefficient (also the correlation) could take positive, negative or zero at any level of aggregation.
2) We observe from the Monte Carlo results reported in appendix 1 that if both 12 ϕ and 21 ϕ are positive (negative) then the contemporaneous correlation will also be positive (negative). However, when the above parameters are of opposite signs then the sign of the contemporaneous correlation is determined by the sign of the larger of the two in absolute value. If both 12 ϕ and 21 ϕ are of opposite sign with same magnitude then contemporaneous regression coefficient is simply zero.
3) Since most of the stock variables exhibit unit roots, the contemporaneous regression relationships established based on systematically sampled data could be very misleading.

Example 1 -VIX vs SPVXSTR I(0)/I(1)
The CBOE Volatility Index (VIX) is calculated from price quotes on the nearest and second nearest S&P 500 index options as described on the CBOE's website at http:// The number of cases at 5-minutes and 10-minutes intervals are 538 and 600 respectively.
What is left at the higher sampling interval is the contemporaneous correlations between VIX and VST. Importantly, as in theoretical results, the no-causal relationship remains the same at all levels of sampling intervals.

Example 2 -SPX vs VIX I(0)/I(0)
The S&P 500 index (SPX) is the most widely used gauge for US equities, and its calculation is described on Standard and Poor's website (www.us.spindices.com). The behavior of the S&P 500 versus the VIX is well documented in finance literature (Whaley, 2009). Like the VIX, the S&P 500 index is stationary in levels. We use intraday data for the VIX and SPX, again available from Thompson Reuters using the SIRCA portal from January 2010 to December 2014. Causality is analysed with sampling at 15 second, 1, 5 and 10 minute intervals. We observe at 15 seconds intervals that SPX leads VIX in 258 cases. The uni-directionality remains uni-directional in 120 cases at a 1minute interval. The spurious causality from VIX to SPX is observed for only 11 cases. This is consistent with our theoretical finding that the uni-directional causality does not induce spurious reverse causality for the stationary variables. At the higher sampling intervals, uni-directional causal relationships could be misinterpreted as no-causal links between the variables of interest. Again, the no-causal relationship remains the same at all levels of sampling intervals.

Example 3 -ES1 vs SPVXSTR I(1)/I(1)
The E-Mini futures contract is based on the S&P 500 index (ES), and the SC1 Index tracks the closest to maturity E-mini contract, rolling close to maturity. Due to the availability of the data, index data was collected from Bloomberg at 1 minute intervals (SC1/ES1). Like SPVXSTR, ES1 is non-stationary in levels. Causality is analysed with sampling at 1, 5 and 10 minute intervals. The results are consistent with the theoretical findings that bi-directional causality between non-stationary variables turns into unidirectional causality at the lower level of sampling intervals. For the same case, the unidirectional causality from ES1 to VST (219 episodes at 5-minutes interval) turns into reverse causality from VST to ES1 at 10-minutes interval. This is consistent with our theoretical findings that causality between the non-stationary variables lead to spurious Granger Causality when they are estimated in differenced form. At the higher sampling intervals, bi-directional causal relationships could be misinterpreted as no-causal links between the variables of interest even if the non-stationary variables are estimated in differenced form.

Conclusion
Economists often have to use systematically sampled data in Granger causality testing. It was known in the theoretical literature that temporal aggregation may distort the causal links between variables while systematic sampling preserves the causal directions. Our exercise provides a quantitative assessment analytically and assesses the nature of the distortions created by systematic sampling. The following observations emerge from this exercise: (1) If the one-sided causality runs from a white noise series (in differences) to a differenced stationary series in the basic disaggregated form then systematic sampling will not produce a spurious feedback relationship even if 1 1 = d . However, this may not hold when 1 1 > d ; this may be similar to the case of temporal aggregation of nonstationary variables.
(2) As m increases VAR(1) tends to become VAR(0). However, when 11 ϕ reaches unity, we get a near co-integrated specification in the I(2) space and as a result VAR(1) remains VAR(1) as m increases.
(3) It can also be observed from the contemporaneous regressions that all causal information concentrates in the contemporaneous relationship among the variables due to systematic sampling of integrated processes. Moreover, the spurious contemporaneous relationships do not disappear even if the order of aggregation is larger.
The empirical results based on the stationary variables (SPX vs VIX) show that a uni-directional causal relationship remains uni-directional at lower sampling intervals. This is consistent with our theoretical finding that the uni-directional causality does not induce spurious reverse causality for the stationary variables. At the higher sampling intervals, uni-directional causal relationships could be misinterpreted as no-causal links between the variables of interest. On the other hand, the causality between the nonstationary variables (ES1 vs SPVXSTR) induces spurious causal relationships when they are estimated in differenced form. This is consistent with our theoretical findings that systematic sampling induces spurious causality when the non-stationary variables are estimated in differenced form.

Appendix 1: Proof of Theorem 1
For the completeness of the proof of this theorem, we need to consider the following four scenarios: (1) 0 , suggesting that if the Granger causality between the stationary series are uni-directional then systematic sampling preserves the direction of causality. This is another proof of the results in Wei (1982) and Cunningham and Vilasuso (1995) (the later based on Monte Carlo simulations). Based on this result we strongly recommend to practitioners who study for example, the relationship between short and long term interest rates should not use time averages of the interest rates, if the rates are I(0) series. They should use systematically sampled values such as the end of period rates.
Scenario 2: Here VAR is constructed for t z 1 and Because, in the above expression, only the term 0 ) 0 ( ) ( respectively.      If the uni-directional Granger causality runs from a non-stationary series to a nonstationary series then one could observe bi-directional spurious feedback relationship between them in systematically sample form. Thus, in summary, as long as the causal variable is non-stationary (i.e. 0 1 > d ) regardless whether the output series is stationary or not, one may observe spurious feedback relationships among the variables with systematically sampled data when the causality between them is uni-directional in the basic disaggregated form. Q.E.D.