BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era | Scientific Reports

Dataset

We analyze the dynamic properties of the BitCoin currency (as the most popular of the digital currencies) and the search queries on Google Trends and Wikipedia as proxies of investors’ interest and attention. Time series for the BitCoin currency at the most liquid market (Mt. Gox) are available since 17.7.2010 with the highest reported frequency (a tick) of 1 minute. However, the market remained highly illiquid for approximately the first year of its existence. To separate the period into the illiquid and the liquid one, we investigate a number of ticks with a non-zero return during a specific day. Fig. 1 depicts the evolution of the BitCoin liquidity. As a benchmark, we also show a number of 1-minute ticks associated with an 8-hour trading day. Even though the BitCoin market is a 24/7 market, we use the 8-hour trading day as a simple benchmark of a liquid market. We observe that the number of ticks gets closer to the threshold value approximately in the middle of 2011. Closer inspection uncovers that since the beginning of May 2011, the number of ticks has fluctuated around the 8-hour benchmark. Therefore, we analyze the series starting on 1 May 2011 with an ending date of 30 June 2013. For Google Trends, we are working with weekly data and as such, we obtain 113 observations in total; while for Wikipedia, daily data are available so that we have 788 observations.

Figure 1figure 1

Evolution of ticks number.

Number of ticks with a non-zero return per day is shown. The red line represents a number of ticks for an 8-hour trading day and is shown just for illustration. It is visible that for the starting days of existence of the BitCoin market, there was practically no liquidity. Approximately since May 2011, liquidity has reached satisfactory levels.

Full size image

Evolution of both pairs – Google Trends (weekly) and Wikipedia (daily) with corresponding BitCoin prices – is illustrated in Fig. 2. Obviously, the daily series of Wikipedia entries provides a more detailed picture of the behavior of the Internet users’ interest and attention together with a higher potential for a more precise statistical analysis. We observe that the prices of the digital currency are strongly correlated with the search queries of both engines. Specifically, the correlations reach the levels of 0.8786 (with t(111) = 19.3850[<0.01], p-value is shown in the square brackets) and 0.8271 (with t(786) = 41.2587[<0.01]) for Google Trends and Wikipedia, respectively. The strength of these relationships is nicely illustrated in Fig. 3 where a strong linear correlation between logarithmic prices and logarithmic search frequencies is evident. The fact that such correlation is most apparent for the log-log specification is the first hint for an analysis of the logarithmic transforms rather than the original series. Moreover, the log-log specification also allows for an easy interpretation of the relationship as the elasticity. Such notion is more stressed in the next section where the stationarity and cointegration of the series are discussed.

Figure 2figure 2

BitCoin price and search queries evolution.

Weekly series for BitCoin and Google Trends are shown on the left and daily series for BitCoin and Wikipedia are shown on the right. Search terms are evidently positively correlated with the prices with correlation of 0.8786 and 0.8271 for Google Trends and Wikipedia, respectively (for a log-log scale). The BitCoin bubble of 2013 is accompanied with rocketing search queries in both databases.

Full size image

Figure 3figure 3

Relationship between BitCoin price and search queries.

Double logarithmic illustration of correlation between BitCoin prices and the searched term (Google Trend on the left and Wikipedia on the right) is shown. A positive dependence is evident and it holds for practically the whole range with correlation of 0.8786 and 0.8271 for Google Trends and Wikipedia, respectively.

Full size image

Stationarity & cointegration

To cover various combinations of relationships, we initially study all standard transformations of the original series, i.e. the logarithmic transformation, the first differences and the first logarithmic differences. For each of the series, we test their stationarity using the KPSS14 and ADF15 tests. As both tests have opposite null and alternative hypotheses, they form an ideal pair for the stationarity vs. unit-root testing. In Tab. 1, all these results are summarized. For the BitCoin prices (both daily and weekly), we find both the original and the logarithmic series to be non-stationary and to contain the unit-root. Correspondingly, their first differences are stationary. The same results are found for the Wikipedia daily views but for the Google Trends queries, we find the unit-root only for the logarithmic transformation of the searched terms series. For this reason and also for more convenient interpretation, we opt for the logarithmic series.

Table 1 Stationarity and unit-root tests

Full size table

Turning now to the analysis of the dynamic properties and interconnections between the series, we are firstly interested in a potential cointegration relationship. Cointegration methodology has proved very useful in various economic and financial studies ranging from economic development16,17 over monetary economics18,19, international economics20,21,22 to energy economics23,24 as it enables to study a long-term relationship between series as well as their short-term dependence via the error-correction models (see the Methods section for more details). To test for the cointegration relationships, we utilize two tests of Johansen25 – the trace and the likelihood tests. In Tab. 2, we show the results for both pairs and we find that the BitCoin series are not cointegrated with the Google Trends series but the connection to the Wikipedia series can be described as the cointegration. Therefore, for the first pair, we need to turn to the vector autoregression (VAR) methodology applied on the first logarithmic differences (see the Methods section for more details) and for the second pair, we stick to the standard cointegration and vector error-correction model (VECM) framework.

Table 2 Cointegration tests between BitCoin prices and search queries

Full size table

General results

Starting with the Google Trends results, we are firstly interested in the dynamic relationship between the search queries on Google – namely “BitCoin” (note that the search query frequency is not case sensitive so that the various versions of the word, such as “BitCoin”, “Bitcoin” and “bitcoin”, are included) – and price of the currency. Based on the Akaike, Hannan-Quinn and Schwarz-Bayesian information criteria, we use a single lag in the VAR approach, i.e. VAR(1) is applied on the first logarithmic differences. To control for potential autocorrelation and heteroskedasticity inefficiencies, we opt for heteroskedasticity and autocorrelation robust (HAC) standard errors. The results are summarized in Fig. 4. The charts show the response of a corresponding variable to a shock in the impulse variable. As we are working with logarithmic differences, we can interpret these shocks as a proportional reaction to a 1% shock. A 10% shock in the search queries yields a reaction of approximately 0.8% in the first and 1.2% in the second period, i.e. a total 2% reaction and the effect vanishes for the latter periods. However, the influence also works from the opposite side and it again lasts (remains statistically significant) for two periods. The reaction to a 10% shock in search queries is followed by a total reaction of 0.8% (0.55% and 0.25% for the periods, respectively) of the prices. Putting these two together, we find that the increased interest in the BitCoin currency measured by the searched terms increases its price. As the interest in the currency increases, the demand increases as well causing the prices to increase. However, as the price of BitCoin increases so does also the interest of not only investors but also a general public. Note that it is quite easy to invest into BitCoin as the currency does not need to be traded in large bundles. This evidently forms a potential for a bubble development.

Figure 4figure 4

Response dynamics for Google Trends.

Impulse-response functions for the first logarithmic differences of BitCoin prices and Google Trends search queries. Positive relationship is evident in both directions. Responses are also partly asymmetric.

Full size image

Turning now to the results of the Wikipedia daily views, we are interested in the same relationship as in the previous case but now based on the vector error-correction model (VECM) with seven lags (VECM(7)) based on the information criteria. In Fig. 5, we present the response functions which are, however, different from the previous ones as these represent permanent shifts in the response variable compared to the immediate shifts in Fig. 4. In the first 7 days (a trading week), an increase in prices causes an increasing positive reaction of the daily views. After the first week, the effect stabilizes but the interest in BitCoin measured by the daily views does not return back to the initial level. The complete transmission is around 0.05, i.e. a 10% change in prices is connected to a 0.5% permanent shift in the Wikipedia views. From the opposite side, we do not observe any statistically significant effect coming from the daily views to prices. The difference between Wikipedia and Google Trends might be caused by the fact that of course the two engines are different and individuals using these two can have different motives and can be interested in different specifics. Nonetheless, we believe that both engines provide interesting insights into the functioning and relationship between the digital currency and a general interest in the currency. Apart from the standard effects, we are also interested whether the reaction of prices to the searched terms is symmetric, i.e. whether an increasing interest coming in hand with the increasing prices (possibly a bubble forming) has a same effect as an increasing interest connected to the decreasing prices (possibly a bubble burst).

Figure 5figure 5

Response dynamics for Wikipedia.

Impulse-response functions for the logarithmic transformations of BitCoin prices and Wikipedia daily views. There is a positive effect of price changes on daily views on Wikipedia site. The opposite effect is not statistically significant. However, when the effects are separated into a positive and a negative feedback, the effect becomes statistically significant.

Full size image

Positive and negative feedback

A crucial disadvantage of measuring interest using the search queries on Google Trends or daily views on Wikipedia is the fact that it is hard to distinguish between interest due to the positive or negative events. Specifically for the BitCoin, there is a big difference between searching for the information during an increasing trend or after the bubble burst. To separate these effects, we introduce a dummy variable equal to one if the price of BitCoin is above its trend level (measured by a moving average of 4 for Google Trends and of 7 for Wikipedia due to different sampling frequency) and zero otherwise. This way, we try to distinguish between a positive feedback defined as a reaction to an increasing interest (measured by search queries) while the price is above its trend value and a negative feedback defined reversely.

For the Google Trends pair, the results are again illustrated in Fig. 4. Here, we can see that practically the whole reaction comes from the positive feedback as there is practically no statistically significant reaction to the negative movements of the prices in a sense of the search queries. Much more interesting results are found for the Wikipedia daily views. In Fig. 5, we find that the positive and negative feedback are practically symmetric around the zero reaction. That is – the reaction of prices to changes in the Wikipedia interest is similar for the prices being both above and below the trend but for the sign of the reaction. The complete transmission is around 0.05 and −0.05 for the positive and negative feedback, respectively. This is a crucial result because without the separation between the positive and negative feedback, we do not find any reaction of the BitCoin prices to the Wikipedia views. However, if the effect is separated, the reaction is statistically significant and of an expected sign. If the prices are going up and the public interest in the matter is growing, the prices will likely continue soaring up. But if the prices decline, the increased interest pushes them even lower.