BOOKS - Big Data Investments
US $7.47
723647
723647
Big Data Investments
Author: Jan Becker
Year: 2015
Format: PDF
File size: PDF 2.9 MB
Language: English
Year: 2015
Format: PDF
File size: PDF 2.9 MB
Language: English
In recent years the internet has developed very quickly and became a major source of information all over the planet Many scientists have used search engine query data to forecast econometric time series like consumer confidence indicators unemployment rates retail sales house price indices stock prices volatility of stocks and even commodity prices Following the prior research this study analyzes the impact of internet search engine data on capital markets Many authors already have contributed to index level data and most of them on the US market This study adds to the existing literature on the German stock market Two research questions are answered First whether an increase in search queries drives individual stock returns and second whether queries affect the implied volatility of stock options After controlling for seasonality autocorrelation and general market risk in the further analysis also the Price to Book valuation one year performance and historical volatility are examined in interaction with internet search queries Auszug aus dem Text Textprobe Chapter 1 7 Data Scope of Analysis The data scope of this study extends over the German Stock Index DAX Deutscher Aktien IndeX MDAX Mid Cap DAX and SDAX Small Cap DAX The three parts are the German prime standard market indices for large medium and small sized exchange listed companies The blue chip index DAX covers 80 of Germany s free float market and consist of the 30 largest companies in terms of market capitalization and exchange turnover MDAX and SDAX both have 50 titles and follow directly DAX constituents The full prime standard would be completed by adding TECDAX to the sample The TECDAX consists of the 30 largest technology shares The composition of all indices is constantly review and rebalanced on a quarterly basis except for new listings deletions or mergers which are taken into account immediately Deutsche Borse 2013 p 19 The TECDAX was initially not included into the sample for two reasons First in order keep the sample size manageable and secondly with respect to the online business model of some companies e g Xing or Freenet the correlations of search queries and the success of the companies were assumed to be high ex ante So for a generalization of theory the hypothesis should work for standard companies too 1 7 1 Timeframe The overall timeframe of nine years and four month from 10 January 2004 to 4 May 2013 refers to the first publicly available observation downloadable from Google and the time of this study All regressions are based on this time frame It is to say that two major macroeconomic crisis fall into this period The global financial crisis of 2008 09 and the European sovereign debt crisis that has been going on since 2010 Both crisis affected the global economy and lead to a slowdown of production The financial crisis of 2008 is sometimes also referred to as the Great Recession 1 7 2 Necessary Adjustments in Sample Selection Over this period not all the stocks could be added to the analysis which adds a small selection bias The final structure of constituents as of 6 May 2013 is modified with respect to the initial setup of January 2004 in the following way All stocks which are included in the index in 2013 should also be in one of the three indices at the starting point of the analysis in 2004 in order to ensure that all control variables are available and the stocks are already exchange listed and tradable This implies a survivorship bias in terms of excluding companies which defaulted or merged during the time in between Some companies were taken private and are also excluded because no trading prices are quoted anymore Third newly listed companies after 2004 are not included in the sample There have been several event studies concerning IPO s which cover this topic cf Da Engelberg and Gao 2011 The main argument for adjusting the sample is to focus on a continuous and comparable data set basis 1 8 Search Engines Gateways to Information The internet search query data is downloaded from Google According to webhits de the American company Google Inc had a German market share of 80 4 Webhits 2013 and a rather higher score of 83 18 was reported by netmarketshare com on a Global ranking Netmarketshare 2013 This leads to the assumption that Google data can to a certain extent allow to appropriately test hypothesis and has the necessary data scope to draw statistically significant conclusions about overall search activities 1 8 1 The Google Tool Historically there has been Google Trends and Google Insights for Search which both have been merged into Google Trends in September 2012 Google 2012b Since then the combined interface under Google Trends is the only remaining platform The service is provided by Google Inc Google located at 1600 Amphitheatre Parkway Mountain View CA 94043 United States and can be accessed via http www google com trends Access to the data is free of charge and furthermore all times series can be downloaded after registration and login with a free Google Account at the website The tool allows to lookup an index of a specific search query from year 2004 until today and is available for worldwide data The user interface of the software offers four options to specify the query into web search area specific search time frame and category For this study Web Search is of relevance which could also be modified into Product or News Search Not relevant are Youtube and Image Search The area specific search queries can be modified to a specific country state or in some case also cities For example Germany can be broken down into the state of Hessen but the city of Frankfurt is not available as time series yet although Google already displays the current respective search activities by city Depending on the query s frequency some time series are still on a monthly basis whereas the most common downloadable format is of a weekly frequency The existing possibility to download daily time series of the last 90 days can be extended to the further past by manually downloading one month windows by the Select dates functionality and then chaining the time series parts together manually Under the category filter there are 26 options with 241 subcategories available Using the example of Choi and Varian cf 2009a p 4 query car tire would be assigned to category Vehicle Tires which is a subcategory of Auto Parts which is a subcategory of Automotive Of major interest are the categories Business Industrial and Finance for stocks These categories do not always deliver time series for all queries So in the later analysis the most general form over all categories is applied instead of the Finance filtering to take care of the maximum likelihood to actually get a time series to analyze Other research focused on these specific categories e g Fink and Johann 2013 The query can be compared to its category In this case the time series is scaled into a percentage of the initial starting value and is thus a growth rate Google 2012c The category can add important information with respect to seasonality 1 8 2 Grouping It is possible to group up to 25 search terms via a sign The Items are then displayed as separate graphs In order to specify the query for combinations of terms the quotation marks x y have to be set at the beginning and end of each request 1 8 3 Multiple Counting is automatically avoided by Google In order to avoid multiple counting the request are filtered by their IP address The IP address Internet Protocol address is a numerical label assigned to each computer which uses the Internet Protocol for communication The IP can be used for host or network interface identification and location addressing By identifying each user via an IP only the sum of their daily queries become part of the search volume index If one user is not only searching via one computer IP then the queries are counted multiple times It cannot be distinguished on a publicly available basis for how many cross sectional queries the same user is responsible It is possible that one user is responsible for generating all the signals over time 1 8 4 Synthetic Index rather than actual Numbers Google does not publish the overall sum of search queries but calculates an index This index is bounded within the values of 0 and 100 and is recalculated under specific situations Whenever there is a new maximum of search queries this quantity is set to 100 and thereafter preceding quantities are scaled by this quantity via division and multiplication of 100 until a new high is reached The old values are not recalculated and remain scaled by their old maxima The Index could be interpreted as a percentage index Google 2012a For this reason it is difficult to compare different stocks by their search intensity The actual quantities are not available to attribute increases in one stock query to a decrease of another This may be interesting in the case of actual sales and shipped units of competitors Moreover the rescaling does not allow drawing conclusions about the original query quantity for a company because the basis is constantly shifting 1 8 5 Empty Values Another drawback in Google s practice is to publish an index value of 0 instead of a very small number whenever the search queries were below a certain threshold level Google does not transparently explain how the threshold level is measured until now Google 2010 1 8 6 Limited by German Language As the later data will show most of the relevant data emerges from the German language area and from queries within Germany This is also true for most queries which refer to international exporting companies like car manufacturers e g BMW and is in contrast to the previous studies on individual stocks from the US market On the higher level of DAX there are comparably more international queries than in the smaller company index MDAX and SDAX This may be a hint to home bias and the local degree of familiarity with smaller stocks When analyzing queries of a combination of the stocks name with a second word the language barriers become more obvious When searching for terms like Aktie engl stock or Dividende engl dividend already small changes in the denotation can tilt the data origination form German to English speaking countries A study by Mondria and Wu 2011 showed that home bias delivers higher returns by advantages of higher information density Therefore the study uses the German terms in the regression models A comparable study by Bank Larch and Peter 2010 on the German stock market for all Xetra listed stocks used the Name of the companies but without any AG Their queries are restricted to only German queries Fink and Johann 2013 apply the category filter Finance when downloading the data in addition to the name of the German companies This procedure allows taking advantage of a particular Google feature which assigns queries with the classification of the final website accessed after activating the query This anomaly to other studies adds additional information to the query and the authors show that it improves the query quality As it is not transparent how Google classifies Finance queries in this study nevertheless the standard query method is used and the focus is set via the additional terms AG and Aktie 1 8 7 Exact Wording of Search Terms and Search Term Combinations When searching for data on a search engine one question which arises is What do people type into the search engine Most users start by typing in just one search term cf Spink et al 2001 This seems to be common practice and is also supported by the data set later To set up a list of words the most common reference name for a company is searched e g BMW for Bayerische Motoren Werke Aktiengesellschaft This approach had some minor flaws because the German common understanding of some stock names conflict with some equal meaning in the English language E g MAN is a German producer in the automobile industry and Metro a big retailer for consumer products In these cases a more stock related perspective was introduced by searching for the combination of the stock s name together with the German abbreviation for PLC public limited company namely AG Aktiengesellschaft Altogether four main search combinations evolved The common search name declared as Name the name plus AG the name plus Aktie and the name plus News It is to say that the available data frequency dramatically decreases by combining terms In the initial setup many more terms were included but not enough data sets could be extracted These terms were name plus Report name plus Return name plus Rendite name plus HV engl shareholders meeting name plus IR name plus Investor Relalations name plus Bilanz engl balance sheet name plus P L and name plus GuV engl P L The fact that only top level search terms are available may support the initial assumption that only search queries with one word are preferred over full sentences or it may be due to Google s policy not to publish time series which fall below a certain threshold level Biographische Informationen Jan Becker was born in Hessen in the mid west of Germany in 1986 After acquiring his Bachelor of Science in Economics and Business Administration from Goethe University in Frankfurt he graduated as a Master of Science in Capital Markets from Frankfurt School of Finance Management The author is a capital market professional in the field of asset management and quantitative finance He gained relevant experience in practical tactical asset allocation at a renowned German asset management firm and thereafter on capital market derivative strategies for a consultancy firm in Frankfurt His core area of expertise is centered on multi asset and absolute return investments Today he is working for an international asset management company and supplying institutional clients with investment solutions Reihe Alternative Investments Band 7