IA Math: Correlation between COVID-19 cases and Facebook stock during the pandemic lockdown

Michael Swiatek
13 min readOct 26, 2021

COVID-19 cases and Facebook stock analysis

1.1 Background information

The stock market refers to the collection of markets and exchanges where regular activities of buying, selling, and issuance of shares of publicly-held companies take place [1]. Stocks, shares, securities and other financial instruments bought by people represent the ownership of the particular company. Those who own any financial asset can trade them in the facilities called stock exchanges where their main purpose is to make a profit out of the trading process. That involves buying and selling a trader’s financial assets at the desired prices. There are a lot of stock exchanges available in which companies’ shares are held. The most popular ones are two stock exchanges located in New York: New York Stock Exchange (NYSE) and NASDAQ. Market capitalization is equal to the share price multiplied by the number of shares outstanding [2]. According to the data from 2019 their money capitalization was equal to $22,923 and $10.857 trillion dollars [3]. However, the most notable difference between those stock exchanges is that NASDAQ is a dealer market and NYSE is rather an auction market. In the former one, transactions go through the dealer. In the latter one bought and sold assets go directly to the buyers and sellers. The most notable companies included in the NASDAQ list are Amazon, Apple, Google or Facebook [4].

I have always been interested in the stock market. At the beginning of high school, I did not understand what factors may influence stock prices fluctuations. I renewed my curiosity when the coronavirus pandemic caused a lockdown process in most of the countries. From the economical point of view, lockdown is very destructive as there is a drastic decline in the money liquidity between any private businesses. The negative influence of the coronavirus pandemic was observed during February and March 2020 in the stock market. Even the largest companies stocks of, for instance, Facebook, were very volatile during that time. I asked myself: Can COVID-19 can disturb the stocks of such a valuable company like Facebook? That was the reason why I decided to investigate the topic of the stock market more deeply.

1.2 Research question

How did increasing coronavirus cases influence Facebook stocks? (can vary)

1.3 Aim of the investigation

The investigation aims to compare the manually created trendline of the stock prices of Facebook with the trendline automatically created by Google Sheets. What if the additional normal distribution would be presented as an indicator of the most abundant price of the company stock at that time and compared with the theoretical projection?

2.1 Data collection

I have collected the data that are shown in Table 1 and Table 2 included in the Appendix. The data connected with Facebook stocks were gathered from Yahoo Finance website [5]. The file with Facebook stock data at the given range of dates [22–01–2020–31–03–2020]. Whereas coronavirus data concerning total cases from 22–01–2020 to 31–03–2020 were taken from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [6]. Due to the availability of coronavirus data that started on the 22nd of January 2020 I was forced to adjust to the same range dates of Facebook stocks. The NASDAQ stock market was opened for 49 days within the range due to its inactivity during weekends and holy days, such as becauseNew Year’s Day, etc. It included columns of Date, Open, High, Low, Close, Adj Close, Volume. During my analysis, I only used Date and Adj Close columns. In the Google Spreadsheet additionally, I calculated the Facebook daily percentage and value changes seen in columns Percentage change and Value change ($) in Table 1 and Table 2 in the Appendix.

All of the rounding to the hundredths were conducted because percentage change with more accurate elements, like one-thousands, would be insignificant from the point of this exploration analysis. Because stock prices of Facebook are rounded on NASDAQ to hundredth elements, my analysis also involved the rounding of stock values to hundredth elements as well as percentage changes.

2.2 Data presentation

Graph 1. The graph represents the data from Table 1 and Table 2 in the Appendix throughout 22.01.2020–31.03.2020.

2.3 Data preprocessing

It is important to indicate the first day of the observation which is 2020–01–22 has to be fitted with total coronavirus daily cases.

2.4 First assumptions

Initially, I thought that the coronavirus had a drastic impact on Facebook and Graph 1 proved that. However, I wondered if any fluctuations were on the greater scale earlier in the history before the coronavirus pandemic. On the other hand, all of the negative consequences during that time came from COVID-19.

3.1 Simple linear regression

At the start of any analysis, I wondered whether there is a linear correlation between increasing the number of coronavirus patients with the value of Facebook stocks. Therefore, I decided to exploit linear regression to describe the relationship between independent and dependent variables. The obtained straight line allows us to estimate and predict how dependent variables would change when the independent variable changes. However, simple linear regression a couple of assumptions [7]:

  1. Homogeneity of variance (homoscedasticity): the size of the error in our prediction doesn’t change significantly across the values of the independent variable.
  2. Independence of observations: the observations in the dataset were collected using statistically valid sampling methods, and there are no hidden relationships among observations.
  3. Normality: The data follows a normal distribution.

Considering all of the data from Tables 1 and 2 from the Appendix and Table 1 above I calculated the Linear Regression equation that symbolises the linear trend of the gathered data.

Calculation summary:

Graph 2. The graph represents the automatically annotated trend line from the collected data.

At the beginning of the observation, the global number of coronavirus cases grew steadily, hence the Facebook prices remained unchanged and fluctuated relatively safely. As the number of COVID-19 patients has been growing exponentially, there was a sudden decrease in Facebook stock prices. I expected to receive a more negative gradient of this linear regression. If I would have considered less stable companies, then the negative gradient would have been far greater.

3.2 Pearson’s correlation coefficient

Pearson’s correlation coefficient is a technique for investigating the relationship between two quantitative, continuous variables, in my example date of the stock market and the value of the Facebook stock. Pearson’s correlation coefficient (r) is a measure of the strength of the association between the two variables [8]. The formula to calculate this coefficient is given below:

3.2.1 Manual calculation of Pearson’s coefficient

Using Tables 1, the summary in Table 2 and the above equation there was calculated the Pearson’s coefficient. The results were calculated using the Pearson Correlation Coefficient Calculator [9].
Hence:

The reason for rounding R squared value to two decimal places is due to the fact that any thousandths or even more accurate rounding approaches would be insignificant from the statistical analysis perspective.

3.2.2 Comparison

As it is similar to Graph 2 I compared the manual results of the calculation of a Person’s correlation coefficient with the one generated automatically by Google Sheets.

Graph 3. The graph represents the generated chart from the gathered data and automatically labelled Pearson’s correlation coefficient which is equal to 0.44.

3.2.3 Analysis in the context

Pearson’s correlation coefficient (R2) from the calculation was equal to -0.6631. That is exactly what Google Spreadsheet automatically generated. -0.6631 correlation indicates that 66.31% of the dependent data are all the values from the Value at the closing day column. Before the squaring process, the r was equal to approximately 0.44. The outcome was rounded to hundredths because there is no significant difference if there would be one-thousandths, etc. According to Figure 1 attached below, this correlation is considered to be a moderate negative correlation. In the case of the stock analysis, it is crucial to constantly monitor the R2 value as the stock market tends to be unpredictable, hence the correlation coefficient differs. I expected my Pearson’s coefficient to be higher, around 0.7. Perhaps, it proves why the stock values are influenced by many other variables, often unseen at the first glance.

Figure 1. The figure represents different levels of negative correlation strengths with the circled range of my outcome [10].

3.2.5 Additional example of R2 calculation

My interest in data analysis is strongly connected with the Data Science field. It consists of being proficient in the Python programming language. I decided to challenge myself and prove that the Google Spreadsheet Pearson’s correlation coefficient is valid.

From Google Spreadsheet I extracted the exact data, the same as those presented in Table 1 and Table 2 in the Appendix and downloaded them in the form CSV (comma-separated values) file called “COVID-19 cases and Facebook stocks.csv”.

Later on, I used Google Collab where I created the ipynb (python notebook) file for the analysis.

Numpy packages are the additional Python libraries that facilitate the calculation over multi-dimensional arrays and matrices. I picked one of the built-in functions called np.corrcoef that directly returns the Pearson product-moment correlation coefficients as follows in the code below.

Figure 2. The figure presents the Python code used to obtain Pearson’s coefficient.

I obtained -0.6630614527000043 which after rounding is equal to -0.6631, the same as it was calculated manually in the 3.2.1 section.

This value was also squared. I obtained the value 0.43965049005664 which after rounding to two decimal places was equal to 0.44.

This additional example of R2 Python calculation provided evidence that the automatically generated R2 value is valid and is identical to the one calculated manually.

3.3 Normal distribution

The normal distribution, also known as Gaussian Distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean [11]. It is considered to be one of the most abundant statistical analysis methods in the stock market. Normal distribution combines two aspects of mathematics which is: standard deviation and mean.

The former one is a measure of how the spread is valued within the group of numbers, in this context the prices of Facebook stocks. It has a general formula:

Combining those two equations and adding relevant features, gives us the probability density function (PDF) of the normal distribution formula which is:

The general purpose of providing a PDF function is to calculate the likelihood of a value occurring in the discrete range. The probability density function is often presented graphically. It is an especially useful tool in the stock analysis and quantification of its potential. Those graphs are typically depicted on a graph, with a normal bell curve indicating neutral market risk, and a bell at either end indicating greater or lesser risk/reward [12] depicted in Figure 3 [13].

Figure 3. The image depicts the PDF bell curve functions that present four different situations with different values of mean and standard deviation [13].

3.3.1 Calculation of normal distribution of Facebook stocks data

Programs, such as Google Sheets or Microsoft Excel possess sophisticated algorithms that facilitate normal distribution calculation with additionally presenting it in an understandable graphical manner.

Calculation of normal distribution before its graphical representation requires calculating the mean and standard deviation using previously presented equations.

To complete the normal distribution calculation in Google Sheets it is required to create bins. Those are the desired range of values of stocks, for instance, a bin 145 refers to the range of values 145–149. They are used to calculate the frequency of Facebook stocks prices and are required to visualise the normal distribution bell curve.

In the context of my analysis, I created that kind of bins from 130 to 245 of price value for Facebook stocks.

Later on, I was supposed to calculate the frequency of stock prices that are suited within one of the bins. There is a function available to conduct that very easily called =Frequency(stock prices, bins). Thanks to that I managed to calculate and see what prices of Facebook stocks are most common in the dataset.

Then I listed all the Normal distribution values using the Google sheets =NORMDIST function, as in the example:

I calculated normal distribution values for every bin I stated previously (from 130–245).

However, to be as close as the frequency values that I calculated I was supposed to multiply all of the normal distribution values by 800.

Having done that I listed all of the values from my calculation in Table 4.

Graph 4. Normal distribution of Facebook stocks and its comparison to the theoretical normal distribution curve (blue).

4. Conclusion and evaluation

The investigation showed that the automatically generated linear regression equation, equal to y = — 8.32 10–5 X + 203 is identical to the one manually generated. Furthermore, Pearson’s correlation coefficient (R2) generated by Google Sheets was the same as the manually calculated and calculated using the Python programming language package which was equal to R2 = 0.44. It shows how dependent these values are on each other. The normal distribution curve shows that the most abundant prices were those above $215 and also the ones ranging from $145 to $150. The latter range is considered to reflect the pandemic times decline which started in the middle of March 2020. By analysing the Gaussian Distribution curves, theoretically, the most abundant Facebook stock prices should be around $190. I did not expect that fluctuation over this short period. It appeared to me that the stock market is strictly dependent on the global situation, even if Facebook stocks are in the American NASDAQ.

A negative value in the obtained equation indicates that during the time from 2020–01–22 to 2020–03–31 the overall trend of Facebook stock prices went down in the perspective of a couple of weeks. During that timeline the highest value was equal to 223.23 was noted on 2020–01–28 and consequently, the lowest price was equal to 146.01 noted on 2020–03–17.

The difference in prices is equal to:

223.23 -146.11 = $77.22

While the day difference is equal to:

(2020–03–17) -(2020–01–28) = 49 days

The difference between the highest and the lowest price as a percentage decline:

Percentage decline =$77.22 $223.23x 100% -34.59%

In the perspective of just 49 days, the global pandemic caused the price of the stock to fall by -34.59% which is a lot in the stock exchange market. The falling percentage was rounded to the hundredths because it is the most common practice in the stock market. Private investors with that decline could have lost the fortunes of their lives. The major consequence of such stocks variations resulted from pessimistic predictions of shareholders. It taught me that the risk in the stock market is full of risks and very often it is unpredictable. However, one thing is sure. COVID-19 negatively impacted the global population’s pockets. Unfortunately, experts continue to warn that the more drastic economic consequences are still ahead of us [14].

What should have been done better in this exploration? The limitation in that investigation was connected with the limited range of data gathered from Yahoo Finance. In the history of Facebook stocks prices, there could have been weekly fluctuations ranging from +/-20% of the initial value. It would have been better to compare historical data with that coming from the COVID-19 pandemic. Although the methods of analysis and comparison of the Facebook stock varied there ought to be more graphical representation involved in it. The purpose of the stock exchange market is to provide visualisation methods to display the condition of certain stock prices. It is unusual to provide only mathematical approaches to the analysis.

Bibliography

  1. James Chan, 2020, Stock Market, Investopedia, accessed on 14.12.2020. Available from: https://www.investopedia.com/terms/s/stockmarket.asp
  2. Jason Fernando, 2020, Market Capitalization, Investopedia, accessed on 14.12.2020. Available from: https://www.investopedia.com/terms/m/marketcapitalization.asp
  3. Wikipedia, 2021, List of stock exchanges, Wikipedia, accessed on 8.02.2021. Available from: https://en.wikipedia.org/wiki/List_of_stock_exchanges
  4. Jim Probasco, 2020, What is the Nasdaq? Understanding the global stock exchange that’s home to the fastest-growing, most innovative companies, Businessinsider, accessed on 31.12.2020. Available from: https://www.businessinsider.com/what-is-nasdaq?IR=T
  5. Yahoo Finance, n.d., accessed on 15.12.2020. Available from: https://finance.yahoo.com/
  6. Johns Hopkins Coronavirus Resource Center, 2020, COVID-19 Map, accessed on 20.12.2020. Available from: https://coronavirus.jhu.edu/map.html
  7. Rebecca Bevans, 2020, An introduction to simple linear regression, Scribbr, accessed on 12.12.2020. Available from: https://www.scribbr.com/statistics/simple-linear-regression/
  8. The University of the West of England, n.d., Data Analysis — Pearson’s Correlation Coefficient, accessed on 21.12.2020. Available from: http://learntech.uwe.ac.uk/da/default.aspx?pageid=1442
  9. Socscistatistics, n.d., Pearson Correlation Coefficient Calculator, Socscistatistics, accessed on 22.12.2020. Available from: https://www.socscistatistics.com/tests/pearson/default2.aspx
  10. Haese, R., 2019, Mathematics Analysis and Approaches SL, accessed on 14.12.2020. 3rd ed. Adelaide Airport, S. Aust.: Haese Mathematics
  11. James Chen, 2020, Normal Distribution, Investopedia, accessed on 20.12.2020. Available from: https://www.investopedia.com/terms/n/normaldistribution.asp
  12. Brian Dolan, 2020, What is Probability Density Function (PDF)?, Investopedia, accessed on 19.12.2020. Available from: https://www.investopedia.com/terms/p/pdf.asp
  13. Wikipedia contributors, 2008, Bell-shaped function, Wikipedia, accessed on 03.01.2021. Available from: https://en.wikipedia.org/wiki/Bell_shaped_function
  14. Steve Schifferes, 2021, World economy in 2021: here’s who will win and who will lose, The Conversation, accessed on 10.02.2021. Available from: https://theconversation.com/world-economy-in-2021-heres-who-will-win-and-who-will-lose-152631

Appendix

--

--

Michael Swiatek

AI enthusiast. MedTech enthusiast. Neuroscience consumer. Aspiring entrepreneur. Programmer. Thinker.