Predicting Bitcoin Prices

Open In Colab

Practicing Deep Learning Time Series Forecasting on BitCoin Data

From ZeroToMastery Tensorflow Course

!nvidia-smi -L
GPU 0: Tesla K80 (UUID: GPU-3d431c8e-89fd-3326-713b-0dfee7b92355)

Table of Contents

Get Data

Download all historical data from here: https://www.coindesk.com/price/bitcoin/

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Currency
Closing Price (USD)
24h Open (USD)
24h High (USD)
24h Low (USD)

Date

2014-11-04

BTC

324.71833

331.60083

332.75133

323.06333

2014-11-05

BTC

332.45666

324.71833

335.81166

320.93333

2014-11-06

BTC

336.58500

332.45666

341.49000

328.56166

2014-11-07

BTC

346.77500

336.58500

351.57500

336.02833

2014-11-08

BTC

344.81166

346.77500

351.29500

339.86000

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Currency
Closing Price (USD)
24h Open (USD)
24h High (USD)
24h Low (USD)

Date

2021-10-21

BTC

62603.575070

65986.244921

66635.466408

62290.325376

2021-10-22

BTC

60689.238265

62192.232903

63717.168947

60033.928038

2021-10-23

BTC

61124.347126

60692.609117

61733.585205

59694.050434

2021-10-24

BTC

60936.150851

61304.790355

61490.095150

59553.027885

2021-10-25

BTC

63004.381115

60860.032655

63661.925180

60656.561825

Note: we have 2548 rows, but generally deep learning models need a lot more data to be effective

Smaller sample size is a problem that is common to time series problems

Note: Seasonality = number of samples per year. This data has a seasonality of 365

Important Time series patterns

Trend: time series has a clear long term increase/decrease

Seasonal: time of year / day of week / season / etc.

Cyclic: has no fixed time frame

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price

Date

2014-11-04

324.71833

2014-11-05

332.45666

2014-11-06

336.58500

2014-11-07

346.77500

2014-11-08

344.81166

png

Importing time series data with Python's CLV module

png

Creating train test splits

png

Side note: This article is fun read about differences in time series compared to regular machine learning

https://towardsdatascience.com/3-facts-about-time-series-forecasting-that-surprise-experienced-machine-learning-practitioners-69c18ee89387

Create a plotting function

png

Modelling Experiments

0: Naive

1: Dense, horizon = 1, window = 7

2: Dense, horizon = 1, window = 30

3: Dense, horizon = 7, window = 30

4: Conv1D, 1 7

5: LSTM 1 7

6: Dense but multivariate

7: N-BEATs 1 7

8: Ensemble

9: Future prediction

10: Silly model

Model 0: Naive Forecast

The formula looks like this:

y^t=yt−1\hat{y}_{t} = y_{t-1}

The prediction at time t will be equal to previous timestep at time t-1

png

Review: Different Evaluation Metrics

Regression problem so can use regression metrics:

  • MAE - interpretable - forecasts lead to median

  • RMSE - interpretable - forecasts lead to mean

  • MASE - comparing model's performance to naive forecast

  • MAPE - percentage errors can be dangerous if y=0

Essentially looking for this: how do our model's forecast compare against actual values?

See this link: https://otexts.com/fpp3/accuracy.html

We can guess how the model is doing by looking at where the majority of values are

For example, if most values are closer to 60k, then an error of 1000 is not that bad

Side note: Other models we could use

This notebook is focused on tensorflow, but maybe for future work can use those and compare to the final best tensorflow model

Moving average https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/

ARIMA (Autoregression Integrated Moving Average) https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/

sktime (Scikit-Learn for time series) https://github.com/alan-turing-institute/sktime

TensorFlow Decision Forests (random forest, gradient boosting trees) https://www.tensorflow.org/decision_forests

Facebook Kats (purpose-built forecasting and time series analysis library by Facebook) https://github.com/facebookresearch/Kats

LinkedIn Greykite (flexible, intuitive and fast forecasts) https://github.com/linkedin/greykite

Windowing our Time Series

We window to turn our data into a supervised learning problem

Great let's now use this basic function to iterate through all the data

For loops would take a while so we will use numpy array indexing: https://towardsdatascience.com/fast-and-robust-sliding-window-vectorization-with-numpy-3ad950ed62f5

Turning windows into training and test sets

Modeling Checkpoint

We want to compare the best performance/epoch of each model

Model 1: Dense 1, 7

  • Single dense layer with 128 and relu

  • Output layer with linear activation (no activation)

  • Adam and MAE

  • Batch size 128 (data is pretty small so can do bigger batches)

  • 100 epochs

Could tune the hyperparameters but that could be an extension for later

Evaluate

Forecast on the test dataset

png

Model 2: Dense 30, 1

Evaluate

Didn't perform as well as model 1 - could totally run a for loop with different window sizes

png

Model 3: Dense 30, 7

Evaluate

It is getting results for each week

png

Comparing our models so far

png

So why does naive model do so well?

Because of autocorrelation in the data. The value at t+1 is typically pretty close to value at t

https://towardsdatascience.com/how-not-to-use-machine-learning-for-time-series-forecasting-avoiding-the-pitfalls-19f9d7adf424

Model 4: Conv 1D model

We need an input shape of (batch_size, timesteps, input_dim)

Evaluate

Model 5: LSTM model

Evaluate

Model 6: Multivariate model

I would love to add Elon Musk tweeting as an additional variable but I think that would be better for dodgecoin

Maybe could add number of tweets about bitcoin

However, bitcoin halving seems like a strong variable to test

Sidenote: here's how to do regression with multivariate time series data: https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price

Date

2014-11-04

324.71833

2014-11-05

332.45666

2014-11-06

336.58500

2014-11-07

346.77500

2014-11-08

344.81166

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price
block_reward

Date

2014-11-04

324.71833

25

2014-11-05

332.45666

25

2014-11-06

336.58500

25

2014-11-07

346.77500

25

2014-11-08

344.81166

25

png

Preparing data for multivariate model

Our current functions won't work because we have two variables

Let's use pandas.DataFrame.shift()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price
block_reward
Price+1
Price+2
Price+3
Price+4
Price+5
Price+6
Price+7

Date

2014-11-04

324.71833

25

NaN

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-05

332.45666

25

324.71833

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-06

336.58500

25

332.45666

324.71833

NaN

NaN

NaN

NaN

NaN

2014-11-07

346.77500

25

336.58500

332.45666

324.71833

NaN

NaN

NaN

NaN

2014-11-08

344.81166

25

346.77500

336.58500

332.45666

324.71833

NaN

NaN

NaN

2014-11-09

343.06500

25

344.81166

346.77500

336.58500

332.45666

324.71833

NaN

NaN

2014-11-10

358.50166

25

343.06500

344.81166

346.77500

336.58500

332.45666

324.71833

NaN

2014-11-11

368.07666

25

358.50166

343.06500

344.81166

346.77500

336.58500

332.45666

324.71833

2014-11-12

376.99666

25

368.07666

358.50166

343.06500

344.81166

346.77500

336.58500

332.45666

2014-11-13

442.10666

25

376.99666

368.07666

358.50166

343.06500

344.81166

346.77500

336.58500

Yay now we can have data like this:

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

block_reward
Price+1
Price+2
Price+3
Price+4
Price+5
Price+6
Price+7

Date

2014-11-11

25.0

358.501648

343.065002

344.811646

346.774994

336.584991

332.456665

324.718323

2014-11-12

25.0

368.076660

358.501648

343.065002

344.811646

346.774994

336.584991

332.456665

2014-11-13

25.0

376.996674

368.076660

358.501648

343.065002

344.811646

346.774994

336.584991

2014-11-14

25.0

442.106659

376.996674

368.076660

358.501648

343.065002

344.811646

346.774994

2014-11-15

25.0

389.003326

442.106659

376.996674

368.076660

358.501648

343.065002

344.811646

This looks like a basic setup for a regression model!

Creating model

Evaluate

Model 7: N-BEATS algorithm

Let's build the biggest model based on 2020's state-of-the-art N-BEATS model

Let's see how it performs!

Building and testing the 'Block Input' Layer

It doesn't exist in TF so we need to create it: https://www.tensorflow.org/guide/keras/custom_layers_and_models

Let's test this class

So far so good

Data Pipeline

Using tf.data: https://www.tensorflow.org/guide/data_performance

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price

Date

2014-11-04

324.71833

2014-11-05

332.45666

2014-11-06

336.58500

2014-11-07

346.77500

2014-11-08

344.81166

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price
Price+1
Price+2
Price+3
Price+4
Price+5
Price+6
Price+7

Date

2014-11-04

324.71833

NaN

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-05

332.45666

324.71833

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-06

336.58500

332.45666

324.71833

NaN

NaN

NaN

NaN

NaN

2014-11-07

346.77500

336.58500

332.45666

324.71833

NaN

NaN

NaN

NaN

2014-11-08

344.81166

346.77500

336.58500

332.45666

324.71833

NaN

NaN

NaN

Hyperparameters for the model

See table 18 in the paper (we are doing N-BEATS-G)

Preparing for Residual Connections (see the diagram with the stacks)

The N-BEATS algorithm uses double residual stacking (3.2) because the architecture is pretty deep (vanishing gradient issues in deep models)

See https://paperswithcode.com/method/resnet

Building, Compiling, and Fitting N-BEATS

  1. Setup an instance of the N-BEATS block layer (this will be first block, and then will set up a loop to create stacks)

  2. Create an input layer for the N-BEATS stack (using Functional API)

  3. Make the initial backcast and forecast for the model using (1.)

  4. Use for loop to create stacks of block layers

  5. Use the NBeatsBlock class within the for loop in (4.) to create blocks which return backcasts and block-level forecasts

  6. Create the double residual stacking using subtract and add layers

  7. Put the model inputs and outputs together in tf.keras.Model()

  8. Compile the model with MAE loss (the paper uses multiple losses but we'll use MAE to be consistent) and Adam optimizer (paper used it)

  9. Fit the N-BEATS for 5000 epochs and use some callbacks:

  • Early Stopping

  • Reduce LR on plateau

png

Model 8: Creating an Ensemble

An ensemble leverages the wisdom of the crowd effect

It combines many different models to predict a common goal

Make predictions with ensemble model

Plotting the prediction intervals

One benefit of using an ensemble is the ability to get prediction intervals

Bootstrap method:

  1. Take the predictions from a number of randomly initialized models (ensemble models)

  2. Measure the standard deviation of the predictions

  3. Mulitiply the standard deviation by 1.96 (assuming the distribution is Gaussian/Normal)

  4. To get the prediction interval upper and lower bounds, add and subtract the value obtained in (3.) to the mean/median of the predictions made in (1.)

See: https://eng.uber.com/neural-networks-uncertainty-estimation/

png

So far all the models predictions are lagging behind the test data, essentially doing the same thing as the naive model - predicting the prev timestep as the next timestep

Potential Problems:

  • Overfitting

  • Model might be copying the naive model

  • Data isn't predictable (I believe this is the main problem)

Note: the prediction intervals were made assuming the model's data is normally distributed

Model 9 - predicting into the future by training on full data

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

Price
block_reward
Price+1
Price+2
Price+3
Price+4
Price+5
Price+6
Price+7

Date

2014-11-04

324.71833

25

NaN

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-05

332.45666

25

324.71833

NaN

NaN

NaN

NaN

NaN

NaN

2014-11-06

336.58500

25

332.45666

324.71833

NaN

NaN

NaN

NaN

NaN

2014-11-07

346.77500

25

336.58500

332.45666

324.71833

NaN

NaN

NaN

NaN

2014-11-08

344.81166

25

346.77500

336.58500

332.45666

324.71833

NaN

NaN

NaN

Make predictions into the future

To make predictions into the future we want a function which:

  1. Takes as an input:

  • a list of values (Bitcoin historical)

  • a trained model

  • a window into the future to predict (INTO_FUTURE)

  • window size a model was trained (WINDOW_SIZE)

  1. Creates an empty list for future forecasts which we will fill up and extracts the last WINDOW SIZE values from the input values

  2. Loop INTO_FUTURE times making a prediction on WINDOW_SIZE sequences which update to remove the first value and append the latest prediction

Plotting the predictions into the future

png

Model 10 - turkey data

Showing why forecasting is BS

One single unpredictable datapoint can ruin everything

png

And with just one value change, our error metrics go through the roof.

To make sure, let's remind ourselves of how model_1 went on unmodified Bitcoin data (no turkey problem).

png

Highly unlikely price movements (based on historical movements), upward or downward will likely never be part of a forecast.

However, as we've seen, despite their unlikeliness, these events can happen and will have huuuuuuuuge impacts to the performance of our models.

Comparing the models

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

mae
mse
rmse
mape
mase

naive_model

924.304138

2.055079e+06

1433.554810

2.604794

0.998342

model_1_dense_w7_h1

933.083740

2.106266e+06

1451.298096

2.630212

1.007824

model_2_dense_w30_h1

990.399170

2.314996e+06

1521.510986

2.788899

1.059855

model_3_dense_w30_h7

2047.885620

1.015826e+07

2338.837646

5.766706

2.206674

model_4_CONV1D

923.650879

2.076426e+06

1440.981079

2.604802

0.997636

model_5_LSTM

998.500732

2.332680e+06

1527.311401

2.825422

1.078481

model_6_multivariate

922.858459

2.067919e+06

1438.026123

2.603089

0.996780

model_8_NBEATs

952.499390

2.230254e+06

1493.403564

2.683706

1.028795

model_9_ensemble

922.459595

2.083583e+06

1443.462036

2.603472

0.996349

model_10_turkey

20278.744141

6.335656e+08

24717.240234

108.450264

19.489779

png

Last updated