Predicting Bitcoin Prices
Practicing Deep Learning Time Series Forecasting on BitCoin Data
From ZeroToMastery Tensorflow Course
!nvidia-smi -LGPU 0: Tesla K80 (UUID: GPU-3d431c8e-89fd-3326-713b-0dfee7b92355)Table of Contents
Get Data
Download all historical data from here: https://www.coindesk.com/price/bitcoin/
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
BTC
324.71833
331.60083
332.75133
323.06333
2014-11-05
BTC
332.45666
324.71833
335.81166
320.93333
2014-11-06
BTC
336.58500
332.45666
341.49000
328.56166
2014-11-07
BTC
346.77500
336.58500
351.57500
336.02833
2014-11-08
BTC
344.81166
346.77500
351.29500
339.86000
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2021-10-21
BTC
62603.575070
65986.244921
66635.466408
62290.325376
2021-10-22
BTC
60689.238265
62192.232903
63717.168947
60033.928038
2021-10-23
BTC
61124.347126
60692.609117
61733.585205
59694.050434
2021-10-24
BTC
60936.150851
61304.790355
61490.095150
59553.027885
2021-10-25
BTC
63004.381115
60860.032655
63661.925180
60656.561825
Note: we have 2548 rows, but generally deep learning models need a lot more data to be effective
Smaller sample size is a problem that is common to time series problems
Note: Seasonality = number of samples per year. This data has a seasonality of 365
Important Time series patterns
Trend: time series has a clear long term increase/decrease
Seasonal: time of year / day of week / season / etc.
Cyclic: has no fixed time frame
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
2014-11-05
332.45666
2014-11-06
336.58500
2014-11-07
346.77500
2014-11-08
344.81166

Importing time series data with Python's CLV module

Creating train test splits

Side note: This article is fun read about differences in time series compared to regular machine learning
https://towardsdatascience.com/3-facts-about-time-series-forecasting-that-surprise-experienced-machine-learning-practitioners-69c18ee89387
Create a plotting function

Modelling Experiments
0: Naive
1: Dense, horizon = 1, window = 7
2: Dense, horizon = 1, window = 30
3: Dense, horizon = 7, window = 30
4: Conv1D, 1 7
5: LSTM 1 7
6: Dense but multivariate
7: N-BEATs 1 7
8: Ensemble
9: Future prediction
10: Silly model
Model 0: Naive Forecast
The formula looks like this:
The prediction at time t will be equal to previous timestep at time t-1

Review: Different Evaluation Metrics
Regression problem so can use regression metrics:
MAE - interpretable - forecasts lead to median
RMSE - interpretable - forecasts lead to mean
MASE - comparing model's performance to naive forecast
MAPE - percentage errors can be dangerous if y=0
Essentially looking for this: how do our model's forecast compare against actual values?
See this link: https://otexts.com/fpp3/accuracy.html
We can guess how the model is doing by looking at where the majority of values are
For example, if most values are closer to 60k, then an error of 1000 is not that bad
Side note: Other models we could use
This notebook is focused on tensorflow, but maybe for future work can use those and compare to the final best tensorflow model
Moving average https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/
ARIMA (Autoregression Integrated Moving Average) https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/
sktime (Scikit-Learn for time series) https://github.com/alan-turing-institute/sktime
TensorFlow Decision Forests (random forest, gradient boosting trees) https://www.tensorflow.org/decision_forests
Facebook Kats (purpose-built forecasting and time series analysis library by Facebook) https://github.com/facebookresearch/Kats
LinkedIn Greykite (flexible, intuitive and fast forecasts) https://github.com/linkedin/greykite
Windowing our Time Series
We window to turn our data into a supervised learning problem
Great let's now use this basic function to iterate through all the data
For loops would take a while so we will use numpy array indexing: https://towardsdatascience.com/fast-and-robust-sliding-window-vectorization-with-numpy-3ad950ed62f5
Turning windows into training and test sets
Modeling Checkpoint
We want to compare the best performance/epoch of each model
Model 1: Dense 1, 7
Single dense layer with 128 and relu
Output layer with linear activation (no activation)
Adam and MAE
Batch size 128 (data is pretty small so can do bigger batches)
100 epochs
Could tune the hyperparameters but that could be an extension for later
Evaluate
Forecast on the test dataset

Model 2: Dense 30, 1
Evaluate
Didn't perform as well as model 1 - could totally run a for loop with different window sizes

Model 3: Dense 30, 7
Evaluate
It is getting results for each week

Comparing our models so far

So why does naive model do so well?
Because of autocorrelation in the data. The value at t+1 is typically pretty close to value at t
https://towardsdatascience.com/how-not-to-use-machine-learning-for-time-series-forecasting-avoiding-the-pitfalls-19f9d7adf424
Model 4: Conv 1D model
We need an input shape of (batch_size, timesteps, input_dim)
Evaluate
Model 5: LSTM model
Evaluate
Model 6: Multivariate model
I would love to add Elon Musk tweeting as an additional variable but I think that would be better for dodgecoin
Maybe could add number of tweets about bitcoin
However, bitcoin halving seems like a strong variable to test
Sidenote: here's how to do regression with multivariate time series data: https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
2014-11-05
332.45666
2014-11-06
336.58500
2014-11-07
346.77500
2014-11-08
344.81166
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
25
2014-11-05
332.45666
25
2014-11-06
336.58500
25
2014-11-07
346.77500
25
2014-11-08
344.81166
25

Preparing data for multivariate model
Our current functions won't work because we have two variables
Let's use pandas.DataFrame.shift()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
25
NaN
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-05
332.45666
25
324.71833
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-06
336.58500
25
332.45666
324.71833
NaN
NaN
NaN
NaN
NaN
2014-11-07
346.77500
25
336.58500
332.45666
324.71833
NaN
NaN
NaN
NaN
2014-11-08
344.81166
25
346.77500
336.58500
332.45666
324.71833
NaN
NaN
NaN
2014-11-09
343.06500
25
344.81166
346.77500
336.58500
332.45666
324.71833
NaN
NaN
2014-11-10
358.50166
25
343.06500
344.81166
346.77500
336.58500
332.45666
324.71833
NaN
2014-11-11
368.07666
25
358.50166
343.06500
344.81166
346.77500
336.58500
332.45666
324.71833
2014-11-12
376.99666
25
368.07666
358.50166
343.06500
344.81166
346.77500
336.58500
332.45666
2014-11-13
442.10666
25
376.99666
368.07666
358.50166
343.06500
344.81166
346.77500
336.58500
Yay now we can have data like this:
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-11
25.0
358.501648
343.065002
344.811646
346.774994
336.584991
332.456665
324.718323
2014-11-12
25.0
368.076660
358.501648
343.065002
344.811646
346.774994
336.584991
332.456665
2014-11-13
25.0
376.996674
368.076660
358.501648
343.065002
344.811646
346.774994
336.584991
2014-11-14
25.0
442.106659
376.996674
368.076660
358.501648
343.065002
344.811646
346.774994
2014-11-15
25.0
389.003326
442.106659
376.996674
368.076660
358.501648
343.065002
344.811646
This looks like a basic setup for a regression model!
Creating model
Evaluate
Model 7: N-BEATS algorithm
Let's build the biggest model based on 2020's state-of-the-art N-BEATS model
Let's see how it performs!
Building and testing the 'Block Input' Layer
It doesn't exist in TF so we need to create it: https://www.tensorflow.org/guide/keras/custom_layers_and_models
Let's test this class
So far so good
Data Pipeline
Using tf.data: https://www.tensorflow.org/guide/data_performance
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
2014-11-05
332.45666
2014-11-06
336.58500
2014-11-07
346.77500
2014-11-08
344.81166
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
NaN
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-05
332.45666
324.71833
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-06
336.58500
332.45666
324.71833
NaN
NaN
NaN
NaN
NaN
2014-11-07
346.77500
336.58500
332.45666
324.71833
NaN
NaN
NaN
NaN
2014-11-08
344.81166
346.77500
336.58500
332.45666
324.71833
NaN
NaN
NaN
Hyperparameters for the model
See table 18 in the paper (we are doing N-BEATS-G)
Preparing for Residual Connections (see the diagram with the stacks)
The N-BEATS algorithm uses double residual stacking (3.2) because the architecture is pretty deep (vanishing gradient issues in deep models)
See https://paperswithcode.com/method/resnet
Building, Compiling, and Fitting N-BEATS
Setup an instance of the N-BEATS block layer (this will be first block, and then will set up a loop to create stacks)
Create an input layer for the N-BEATS stack (using Functional API)
Make the initial backcast and forecast for the model using (1.)
Use for loop to create stacks of block layers
Use the
NBeatsBlockclass within the for loop in (4.) to create blocks which return backcasts and block-level forecastsCreate the double residual stacking using subtract and add layers
Put the model inputs and outputs together in
tf.keras.Model()Compile the model with MAE loss (the paper uses multiple losses but we'll use MAE to be consistent) and Adam optimizer (paper used it)
Fit the N-BEATS for 5000 epochs and use some callbacks:
Early Stopping
Reduce LR on plateau

Model 8: Creating an Ensemble
An ensemble leverages the wisdom of the crowd effect
It combines many different models to predict a common goal
Make predictions with ensemble model
Plotting the prediction intervals
One benefit of using an ensemble is the ability to get prediction intervals
Bootstrap method:
Take the predictions from a number of randomly initialized models (ensemble models)
Measure the standard deviation of the predictions
Mulitiply the standard deviation by 1.96 (assuming the distribution is Gaussian/Normal)
To get the prediction interval upper and lower bounds, add and subtract the value obtained in (3.) to the mean/median of the predictions made in (1.)
See: https://eng.uber.com/neural-networks-uncertainty-estimation/

So far all the models predictions are lagging behind the test data, essentially doing the same thing as the naive model - predicting the prev timestep as the next timestep
Potential Problems:
Overfitting
Model might be copying the naive model
Data isn't predictable (I believe this is the main problem)
Note: the prediction intervals were made assuming the model's data is normally distributed
Model 9 - predicting into the future by training on full data
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
Date
2014-11-04
324.71833
25
NaN
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-05
332.45666
25
324.71833
NaN
NaN
NaN
NaN
NaN
NaN
2014-11-06
336.58500
25
332.45666
324.71833
NaN
NaN
NaN
NaN
NaN
2014-11-07
346.77500
25
336.58500
332.45666
324.71833
NaN
NaN
NaN
NaN
2014-11-08
344.81166
25
346.77500
336.58500
332.45666
324.71833
NaN
NaN
NaN
Make predictions into the future
To make predictions into the future we want a function which:
Takes as an input:
a list of values (Bitcoin historical)
a trained model
a window into the future to predict (
INTO_FUTURE)window size a model was trained (
WINDOW_SIZE)
Creates an empty list for future forecasts which we will fill up and extracts the last
WINDOW SIZEvalues from the input valuesLoop
INTO_FUTUREtimes making a prediction onWINDOW_SIZEsequences which update to remove the first value and append the latest prediction
Plotting the predictions into the future

Model 10 - turkey data
Showing why forecasting is BS
One single unpredictable datapoint can ruin everything

And with just one value change, our error metrics go through the roof.
To make sure, let's remind ourselves of how model_1 went on unmodified Bitcoin data (no turkey problem).

Highly unlikely price movements (based on historical movements), upward or downward will likely never be part of a forecast.
However, as we've seen, despite their unlikeliness, these events can happen and will have huuuuuuuuge impacts to the performance of our models.
Comparing the models
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
naive_model
924.304138
2.055079e+06
1433.554810
2.604794
0.998342
model_1_dense_w7_h1
933.083740
2.106266e+06
1451.298096
2.630212
1.007824
model_2_dense_w30_h1
990.399170
2.314996e+06
1521.510986
2.788899
1.059855
model_3_dense_w30_h7
2047.885620
1.015826e+07
2338.837646
5.766706
2.206674
model_4_CONV1D
923.650879
2.076426e+06
1440.981079
2.604802
0.997636
model_5_LSTM
998.500732
2.332680e+06
1527.311401
2.825422
1.078481
model_6_multivariate
922.858459
2.067919e+06
1438.026123
2.603089
0.996780
model_8_NBEATs
952.499390
2.230254e+06
1493.403564
2.683706
1.028795
model_9_ensemble
922.459595
2.083583e+06
1443.462036
2.603472
0.996349
model_10_turkey
20278.744141
6.335656e+08
24717.240234
108.450264
19.489779

Last updated