Get Bitcoin historical data in the interval of minutes - bitcoin

Can anybody tell me where can I download Bitcoin historical data? I'm able to find the daily price. But that is not enough for my project. It will be better if I can have the price data in the interval of minutes.

Here you have tool that imports bitcoin blockchain data into graph database
https://bitcointalk.org/index.php?topic=252033.0

You can get thousands of historical datasets from https://spreadstreet.io.
All datasets are available as a tab-delimited, CSV file. The datasets are a mix between raw tick data, OHLCV, spreads, mining and economic statistics. They can be used for backtesting, analysis, and charting.
Here is an example dataset. This is from Poloniex, and is the 30 min timeframe. We have lower time frames, but the files are larger than I wanted to link in this question.
Disclaimer: I am the founder of Spreadstreet.io

Related

NBeats does not work for long-term daily time series data. How can I process my data to correct this?

I am following the code as per this notebook: https://www.kaggle.com/code/stpeteishii/google-stock-prediction-darts/notebook
It's a really simple implementation of the NBeats model, and it takes into account 400 days of the Google stockprice. However, when I increase the database to take into account all 4000 days, the N-Beats forecast is completely thrown of and gives a completely incorrect forecast with a mape of over 100.
I understand that one possibility is due to the seasonal nature of the stockprice. I tried the Dicky Fueller test (not using the Darts library) and it is indeed seasonal data. However, when I try to inspect seasonality with the Darts library as follows:
from darts.utils.statistics import plot_acf, check_seasonality
plot_acf(train, m=365, alpha=0.05)
I get a max lag error.
Can someone please help me as to how to forecast stock market data of a 10 year daily timeseries using the darts library? Also, I'd be extremely grateful if someone could give me some additional tips when it comes to forecasting stock data with darts since there's not much info/tutorials available online.
I've gone through the entire docs and I've got a good hang of working with darts using the darts open-source databases, however none of the examples seem to work with stock market data.
Any help would be greatly appreciated.
Thanks!

Most flexible way to store personal financial historical transaction/trade data

I'm not talking about time-series Open High Low Close data but rather user trade actions that result in a transaction for accounting purposes. I am not proficient at databases and typically avoid using them not because of difficulty but mostly due to efficiency. From my benchmarks on time-series data I found that CSV files to be performant/reliable compared to various time-series storage formats like HDF, LMDB, etc. I'm know I didn't try all possible databases out there for example leveldb, SQL, etc - but like I said I am not proficient in databases and avoid the unnecessary overhead if I can help it.
Is it possible to enumerate the different ways accounting financial data can be stored such that its possible to easily insert, delete, update the data. Once the data has been aggregated it doesn't need to be touched just traversed in order from past to future for accounting purposes. What methods are out there? Are databases my only option? CSV files would be a bit tougher here since time-series data implies a period where as transaction data is not predictable and happen as user events so a single CSV file with all financial history is possible except its expensive to update/insert/delete. I could recreate the CSV file each time too whenever I want to update/insert/delete - that is possible as well but maybe not what I am looking for. Another idea is to do monthly or daily statements stored as CSV. I want to be able to compare my stored data against remote resources for integrity and update my stored financial data if there are any inaccuracies.

How does Big Query store millions of rows of a column with categorical (duplicate) string values?

We are streaming around a million records per day into BQ and a particular string column has categorical values of "High", "Medium" and "Low".
I am trying to understand if Biq Query does storage optimisations other than compression at its own end and what is the scale of that? Looked for documentation on this and was unable to find explanations on the same.
For example if i have:
**Col1**
High
High
Medium
Low
High
Low
**... 100 Million Rows**
Would BQ Store it internally as follows
**Col1**
1
1
2
3
1
3
**... 100 Million Rows**
Summary of noteworthy (and correct!) answers:
As Elliott pointed out in the comments, you can read details on BigQuery's data compression here.
As Felipe notes, there is no need to consider these details as a user of BigQuery. All such optimizations are done behind the scenes, and are being improved continuously as BigQuery evolves without any action on your part.
As Mikhail notes in the comments, you are billed by the logical data size, regardless of any optimizations applied at the storage layer.
BigQuery constantly improves the underlying storage - and this all happens without any user interaction.
To see the original ideas behind BigQuery's columnar storage, read the Dremel paper:
https://ai.google/research/pubs/pub36632
To see the most recent published improvements in storage, see Capacitor:
https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format
BigQuery relies on Colossus, Google’s latest generation distributed file system. Each Google datacenter has its own Colossus cluster, and each Colossus cluster has enough disks to give every BigQuery user thousands of dedicated disks at a time.
You may gather more detail from the "BigQuery under the hood" page.

What is the best way to store market data for Algorithmic Trading setup?

I am making an Algorithmic Trading setup for trades automation. Currently, I have a broker API which helps me get historical data for all stocks that I'm interested in.
I am wondering how to store all the data, whether in a file system or database (SQL based or NoSQL). Data comes through REST API if thats relevant.
My use case here would be to query historical data to make trading decisions in live market. I would also have to develop a backtesting framework that will query Historical Data to check performance of strategy historically.
I am looking at a frequency of 5 mins - 1 hr candles and mostly Intraday trading strategies. Thanks
As you say, there are many options and as STLDeveloper says this is kind of off topic since it is opinion based... anyway...
A simple strategy which I used in my own Python back-testing engine is to use Python Pandas DataFrame objects, and save/load to disk in an HD5 file using to_hdf() and read_hdf(). The primary advantage (for me) of HD5 is that it loads/saves far more quickly than CSV.
Using the above approach I easily manage several years of 1 minute data for back testing purposes, and data access certainly is not my performance bottleneck.
You will need to determine for yourself if your chosen data management approach is fast enough for live trading, but in general I think if your strategy is based on 5-min candles then any reasonable database approach is going to be sufficiently performant for your purposes.

RRDtool what use are multiple RRAs?

I'm trying to implement rrdtool. I've read the various tutorials and got my first database up and running. However, there is something that I don't understand.
What eludes me is why so many of the examples I come across instruct me to create multiple RRAs?
Allow me to explain: Let's say I have a sensor that I wish to monitor. I will want to ultimately see graphs of the sensor data on an hourly, daily, weekly and monthly basis and one that spans (I'm still on the fence on this one) about 1.5 yrs (for visualising seasonal influences).
Now, why would I want to create an RRA for each of these views? Why not just create a database like this (stepsize=300 seconds):
DS:sensor:GAUGE:600:U:U \
RRA:AVERAGE:0.5:1:160000
If I understand correctly, I can then create any graph I desire, for any given period with whatever resolution I need.
What would be the use of all the other RRAs people tell me I need to define?
BTW: I can imagine that in the past this would have been helpful when computing power was more rare. Nowadays, with fast disks, high-speed interfaces and powerful CPUs I guess you don't need the kind of pre-processing that RRAs seem to be designed for.
EDIT:
I'm aware of this page. Although it explains about consolidation very clearly, it is my understanding that rrdtool graph can do this consolidation aswell at the moment the data is graphed. There still appears (to me) no added value in "harvest-time consolidation".
Each RRA is a pre-consolidated set of data points at a specific resolution. This performs two important functions.
Firstly, it saves on disk space. So, if you are interested in high-detail graphs for the last 24h, but only low-detail graphs for the last year, then you do not need to keep the high-detail data for a whole year -- consolidated data will be sufficient. In this way, you can minimise the amount of storage required to hold the data for graph generation (although of course you lose the detail so cant access it if you should want to). Yes, disk is cheap, but if you have a lot of metrics and are keeping low-resolution data for a long time, this can be a surprisingly large amount of space (in our case, it would be in the hundreds of GB)
Secondly, it means that the consolidation work is moved from graphing time to update time. RRDTool generates graphs very quickly, because most of the calculation work is already done in the RRAs at update time, if there is an RRA of the required configuration. If there is no RRA available at the correct resolution, then RRDtool will perform the consolidation on the fly from a high-granularity RRA, but this takes time and CPU. RRDTool graphs are usually generated on the fly by CGI scripts, so this is important, particularly if you expect to have a large number of queries coming in. In your example, using a single 5min RRA to make a 1.5yr graph (where 1pixel would be about 1 day) you would need to read and process 288 times more data in order to generate the graph than if you had a 1-day granularity RRA available!
In short, yes, you could have a single RRA and let the graphing work harder. If your particular implementation needs faster updates and doesnt care about slower graph generation, and you need to keep the detailed data for the entire time, then maybe this is a solution for you, and RRDTool can be used in this way. However, usually, people will optimise for graph generation and disk space, meaning using tiered sets of RRAs with decreasing granularity.