I have run some queries against BigQuery public data sets for weather. I have used both GSOD (bigquery-public-data.noaa_gsod.gsod2019)and GHCN (bigquery-public-data.ghcn_d). In both cases, the most recent data I get is from 4 or 5 days ago. Why is that? What can I do to get more recent historic data e.g. this morning or yesterday by lat and long.
Those historical datasets are provided by NOAA a few days delayed. For real-time weather, you have to use one of the commercial weather data providers in the Marketplace.
Related
I have an issue with setting up Appsflyer Cost ETL with Google BigQuery. We get parquet files each day.
The issue is the following - each day you get the file with 10 dates.
enter image description here
The problem is that each day you have 6 dates that shoud rewrite yesterday file. And the task is how to set a data transfers or scheduled queries to override the data for each date that you have in newer file to make the data for long period in one table.
I am trying to create something similar to what the weather apps have which show a graph of the temperatures from today.
I am currently using the onecall endpoint which returns the current weather along with forecasts for the next 48 hours but this doesn't cover my case of showing earlier data from the day.
Am I miss using this endpoint or is there a way to show the data for the current day (both historical from earlier and forecast for later)? Or do I need to use the historical data endpoints?
I'm trying to store daily snapshot of Employee data for FTE analysis project - analytics on how many FTE in various positions any given day.
I can call a REST API, which will give me data for all the active and terminated employees as of API call time. Is it prudent to call this API every single day and store daily snapshot or store only records which changed from previous version. What is the common design principle for this use case. Thanks!
I ended up using Delta tables on Databricks.
Sample code to create the table and load data
#read data into a dataframe
df = spark.read.format("json").option("multiline", "true").load("/mnt/datalake/path/to/file/employee.json")
#load data into a managed delta table
df.write.format("Delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("raw.DimEmployee")
Now you can query data of any given day using below query
select count(*) from raw.DimEmployee timestamp as of '2022-09-14'
I'm having some trouble with a Coinbase.com API call for historical data.
Previously, I was getting a variable length of days that would match the amount of space available on a terminal screen with a request URL that looked like this:
https://api.coinbase.com/v2/prices/historic?currency=USD&days=76
This would pull the previous 76 days of price history. An example of the old output is here:
https://gist.github.com/KenDB3/f071a06ab3ef1a899d3cd8df8b40a049#file-coinbase-historic-days-example-2017-12-23-json
This stopped working a few days ago. The closest I can get to this is with this request URL (though I don't get the data I want):
https://api.coinbase.com/v2/prices/BTC-USD/historic?days=76
The output from this can be seen here:
https://gist.github.com/KenDB3/f071a06ab3ef1a899d3cd8df8b40a049#file-coinbase-historic-days-example-2018-07-19-json
In the second example, it is just displaying prices from the day of the query at different times of that day. What I really want is the first example output where it gives a single price per day going back as many days as the request is for.
The project this is connected to is here:
https://github.com/KenDB3/SyncBTC
Links that do not work:
https://api.coinbase.com/v2/prices/historic?currency=BTC-USD&days=76
(No Results)
https://api.coinbase.com/v2/prices/BTC-USD/historic?2018-07-15T00:00:00-04:00
(Does not pull data from 7/15/2018)
Any reason you aren't using coinbase pro?
The new api is very easy to use. Simply add the get command you want followed by the parameters separated with a question mark. Here is the new historic rates api documentation:
https://docs.cloud.coinbase.com/exchange/reference/exchangerestapi_getproductcandles
The get command with the new api most similar to prices is "candles". It requires three parameters to be identified, start and stop time in iso format and granularity which is in seconds. Here is an example:
https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-10T12:00:00&end=2018-07-15T12:00:00&granularity=900
EDIT: also, note the time zone is not for your time zone, I believe its GMT.
Here is a wrapper for the CoinBase API for the export of Historical Data: https://pypi.org/project/Historic-Crypto/
It should provide the required outcome through invoking:
pip install Historic-Crypto
from Historic_Crypto import HistoricalData
new = HistoricalData('ETH-USD',300,'2020-06-01-00-00').retrieve_data()
for a full list of cryptocurrencies available:
pip install Historic-Crypto
from Historic_Crypto import Cryptocurrencies
data = Cryptocurrencies(extended_output=False).find_crypto_pairs()
I work with Vue.js and chartjs , and I have several water consumption data at different times. I would like to represent this data as a stack group per day. But all the examples I found are made with at least two different categories of data. I would like a bar to be the concatenation of all daily consumption. Here is an example of representation that I would like to get with my data. I need help