For years I've been using webpage requests like the following to retrieve 20 days at a time of minutewise stock data from Google:
http://www.google.com/finance/getprices?q=.INX&i=60&p=20d&f=d,c,h,l,o,v
= Retrieve for .INX (S&P 500 index) 60-second interval data for the last 20 days, with format Datetime(in Unix format), Close, High, Low, Open, Volume.
The Datetime is in Unix format (seconds since 1/1/1970, prefixed with an "A") for the first entry of each day, and subsequent entries show the intervals that have passed (so 1 = 60 seconds after the opening of the market that day).
That worked up until 9/10/2017, but today (9/17) it only returns day-end data (it even reports the "interval" between samples as 86400). Pooey! I can get that anywhere, in bulk.
But if I ask for fewer days, or broader intervals, it seems to return data - but weird data. Asking for data every 120 seconds returns exactly that - but only for every other market day. Weird!
Has anyone got a clue what might have happened?
Whoa! I think I figured it out.
Google still returns minutewise data for the same approximate limitations (up to 20 calendar days), but instead of d=10 returning all the market data for the last 10 calendar days, it return the data for the last 10 market days. Previously, to get the last 10 market days you would ask for d=14 (M-Fx2, plus two weekends). Now, Google interprets the d variable as market days, and asking for d=20 exceeds the limits on what they will deliver.
It now appears that d=15 is the limit (three weeks of market days). No clue on why I got the very weird every-other-day data for a while... but maybe if you exceed their d-limits the intervals get screwy. Dunno. Don't care. Easy fix.
Related
I’m a data analyst in the insurance industry and we currently have a program in SAS EG that tracks catastrophe development week by week since the start of the event for all of the catastrophic events that are reported.
(I.E week 1 is catastrophe start date + 7 days, week 2 would be end of week 1 + 7 days and so on) then all transaction amounts (dollars) for the specific catastrophes would be grouped into the respective weeks based on the date each transaction was made.
Problem that we’re faced with is we are moving away from SAS EG to GCP big query and the current process of calculating those weeks is a manually read in list which isn’t very efficient and not easily translated to BigQuery.
Curious if anybody has an idea that would allow me to calculate each week number in periods of 7 days since the start of an event in SQL or has an idea specific for BigQuery? There would be different start dates for each event.
It is complex, I know and I’m willing to give more explanation as needed. Open to any ideas for this as I haven’t been able to find anything.
I am looking to pull data between two time periods at only 15 to 30 mins apart. I want to be able to rerun the code multiple times to constantly update the data I had already pulled. I know there is a function for current system time but I am unable to use it effectively in SQL developer.
I have tried using the function CURRENT_TIMESTAMP but could not get it to work effectively.
Currently i am using the following code and just pulling over a broad time frame, but i would like to shrink that down to 15 to 30 minute intervals that could be used to continue to pull updated data.
I expect to be able to pull current data within 15 to 30 minute segments of time.
We're having trouble understanding the numbers in the daily summary emails we get from Fabric.
A search on SO only shows one somewhat-related question/answer.
Here are the emails from 2 consecutive days:
Our questions are:
Does “Monthly Active” mean over the last 30 days? If so, how can there be a 36% drop in 1 day if the counts went from 101 to 93 (an 8% drop)?
Why does “Daily Active” show a 75% drop if the current day is 1 and the previous day was 0?
Why does “Total Sessions” show a 94% drop if the current day is 1 and the previous day was 0?
Does the “Time in App per User” mean the average for the month or for the prior day? If it's for the month, why would 1 extra session cause the value to change so much? If it's for the day, why does it show “11:33m” even though the Total Sessions was 0?
Sometimes the “Time in App per User” ends in an “m” and sometimes it ends in an “s”. For example, “11:33m” and “0:44s”. Does that mean that “11:33m” is “11 hours and 33 minutes” and “0:44s” is “0 minutes and 44 seconds”? Or does the “11:33m” still mean “11 minutes and 33 seconds” and I should ignore the suffix?
Thanks for reaching out. Todd from Fabric here. The % change is actually % difference vs. what we expected based on the previous behavior of your app. This compensates for day of week etc.
The long session when getting zero, suggests that either the session was live/not reported to us at UTC midnight. The session gets created on session start and the duration gets set at the end.
Thanks!
I used to have a number of queries running on the past 40 days of data using a decorator with [dataset.table#-4123456789-].
However, since September 15 all the decorators return maximum 10 days of data.
By the way [dataset.table#0] returns the whole table and not the past 7 days as told in the documentation.
Does anyone know what is going on. Do I have to move my table to partition in order to receive data for a limited period of time but more the a week?
Thanks
I am looking for an algorithm to extract data from one system in to another but on a sliding scale. Here are the details:
Every two weeks, 80 weeks of data needs to be extracted.
Extracts take a long time and are resource intensive so we would like to distribute the load of the extract over time.
The first 8-12 weeks are the most important and need be updated more often over the two week window. Data further out can be updated less frequently to the point where the last 40 weeks+ could even just be extracted once every two weeks.
Every two weeks, the start date shifts two weeks ahead and so two new weeks are extracted.
Extract procedure takes a start and end date (this is already made and should be treated like a black box). The procedure could be run for multiple date spans in a day if required but contiguous dates are faster than multiple blocks of dates.
Extracts blocks should be no smaller than 2 weeks and probably no greater than 16 weeks. Longer blocks are possible but at 16 weeks are already a significant load to the system.
4 contiguous weeks of data takes about 1 hour approximately. It takes a long time because the data needs to be generated/calculated.
Data that is newly extracted replaces the old data for the timespan. No need to merge or diff the data, it is just replaced.
This algorithm needs to be built into a SQL job which will handle the daily process (triggered once a day only).
My initial thought was to create a sliding schedule pretty much. Rotate the first 4 week block every second day and then the second 4 week block every 3 to 4 days. The rest of the data would be extracted in blocks in smaller chunks over the two week period.
What I am going to do will work but I wanted to spend some time seeing if there might be a better way to approach the problem. Mainly looking for an algorithm to do the start/end date schedule for the daily extract.