Deduping to newest items in a time bucket for time series - kql

Some data gets logged every 10 hours for machines with an ID.
I want to do a timeplot with one point for every 24 hours.
How do I get the newest datapoint by ID for these 24 hour buckets?

Related

Select Data between current time - 15 mins and current time in SQL

I am looking to pull data between two time periods at only 15 to 30 mins apart. I want to be able to rerun the code multiple times to constantly update the data I had already pulled. I know there is a function for current system time but I am unable to use it effectively in SQL developer.
I have tried using the function CURRENT_TIMESTAMP but could not get it to work effectively.
Currently i am using the following code and just pulling over a broad time frame, but i would like to shrink that down to 15 to 30 minute intervals that could be used to continue to pull updated data.
I expect to be able to pull current data within 15 to 30 minute segments of time.

Finding number of users in each 15 minutes interval continuously for 6 years in hive

I have more than 100k users. Each user starts using their phone at different timestamp value and ends by different timestamp.
I want to know for each 15 minutes interval how many users are using their phone
Attached the sample input and required output format.

Understanding Fabric Daily Summary Email

We're having trouble understanding the numbers in the daily summary emails we get from Fabric.
A search on SO only shows one somewhat-related question/answer.
Here are the emails from 2 consecutive days:
Our questions are:
Does “Monthly Active” mean over the last 30 days? If so, how can there be a 36% drop in 1 day if the counts went from 101 to 93 (an 8% drop)?
Why does “Daily Active” show a 75% drop if the current day is 1 and the previous day was 0?
Why does “Total Sessions” show a 94% drop if the current day is 1 and the previous day was 0?
Does the “Time in App per User” mean the average for the month or for the prior day? If it's for the month, why would 1 extra session cause the value to change so much? If it's for the day, why does it show “11:33m” even though the Total Sessions was 0?
Sometimes the “Time in App per User” ends in an “m” and sometimes it ends in an “s”. For example, “11:33m” and “0:44s”. Does that mean that “11:33m” is “11 hours and 33 minutes” and “0:44s” is “0 minutes and 44 seconds”? Or does the “11:33m” still mean “11 minutes and 33 seconds” and I should ignore the suffix?
Thanks for reaching out. Todd from Fabric here. The % change is actually % difference vs. what we expected based on the previous behavior of your app. This compensates for day of week etc.
The long session when getting zero, suggests that either the session was live/not reported to us at UTC midnight. The session gets created on session start and the duration gets set at the end.
Thanks!

Has Google just changed their historical stock price interface (again)?

For years I've been using webpage requests like the following to retrieve 20 days at a time of minutewise stock data from Google:
http://www.google.com/finance/getprices?q=.INX&i=60&p=20d&f=d,c,h,l,o,v
= Retrieve for .INX (S&P 500 index) 60-second interval data for the last 20 days, with format Datetime(in Unix format), Close, High, Low, Open, Volume.
The Datetime is in Unix format (seconds since 1/1/1970, prefixed with an "A") for the first entry of each day, and subsequent entries show the intervals that have passed (so 1 = 60 seconds after the opening of the market that day).
That worked up until 9/10/2017, but today (9/17) it only returns day-end data (it even reports the "interval" between samples as 86400). Pooey! I can get that anywhere, in bulk.
But if I ask for fewer days, or broader intervals, it seems to return data - but weird data. Asking for data every 120 seconds returns exactly that - but only for every other market day. Weird!
Has anyone got a clue what might have happened?
Whoa! I think I figured it out.
Google still returns minutewise data for the same approximate limitations (up to 20 calendar days), but instead of d=10 returning all the market data for the last 10 calendar days, it return the data for the last 10 market days. Previously, to get the last 10 market days you would ask for d=14 (M-Fx2, plus two weekends). Now, Google interprets the d variable as market days, and asking for d=20 exceeds the limits on what they will deliver.
It now appears that d=15 is the limit (three weeks of market days). No clue on why I got the very weird every-other-day data for a while... but maybe if you exceed their d-limits the intervals get screwy. Dunno. Don't care. Easy fix.

Schedule algorithm for nightly SQL extract of data

I am looking for an algorithm to extract data from one system in to another but on a sliding scale. Here are the details:
Every two weeks, 80 weeks of data needs to be extracted.
Extracts take a long time and are resource intensive so we would like to distribute the load of the extract over time.
The first 8-12 weeks are the most important and need be updated more often over the two week window. Data further out can be updated less frequently to the point where the last 40 weeks+ could even just be extracted once every two weeks.
Every two weeks, the start date shifts two weeks ahead and so two new weeks are extracted.
Extract procedure takes a start and end date (this is already made and should be treated like a black box). The procedure could be run for multiple date spans in a day if required but contiguous dates are faster than multiple blocks of dates.
Extracts blocks should be no smaller than 2 weeks and probably no greater than 16 weeks. Longer blocks are possible but at 16 weeks are already a significant load to the system.
4 contiguous weeks of data takes about 1 hour approximately. It takes a long time because the data needs to be generated/calculated.
Data that is newly extracted replaces the old data for the timespan. No need to merge or diff the data, it is just replaced.
This algorithm needs to be built into a SQL job which will handle the daily process (triggered once a day only).
My initial thought was to create a sliding schedule pretty much. Rotate the first 4 week block every second day and then the second 4 week block every 3 to 4 days. The rest of the data would be extracted in blocks in smaller chunks over the two week period.
What I am going to do will work but I wanted to spend some time seeing if there might be a better way to approach the problem. Mainly looking for an algorithm to do the start/end date schedule for the daily extract.