Big Query SQL - Group into every n numbers - sql

I have a table that includes a column called minutes_since. It is an integer containing the number of minutes since a pre-defined event. Multiple rows maybe fall within the same minute.
I want to group and aggregate the rows into every n minutes. For example, I want to get the average of another column for all rows occurring within 5 minute intervals.
How could this be achieved in big query standard sql?

#standardSQL
SELECT
MIN(minutes_since) minute_start,
MAX(minutes_since) minute_end,
AVG(value) value_avg
FROM `project.dataset.table`
GROUP BY DIV(minutes_since - 1, 5)

Related

PL/SQL check time period, repeat, up until 600 records, from large database

What would be the best way to check if there has been data within a 3 month period up until a maximum of 600 records, then repeat for the 3 months before that if 600 hasn't been reached? Also it's a large table so querying the whole thing could take a few minutes or completely hang Oracle SQL Developer.
ROWNUM seems to give row numbers to the whole table before returning the result of the query, so that seems to take too long. The way we are currently doing it is entering a time period explicitly that we guess there will be enough records within and then limiting the rows to 600. This only takes 5 seconds, but needs to be changed constantly.
I was thinking to do a FOR loop through each row, but am having trouble storing the number of results outside of the query itself to check whether or not 600 has been reached.
I was also thinking about creating a data index? But I don't know much about that. Is there a way to sort the data by date before grabbing the whole table that would be faster?
Thank you
check if there has been data within a 3 month period up until a maximum of 600 records, then repeat for the 3 months before that if 600 hasn't been reached?
Find the latest date and filter to only allow the rows that are within 6 months of it and then fetch the first 600 rows:
SELECT *
FROM (
SELECT t.*,
MAX(date_column) OVER () AS max_date_column
FROM table_name t
)
WHERE date_column > ADD_MONTHS( max_date_column, -6 )
ORDER BY date_column DESC
FETCH FIRST 600 ROWS ONLY;
If there are 600 or more within the latest 3 months then they will be returned; otherwise it will extend the result set into the next 3 month period.
If you intend to repeat the extension over more than two 3-month periods then just use:
SELECT *
FROM table_name
ORDER BY date_column DESC
FETCH FIRST 600 ROWS ONLY;
I was also thinking about creating a data index? But I don't know much about that. Is there a way to sort the data by date before grabbing the whole table that would be faster?
Yes, creating an index on the date column would, typically, make filtering the table faster.

Redshift - How to SUM number over last 4 weeks as a window function per row?

is it possible to SUM a number over a special time period in Amazon Redshift with a WINDOW-Function?
As an example I'm counting login numbers for different companies per day.
What I now want per row is, that it sums up the logins over the last 4 weeks (referenced by the date of the row): The field which I'm serarching for is marked yellow in the screenshot.
Thanks in advance for your help.
If you have data for each day, then you can use rows:
select t.*,
sum(logs) over (partition by company
order by date
rows between 27 preceding and current row
) as logins_4_weeks
from t;
Redshift does not yet support range for the window frame, so this is your best bet.

SQL query that calculates historical average and checks if current value is greater multiple than 3

I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3. My current query is:
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
where Multiple > 3
and SqlUnixTime > 1492225582
group by ID_BB_RT;
I have a table cdsData and the unix time is april 15th converted. Finally I want the group by calculated within the ID as I show. I'm not sure why it's failing but it says that the field Multiple is unknown in the where clause.
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3.
I think your intention is correctly stated as follows, "I am trying to calculate the average since the last time stamp and select all rows where the average is greater than 3 times the individual bid".
In fact, a still better restatement of your objective would be, "I want to select all rows since the last time stamp, where the bid is less than 1/3rd the average bid".
For this, the steps are as follows:
1) A sub-query finds the average bid divided by 3, of rows since the last time stamp.
2) The outer query selects rows since the last time stamp, where the individual bid is < the value returned by the sub-query.
The following SQL statement does that:
SELECT BID
FROM cdsData
WHERE SqlUnixTime > 1492225582
AND BID <
(
SELECT AVG(BID) / 3
FROM cdsData
WHERE SqlUnixTime > 1492225582
)
ORDER BY BID;
1)
SQL is evaluated backwards, from right to left. So the where clause is parsed and evaluate prior to the select clause. Because of this the aliasing of AVG(BID)/BID to Multiple has not yet occurred.
You can try this.
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
WHERE SqlUnixTime > 1492225582
GROUP BY ID_BB_RT Having (AVG(BID)/BID)>3 ;
Or
Select Multiple
From (SELECT AVG(BID)/BID AS Multiple
FROM cdsData
Where SqlUnixTime > 1492225582 group by ID_BB_R)X
Where multiple >3
2)
Once you corrected the above error, you will be having one more error:
Column 'BID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
To correct this you have to insert BID column in group by clause.

Get the number of records from 2 columns where the time is overlapping

I am new to MS ACCESS and am having trouble trying to get the number of records from overlapping time ranges. This is an example of my data.
example of raw data
I am trying to do is to get the column number_of_records. For example, if there are 4 records added at 5.11, the number_of_records should become 8 as 4 records are added at 5.10.
example of raw data with no_of_records column
There is a mistake in my image above. I forgot to mention that for example, if the time hits 6:00, the number of records should not add on to the previous records and should start afresh.
Do any of you have any suggestions?
Consider the correlated count subquery:
SELECT t.time_column_1, t.time_column_2,
(SELECT Count(*) FROM myTable sub
WHERE sub.time_column_1 <= t.time_column_1
AND sub.time_column_2 = t.time_column_2) AS number_of_records
FROM mytable t
ORDER BY t.time_column_2, t.time_column_1

Running a complex loop query in PostgreSQL

I have one problem in PostgreSQL.
This is my table (this table does not showing all data in image).
What is my requirement is:
Step 1 : find count of value (this is a column in table) Order by value for today date. So it will be like this and I did it.
Step 2 : find count of value for last 30 days starting from today. I am stuck here. Also one another thing is included in this step --
Example : today has 10 count for a value - kash, this will be 10x30,
yesterday had 4 count for the same value , so will be 4x29, so the total sum would be
(10x30) + (4x29) = 416.
This calculation is calculated for each and every value.
This loop execute for 30 times (as I said before last 30 days starting from today). Take today as thirtieth day.
Query will just need to return two columns with value and sum, ordered by the sum.
Add a WHERE clause to your existing query:
WHERE Timestamp > current_date - interval '30' day;
As far as ordering by the sum, add an ORDER BY clause.
ORDER BY COUNT(*) DESC.
I do not believe that you will need a loop (CURSOR) for this query.