Google BigQuery select max for each day - sql

I have a problem with my BigQuery select, I need to get the max value for the column (students) for each day.
SELECT EXTRACT(DATE FROM timestamp) as date, ARRAY_LENGTH(student_ids) as students from analytics.daily_active_students_count order by timestamp desc
Row
Date
Students
1
2022-05-16
72
2
2022-05-16
33
3
2022-05-16
12
4
2022-05-15
10
5
2022-05-15
84
6
2022-05-15
8
7
2022-05-14
92
8
2022-05-14
105
9
2022-05-14
12
Query should remove duplicated rows for days and take only rows with max number of students.
I want my output looks like this:
Row
Date
Students
1
2022-05-16
72
2
2022-05-15
84
3
2022-05-14
105

Problem was that my backend used different timezone. I solved this issue with casting timestamp:
DATE(DATETIME(TIMESTAMP, "Europe/Zagreb")) as date

Related

SQL: how to average across groups, while taking a time constraint into account

I have a table named orders in a Postgres database that looks like this:
customer_id order_id order_date price product
1 2 2021-03-05 15 books
1 13 2022-03-07 3 music
1 14 2022-06-15 900 travel
1 11 2021-11-17 25 books
1 16 2022-08-03 32 books
2 4 2021-04-12 4 music
2 7 2021-06-29 9 music
2 20 2022-11-03 8 music
2 22 2022-11-07 575 travel
2 24 2022-11-20 95 food
3 3 2021-03-17 25 books
3 5 2021-06-01 650 travel
3 17 2022-08-17 1200 travel
3 19 2022-10-02 6 music
3 23 2022-11-08 70 food
4 9 2021-08-20 3200 travel
4 10 2021-10-29 2750 travel
4 15 2022-07-15 1820 travel
4 21 2022-11-05 8000 travel
4 25 2022-11-29 27 books
5 1 2021-01-04 3 music
5 6 2021-06-09 820 travel
5 8 2021-07-30 19 books
5 12 2021-12-10 22 music
5 18 2022-09-19 20 books
Here's a SQL Fiddle: http://sqlfiddle.com/#!17/262fc/1
I'd like to return the average money spent by customers per product, but only consider orders within the first 12 months of a given customer's first purchase within the given product group. (yes, this is challenging!)
For example, for customer 1, order ID 2 and order ID 11 would be factored into the average for books(because order ID 11 took place less than 12 months after customer 1's first order for books, which was order ID 2), but order ID 16 would not be factored into the average (because 8/3/22 is more than 12 months from customer 1's first purchase for books, which took place on 3/5/21).
Here is a matrix showing which orders would be included within a given product (denoted by "yes"):
The desired output would look as follows:
average_spent
books 22.20
music 7.83
travel 1530.71
food 82.50
How would I do this?
Thanks in advance for any assistance you can give!
You can use a subquery to check whether or not to include a product's price in the summation:
select o.product, sum(o.price)/count(*) val from orders o
where o.order_date < (select min(o1.order_date) from orders o1 where
o1.product = o.product and o.user_id = o1.user_id) + interval '12 months'
group by o.product
See fiddle

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

I have a large access table that I need to pull specific data from with a query.
I need to get a list of all the IDs that meet a specific criteria, i.e. 3 months in a row with a cage number less than 50.
The SQL code I'm currently working with is below, but it only gives me which months of the past 3 had a cage number below 50.
SELECT [AbBehWeeklyMonitor Database].AnimalID, [AbBehWeeklyMonitor Database].Date, [AbBehWeeklyMonitor Database].Cage
FROM [AbBehWeeklyMonitor Database]
WHERE ((([AbBehWeeklyMonitor Database].Date)>=DateAdd("m",-3,Date())) AND (([AbBehWeeklyMonitor Database].Cage)<50))
ORDER BY [AbBehWeeklyMonitor Database].AnimalID DESC;
I would need it to look at the past 3 months for each ID, and only output if all 3 met the specific criteria, but I'm not sure where to go from here.
Any help would be appreciated.
Data Sample:
Date
AnimalID
Cage
6/28/2022
12345
50
5/19/2021
12345
32
3/20/2008
12345
75
5/20/2022
23569
4
8/20/2022
23569
4
5/20/2022
44444
71
8/1/2012
44444
4
4/1/2022
78986
30
1/20/2022
78986
1
9/14/2022
65659
59
8/10/2022
65659
48
7/14/2022
65659
30
6/14/2022
95659
12
8/14/2022
91111
51
7/14/2022
91111
5
6/14/2022
91111
90
8/14/2022
88888
4
7/14/2022
88888
5
6/14/2022
88888
15
Consider:
Query1:
SELECT AnimalID, Count(*) AS Cnt
FROM Table1
WHERE (((Cage)<50) AND (([Date]) Between #6/1/2022# And #8/31/2022#))
GROUP BY AnimalID
HAVING (((Count(*))=3));
Query2
SELECT Table1.*
FROM Query1 INNER JOIN Table1 ON Query1.AnimalID = Table1.AnimalID
WHERE ((([Date]) Between #6/1/2022# And #8/31/2022#));
Output:
Date AnimalID Cage
6/14/2022 65659 12
7/14/2022 65659 30
8/10/2022 65659 48
6/14/2022 88888 15
7/14/2022 88888 5
8/14/2022 88888 4
Date is a reserved word and really should not use reserved words as names.

In Azure Data bricks I want to get start dates of every week with week numbers from datetime column

This is a sample Data Frame
Date Items_Sold
12/29/2019 10
12/30/2019 20
12/31/2019 30
1/1/2020 40
1/2/2020 50
1/3/2020 60
1/4/2020 35
1/5/2020 56
1/6/2020 34
1/7/2020 564
1/8/2020 6
1/9/2020 45
1/10/2020 56
1/11/2020 45
1/12/2020 37
1/13/2020 36
1/14/2020 479
1/15/2020 47
1/16/2020 47
1/17/2020 578
1/18/2020 478
1/19/2020 3578
1/20/2020 67
1/21/2020 578
1/22/2020 478
1/23/2020 4567
1/24/2020 7889
1/25/2020 8999
1/26/2020 99
1/27/2020 66
1/28/2020 678
1/29/2020 889
1/30/2020 990
1/31/2020 58585
2/1/2020 585
2/2/2020 555
2/3/2020 56
2/4/2020 66
2/5/2020 66
2/6/2020 6634
2/7/2020 588
2/8/2020 2588
2/9/2020 255
I am running this query
%sql
use my_items_table;
select weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date)=2020
group by weekofyear(Date)
order by weekofyear(Date)
I am getting this output. (IMP: I have added random values in Sum)
Week Sum
1 | 300091
2 | 312756
3 | 309363
4 | 307312
5 | 310985
6 | 296889
7 | 315611
But I want in which with week number one column should hold a start date of each week. Like this
Start_Date Week Sum
12/29/2019 1 300091
1/5/2020 2 312756
1/12/2020 3 309363
1/19/2020 4 307312
1/26/2020 5 310985
2/2/2020 6 296889
2/9/2020 7 315611
I am running the query on Azure Data Bricks.
If you have data for all days, then just use min():
select min(date), weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date) = 2020
group by weekofyear(Date)
order by weekofyear(Date);
Note: The year() is the calendar year starting on Jan 1. You are not going to get dates from other years using this query. If that is an issue, I would suggest that you ask a new question asking how to get the first day for the first week of the year.

PostgreSQL select only rows whose dates match a specific number of the week in a table

I have a table that looks like this for a span of many years:
dump_time | group_id | client_count
---------------------+----------+--------------
2014-10-21 19:45:00 | 145 | 74
2014-10-21 19:45:00 | 131 | 279
2014-10-21 19:45:00 | 139 | 49
where dump_time is of type 'timestamp without time zone'.
I want to select only rows that match a specific week of the year and a specific day of the week. For instance, I want all rows that are 3rd day of the 15th week of the year. Any idea on how I could do this? I've explored the EXTRACT command, but haven't quite figured it out.
Thanks!
select *
from testme
where extract(week from dump_time) = 15
and extract(dow from dump_time) = 3