SQL Query to keep a running lowest value - sql

I have a table with 2 fields:
Period Time
1 4562
2 4555
3 4570
4 4558
5 4550
6 4570
7 4565
8 4545
9 4550
10 4560
For each period I would like to keep the lowest time in another field so the table would look like this:
Period Time Lowest
1 4562 4562
2 4555 4555
3 4570 4555
4 4558 4555
5 4550 4550
6 4570 4550
7 4565 4550
8 4545 4545
9 4550 4545
10 4560 4545
Thanks

You want a cumulative minimum. You can use ISO/ANSI standard window functions:
select t.*,
min(time) over (order by id) as lowest
from t;

Try the following query
select *,min(time) over (order by period) from YOURTABLENAME
SQL Server 2014

Related

How to get top values when there is a tie

I am having difficulty figuring out this dang problem. From the data and queries I have given below I am trying to see the email address that has rented the most movies during the month of September.
There are only 4 relevant tables in my database and they have been anonymized and shortened:
Table "cust":
cust_id
f_name
l_name
email
1
Jack
Daniels
jack.daniels#google.com
2
Jose
Quervo
jose.quervo#yahoo.com
5
Jim
Beam
jim.beam#protonmail.com
Table "rent"
inv_id
cust_id
rent_date
10
1
9/1/2022 10:29
11
1
9/2/2022 18:16
12
1
9/2/2022 18:17
13
1
9/17/2022 17:34
14
1
9/19/2022 6:32
15
1
9/19/2022 6:33
16
3
9/1/2022 18:45
17
3
9/1/2022 18:46
18
3
9/2/2022 18:45
19
3
9/2/2022 18:46
20
3
9/17/2022 18:32
21
3
9/19/2022 22:12
10
2
9/19/2022 11:43
11
2
9/19/2022 11:42
Table "inv"
mov_id
inv_id
22
10
23
11
24
12
25
13
26
14
27
15
28
16
29
17
30
18
31
19
31
20
32
21
Table "mov":
mov_id
titl
rate
22
Anaconda
3.99
23
Exorcist
1.99
24
Philadelphia
3.99
25
Quest
1.99
26
Sweden
1.99
27
Speed
1.99
28
Nemo
1.99
29
Zoolander
5.99
30
Truman
5.99
31
Patient
1.99
32
Racer
3.99
and here is my current query progress:
SELECT cust.email,
COUNT(DISTINCT inv.mov_id) AS "Rented_Count"
FROM cust
JOIN rent ON rent.cust_id = cust.cust_id
JOIN inv ON inv.inv_id = rent.inv_id
JOIN mov ON mov.mov_id = inv.mov_id
WHERE rent.rent_date BETWEEN '2022-09-01' AND '2022-09-31'
GROUP BY cust.email
ORDER BY "Rented_Count" DESC;
and here is what it outputs:
email
Rented_Count
jack.daniels#google.com
6
jim.beam#protonmail.com
6
jose.quervo#yahoo.com
2
and what I want it to be outputting:
email
jack.daniels#google.com
jim.beam#protonmail.com
From the results I am actually getting I have a tie for first place (Jim and Jack) and that is fine but I would like it to list both tieing email addresses not just Jack's so you cant do anything with rows or max I don't think.
I think it must have something to do with dense_rank but I don't know how to use that specifically in this scenario with the count and Group By?
Your creativity and help would be appreciated.
You're missing the FETCH FIRST ROWS WITH TIES clause. It will work together with the ORDER BY clause to get you the highest values (FIRST ROWS), including ties (WITH TIES).
SELECT cust.email
FROM cust
INNER JOIN rent
ON rent.cust_id = cust.cust_id
INNER JOIN inv
ON inv.inv_id = rent.inv_id
INNER JOIN mov
ON mov.mov_id = inv.mov_id
WHERE rent.rent_date BETWEEN '2022-09-01' AND '2022-09-31'
GROUP BY cust.email
ORDER BY COUNT(DISTINCT inv.mov_id) DESC
FETCH FIRST 1 ROWS WITH TIES

SQL: how to average across groups, while taking a time constraint into account

I have a table named orders in a Postgres database that looks like this:
customer_id order_id order_date price product
1 2 2021-03-05 15 books
1 13 2022-03-07 3 music
1 14 2022-06-15 900 travel
1 11 2021-11-17 25 books
1 16 2022-08-03 32 books
2 4 2021-04-12 4 music
2 7 2021-06-29 9 music
2 20 2022-11-03 8 music
2 22 2022-11-07 575 travel
2 24 2022-11-20 95 food
3 3 2021-03-17 25 books
3 5 2021-06-01 650 travel
3 17 2022-08-17 1200 travel
3 19 2022-10-02 6 music
3 23 2022-11-08 70 food
4 9 2021-08-20 3200 travel
4 10 2021-10-29 2750 travel
4 15 2022-07-15 1820 travel
4 21 2022-11-05 8000 travel
4 25 2022-11-29 27 books
5 1 2021-01-04 3 music
5 6 2021-06-09 820 travel
5 8 2021-07-30 19 books
5 12 2021-12-10 22 music
5 18 2022-09-19 20 books
Here's a SQL Fiddle: http://sqlfiddle.com/#!17/262fc/1
I'd like to return the average money spent by customers per product, but only consider orders within the first 12 months of a given customer's first purchase within the given product group. (yes, this is challenging!)
For example, for customer 1, order ID 2 and order ID 11 would be factored into the average for books(because order ID 11 took place less than 12 months after customer 1's first order for books, which was order ID 2), but order ID 16 would not be factored into the average (because 8/3/22 is more than 12 months from customer 1's first purchase for books, which took place on 3/5/21).
Here is a matrix showing which orders would be included within a given product (denoted by "yes"):
The desired output would look as follows:
average_spent
books 22.20
music 7.83
travel 1530.71
food 82.50
How would I do this?
Thanks in advance for any assistance you can give!
You can use a subquery to check whether or not to include a product's price in the summation:
select o.product, sum(o.price)/count(*) val from orders o
where o.order_date < (select min(o1.order_date) from orders o1 where
o1.product = o.product and o.user_id = o1.user_id) + interval '12 months'
group by o.product
See fiddle

Google BigQuery select max for each day

I have a problem with my BigQuery select, I need to get the max value for the column (students) for each day.
SELECT EXTRACT(DATE FROM timestamp) as date, ARRAY_LENGTH(student_ids) as students from analytics.daily_active_students_count order by timestamp desc
Row
Date
Students
1
2022-05-16
72
2
2022-05-16
33
3
2022-05-16
12
4
2022-05-15
10
5
2022-05-15
84
6
2022-05-15
8
7
2022-05-14
92
8
2022-05-14
105
9
2022-05-14
12
Query should remove duplicated rows for days and take only rows with max number of students.
I want my output looks like this:
Row
Date
Students
1
2022-05-16
72
2
2022-05-15
84
3
2022-05-14
105
Problem was that my backend used different timezone. I solved this issue with casting timestamp:
DATE(DATETIME(TIMESTAMP, "Europe/Zagreb")) as date

In Azure Data bricks I want to get start dates of every week with week numbers from datetime column

This is a sample Data Frame
Date Items_Sold
12/29/2019 10
12/30/2019 20
12/31/2019 30
1/1/2020 40
1/2/2020 50
1/3/2020 60
1/4/2020 35
1/5/2020 56
1/6/2020 34
1/7/2020 564
1/8/2020 6
1/9/2020 45
1/10/2020 56
1/11/2020 45
1/12/2020 37
1/13/2020 36
1/14/2020 479
1/15/2020 47
1/16/2020 47
1/17/2020 578
1/18/2020 478
1/19/2020 3578
1/20/2020 67
1/21/2020 578
1/22/2020 478
1/23/2020 4567
1/24/2020 7889
1/25/2020 8999
1/26/2020 99
1/27/2020 66
1/28/2020 678
1/29/2020 889
1/30/2020 990
1/31/2020 58585
2/1/2020 585
2/2/2020 555
2/3/2020 56
2/4/2020 66
2/5/2020 66
2/6/2020 6634
2/7/2020 588
2/8/2020 2588
2/9/2020 255
I am running this query
%sql
use my_items_table;
select weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date)=2020
group by weekofyear(Date)
order by weekofyear(Date)
I am getting this output. (IMP: I have added random values in Sum)
Week Sum
1 | 300091
2 | 312756
3 | 309363
4 | 307312
5 | 310985
6 | 296889
7 | 315611
But I want in which with week number one column should hold a start date of each week. Like this
Start_Date Week Sum
12/29/2019 1 300091
1/5/2020 2 312756
1/12/2020 3 309363
1/19/2020 4 307312
1/26/2020 5 310985
2/2/2020 6 296889
2/9/2020 7 315611
I am running the query on Azure Data Bricks.
If you have data for all days, then just use min():
select min(date), weekofyear(Date), count(items_sold) as Sum
from my_items_table
where year(Date) = 2020
group by weekofyear(Date)
order by weekofyear(Date);
Note: The year() is the calendar year starting on Jan 1. You are not going to get dates from other years using this query. If that is an issue, I would suggest that you ask a new question asking how to get the first day for the first week of the year.

Get record with maximum value and show other fields from that record

I am trying to show the important dates for a manufacturing process. There are 10 rooms performing the same process. Each time the process starts over a new cycle number is assigned. I want to show the important dates for the current (i.e. maximum) cycle in each room.
So far I have put together a query that will show the important dates for the maximum cycle number overall (my code is below), but I want to add an additional criterion so that I see the information for the maximum cycle number in each room
SELECT
[dbo_batch_overview5].[rm],
[dbo_batch_overview5].[cyc],
[dbo_batch_overview5].[bpr],
[dbo_batch_overview5].[plug_date],
[dbo_batch_overview5].[trig_date],
[dbo_batch_overview5].[flush_date],
[dbo_batch_overview5].[harv_date]
FROM dbo_batch_overview5
WHERE ((([dbo_batch_overview5].[cyc])=(SELECT Max([dbo_batch_overview5].[cyc])
FROM [dbo_batch_overview5]
)));
I think I need to add a GROUP BY statement to specify that I want to see the maximum cycle number for each unique entry in the room [rm] field, here is the code with my attempt at the statement I think I need included:
SELECT
[dbo_batch_overview5].[rm],
[dbo_batch_overview5].[cyc],
[dbo_batch_overview5].[bpr],
[dbo_batch_overview5].[plug_date],
[dbo_batch_overview5].[trig_date],
[dbo_batch_overview5].[flush_date],
[dbo_batch_overview5].[harv_date]
FROM dbo_batch_overview5
WHERE ((([dbo_batch_overview5].[cyc])=(SELECT Max([dbo_batch_overview5].[cyc])
FROM [dbo_batch_overview5]
GROUP BY [dbo_batch_overview5].[rm]
)));
When I try the above code I get an error saying that my subquery is returning more than one value. Can anyone tell me what I'm doing wrong?
As requested, here is some sample data
rm cyc bpr clone_date plug_date trig_date harv_date
1 13 20161031-OP 10/31/2016 11/16/2016 11/22/2016 1/12/2017
1 13 20161101-EV 11/1/2016 11/16/2016 11/22/2016 1/13/2017
1 13 20161031-CG 10/31/2016 11/16/2016 11/22/2016 1/13/2017
1 13 20161101-CB 11/1/2016 11/16/2016 11/22/2016 1/12/2017
1 13 20161031-VO 10/31/2016 11/16/2016 11/22/2016 1/13/2017
1 14 20170104-CG 1/4/2017 1/23/2017 1/28/2017
1 14 20170104-CB 1/4/2017 1/23/2017 1/28/2017
1 14 20170106-AV 1/6/2017 1/23/2017 1/28/2017
1 14 20170106-MN 1/6/2017 1/23/2017 1/28/2017
2 7 20150925-ST 9/25/2015 10/10/2015 10/19/2015 12/16/2015
2 7 20150924-AL 9/24/2015 10/10/2015 10/19/2015 12/16/2015
2 7 20150924-EA 9/24/2015 10/10/2015 10/19/2015 12/21/2015
2 7 20150928-LM 9/28/2015 10/10/2015 10/19/2015 12/22/2015
2 7 20150928-HM 9/28/2015 10/10/2015 10/19/2015 12/19/2015
2 8 20151214-CG 12/14/2015 12/30/2015 1/7/2016 3/14/2016
2 8 20151214-RM 12/14/2015 12/30/2015 1/7/2016 3/15/2016
2 8 20151215-CB 12/15/2015 12/30/2015 1/7/2016 3/8/2016
In the above example, I would only want to see the records associated with cycle 14 in room 1 and cycle 8 in room 2
Would something like this solve your problem?
SELECT
a.[rm],
a.[cyc],
a.[bpr],
a.[plug_date],
a.[trig_date],
a.[flush_date],
a.[harv_date]
FROM dbo_batch_overview5 a
INNER JOIN (SELECT Max([cyc]) AS maxcyc,
rm as rm2
FROM [dbo_batch_overview5]
GROUP BY [rm])c
ON a.rm = c.rm2 AND a.cyc = c.maxcyc