Get just one row per ID from SQL query with specifc condition - sql

I'm using Postgres v>9.
I'd like to get values of a table like this:
id year value
1 2015 0.1
2 2015 0.2
6 2030 0.3
6 2015 0.4
6 2017 0.3
The idea is to get lines where years is < 2019 or year = 2030. If id is repeated, I´d like to get only 2030 line, not 2015 ones, that is, the result I´m looking for is:
id year value
1 2015 0.1
2 2015 0.2
6 2030 0.3
How can I do that?

This only considers the year 2030 or any year < 2019. At least that's what the question says. (I suspect there's something fuzzy there.)
It picks one row per id, with the latest year first.
SELECT DISTINCT ON (id) *
FROM tbl
ORDER BY id, year DESC
WHERE (year = 2030 OR year < 2019);
If there can be multiple rows with the same (id, year), you need a tiebreaker.
About this and more details for DISTINCT ON:
Select first row in each GROUP BY group?

Use distinct on if you want one row per id:
select distint on (id) t.*
from t
order by id, year desc;

SELECT ID,
FIRST_VALUE(YEAR) OVER (PARTITION BY ID ORDER BY YEAR DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS year,
FIRST_VALUE(Value) OVER (PARTITION BY ID ORDER BY YEAR DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS value
FROM t
WHERE YEAR = 2030 OR YEAR < 2019
I think this is the standard for first_value -- postgre might require a seperate clause?

Related

Dynamic average calculation

I want to add an average cost column which calculates the average across different time periods.
So in the example below, there are 6 months of cost, the first column finds the average across all 6 i.e. average(1,5,8,12,15,20)
The next "Half Period" column determines how many total periods there are and calculates the average across the most recent 3 periods i.e. average(12,15,20)
The first average is straightforward e.g.
AVG(COST)
What I've tried for the half period is:
AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN x PRECEDING AND CURRENT ROW)
The x is of course an integer value, how would I write the statement to automatically enter the integer required? i.e. in this example 6 periods requires 3 rows averaged, therefore x=2.
x can be found by some sub-query e.g.
SELECT ( CEILING(COUNT(PERIOD) / 2) - 1) FROM TABLE
Example table:
Period
Cost
Jan
1
Feb
5
Mar
8
Apr
12
May
15
Jun
20
Desired Output:
Period
Cost
All Time Average Cost
Half Period Average Cost
Jan
1
10.1
1
Feb
5
10.1
3
Mar
8
10.1
4.7
Apr
12
10.1
8.3
May
15
10.1
11.7
Jun
20
10.1
15.7
The main problem here is that you cannot use a variable or an expression for the number of rows Preceeding in the window expression, we must use a literal value for x in the following:
BETWEEN x PRECEDING
If there is a finite number of periods, then we can use a CASE statement to switch between the possible expressions:
CASE
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 1
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 2
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 3
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 4
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 5
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 5 PRECEDING AND CURRENT ROW)
WHEN CEILING(COUNT(PERIOD) / 2) - 1 <= 6
THEN AVG(COST) OVER (ORDER BY PERIOD ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
END as [Half Period Average Cost]
I added this step in SQL. But my window function denied taking the variable half_period_rounded. So we're not quite there yet. :-)
SQL query
This looks like a job for sneaky windowed function aggregates!
DECLARE #TABLE TABLE (SaleID INT IDENTITY, Cost DECIMAL(12,4), SaleDateTime DATETIME)
INSERT INTO #TABLE (SaleDateTime, Cost) VALUES
('2022-Jan-01', 1 ),
('2022-Feb-01', 5 ),
('2022-Mar-01', 8 ),
('2022-Apr-01', 12),
('2022-May-01', 15),
('2022-Jun-01', 20)
SELECT DISTINCT DATEPART(YEAR,SaleDateTime) AS Year, DATEPART(MONTH,SaleDateTime) AS MonthNumber, DATENAME(MONTH,SaleDateTime) AS Month,
AVG(Cost) OVER (ORDER BY (SELECT 1)) AS AllTimeAverage,
AVG(Cost) OVER (PARTITION BY DATEPART(YEAR,SaleDateTime), DATEPART(MONTH, SaleDateTime) ORDER BY SaleDateTime) AS MonthlyAverage,
AVG(Cost) OVER (PARTITION BY DATEPART(YEAR,SaleDateTime), DATEPART(QUARTER,SaleDateTime) ORDER BY SaleDateTime) AS QuarterlyAverage,
AVG(Cost) OVER (PARTITION BY CASE WHEN SaleDateTime BETWEEN CAST(DATEADD(MONTH,-1,DATEADD(DAY,1-DATEPART(DAY,SaleDateTime),SaleDateTime)) AS DATE)
AND CAST(DATEADD(MONTH,2,DATEADD(DAY,1-DATEPART(DAY,SaleDateTime),SaleDateTime)) AS DATE)
THEN 1 END ORDER BY SaleDateTime) AS RollingThreeMonthAverage
FROM #TABLE
ORDER BY DATEPART(YEAR,SaleDateTime), DATEPART(MONTH,SaleDateTime)
We're cheating here, and having the case expression find the rows we want in our rolling 3 month window. I've opted to keep it to a rolling window of last month, this month and next month (from the first day of last month, to the last day of next month - '2022-01-01 00:00:00' to '2022-04-01 00:00:00' for February).
Partitioning over the whole result set, month and quarter is straightforward, but the rolling three months isn't much more complicated when you turn it into a case expression describing it.
Year MonthNumber Month AllTimeAverage MonthlyAverage QuarterlyAverage RollingThreeMonthAverage
--------------------------------------------------------------------------------------------------------
2022 1 January 10.166666 1.000000 1.000000 1.000000
2022 2 February 10.166666 5.000000 3.000000 3.000000
2022 3 March 10.166666 8.000000 4.666666 4.666666
2022 4 April 10.166666 12.000000 12.000000 6.500000
2022 5 May 10.166666 15.000000 13.500000 8.200000
2022 6 June 10.166666 20.000000 15.666666 10.166666

Apply SUM( where date between date1 and date2)

My table is currently looking like this:
+---------+---------------+------------+------------------+
| Segment | Product | Pre_Date | ON_Prepaid |
+---------+---------------+------------+------------------+
| RB | 01. Auto Loan | 2020-01-01 | 10645976180.0000 |
| RB | 01. Auto Loan | 2020-01-02 | 4489547174.0000 |
| RB | 01. Auto Loan | 2020-01-03 | 1853117000.0000 |
| RB | 01. Auto Loan | 2020-01-04 | 9350258448.0000 |
+---------+---------------+------------+------------------+
I'm trying to sum values of 'ON_Prepaid' over the course of 7 days, let's say from '2020-01-01' to '2020-01-07'.
Here is what I've tried
drop table if exists ##Prepay_summary_cash
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 1 following and 7 following),
[2W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 8 following and 14 following),
[3W_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 15 following and 21 following),
[1M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 22 following and 30 following),
[1.5M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 31 following and 45 following),
[2M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 46 following and 60 following),
[3M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 61 following and 90 following),
[6M_Prepaid] = sum(ON_Prepaid) over (partition by SEGMENT, PRODUCT order by PRE_DATE rows between 91 following and 181 following)
into ##Prepay_summary_cash
from ##Prepay1
Things should be fine if the dates are continuous; however, there are some missing days in 'Pre_Date' (you know banks don't work on Sundays, etc.).
So I'm trying to work on something like
[1W] = SUM(ON_Prepaid) over (where Pre_date between dateadd(d,1,Pre_date) and dateadd(d,7,Pre_date))
something like that. So if per se there's no record on 2020-01-05, the result should only sum the dates on the 1,2,3,4,6,7 of Jan 2020, instead of 1,2,3,4,6,7,8 (8 because of "rows 7 following"). Or for example I have missing records over the span of 30 days or something, then all those 30 should be summed as 0s. So 45 days should return only the value of 15 days.
I've tried looking up all over the forum and the answers did not suffice. Can you guys please help me out? Or link me to a thread which the problem had already been solved.
Thank you so much.
Things should be fine if the dates are continuous
Then make them continuous. Left join your real data (grouped up so it is one row per day) onto your calendar table (make one, or use a recursive cte to generate you a list of 360 dates from X hence) and your query will work out
WITH d as
(
SELECT *
FROM
(
SELECT *
FROM cal
CROSS JOIN
(SELECT DISTINCT segment s, product p FROM ##Prepay1) x
) c
LEFT JOIN ##Prepay1 p
ON
c.d = p.pre_date AND
c.segment = p.segment AND
c.product = p.product
WHERE
c.d BETWEEN '2020-01-01' AND '2021-01-01' -- date range on c.d not c.pre_date
)
--use d.d/s/p not d.pre_date/segment/product in your query (sometimes the latter are null)
select *,
[1W_Prepaid] = sum(ON_Prepaid) over (partition by s, s order by d.d rows between 1 following and 7 following),
...
CAL is just a table with a single column of dates, one per day, no time, extending for n thousand days into the past/future
Wish to note that months have variable number of days so 6M is a bit of a misnomer.. might be better to call the month ones 180D, 90D etc
Also want to point out that your query performs a per row division of your data into into groups. If you want to perform sums up to 180 days after the date of the row you need to pull a year's worth of data so that on row 180(June) you have the December data available to sum (dec being 6 months from June)
If you then want to restrict your query to only showing up to June (but including data summed from 6 months after June) you need to wrap it all again in a sub query. You cannot "where between jan and jun" in the query that does the sum over because where clauses are done before window clauses (doing so will remove the dec data before it is summed)
Some other databases make this easier, Oracle and Postgres spring to mind; they can perform sum in a range where the other rows values are within some distance of the current row's values. SQL server only usefully supports distancing based on a row's index rather than its values (the distancing-based-on-values support is limited to "rows that have the same value", rather than "rows that have values n higher or lower than the current row"). I suppose the requirement could be met with a cross apply, or a coordinated sub in the select, though I'd be careful to check the performance..
SELECT *,
(SELECT SUM(tt.a) FROM x tt WHERE t.x = tt.x AND tt.y = t.y AND tt.z BETWEEN DATEADD(d, 1, t.z) AND DATEADD(d, 7, t.z) AS 1W
FROM
x t

How to count no of days per user on a rolling basis in Oracle SQL?

Following on from a previous question, in which i have a table called orders with information regarding the time an order was placed and who made that order.
order_timestamp user_id
-------------------- ---------
1-JUN-20 02.56.12 123
3-JUN-20 12.01.01 533
23-JUN-20 08.42.18 123
12-JUN-20 02.53.59 238
19-JUN-20 02.33.72 34
I would like to calculate a daily rolling count of the number of days a user made an order in a past 10 days.
For example, in the last 10 days from the 20th June, user 34 made an order on 5 of those days. Then in the last 10 days from the 21st June, user 34 made an order on 6 of those days
In the end the table should be like this:
date user_id no_of_days
----------- --------- ------------
20-JUN-20 34 5
20-JUN-20 123 10
20-JUN-20 533 2
20-JUN-20 238 3
21-JUN-20 34 6
21-JUN-20 123 10
How would the query be written for this kind of analysis?
Please let me know if my question is unclear/more infor is required.
Thanks to you in advancement.
You can use window functions for this. Start by getting one row per user per day. And then use a rolling sum:
select day, user_id,
count(*) over (partition by user_id range between interval '10' day preceding and current row)
from (select distinct trunc(order_timestamp) as day, user_id
from t
) t
Assuming that a user places one order a day maximum, you can use window functions as follows:
select
t.*,
count(*) over(partition by user_id order by trunc(order_timestamp) range 10 preceding) no_of_days
from mytable t
Otherwise, you can get the distinct orders per day first:
select
order_day,
user_id,
count(*) over(partition by user_id order by order_day range 10 preceding) no_of_days
from (select distinct trunc(order_timestamp) order_day, user_id from mytable) t

Oracle sql: Order by with GROUP BY ROLLUP

I'm looking everywhere for an answer but nothing seems to compare with my problem. So, using rollup with query:
select year, month, count (sale_id) from sales
group by rollup (year, month);
Will give the result like:
YEAR MONTH TOTAL
2015 1 200
2015 2 415
2015 null 615
2016 1 444
2016 2 423
2016 null 867
null null 1482
And I would like to sort by total desc, but I would like year with biggest total to be on top (important: with all records that compares to that year), and then other records for other years. So I would like it to look like:
YEAR MONTH TOTAL
null null 1482
2016 null 867
2016 1 444
2016 2 423
2015 null 615
2015 2 415
2015 1 200
Or something like that. Main purpose is to not "split" records comparing to one year while sorting it with total. Can somebody help me with that?
Try using window function max to get max of total for each year in the order by clause:
select year, month, count(sale_id) total
from sales
group by rollup(year, month)
order by max(total) over (partition by year) desc, total desc;
Hmmm. I think this does what you want:
select year, month, count(sale_id) as cnt
from sales
group by rollup (year, month)
order by sum(count(sale_id)) over (partition by year) desc, year;
Actually, I've never use window functions in an order by with a rollup query. I wouldn't be surprised if a subquery were necessary.
I think you need to used GROUPING SETS and GROUP_ID's. These will help you determine a NULL caused by a subtotal. Take a look at the doc: https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm

How to get top records of subset?

Say I have a table
StoreID TotalSales Month Year
-- ---------- ----- ----
1 10 1 2012
2 2 1 2012
3 15 1 2012
1 4 2 2012
2 5 2 2012
I need: For each unique "Month/Year", grab the top two StoreID's with the highest Sales.
I'm at a loss on how to do this. I tried with a cross apply but that doesn't seem to work. This is all way over my head so hopefully someone can give me a nudge in the right direction.
This query uses Common Table Expression and Window Function to be able to get all the columns within the row. It works on SQL Server 2005 and up
WITH records
AS
(
SELECT StoreID, TotalSales , Month, Year,
DENSE_RANK() OVER (PARTITION BY Month, Year
ORDER BY TotalSales DESC) rn
FROM tableName
)
SELECT StoreID, TotalSales , Month, Year
FROM records
WHERE rn <= 2
SQLFiddle Demo