SQL select row containing all of the values in interval - sql

I know the question is poorly worded, I'm sorry, I can't really put this problem into words. Here is a representation:
I have two tables: product and availability. A product can have multiple dates when it's available. Example:
Table 1 (products):
id | name | ....
----------------------------------
1 | My product 1 | ....
2 | My product 2 | ....
Table 2 (availability):
id | productId | date
-----------------------------------------
1 | 1 | 2021-01-15
2 | 1 | 2021-01-16
3 | 1 | 2021-01-17
4 | 2 | 2021-01-15
5 | 2 | 2021-01-16
Is there an sql statement that, given an interval, allows us to fetch a list of products having a row in the availabilty table for each element of the interval?
For example, given the interval [2021-01-15 -> 2021-01-17], the request should return product 1 because it's available during the entire period (it has a row for each element: the 15th, 16th and 17th). Product2 isn't returned because it's not available on 2021-01-17.
Is there a way to do this in SQL or do I have to use PL/SQL?
Any help is appreciated,
Thanks

You can use analytical function as follows:
select p.* from
(select p.*, count(distinct a.date) over (partition by a.productid) as cnt
from products p
join availability a on a.productid = p.id
where a.date >= date '201-01-15'
and a.date < date '201-01-17' + 1 )
where cnt = date '201-01-17' - date '201-01-15' + 1

Finally, came up with this, thanks #Popeye for the inspiration.
select occurence.pid from
(
select a.product_id as pid, count(distinct a.date::date) as cnt
from availability a
where a.date >= '2021-01-15'
and a.date < '2021-01-17'::date + 1
group by a.product_id
) as occurence
where cnt = '2021-01-17'::date - '2021-01-15'::date + 1;

Related

Aggregate functions based on current Row value

I am working with data similar to below,
week | product | sale
1 | ABC | 2
1 | ABC | 1
2 | ABC | 1
3 | ABC | 5
4 | ABC | 1
2 | DEF | 5
Let us say that is my Orders table named tblOrders. Now, in each row, I want to aggregate the total sales from last week for that product - for instance, if I am on week 2 of product "ABC", I need to show the aggregated sales amount of week 1 for product ABC. so, the output should look something like below,
week | product | sale | ProductPreviousWeekSales
1 | ABC | 2 | 0
1 | ABC | 1 | 0
2 | ABC | 1 | 3
3 | ABC | 5 | 1
4 | ABC | 1 | 5
2 | DEF | 5 | 0
I was originally thinking I could solve this using Aggregates and Window Function, but doesn't look to be so. Another thought I was having is to use Conditional Aggregate - something like sum(case when x=currentRow.x then sale else 0 end), but that wouldn't work too.
Here is the SQLFiddle for above sample - http://sqlfiddle.com/#!18/890b7/2
Note: I need to calculate similar value for Last 4 weeks, so trying to avoid doing this as a sub-query or multiple joins (if possible), as the data set I am working with is very large, and don't want to add to much performance overhead trying to incorporate this change.
Here is one approach which first aggregates your table in a separate CTE and uses LAG to find the previous week's amount, for each week and product:
WITH cte AS (
SELECT week, product,
LAG(SUM(sale)) OVER (PARTITION BY product ORDER BY week) AS lag_total_sales
FROM yourTable
GROUP BY week, product
)
SELECT t1.week, t1.product, t1.sale,
COALESCE(t2.lag_total_sales, 0) AS ProductPreviousWeekSales
FROM yourTable t1
INNER JOIN cte t2
ON t2.week = t1.week AND
t2.product = t1.product
ORDER BY
t1.product,
t1.week;
Demo
DISCLAIMER
The query I am showing below doesn't work in SQL Server, unfortunately. Up to SQL Server version 2019 the DBMS lacks full support of the RANGE clause that is essential for the query to work. Running the query in SQL Server results in
Msg 4194 Level 16 State 1 Line 1 RANGE is only supported with UNBOUNDED and CURRENT ROW window frame delimiters.
I am not deleting this answer, because this is standard SQL and the approach may help future readers. It runs fine in a lot of DBMS, and maybe a future version of SQL Server will be able to deal with this, too. I've added demos to show that it runs in PostgreSQL, MySQL and Oracle, but fails in SQL Server 2019.
ORIGINAL ANSWER
Your query shown in the fiddle (select a.*, sum(sale) over(partition by product) ProductPreviousWeekSales from tblOrder a) is merely lacking the appropriate windowing clause. As you are dealing with ties here (more than one row per product and week) this needs to be a RANGE clause:
select a.*,
sum(sale) over(partition by product
order by week range between 1 preceding and 1 preceding
) as ProductPreviousWeekSales
from tblOrder a
order by product, week;
(Use COALESCE if you want to see a zero instead of NULL.)
Demos:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=149eddbff82500d539b2c615f4167cff
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a8453970efac08ad69275914910bb13e
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=64ed21150142caa0acb7f8c7ca7d9022
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=149eddbff82500d539b2c615f4167cff
You can do from following
; WITH cteorder AS
(
SELECT DISTINCT product, week FROM dbo.tblOrder
)
SELECT
cte.*,
SUM(ISNULL(b.sale,0)) ProductPreviousWeekSales
from tblOrder a
INNER JOIN cteorder cte ON cte.product = a.product AND cte.week = a.week
LEFT JOIN dbo.tblOrder b ON b.product = cte.product AND b.week = (a.week-1)
GROUP BY cte.product,
cte.week
You can run from : Fiddle
You need to select from TblOrders twice. Once, grouping by week and product and summing the sales, and the second time, a row-by-row scan against TblOrders, left-joining it with the grouping query on same product and week offset by 1:
If the join fails , the sales value of the joined grouping query returns NULL. You can put in 0 instead of NULL using COALESCE(), but ISNULL() has all chances of being faster, as it has a fixed number of parameters, while COALESCE() has a variable argument list, which comes at a certain cost.
WITH
tblorders(wk,product,sales) AS (
SELECT 1,'ABC',2
UNION ALL SELECT 1,'ABC',1
UNION ALL SELECT 2,'ABC',1
UNION ALL SELECT 3,'ABC',5
UNION ALL SELECT 4,'ABC',1
UNION ALL SELECT 2,'DEF',5
)
,
grp AS (
SELECT
wk
, product
, SUM(sales) AS sales
FROM tblorders
GROUP BY
wk
, product
)
SELECT
o.wk
, o.product
, o.sales
, ISNULL(g.sales,0) AS productpreviousweeksales
FROM tblorders o
LEFT
JOIN grp g
ON o.wk - 1 = g.wk
AND o.product= g.product
ORDER BY 2,1
;
wk | product | sales | productpreviousweeksales
----+---------+-------+--------------------------
1 | ABC | 2 | 0
1 | ABC | 1 | 0
2 | ABC | 1 | 3
3 | ABC | 5 | 1
4 | ABC | 1 | 5
2 | DEF | 5 | 0

SQL Performance Inner Join

Let me ask you something I've been thinking about for a while. Imagine that you have two tables with data:
MAIN TABLE (A)
| ID | Date |
|:-----------|------------:|
| 1 | 01-01-1990|
| 2 | 01-01-1991|
| 3 | 01-01-1992|
| 4 | 01-01-2000|
| 5 | 01-01-2001|
| 6 | 01-01-2003|
SECONDARY TABLE (B)
| ID | Date | TOTAL |
|:-----------|------------:|--------:|
| 1 | 01-01-1990| 1 |
| 2 | 01-01-1991| 2 |
| 3 | 01-01-1992| 1 |
| 4 | 01-01-2000| 5 |
| 5 | 01-01-2001| 8 |
| 6 | 01-01-2003| 7 |
and you want to select only ID with date greater than 31-12-1999 and get the following columns: ID, Date and Total. For that we have many options but my question would be which of the following would be better in terms of performance:
OPTION 1
With main as(
select id,
date
from A
where date > '31-12-1999'
)
select main.id,
main.date,
B.total
from main inner join B on main.id = b.id
OPTION 1
With main as(
select id,
date
from A
where date > '31-12-1999'
),
secondary as (
select id,
total
from B
where date > '31-12-1999'
)
select main.id,
main.date,
secondary.total
from main inner join secondary on main.id = b.id
Which of both queries would be better in terms of performance? Thanks in advance!
DATE FOR BOTH TABLES MEANS THE SAME
You don't need to use CTE you can directly join two tables -
select A.id,
A.date,
B.total
from A inner join B on A.id = b.id
where A.date > '31-12-1999'
You would need to test on your data. But there is really no need for CTEs:
select a.id a.date, b.total
from a inner join
b
on a.id = b.id
where a.date > '1999-12-31' and b.date > '1999-12-31';
As for your specific question, the two queries are not the same, because the first is filtering on only one date and the second is filtering on two dates. You should run the query that implements the logic that you intend.

Join table A on table B and select only the first occurrence from B after specific date from table A

I'm trying to determine the best way to do the following.... Table a has a specific start_date. table b has a bunch of dollar amounts with various dates based on payments received and when. I only want to show the row from table b with the first date occurrence >= the start_date from table a. I also do not want to retrieve duplicates ID numbers which is what I am encountering now.
I have something like this so far...
Select a.ID, a.Start_Date
From a
Left Join (Select ID, Min(Recd_Dt) as Mindate, Total_Recd
Group by ID, Total_Recd) b on a.ID = b.ID and a.Start_Date <= b.Mindate
table a looks like this...
ID | Start_Dt
1 | 11/2/2017
2 | 11/3/2017
table b looks like this...
ID | Recd_Dt | Total_Recd
1 | 11/1/2017 | $600
1 | 11/10/2017 | $800
1 | 11/19/2017 | $100
2 | 11/2/2017 | $200
2 | 11/5/2017 | $600
2 | 11/6/2017 | $100
Id Like to see something like this...
ID | Recd_Dt | Total_Recd | Sum_of_Total_Recd_After_Start
1 | 11/10/2017 | $800 | $900
2 | 11/5/2017 | $600 | $700
furthermore, I'd like to also have a second join on the same table b that will give me a sum of any amount that occurred after the Start_Date
Give this a try:
SELECT
a.ID,
b.Recd_Dt,
b.Total_Recd,
SUM(Total_Recd) OVER(PARTITION BY a.ID) AS Sum_of_Total_Recd_After_Start
FROM a
INNER JOIN b ON a.ID = b.ID AND b.Recd_Dt > a.Start_Dt
QUALIFY ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY b.Start_Dt) = 1
1) Get all rows from table "a"
2) Get related rows from table "b" with Recd_Dt > Start_Dt
3) ROW_NUMBER orders rows by the earliest Start_Dt per each ID
4) QUALIFY ... = 1 keeps only the first row per ID grouping
5) SUM(Total_Recd) adds up the Total_Recd column per each ID grouping
I haven't tested it, but let me know if it works.

How to fill missing dates by groups in a table in sql

I want to know how to use loops to fill in missing dates with value zero based on the start/end dates by groups in sql so that i have consecutive time series in each group. I have two questions.
how to loop for each group?
How to use start/end dates for each group to dynamically fill in missing dates?
My input and expected output are listed as below.
Input: I have a table A like
date value grp_no
8/06/12 1 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/12/12 3 2
Also I have a table B which can be used to left join with A to fill in missing dates.
date
...
8/05/12
8/06/12
8/07/12
8/08/12
8/09/12
8/10/12
8/11/12
8/12/12
8/13/12
...
How can I use A and B to generate the following output in sql?
Output:
date value grp_no
8/06/12 1 1
8/07/12 0 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/09/12 0 2
8/10/12 0 2
8/11/12 0 2
8/12/12 3 2
Please send me your code and suggestion. Thank you so much in advance!!!
You can do it like this without loops
SELECT p.date, COALESCE(a.value, 0) value, p.grp_no
FROM
(
SELECT grp_no, date
FROM
(
SELECT grp_no, MIN(date) min_date, MAX(date) max_date
FROM tableA
GROUP BY grp_no
) q CROSS JOIN tableb b
WHERE b.date BETWEEN q.min_date AND q.max_date
) p LEFT JOIN TableA a
ON p.grp_no = a.grp_no
AND p.date = a.date
The innermost subquery grabs min and max dates per group. Then cross join with TableB produces all possible dates within the min-max range per group. And finally outer select uses outer join with TableA and fills value column with 0 for dates that are missing in TableA.
Output:
| DATE | VALUE | GRP_NO |
|------------|-------|--------|
| 2012-08-06 | 1 | 1 |
| 2012-08-07 | 0 | 1 |
| 2012-08-08 | 1 | 1 |
| 2012-08-09 | 0 | 1 |
| 2012-08-07 | 2 | 2 |
| 2012-08-08 | 1 | 2 |
| 2012-08-09 | 0 | 2 |
| 2012-08-10 | 0 | 2 |
| 2012-08-11 | 0 | 2 |
| 2012-08-12 | 3 | 2 |
Here is SQLFiddle demo
I just needed the query to return all the dates in the period I wanted. Without the joins. Thought I'd share for those wanting to put them in your query. Just change the 365 to whatever timeframe you are wanting.
DECLARE #s DATE = GETDATE()-365, #e DATE = GETDATE();
SELECT TOP (DATEDIFF(DAY, #s, #e)+1)
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY number)-1, #s)
FROM [master].dbo.spt_values
WHERE [type] = N'P' ORDER BY number
The following query does a union with tableA and tableB. It then uses group by to merge the rows from tableA and tableB so that all of the dates from tableB are in the result. If a date is not in tableA, then the row has 0 for value and grp_no. Otherwise, the row has the actual values for value and grp_no.
select
dat,
sum(val),
sum(grp)
from
(
select
date as dat,
value as val,
grp_no as grp
from
tableA
union
select
date,
0,
0
from
tableB
where
date >= date '2012-08-06' and
date <= date '2012-08-13'
)
group by
dat
order by
dat
I find this query to be easier for me to understand. It also runs faster. It takes 16 seconds whereas a similar right join query takes 32 seconds.
This solution only works with numerical data.
This solution assumes a fixed date range. With some extra work this query can be adapted to limit the date range to what is found in tableA.

SELECT set of most recent id, amount FROM table, where id occurs many times

I have a table recording the amount of data transferred by a given service on a given date. One record is entered daily for a given service.
I'd like to be able to retrieve the most recent amount for a set of services.
Example data set:
serviceId | amount | date
-------------------------------
1 | 8 | 2010-04-12
2 | 11 | 2010-04-12
2 | 14 | 2010-04-11
3 | 9 | 2010-04-11
1 | 6 | 2010-04-10
2 | 5 | 2010-04-10
3 | 22 | 2010-04-10
4 | 17 | 2010-04-19
Desired response (service ids 1,2,3):
serviceId | amount | date
-------------------------------
1 | 8 | 2010-04-12
2 | 11 | 2010-04-12
3 | 9 | 2010-04-11
Desired response (service ids 2, 4):
serviceId | amount | date
-------------------------------
2 | 11 | 2010-04-12
4 | 17 | 2010-04-19
This retrieves the equivalent as running the following once per serviceId:
SELECT serviceId, amount, date
FROM table
WHERE serviceId = <given serviceId>
ORDER BY date DESC
LIMIT 0,1
I understand how I can retrieve the data I want in X queries. I'm interested to see how I can retrieve the same data using either a single query or at the very least less than X queries.
I'm very interested to see what might be the most efficient approach. The table currently contains 28809 records.
I appreciate that there are other questions that cover selecting the most recent set of records. I have examined three such questions but have been unable to apply the solutions to my problem.
select m.*
from (
select serviceId, max(date) as MaxDate
from MyTable
group by serviceId
) mm
inner join MyTable m on mm.serviceId = m.serviceId and mm.MaxDate = m.date
If you wish to filter by serviceId, you can do:
select m.*
from (
select serviceId, max(date) as MaxDate
from MyTable
where serviceId in (1, 2, 3)
group by serviceId
) mm
inner join MyTable m on mm.serviceId = m.serviceId and mm.MaxDate = m.date
SELECT serviceId, amount, date
FROM table as t
WHERE NOT EXIST (
SELECT * FROM table as x
WHERE t.serviceId = x.serviceID AND t.date < x.date
)
if you want to filter out some serviceIds than
SELECT serviceId, amount, date
FROM table as t
WHERE NOT EXIST (
SELECT * FROM table as x
WHERE t.serviceId = x.serviceID AND t.date < x.date
)
AND serviceId in (2, 4)