eliminating a whole group which doesn't meet minimum date - SQL Spark - sql

I have a table with order ID, Country, order date, product name and quantity. As you can see one unique order ID is composed of a few products/records but spread around different dates. I need my query to retrieve only records of an order that all of its records order date is later than 6/11/2022; so for example: I need the query to completely eliminate all of order 222 as at least one of its records is earlier than 6/11/2022, and the same goes for order 111 (at least one of its records is earlier than 6/11/2022). As you can see, only order 333 meets that criteria. I'm trying to group by order ID and country, and then eliminating the whole order records according to the criteria, the issue is it will just eliminate a specific record which is earlier than 6/11/2022 - but not the whole order records:
code:
select order ID, order date, product, quantity from Orders table
group by order ID, country
HAVING MIN(order date) > '6/11/2022'
Orders table:
order Id
country
order date
product
quantity
222
UK
05/11/2022
keyboard
2
222
UK
05/11/2022
motherboard
2
222
UK
07/11/2022
wireless mouse
1
111
Germany
08/11/2022
game console
5
111
Germany
05/10/2022
mini keyboard
3
111
Germany
08/10/2022
5 mini discs bundle
1
111
Germany
10/10/2022
backup disc
5
333
France
09/12/2022
backup disc
2
333
France
10/12/2022
backup disc
1
Query desired result:
order Id
country
order date
product
quantity
333
France
09/12/2022
backup disc
2
333
France
10/12/2022
backup disc
1
the results I'm getting - not the desired results:
order Id
country
order date
product
quantity
222
UK
07/11/2022
wireless mouse
1
111
Germany
08/11/2022
game console
5
333
France
09/12/2022
backup disc
2
333
France
10/12/2022
backup disc
1

You can use window functions to compute the earliest date per order/country tuple on each row, then use that information to filter the dataset:
select *
from (
select o.*,
min(date) over(partition by order_id, country) min_date
from orders o
) o
where min_date > date '2022-11-06'
This scans the table only once, hence it should be more efficient than the group by/join solution.

Related

MS Access SQL, How to return only the newest row before a given date joined to a master table

I have two tables in a MS Access database as shown below. CustomerId is a primary key and fkCustomerId is a foreign key linked to the CustomerId in the other table.
Customer table
CustomerId
Name
1
John
2
Bob
3
David
Purchase table
fkCustomerId
OrderDate
fkStockId
1
01/02/2010
100
3
08/07/2010
101
2
14/01/2011
102
2
21/10/2011
103
3
02/03/2012
104
1
30/09/2012
105
3
01/01/2013
106
1
18/04/2014
107
3
22/11/2015
108
I am trying to return a list of customers showing the last fkStockId for each customer ordered before a given date.
So for the date 01/10/2012, I'd be looking for a return of
fkCustomerId
Name
fkStockId
1
John
105
2
Bob
103
3
David
104
A solution seems to be escaping me, any help would be greatly appreciated.
You can use nested select to get last order date.
SELECT Purchase.fkCustomerId,
Name,
fkStockId
FROM Purchase
JOIN
(
SELECT fkCustomerId,
MAX(OrderDate) as last_OrderDate
FROM Purchase
WHERE OrderDate < '01/10/2012'
GROUP BY fkCustomerId
) AS lastOrder
ON lastOrder.fkCustomerId = Purchase.fkCustomerId
AND last_OrderDate = OrderDate
LEFT JOIN Customer
ON Customer.CustomerId = Purchase.fkCustomerId
This example assumes OrderDate before '01/10/2012'. You might need to change it if you want it to be filtered by a different value.
Another assumption is that there's only one corresponding fkStockId for each OrderDate

Get the running unique count of items till a give date, similar to running total but instead a running unique count

I have a table with user shopping data as shown below
I want an output similar to running total but instead I want the running total of the count of unique categories that the user has shopped for by date.
I know I have to make use of ROWS PRECEDING AND FOLLOWING in the count function but I am not able to user count(distinct category) in a window function
Dt category userId
4/10/2022 Grocery 123
4/11/2022 Grocery 123
4/12/2022 MISC 123
4/13/2022 SERVICES 123
4/14/2022 RETAIl 123
4/15/2022 TRANSP 123
4/20/2022 GROCERY 123
Desired output
Dt userID number of unique categories
4/10/2022 123 1
4/11/2022 123 1
4/12/2022 123 2
4/13/2022 123 3
4/14/2022 123 4
4/15/2022 123 5
4/20/2022 123 5
Consider below approach
select Dt, userId,
( select count(distinct category)
from t.categories as category
) number_of_unique_categories
from (
select *, array_agg(lower(category)) over(partition by userId order by Dt) categories
from your_table
) t
if applied to sample data in your question - output is

Find the most popular combinations SQL

I have 2 tables I want to join to explore the most popular combinations of location, by distinct id, ordered by count. I get location from l, date from d. The results from this join would be:
id loc_id location date
1 111 NYC 20200101
1 222 LA 20200102
2 111 NYC 20200103
2 333 LON 20200103
3 444 NYC 20200105
4 444 LA 20200106
4 555 PAR 20200107
5 111 NYC 20200110
5 222 LA 20200111
I would like to use STRING_AGG if possible, but get an error with the WITHIN statement -
'expecting ')' but got WITHIN
..( I'm on BigQuery for this). Here is what I've attempted so far.
SELECT t.combination, count(*) count
FROM (
SELECT
STRING_AGG(location, ',') WITHIN GROUP (ORDER BY d.date) combination
FROM location as l
JOIN date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228 GROUP BY t.combination
ORDER BY count DESC;
I want to end up with something like:
combination count
NYC, LA 3
NYC, LON 1
LA, PAR 1
NYC 1
If there's another method I'd be happy to change from string_agg.
The correct BQ syntax would be:
SELECT t.combination, count(*) count
FROM (SELECT STRING_AGG(location, ',' ORDER BY d.date) as combination
FROM location l JOIN
date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228
GROUP BY t.combination
ORDER BY count DESC;
Note that your JOIN condition still looks wrong.
And if you are using dates, then I would expect DATE constants.
And your date filtering code won't work in the outer query, because you haven't selected the dates in the inner query. You probably want the filtering in the inner query.
This answer does not address these issues.
BigQuery has quite good documentation. There is no WITHIN GROUP for STRING_AGG().

Need Distinct address, ID etc with the different amount in one table using Plsql

Need help on the below scenario, please.
I want distinct address, ID, etc with the different amount in one table using plsql or
For example below is the current table
Address aRea zipcode ID Amount amount2 qua number
123 Howe's drive AL 1234 1234567 100 20 1 666666
123 Howe's drive AL 1234 1234567 5 05 2 abcccc
123 east drive AZ 456 8910112 200 11 1 777777
123 east drive AZ 456 8910112 5 5 2 SDN133
116 WOOD Ave NL 1234 2325890 3.23 1.25 1 10483210
116 WOOD Ave NL 1234 2325890 3.24 1.26 2 10483211
I need the output as below.
Address aRea zipcode ID Amount amount2 qua number
123 Howe's drive AL 1234 1234567 100 20 1 666666
5 05 2 abcccc
123 east drive AZ 456 8910112 200 11 1 777777
5 5 2 SDN133
116 WOOD Ave NL 1234 2325890 3.23 1.25 1 10483210
3.24 1.26 2 10483211
Below is for BigQuery Standard SQL
I would recommended below approach
#standardSQL
SELECT address, area, zipcode, id,
ARRAY_AGG(STRUCT(amount, amount2, qua, number)) info
FROM `project.dataset.table`
GROUP BY address, area, zipcode, id
if to apply to sample data from your question - output is
This type of task is usually better done on application side.
You can do this with SQL - but you need a column to order the record consistently:
select
case when rn = 1 then address end as address,
case when rn = 1 then area end as area,
case when rn = 1 then zipcode end as zipcode,
case when rn = 1 then id end as id,
amount,
amount2,
qua,
number
from (
select
t.*,
row_number() over(
partition by address, area, zipcode, id
order by ??
) rn
from mytable t
) t
order by address, area, zipcode, id, ??
The partition by clause of the window function lists the columns that you want to "group" together; you can modify it as you really need.
The order by clause of row_number() indicates how rows should be sorted within the partition: you need to decide which column (or set of columns) you want to use. For the output tu make sense, you also need an order by clause in the query, where the partition and ordering columns will be repeated.

SQL oracle with joining tables and Max functions

Some help please? Just a noob here starting to learn how to write SQL and ran into this problem. I know how to use the MAX function but I can't figure out how to join all these requirements together. I have two tables, Accounts and Books (below is an example of the data)
Accounts
ID Series YesorNot Dated Filed Plan Year
1 123 Yes 06/12/2015 2015
2 123 No 06/12/2015 2015
3 145 Yes 06/06/2015 2015
4 145 No 02/02/2015 2014
5 198 Yes 02/03/2015 2015
6 187 Yes 02/14/2013 2013
7 153 Yes 01/02/2011 2011
Books
Primary Key Date Created ID
1 06/13/2015 123
2 06/12/2015 123
3 06/07/2015 145
4 02/02/2015 145
5 02/03/2015 198
Two tables: Accounts and Books
Looking for:
1. Data that exists in both tables by the Project ID = Primary Key
2. I only want one unqiue Series (Series also = ID)
3. I want the MAX (most recent) value of Plan Year, and then if there are duplicates for Plan Year, I need the MAX (most recent) value of Date Created.
4. I just need the columns Project ID, Series, YesorNot, Date Filed, Plan Year so my output should be like this:
Project ID Series YesorNot Dated Filed Plan Year
1 123 Yes 06/12/2015 2015
3 145 Yes 06/06/2015 2015
4 145 No 02/02/2015 2014
5 198 Yes 02/03/2015 2015
First join the tables:
SELECT B.Primary_Key as Project_ID, A.Series, A.YesorNot, A.Date_Filed, A.Plan_Year
FROM Books B
JOIN Accounts A ON B.ID = A.Series
You should have been able to get this far on your own (and you should have posted it as part of the question) -- if you can't I'd say find a different career. Assuming you could now the slightly harder part.
Now we add a row number based on your criteria
ROW_NUMBER() PARTITION BY (B.Primary_Key, A.Series, A.YesorNot, A.Date_Filed ORDER BY A.Date_Year DESC, B.Date_Created DESC) AS RN
Now just take the first of the row number.
SELECT Project_ID, Series, YesorNot, Date_Filed, Plan_Year
FROM (
SELECT B.Primary_Key as Project_ID, A.Series, A.YesorNot, A.Date_Filed, A.Plan_Year,
ROW_NUMBER() PARTITION BY (B.Primary_Key, A.Series, A.YesorNot, A.Date_Filed ORDER BY A.Date_Year DESC, B.Date_Created DESC) AS RN
FROM Books B
JOIN Accounts A ON B.ID = A.Series
) X
WHERE RN = 1