T-SQL - Pivot out Distinct N rows for each group - sql

I have a table similar to the one below with customers, products, and purchase date. I am trying to get a list of customers and their 3 most recently purchased DISTINCT products. I want to use purchase date as a means of ordering the results, but I don't want to see duplicate product IDs.
Customer
Product
PurchaseDate
1
a
2020-12-5
2
b
2020-12-5
1
a
2020-12-4
2
a
2020-12-3
1
b
2020-12-2
2
b
2020-12-1
1
c
2020-11-30
1
d
2020-11-29
2
b
2020-11-28
Ideally I would see results like this
Customer
Product1
Product2
Product3
1
a
b
c
2
b
a
I have tried partition by statements and order by statements, but everything wants me to include date in the final output. Is there a way to do this?

Most recent distinct products is tricky. This requires one level of aggregation per customer and product and then another for pivoting:
select customer,
max(case when seqnum = 1 then product end) as product_1,
max(case when seqnum = 2 then product end) as product_2,
max(case when seqnum = 3 then product end) as product_3
from (select customer, product, max(purchasedate) as max_purchasedate,
row_number() over (partition by customer order by max(purchasedate) desc) as seqnum
from t
group by customer, product
) cp
group by customer;

Related

SQL subquery with comparison

On a Rails (5.2) app with PostgreSQL I have 2 tables: Item and ItemPrice where an item has many item_prices.
Table Item
id
name
1
poetry book
2
programming book
Table ItemPrice
id
item_id
price
1
1
4
2
2
20
3
1
8
4
1
6
5
2
22
I am trying to select all the items for which the last price (price of the last offer price attached to it) is smaller than the one before it
So in this example, my request should only return item 1 because 6 < 8, and not item 2 because 22 > 20
I tried various combinations of Active records and SQL subqueries that would allow me to compare the last price with the second to last price but failed so far.
ex Item.all.joins(:item_prices).where('EXISTS(SELECT price FROM item_prices ORDER BY ID DESC LIMIT 1 as last_price WHERE (SELECT price FROM item_prices ... can't work it out..
You can do it as follows using ROW_NUMBER and LAG:
LAG to get the previous row based on a condition
WITH ranked_items AS (
SELECT m.*,
ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY id DESC) AS rn,
LAG(price,1) OVER (PARTITION BY item_id ORDER BY id ) previous_price
FROM ItemPrice AS m
)
SELECT it.*
FROM ranked_items itp
inner join Item it on it.id = itp.item_id
WHERE rn = 1 and price < previous_price
Demo here

identify the week second purchase was recorded for each customer ID

please help me in SQL
I want to findout weekno the second purchase was made for each customer ID
here purchaseyn column value 1 means purchase made and 0 means not made
Table customerinfo
Week_No customerID PurcahseYn
201643 1 0
201643 2 1
201644 1 1
201644 2 1
201645 1 1
I want output like
Weekno CustomerID
201645 1
201644 2
Many thanks
You didn't state your DBMS so the following is standard SQL:
select week_no, customer_id
from (
select week_no, customer_id,
row_number() over (partition by customer_id order by week_no) as rn
from customerinfo
where purchaseyn = 1
) t
where rn = 2;
The above uses a window function number the purchases done by each customer and then restricts the overall result to the second one.

Remove Duplicate Records in Hive

I want to create a table that indicates medical providers that are linked by common members. For example, if I go to prov 1 and prov 2, then prov 1 and prov 2 will be linked because I visited both.
I have a table where each record indicates a member visiting a provider on a specific date. The table contains millions of members and thousands of provs. Below is a small example of the table:
member prov date
1 1 1/1/15
1 2 1/2/15
2 16 1/12/14
2 5 1/1/16
I am trying to create a table where each record indicates two distinct providers being linked by a common member. For example:
member prov1 prov2 date1 date2
1 1 2 1/1/15 1/2/15
2 16 5 1/12/14 1/1/16
I am trying to use an inner join on the same table, but it is returning duplicate records. I thought the distinct clause would fix this, but it does not seem to get the job done. My query is shown below:
select distinct a.member, a.prov, b.prov, a.date, b.date
from table1 as a
inner join table1 as b
on a.member=b.member
This query returns distinct records, but there are records that contain the same information. Below shows an example of this:
a.member a.prov b.prov a.date b.date
1 1 2 1/1/15 1/2/15
1 2 1 1/2/15 1/1/15
Above we see that the records are distinct, but they describe the same information. Below is what I want the query to return:
a.member a.prov b.prov a.date b.date
1 1 2 1/1/15 1/2/15
How can I alter the above query so that I only return distinct information? I don't want 1 record per member. I want 1 record for each distinct prov pairings by member.
One option is to use conditional aggregation with a subquery using row_number:
select member,
max(case when rn = 1 then prov end) prov1,
max(case when rn = 2 then prov end) prov2,
max(case when rn = 1 then date end) date1,
max(case when rn = 2 then date end) date2
from (select member,
prov,
date,
row_number() over (partition by member order by prov, date) rn
from table1) t
group by member

SQL: Average number of applications per customer for last x months

I have 3 tables Customer, Applications, ApplicationHistory. I have to retrieve following data:
Get the average number of applications per customer for last 3 months
Get the number of customers with atleast one or more applications for last 3 months
I had been trying group by however having following issues:
ApplicationHistory table has more than one entries for each application, now sure how to eliminate them &
Note: Have included Customer Table as need to filter data by customertype
Can you please suggest how can I get this right?
Many thanks,
My Solution ( does not work )
SELECT a.ApplicationId, a.CustomerId, count(*) count
from [application] a
inner join [applicationhistory] ah on a.ApplicationId = ah.ApplicationId
inner join Customer c on c.CustomerId = a.CustomerId
where ah.EventDate between #StartDateFilter and #EndDateFilter
--c.CustomerType in ( A, B)
group by a.ApplicationId, a.CustomerId
Table Structure:
Customer
Name CustomerId CustomerType
test1 1 A
test2 2 B
Applications
ApplicationId CustomerId
3 1
4 1
5 2
6 2
7 2
ApplicationHistory
ApplicationId EventDate EventType
3 2014-12-01 New
3 2014-12-01 Updated
3 2014-12-02 Withdrawn
4 2014-12-02 New
4 2014-12-03 Updated
5 2014-12-05 New
5 2014-12-06 Updated
5 2014-12-06 Updated
5 2014-12-07 Updated
6 2014-12-08 New
First, you query doesn't need the joins -- unless you care about customers with no applications. So, this is a simpler version to get the total
select ah.CustomerId, count(*) as cnt
from applicationhistory ah
where ah.EventDate between #StartDateFilter and #EndDateFilter
group by a.CustomerId;
Note that the group by only has CustomerId and not ApplicationId.
If you want only "New" applications, the use where:
select ah.CustomerId, count(*) as cnt
from applicationhistory ah
where ah.EventDate between #StartDateFilter and #EndDateFilter and
EventType = 'New'
group by a.CustomerId;
If you want net applications "new" - "withdrawn", then use conditional aggregation:
select ah.CustomerId,
sum(case when EventType = 'New' then 1 else -1 end) as cnt
from applicationhistory ah
where ah.EventDate between #StartDateFilter and #EndDateFilter and
EventType in ( 'New', 'Withdrawn' )
group by a.CustomerId;

Oracle SQL: Insert based on found value in a column given condition from another table

I want to merge my order data into one table which is now in two separate tables:
Order ID and customer code in table Orders:
Order_ID Customer
1 C11
2 C76
4 C32
and order detalis in table Details (with columns Order_ID, Hour, Quantity) in which the ordered quantity for the hours that the order is valid is given:
Order_ID Hour Quantity
1 2 10
1 3 20
2 2 5
2 3 5
2 4 5
4 6 20
4 7 25
I want to merge data of these two tables in one table to have only one row per each order by inserting the quantity for the hours that the order is valid in corresponding column, otherwise zero.
Order_ID Cutomer Hour1 Hour2 Hour3 Hour4 Hour5 Hour6 Hour7 ...
1 C11 0 10 20 0 0 0 0
2 C76 0 5 5 5 0 0 0
4 C32 0 0 0 0 0 20 25
I tried (only for quantity of hour 1):
insert into Merged_Order_Table
(Order_ID,Customer,Hour1)
select
Orders.Order_id,Orders.Customer,
case
when 1 in (select Details.Hour from Details,Orders where
Details.Order_ID = Orders.Order_ID)
then Details.Quantity
else 0
end
from
Orders
inner join
Details
on
Details.Order_ID = Orders.Order_ID;
But got quantity in Hour1 even for orders with no quantity in this hour.
I question why you would want to take nicely normalized data and put it into a table with that structure. I can understand a query returning the data like that, but another table?
In any case, your problem is a common problem when using correlated subqueries. The table being correlated is included in the subquery. Ooops. Here is fix for that:
insert into Merged_Order_Table(Order_ID, Customer, Hour1)
select o.Order_id, o.Customer,
(case when 1 in (select d.Hour from Details d where d.Order_ID = o.Order_ID)
then d.Quantity
else 0
end)
from Orders o;
That said, what you really want is conditional aggregation:
insert into Merged_Order_Table(Order_ID, Customer, Hour1)
select o.Order_id, o.Customer,
sum(case when d.Hour = 1 then d.Quantity else 0 end)
from Orders o left join
Details d
on o.Order_ID = d.Order_ID
group by o.Order_id, o.Customer;
You are on a right track!
you just need one more layer:
select order_id, customer, max(quantity) hour1 from
(your query)
group by order_id, customer
Or you can look into how to do PIVOT tables