How to Limit Duplicat Rows in Postgres sql - sql

Hi I have data similar to for table : fruit_table
Product Code
Product
Date
MG
Mango
2020-01-25
MG
Mango
2020-01-26
MG
Mango
2020-01-27
MG
Mango
2020-01-28
BN
Banana
2019-01-15
BN
Banana
2020-01-19
BN
Banana
2020-01-20
BN
Banana
2016-01-20
AP
APPLE
2021-03-02
As you can see in the data we have Mango 4 products and Banana 4 Products and Apple with 1 product, i want the solution is to limit the products with 2 rows with latest date.
And i want output similar to
Product Code
Product
Date
MG
Mango
2020-01-27
MG
Mango
2020-01-28
BN
Banana
2020-01-19
BN
Banana
2020-01-20
AP
APPLE
2021-03-02
How can this be achieved with a simple query in PostgreSQL query.
Thanks in advance.

demo:db<>fiddle
You can use row_number() window function to achieve that:
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY "Product Code" ORDER BY "Date" DESC) -- 1
FROM fruit_table
) s
WHERE row_number <= 2 -- 2
row_number() adds a row count to each record in an ordered group (= partition). In your case the group is the Product Code (or Product) and you need to order each group by Date DESC to get the most recent dates to the top. Now the most recent date in each group gets the row count 1, the second recent one the 2 and so on
Using this row count you can filter only the two top records of each group.

As a note: If you have a separate table of products, then you might find that a lateral join is faster:
select ft.*
from products cross join lateral
(select ft.*
from fruit_table ft
where ft.product = p.product
order by ft.date desc
limit 2
) ft;
Like the solution using row_number() this will take advantage of an index on fruit_table(product, date). However, I think the performance would usually be a little better (basically because it does not assign row number values to all rows before filtering them out).

Related

Computing window functions for multiple dates

I have a table sales of consisting of user id's, products those users have purchased, and the date of purchase:
date
user_id
product
2021-01-01
1
apple
2021-01-02
1
orange
2021-01-02
2
apple
2021-01-02
3
apple
2021-01-03
3
orange
2021-01-04
4
apple
If I wanted to see product counts based on every users' most recent purchase, I would do something like this:
WITH latest_sales AS (
SELECT
date
, user_id
, product
, row_number() OVER(PARTITION BY user_id ORDER BY date DESC) AS rn
FROM
sales
)
SELECT
product
, count(1) AS count
FROM
latest_sales
WHERE
rn = 1
GROUP BY
product
Producing:
product
count
apple
2
orange
2
However, this will only produce results for my most recent date. If I looked at this on 2021-01-02. The results would be:
product
count
apple
2
orange
1
How could I code this so I could see counts of the most recent products purchased by user, but for multiple dates?
So the output would be something like this:
date
product
count
2021-01-01
apple
1
2021-01-01
orange
0
2021-01-02
apple
2
2021-01-02
orange
1
2021-01-03
apple
1
2021-01-03
orange
2
2021-01-04
apple
2
2021-01-04
orange
2
Appreciate any help on this.
I'm afraid the window function row_number() with the PARTITION BY user_id clause is not relevant in your case because it only focusses on the user_id of the current row whereas you want a consolidate view with all the users.
I dont have a better idea than doing a self-join on table sales :
WITH list AS (
SELECT DISTINCT ON (s2.date, user_id)
s2.date
, product
FROM sales AS s1
INNER JOIN (SELECT DISTINCT date FROM sales) AS s2
ON s1.date <= s2.date
ORDER BY s2.date, user_id, s1.date DESC
)
SELECT date, product, count(*)
FROM list
GROUP BY date, product
ORDER BY date
see the test result in dbfiddle

SQL - Get MIN of a column for each key in other table

I have two tables as such:
ORDERS
Date
TransactID
COL3
2021-06
1234
4
2021-09
1238
8
Agg
Date
User
TransactID
2021-06
3333
1234
2021-03
3333
XXXX
2021-02
3333
XXXX
2021-09
4444
1238
2021-05
4444
XXXX
2021-01
4444
XXXX
In AGG, a User can have many transactions, the ORDERS table is just a subset of it.
For each TransactID in Orders, I need to go into the Agg table and get the MIN date for the User associated with the TransactID.
Then, I need to calculate the date difference between the ORDERS.Date and the minimum AGG.DATE. The result is stored in SDP.COL3. COL3 can basically be described as Days Since First Transaction.
I have never done a SQL problem that is this multistep, and need some guidance. Any help would be greatly appreciated!
If I've got it right
SELECT SDP.TXN_ID, sdp.dt, datediff(sdp.dt, min(a1.DT)) diff
FROM SDP
JOIN AGG a1 on a1.UserID =
(SELECT a2.UserID
FROM AGG a2
WHERE SDP.TXN_ID = a2.TXN_ID
ORDER BY a2.UserID
limit 1)
GROUP BY SDP.TXN_ID, sdp.dt
You can omit
ORDER BY a2.UserID
limit 1
provided each transaction is always belonging to a single user.
The fiddle
based on your SQL Fidddle (http://sqlfiddle.com/#!9/101497/1) this should get you started
SELECT TXN_ID, DT, USERID
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY sdp.TXN_ID ORDER BY sdp.DT ASC) AS [index],
sdp.TXN_ID,
sdp.DT,
agg.USERID
FROM sdp
LEFT JOIN agg ON sdp.TXN_ID = agg.TXN_ID) A
WHERE [index] = 1
For more information you should look at
https://www.sqlshack.com/sql-partition-by-clause-overview/
https://www.sqltutorial.org/sql-window-functions/sql-partition-by/
https://learnsql.com/blog/partition-by-with-over-sql/

Use table with multiple rows with the same value

I'm trying to do a INNER JOIN and count values with information from two tables. The problem is that the product category table have multiple rows with the same or similar value and my COUNT() is to high as a result.
My two tables
Sales table
Date prod_id
2016-01-01 81
2016-01-01 82
2016-01-01 81
2016-10-01 80
2016-01-01 80
2016-01-02 80
2016-01-02 80
2016-01-02 81
2016-01-02 81
.... ....
Product table
prodid Name
80 Banana
81 Apple
82 Orange
83 Ice Cream
80 BANANAS
81 APPLE
82
83 Ice Cream
.... ....
When I do an INNER JOIN and count the number of occurrences of e.g. prod_id I get an unreasonable high number, and my guess is that it's because there are more than one occurrence of prod_id 80 for example.
Do you have any idea for a solution? My first reaction was to redo the Procuct table, but there are many other systems depending on that table so I can't change it a foreseeable future.
My query so far:
SELECT
pt.Date AS "Date",
ft.Name AS "Product",
COUNT(ft.Name) Number
FROM SALES as pt
INNER JOIN PROD_TABLE AS ft ON pt.prod_id=ft.prodid
WHERE pt.Date BETWEEN '2016-01-01' AND '2016-01-30'
GROUP BY pt.Date, ft.Name
ORDER BY pt.Date DESC
Expected result:
Date Product Number
2016-01-01 Banana 2
2016-01-01 Apple 2
2016-01-01 Orange 1
First, you should fix the data. Having a product table with duplicates seems non-sensical. You shouldn't try to get around such issues by writing more complex queries.
That said, this is pretty easy to do in SQL Server. I think outer apply is appropriate:
select p.name, count(*)
from sales s outer apply (
(select top 1 p.*
from product p
where p.name is not null and
p.prodid = s.prod_id -- note: the columns should have the same name
) p;
I guess this simple query solves your requirement:
select
date,
name,
count(name)
from product p inner join sales s
on s.prod_id=p.prodid group by date,name

Select all rows based on alternative publisher

I want list all the rows by alternative publisher with price ascending, see the example table below.
id publisher price
1 ABC 100.00
2 ABC 150.00
3 ABC 105.00
4 XYZ 135.00
5 XYZ 110.00
6 PQR 105.00
7 PQR 125.00
The expected result would be:
id publisher price
1 ABC 100.00
6 PQR 105.00
5 XYZ 110.00
3 ABC 105.00
7 PQR 125.00
4 XYZ 135.00
2 ABC 150.00
What would be the required SQL?
This should do it:
select id, publisher, price
from (
select id, publisher, price,
row_number() over (partition by publisher order by price) as rn
from publisher
) t
order by rn, publisher, price
The window functions assigns unique numbers for each publisher price. Based on that the outer order by will then first display all rows with rn = 1 which are the rows for each publisher with the lowest price. The second row for each publisher has the second lowest price and so on.
SQLFiddle example: http://sqlfiddle.com/#!4/06ece/2
SELECT id, publisher, price
FROM tbl
ORDER BY row_number() OVER (PARTITION BY publisher ORDER BY price), publisher;
One cannot use the output of window functions in the WHERE or HAVING BY clauses because window functions are applied after those. But one can use window functions in the ORDER BY clause.
SQL Fiddle.
Not sure what your table name is - I have called it publishertable. But the following will order the result by price in ascending order - which is the result you are looking for:
select id, publisher, price from publishertable order by price asc
if I've got it right. You should use ROW_NUMBER() function to range prices inside of each publisher and then order by this range and publisher.
SELECT ID,
Publisher,
Price,
Row_number() OVER (PARTITION BY Publisher ORDER BY Price) as rn
FROM T
ORDER BY RN,Publisher
SQLFiddle demo

In SQL, how find the total of a row over time?

A table and I want to know the total of my rows over time. For example. Here's my table:
Date Fruit Sold
Mon apple 4
Mon pear 5
Mon orange 2
Tues apple 3
Tues pear 2
Tues orange 1
The table I want back is:
Fruit Sold
apple 7
pear 7
orange 3
What is a query that I can do this? However, with my real situation, I have hundreds of types of fruit. So how do I query with out specifying each type of fruit each time?
That would be along the lines of:
select fruit, sum(sold) as sold
from fruitsales
group by fruit
-- adding something like <<where date = 'Mon'>> if you want to limit it.
This will aggregate the individual sold columns (by summing) for each fruit type.
here is how to do it:
select fruit, sum(sold)
from table
group by fruit
cheers...
Group by Time
select fruit, sum(sold),substring(saletime,1,3) from table group by fruit,substring(saletime,1,3)