Find the 3rd top selling item postgres - sql

I have a transaction table, I want to get the product ID of the record which is ranked 3rd highest for sales. Please note there there can be multiple transactions for an item and one transaction can have multiple qty. so i am wanting to find which product id has the 3rd highest qty.
I think i need to use something like rank, but my query returns ranks weirdly, not sure whats wrong.
select distinct t.product_id,
sum(t.qty) over (partition by t.product_id) qty,
rank() over(partition by t.product_id order by t.qty desc) rnk
from transaction t
order by rnk`
CREATE TABLE IF NOT EXISTS "transaction"
(DATE_ID BIGINT NOT NULL,
STORE_ID INT NOT NULL,
TRANSACTION_TYPE_ID CHAR(1) NOT NULL,
PRODUCT_ID INT NOT NULL,
QTY INT NOT NULL);
INSERT INTO "transaction"
(DATE_ID, STORE_ID, TRANSACTION_TYPE_ID, PRODUCT_ID, QTY)
VALUES
(1, 1, 'A', 1, 2),
(1, 1, 'B', 1, 1),
(1, 2, 'A', 4, 1),
(1, 6, 'A', 3, 1),
(1, 1, 'A', 1, 1),
(2, 1, 'B', 1, 1),
(2, 1, 'A', 1, 1),
(2, 2, 'A', 2, 5),
(3, 2, 'A', 2, 7),
(3, 3, 'A', 2, 1),
(3, 3, 'B', 1, 15),
(3, 3, 'A', 1, 1),
(4, 4, 'A', 1, 1),
(4, 4, 'A', 1, 5),
(4, 4, 'A', 1, 11),
(4, 5, 'A', 3, 2),
(4, 6, 'A', 3, 1),
(4, 6, 'A', 3, 1),
(4, 6, 'B', 2, 1),
(5, 2, 'A', 2, 2),
(5, 2, 'B', 1, 1),
(5, 2, 'A', 2, 1),
(5, 2, 'A', 4, 1),
(5, 2, 'A', 5, 1),
(6, 2, 'B', 4, 1),
(6, 2, 'A', 6, 1),
(6, 3, 'A', 3, 5),
(7, 3, 'A', 2, 7),
(7, 4, 'A', 2, 1),
(7, 4, 'B', 2, 15),
(7, 4, 'A', 2, 1),
(7, 5, 'A', 2, 1),
(7, 5, 'A', 2, 5),
(7, 5, 'A', 2, 11),
(7, 6, 'A', 2, 2),
(7, 1, 'A', 2, 1),
(8, 1, 'A', 2, 1),
(8, 1, 'B', 2, 1),
(8, 3, 'A', 3, 2),
(9, 3, 'B', 3, 1),
(9, 3, 'A', 3, 1),
(9, 3, 'A', 3, 1),
(9, 3, 'A', 3, 1),
(10, 3, 'B', 3, 1),
(10, 3, 'A', 3, 1),
(10, 4, 'A', 4, 5),
(10, 4, 'A', 4, 7),
(10, 5, 'A', 5, 1),
(10, 5, 'B', 5, 15),
(10, 5, 'A', 5, 1),
(10, 6, 'A', 6, 1),
(10, 6, 'A', 6, 5),
(10, 6, 'A', 6, 11),
(10, 1, 'A', 1, 2),
(10, 2, 'A', 2, 1),
(11, 2, 'A', 2, 1),
(11, 2, 'B', 2, 1),
(11, 3, 'A', 5, 2),
(11, 3, 'B', 5, 1),
(11, 3, 'A', 5, 1),
(12, 3, 'A', 5, 1),
(12, 3, 'A', 5, 1)

Yet another option is:
aggregating the "qty" values per "product_id" (SUM(qty) GROUP BY product_id)
extracting a ranking value for each product_id summed quantities (DENSE_RANK() OVER(ORDER BY SUM(qty) DESC))
ordering your output rows with respect to when this ranking value equals 3 (DENSE_RANK() ... = 3)
keeping only the first row given your ordering (FETCH FIRST 1 ROWS WITH TIES )
SELECT product_id
FROM "transaction"
GROUP BY product_id
ORDER BY DENSE_RANK() OVER(ORDER BY SUM(qty) DESC) = 3 DESC
FETCH FIRST 1 ROWS WITH TIES
Check the demo here.

select product_id
from (
select product_id
,dense_rank() over(order by sum(qty)) as d_rnk
from transaction
group by product_id
) t
where d_rnk = 3
product_id
5
Fiddle

Related

PostgreSQL: Adjust columns value based on criteria

Imagine the following data:
student category exam_id adjusted_category
Carl A 44 A
Carl A 55 A
Carl A 88 A
Carl A 1 A
Carl A 2 A
Carl A 3 A
Carl B 1 B
Carl B 2 B
Carl B 3 B
John C 100 C
John C 200 C
John C 300 C
If for the same user, both categories A and B are encountered but specific exam_ids are found (44, 55, 88), I'd like to adjust the category to be A. Otherwise, keep the same category.
The output I'm aiming is:
student adjusted_category
Carl A
Carl C
Code I'm currently attempting to modify:
with my_table (student, category, exam_id)
as (values
('Carl', 'A', 44),
('Carl', 'A', 55),
('Carl', 'A', 88),
('Carl', 'A', 1),
('Carl', 'A', 2),
('Carl', 'A', 3),
('Carl', 'B', 1),
('Carl', 'B', 2),
('Carl', 'B', 3),
('John', 'C', 100),
('John', 'C', 200),
('John', 'C', 300)
)
select *,
case
when category in ('A','B') and exam_id in (44, 55, 88) then 'A'
else category
end as adjusted_category
from my_table
The reason why the code above is not what I'm after is because I end up getting the adjusted category as A only where the exams id_s are 44, 55, or 88. Id's like all of the entries for Carl to have A as the adjusted category.
How can I achieve the desired output?
I may have misinterpreted your requirement. If so, you can change the bool_or() to bool_and(). The bool_or() expression over a window partitioned by student will return true if the student has any one of (44, 55, 88):
with my_table (student, category, exam_id)
as (values
('Carl', 'A', 44),
('Carl', 'A', 55),
('Carl', 'A', 88),
('Carl', 'A', 1),
('Carl', 'A', 2),
('Carl', 'A', 3),
('Carl', 'B', 1),
('Carl', 'B', 2),
('Carl', 'B', 3),
('John', 'C', 100),
('John', 'C', 200),
('John', 'C', 300)
)
select *,
case
when category in ('A','B')
and bool_or(exam_id in (44, 55, 88)) over (partition by student) then 'A'
else category
end as adjusted_category
from my_table;
db<>fiddle here

How to complete this specific question/query using SQL and an inline view?

I'm currently working on a hotel project where I need to make queries with SQL but I'm stuck on one question.
The question is:
How many employees have made at least 2 bookings for at least 3 customers?
I have figured out that I need to use an inline view but I have not gone any further because I'm stuck on the next part.
This is the table in the database:
bookingid | int | primary key
bookingdate | date| -
numOfGuests | int | -
customerId | int | foreign key
employeeId | int | foreign key
bookingid | bookingdate | numOfGuests | customerId | employeeId
1 2016-01-25 4 2 2
2 2016-06-12 1 3 2
3 2016-12-05 1 2 2
4 2016-04-01 2 3 2
5 2016-11-01 3 2 3
6 2016-11-03 1 8 2
7 2017-06-02 6 2 2
8 2016-02-07 2 8 2
9 2016-12-25 2 4 5
10 2017-06-21 1 10 2
11 2016-08-12 2 10 2
... ... ... ... ...
So does anyone know how to complete this question with a SQL query using an inline view?
The result I want are the employeeId's or id that satisfies the specifications of the question: Result based on sample data
CountOfemployeeID |
1
Please check this script-
SELECT COUNT(DISTINCT C.employeeId)
FROM
(
SELECT A.employeeId,B.customerid,COUNT(B.bookingid) T
FROM (
--Select users who atlease booked for 3 customer
SELECT employeeId,COUNT(DISTINCT customerid) customerid
FROM Table1
GROUP BY employeeId
HAVING COUNT(customerid)> 2
)A
--Select users who atleast booked twice per customer
INNER JOIN (
SELECT bookingid,bookingdate,numOfGuests,customerId,employeeId
FROM Table1
) B
ON A.employeeId = B.employeeId
GROUP BY A.employeeId,B.customerid
HAVING COUNT(B.bookingid) > 1
)C
declare #userData TABLE(
bookingid int,
bookingdate date,
numOfGuests int,
customerId int,
employeeId int
)
insert into #userData
values
(1, '2016-01-25', 4, 2, 2),
(2, '2016-06-12', 1, 3, 3),
(3, '2016-12-05', 1, 2, 4),
(4, '2016-04-01', 2, 2, 3),
(5, '2016-11-12', 3, 2, 3),
(6, '2017-01-15', 1, 5, 5),
(6, '2017-01-15', 1, 5, 5),
(6, '2017-01-15', 1, 5, 5),
(6, '2017-01-15', 1, 5, 5),
(6, '2017-01-15', 1, 5, 5),
(6, '2017-01-15', 1, 5, 5),
(1, '2016-01-25', 4, 2, 2),
(2, '2016-06-12', 1, 3, 3),
(3, '2016-12-05', 1, 2, 4),
(4, '2016-04-01', 2, 2, 3),
(5, '2016-11-12', 3, 2, 3),
(6, '2017-01-15', 1, 2, 5),
(6, '2017-01-15', 1, 2, 5),
(6, '2017-01-15', 1, 3, 5),
(6, '2017-01-15', 1, 3, 5),
(6, '2017-01-15', 1, 4, 5),
(6, '2017-01-15', 1, 4, 5),
(1, '2016-01-25', 4, 2, 2),
(2, '2016-06-12', 1, 3, 3),
(3, '2016-12-05', 1, 2, 4),
(4, '2016-04-01', 2, 2, 3),
(5, '2016-11-12', 3, 2, 3),
(6, '2017-01-15', 1, 1, 5),
(6, '2017-01-15', 1, 2, 5),
(6, '2017-01-15', 1, 3, 5),
(6, '2017-01-15', 1, 4, 5),
(6, '2017-01-15', 1, 7, 5),
(6, '2017-01-15', 1, 6, 5),
(1, '2016-01-25', 4, 3, 2),
(1, '2016-01-25', 4, 3, 2),
(1, '2016-01-25', 4, 1, 2),
(1, '2016-01-25', 4, 1, 2)
select * from #userData
; with CTE as
(
select count(customerId) count, customerId, employeeId from #userData
group by customerId, employeeid having count(customerid) >= 2
), cte2 as
(
Select employeeId from CTE group by Employeeid having count(employeeId) >= 3
)
select count, customerid, employeeid from CTE as a
inner join CTE2 as b on a.employeeId = b.employeeId
OUTPUT
count customerId employeeId
2 1 2
3 2 2
2 3 2
3 2 5
3 3 5
3 4 5
6 5 5
If You need only the EmployeeId, then just fire
Select employeeId from CTE2
output
Employeeid
2
5

SQL count joins distinct by multiple columns

I have a problem where I need to JOIN on the same table multiple time and count the JOINS.
here is the database setup (SQL Fiddle
):
CREATE TABLE state
(
[t_id] int,
[true_id] int,
[false_id] int,
[msg] varchar(32)
);
INSERT INTO state
(t_id, true_id, false_id, msg)
VALUES
(5, 6, 7, 'CASE_1'),
(10, 11, 12, 'CASE_2'),
(20, 21, 22, 'CASE_N'),
(30, 31, 32, 'FOOOO');
CREATE TABLE step
(
[id] int,
[f_id] int,
[state_type] int,
[state_value] int
);
INSERT INTO step
(id,f_id,state_type, state_value)
VALUES
(1, 5, 5, 7),
(2, 5, 5, 7),
(3, 5, 5, 6),
(4, 5, 10, 12),
(5, 5, 10, 12),
(6, 5, 10, 11),
(7, 6, 10, 12),
(8, 6, 10, 12),
(9, 7, 20, 21),
(10, 7, 20, 21),
(11, 7, 30, 32),
(12,7, 30, 31);
here is my current query:
SELECT state.msg,
COUNT(state_true.true_id) AS Trues,
COUNT(state_false.false_id) AS Falses
FROM state
INNER JOIN step ON state.t_id = step.state_type
LEFT OUTER JOIN state AS state_true ON step.state_value = state_true.true_id
LEFT OUTER JOIN state AS state_false ON step.state_value = state_false.false_id
GROUP BY state.msg, step.f_id
And here what I get:
msg Trues Falses
CASE_1 1 2
CASE_2 1 2
CASE_2 0 2
CASE_N 2 0
FOOOO 1 1
And here what i need:
msg Trues Falses
CASE_1 1 0
CASE_2 1 1
CASE_N 1 0
FOOOO 1 0
For Explanation:
I need to count how many trues and fails are per state_type and f_id combination.
There are 6 entries with the f_id = 5 -> (1,2,3,4,5,6). If there is a entry with the same (f_id,state_type) combination, only the last one should be counted. So for f_id 5 the entries 1,2,4,5 should not be taken into the count, as they are overwritten by 3 and 6.
So after processing the first 6 entries there should be CASE_1 true => 1 false => 0 and CASE_2 true => 1 false => 0
__ EDIT __
TABLE step:
(1, 5, 5, 7), -- do not count
(2, 5, 5, 7), -- do not count
(3, 5, 5, 6), -- this is the last entry with
-- (f_id,state_type) => (5,5) combination.
-- it overwrites the 2 previous ones => count CASE_1 true
(4, 5, 10, 12), -- do not count
(5, 5, 10, 12), -- do not count
(6, 5, 10, 11), -- count CASE_2 true
(7, 6, 10, 12), -- do not count
(8, 6, 10, 12), -- count CASE_2 false
(9, 7, 20, 21), -- do not count
(10, 7, 20, 21), -- count CASE_N false
(11, 7, 30, 32), -- do not count
(12,7, 30, 31); -- count FOOOO true
I'm not sure I fully understand your intent, but maybe the query below might be what you want? Given your sample data it seems to produce the right output.
SELECT state.msg,
SUM(CASE WHEN true_id = state_value THEN 1 ELSE 0 END) AS Trues,
SUM(CASE WHEN false_id = state_value THEN 1 ELSE 0 END) AS Falses
FROM state
JOIN step ON state.t_id = step.state_type
JOIN (SELECT MAX(id) mid FROM step GROUP BY f_id, state_type) a ON a.mid = step.id
GROUP BY state.msg;
Please give it a try. If I misunderstood I'll remove the answer.

Find missing sequences by category

I have to identify missing records from the example below.
Category BatchNo TransactionNo
+++++++++++++++++++++++++++++++++
CAT1 1 1
CAT1 1 2
CAT1 2 3
CAT1 2 4
CAT1 2 5
CAT1 3 6
CAT1 3 7
CAT1 3 8
CAT1 5 12
CAT1 5 13
CAT1 5 14
CAT1 5 15
CAT1 7 18
CAT2 1 1
CAT2 1 2
CAT2 3 6
CAT2 3 7
CAT2 3 8
CAT2 3 9
CAT2 4 10
CAT2 4 11
CAT2 4 12
CAT2 6 14
I need a script that will identify missing records as below
Category BatchNo
+++++++++++++++++++
CAT1 4
CAT1 6
CAT2 2
CAT2 5
I do not need to know that CAT1 8 and CAT2 7 are not there as they potentially have not been inserted yet.
You can create temporary result set with all possible batch no up to max batch number for each category than select batch no which are not available.
create table TEMP(
Category varchar(10),
BatchNo int,
TransactionNo int
)
insert into TEMP values
('CAT1', 1, 1),
('CAT1', 1, 2),
('CAT1', 2, 3),
('CAT1', 2, 4),
('CAT1', 2, 5),
('CAT1', 3, 6),
('CAT1', 3, 7),
('CAT1', 3, 8),
('CAT1', 5, 9),
('CAT1', 7, 10),
('CAT2', 1, 1),
('CAT2', 1, 2),
('CAT2', 3, 3),
('CAT2', 4, 4),
('CAT2', 4, 5),
('CAT2', 4, 6),
('CAT2', 6, 7);
WITH BatchNo (BatchID,Category,MaxBatch) AS (
SELECT 1, Category, MAX(BatchNo) AS MaxBatch FROM TEMP GROUP BY Category
UNION ALL
SELECT BatchID + 1, Category, MaxBatch FROM BatchNo
WHERE BatchID < MaxBatch
)
SELECT
BatchNo.Category,
BatchNo.BatchID
FROM
BatchNo
WHERE
BatchID NOT IN (SELECT BatchNo FROM TEMP WHERE Category = BatchNo.Category)
ORDER BY
BatchNo.Category,
BatchNo.BatchID
DROP TABLE TEMP
This one uses a Tally Table. For reference: http://www.sqlservercentral.com/articles/T-SQL/62867/
SAMPLE DATA
create table MyTable(
Category varchar(10),
BatchNo int,
TransactionNo int
)
insert into MyTable values
('CAT1', 1, 1),
('CAT1', 1, 2),
('CAT1', 2, 3),
('CAT1', 2, 4),
('CAT1', 2, 5),
('CAT1', 3, 6),
('CAT1', 3, 7),
('CAT1', 3, 8),
('CAT1', 5, 12),
('CAT1', 5, 13),
('CAT1', 5, 14),
('CAT1', 5, 15),
('CAT1', 7, 18),
('CAT2', 1, 1),
('CAT2', 1, 2),
('CAT2', 3, 6),
('CAT2', 3, 7),
('CAT2', 3, 8),
('CAT2', 3, 9),
('CAT2', 4, 10),
('CAT2', 4, 11),
('CAT2', 4, 12),
('CAT2', 6, 14);
SOLUTION
with e1(n) as (
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1
), --10e+1 or 10 rows
e2(n) as (select 1 from e1 a, e1 b), --10e+2 or 100 rows
e4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows
tally(n) as(
select
top (select top 1 BatchNo from MyTable order by BatchNo desc)
row_number() over(order by (select null))
from e4
)
select
c.Category,
t.n
from tally t
cross join(
select
Category,
max(BatchNo) as MaxBatchNo
from MyTable
group by Category
)c
left join MyTable m
on m.BatchNo = t.n
and m.Category = c.Category
where
m.Category is null
and t.n < c.MaxBatchNo
order by
c.Category,
t.n
It is better to create a projection table and use standard left join to find gaps:
declare #Sequencer table (
Id int primary key
);
insert into #Sequencer (Id)
select top (1000) row_number() over(order by (select null)) from master.dbo.spt_values;
select *
from #Sequencer s
inner join (
select Category, max(BatchNo) as [Size] from dbo.Table group by Category
) cat on cat.Size > s.Id
left join (
select distinct Category, BatchNo from dbo.Table
) t on t.Category = cat.Category and t.BatchNo = s.Id
where t.BatchNo is null;
Of course, in real life you might need more than 1000 rows, so adjust it accordingly.
WITH Numbers AS (
SELECT MAX(BatchNo) AS Number
FROM #MyTable
UNION ALL
SELECT Number - 1
FROM Numbers
WHERE Number > 1
)
,CategorySizes AS (
SELECT Category
,MIN(BatchNo) AS StartBatch
,MAX(BatchNo) AS EndBatch
FROM #MyTable
GROUP BY Category
)
,PossibleBatches AS (
SELECT Category
,Numbers.Number AS BatchNo
FROM CategorySizes
CROSS JOIN Numbers
WHERE Numbers.Number BETWEEN CategorySizes.StartBatch AND CategorySizes.EndBatch
)
,MissingBatches AS (
SELECT PossibleBatches.Category
,PossibleBatches.BatchNo
FROM PossibleBatches
LEFT JOIN #MyTable
ON #MyTable.Category = PossibleBatches.Category
AND #MyTable.BatchNo = PossibleBatches.BatchNo
WHERE #MyTable.BatchNo IS NULL
)
SELECT *
FROM MissingBatches
without use cycle or fetch you can use this one: (#Category is my eqvivalent of your table name). (Performance is perfect)
DECLARE #t TABLE (RN INT IDENTITY,Category VARCHAR(255), BatchNo INT)
INSERT INTO #t
SELECT DISTINCT Category, BatchNo
FROM #Category
SELECT a.Category,a.BatchNo+1 AS BatchNo
FROM #t a
CROSS APPLY (SELECT * FROM #t b
WHERE a.RN+1 = b.RN AND
a.Category = b.Category AND
a.BatchNo+1 != b.BatchNo) x
create table #cat(
Category varchar(10),
BatchNo int,
TransactionNo int
)
insert into #cat values
('CAT1', 1, 1),
('CAT1', 1, 2),
('CAT1', 2, 3),
('CAT1', 2, 4),
('CAT1', 2, 5),
('CAT1', 3, 6),
('CAT1', 3, 7),
('CAT1', 3, 8),
('CAT1', 5, 9),
('CAT1', 7, 10),
('CAT2', 1, 1),
('CAT2', 1, 2),
('CAT2', 3, 3),
('CAT2', 4, 4),
('CAT2', 4, 5),
('CAT2', 4, 6),
('CAT2', 6, 7);
SELECT DISTINCT C.Category, C.BatchNo + 1
FROM #cat c
OUTER APPLY
(
SELECT *
FROM #cat c1
WHERE C1.BatchNo = C.BatchNo + 1 AND C1.Category = C.Category
) C2
WHERE C2.BatchNo IS NULL
AND
C.BatchNo <> (SELECT MAX(BatchNo) FROM #cat C3 WHERE c3.Category = c.Category)

SQL: Find filled area [duplicate]

This question already has answers here:
Finding filled rectangles given x, y coordinates using SQL
(2 answers)
Closed 8 years ago.
I have this table:
CREATE TABLE coordinates (
x INTEGER NOT NULL,
y INTEGER NOT NULL,
color VARCHAR(1) NOT NULL,
PRIMARY KEY(x,y)
)
Here are some sample data:
INSERT INTO coordinates
(x, y, color)
VALUES
(0, 4, 'g'),
(1, 0, 'g'),
(1, 1, 'g'),
(1, 2, 'g'),
(1, 3, 'g'),
(0, 4, 'g'),
(1, 0, 'g'),
(1, 1, 'g'),
(1, 2, 'g'),
(1, 3, 'g'),
(1, 4, 'g'),
(2, 0, 'b'),
(2, 1, 'g'),
(2, 2, 'g'),
(2, 3, 'g'),
(2, 4, 'g'),
(4, 0, 'b'),
(4, 1, 'r'),
(4, 2, 'r'),
(4, 3, 'g'),
(4, 4, 'g'),
(6, 0, 'r'),
(6, 1, 'g'),
(6, 2, 'g'),
(6, 3, 'r'),
(6, 4, 'r')
;
I am trying to write a query that finds all the largest-area rectangles. This is assuming a rectangle is defined by its bottom left and top right, and that 1/4 is r, 1/4 is b, 1/4 is g, 1/4 is y.
So result should be something similarly like this:
x1 | y1 | x2 | y2 | area
-------------------------
0 1 6 9 58
1 2 4 7 58
Make a function that calculates the area and then query the function to get the biggest one.