SQL Server group by set of results - sql

I have a table with data that look like this:
product_id | filter_id
__________________
4525 5066
4525 5068
4525 5091
4526 5066
4526 5068
4526 5094
4527 5066
4527 5068
4527 5094
4528 5066
4528 5071
4528 5078
which is actualy groups of three filters for each product e.g. product 4525 has the filters 5066,5068 and 5091.
The second and third group, is the exact same set of filters (5066,5068 and 5094) bound to a different product ( 4526 and 4527 ).
I want to have each unique filter set only one time ( in other words, I want to remove the duplicate sets of filter_ids ). I don't really care what will happen to the product_id, I only want my unique sets of three filter_ids to be grouped with a key.
For example this will also do:
new_id | filter_id
__________________
1 5066
1 5068
1 5091
2 5066
2 5068
2 5094
3 5066
3 5071
3 5078
I hope I explained it well enough.
Thank you.

Please try below query, which is a bit longer than I expected. Not getting any other logic as of now !!!
select
distinct filter_id,
DENSE_RANK() over(order by sc) new_id
from(
select *,
(SELECT ' ' + cast(filter_id as nvarchar(10))
FROM tbl b where b.product_id=a.product_id order by filter_id
FOR XML PATH('')) SC
From tbl a
)x
order by new_id
/-------------- Other Way ------------------/
SELECT
DENSE_RANK() OVER (ORDER BY PRODUCT_ID) new_id,
filter_id
FROM
Table1
WHERE product_id in (
SELECT MIN(product_id) FROM(
SELECT
product_id,
SUM(filter_id*RN) OVER (PARTITION BY PRODUCT_ID) SM
FROM(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY filter_id) RN
FROM Table1
)x
)xx GROUP BY SM)

Select dense_rank()
over(order by product_id asc),filter_id
from table

If I understand well the question the expected result only have the filter_id of the product 4525, 4526 and 4528 because 4526 and 4527 have the same filter_id, so only one of those is needed, in that case this query will do:
SELECT product_id
, dense_rank() OVER (ORDER BY PRODUCT_ID) new_id
, filter_id
FROM table1 c
WHERE NOT EXISTS (SELECT 1
FROM table1 a
LEFT JOIN table1 b ON a.product_id < b.product_id
WHERE b.product_id = c.product_id
GROUP BY a.product_id, b.product_id
HAVING COUNT(DISTINCT a.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END));
SQLFiddle demo
To get the result the first step is to remove the products with a full duplicate list of filter_ID. To get those product the subquery check every product couple to see if the number of filter_id in one is equal to the filter_id shared by the couple.
If you can have product with different number of filters and if a product with a list of filter fully contained in the filter list of another product should be removed from the result, for example if with the base data
product_id | filter_id
-----------+----------
4525 | 5066
4525 | 5068
4525 | 5091
4526 | 5066
4526 | 5068
the expected result is
new_id | filter_id
-------+----------
1 | 5066
1 | 5068
1 | 5091
the query need to be changed to
SELECT product_id
, dense_rank() OVER (ORDER BY PRODUCT_ID) new_id
, filter_id
FROM table1 c
WHERE NOT EXISTS (SELECT b.product_id
FROM table1 a
LEFT JOIN table1 b ON a.product_id < b.product_id
WHERE b.product_id IS NOT NULL
AND b.product_id = c.product_id
GROUP BY a.product_id, b.product_id
HAVING COUNT(DISTINCT a.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END)
OR COUNT(DISTINCT b.filter_id)
= COUNT(CASE WHEN a.filter_id = b.filter_id THEN 1
ELSE NULL
END));
SQLFiddle Demo
I came out with a query quite similar to the second one of TechDo, nine hour after after him. Even if the result is similar, as the idea is different, my idea is to concat the values of filter_id with math
;WITH B AS (
SELECT Product_ID
, filter_id = filter_id - MIN(filter_id) OVER (PARTITION BY NULL)
, _ID = Row_Number() OVER (PARTITION BY Product_ID ORDER BY filter_id) - 1
, N = CEILING(LOG10(MAX(filter_id) OVER (PARTITION BY NULL)
- MIN(filter_id) OVER (PARTITION BY NULL)))
FROM table1 a
), G1 AS (
SELECT Product_ID
, _ID = SUM(Filter_ID * POWER(10, N * _ID))
FROM B
GROUP BY Product_ID
), G2 AS (
SELECT Product_ID = MIN(Product_ID)
FROM G1
GROUP BY _ID
)
SELECT g2.product_id
, dense_rank() OVER (ORDER BY g2.PRODUCT_ID) new_id
, a.filter_id
FROM G2
INNER JOIN table1 a ON g2.product_id = a.product_id;
SQLFiddle demo
The first CTE do a lot of work:
filter_id is reduced in rank (the reduction from 0 to n-1 digits, depending on the range of the data)
is generated a order number for the filter within the product (_ID)
is calculated the max number of digits of the reduced filter_id (N)
In the following CTE those values are used to generate the filter concatenation using the SUM, the formula SUM(Filter_ID * POWER(10, N * _ID)) put a reduced filter_id every N position, for example with the data provided by the OP we have that the max difference of filter_id is 28, so N is 2 and the results are (the points are added for readability)
Product_ID _ID
----------- -----------
4525 25.02.00
4526 28.02.00
4527 28.02.00
4528 12.05.00
The formula used make collision between different filter group impossible, but need a larger space to be calculated, if the range of the filter_id is big it can hit the limit if the integer.

Related

Find the percentage of a group by count row in sql

I have a table as
Person| Count
A | 10
B | 20
C | 30
I use code as below to get above table:
select new_table.person, count(new_table.person)
from (person_table_1
inner join person_table_2
on person_table_1.user_name = person_table_2.user_all_name) new_table
group by new_table.person
However, I wish to have the percentage for each row based on overall sum in count.
Expected:
Person| Count | Percentage
A | 10 | 0.167
B | 20 | 0.333
C | 30 | 0.500
I wish it to be in 3 decimal places. Can anyome please help me. thank you.
Just do an inner query in SELECT clause
select p1.person, count(p1.person), count(p1.person) / (SELECT COUNT(p2.person) FROM person_table p2)
from person_table p1
group by p1.person
Edit: if you want only up to 3 decimal:
select
p1.person,
count(p1.person),
ROUND(count(p1.person) / (SELECT COUNT(p2.person) FROM person_table p2), 3)
from person_table p1
group by p1.person
Edit 2: OP edited his/her table
select
new_table.person,
count(new_table.person),
ROUND(
count(new_table.person) /
SELECT COUNT(new_table_COUNTER.person) FROM (
person_table_1
inner join person_table_2
on person_table_1.user_name = person_table_2.user_all_name
) new_table_COUNTER )
from
(
person_table_1
inner join person_table_2
on person_table_1.user_name = person_table_2.user_all_name
) new_table
group by new_table.person
Try below query:
declare #tbl table ([person] varchar(5));
insert into #tbl values
('a'),('a'),('a'),('b'),('b'),('c');
-- here we tabke max(rowsCnt), but we wwant any value, because every value is the same in that column
select person, count(*) * 1.0 / max(rowsCnt) [percentage] from (
select person,
count(*) over (partition by (select null)) rowsCnt
from #tbl
) a group by person

Why count ignores grouping by

I don't understand why my query doesn't group results of count by the column I specified. Instead it counts all occurrences of outcome_id in the 'un' subtable.
What am I missing there?
The full structure of my sample database and the query I tried are here:
https://www.db-fiddle.com/f/4HuLpTFWaE2yBSQSzf3dX4/4
CREATE TABLE combination (
combination_id integer,
ticket_id integer,
outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer,
ticket_id integer,
val double precision
);
insert into combination
values
(510,188,'{52,70,10}'),
(511,188,'{52,56,70,18,10}'),
(512,188,'{55,70,18,10}'),
(513,188,'{54,71,18,10}'),
(514,189,'{52,54,71,18,10}'),
(515,189,'{55,71,18,10,54,56}')
;
insert into outcome
values
(52,188,1.3),
(70,188,2.1),
(18,188,2.6),
(56,188,2),
(55,188,1.1),
(54,188,2.2),
(71,188,3),
(10,188,0.5),
(54,189,2.2),
(71,189,3),
(18,189,2.6),
(55,189,2)
with un AS (
SELECT combination_id, unnest(outcomes) outcome
FROM combination c JOIN
outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN
outcome o
on o.outcome_id = un.outcome
GROUP BY 1
) x
GROUP BY 1, 2
ORDER BY 1
Expected result should be:
510 2
511 4
512 2
513 3
514 4
515 4
Assuming, you have these PK constraints:
CREATE TABLE combination (
combination_id integer PRIMARY KEY
, ticket_id integer
, outcomes integer[]
);
CREATE TABLE outcome (
outcome_id integer
, ticket_id integer
, val double precision
, PRIMARY KEY (ticket_id, outcome_id)
);
and assuming this objective:
For each row in table combination, count the number of array elements in outcomes for which there is at least one row with matching outcome_id and ticket_id in table outcome - and val >= 1.3.
Assuming above PK, this burns down to a much simpler query:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
JOIN outcome o USING (ticket_id)
WHERE o.outcome_id = ANY (c.outcomes)
AND o.val >= 1.3
GROUP BY 1
ORDER BY 1;
This alternative might be faster with index support:
SELECT c.combination_id, count(*) AS cnt
FROM combination c
CROSS JOIN LATERAL unnest(c.outcomes) AS u(outcome_id)
WHERE EXISTS (
SELECT
FROM outcome o
WHERE o.outcome_id = u.outcome_id
AND o.val >= 1.3
AND o.ticket_id = c.ticket_id -- ??
)
GROUP BY 1
ORDER BY 1;
Plus, it does not require the PK on outcome. Any number of matching rows still count as 1, due to EXISTS.
db<>fiddle here
As always, the best answer depends on the exact definition of setup and requirements.
A simpler version of #forpas answer:
-- You don't need to join to outcomes in the "with" statement.
with un AS (
SELECT combination_id, ticket_id, unnest(outcomes) outcome
FROM combination c
-- no need to join to outcomes here
GROUP BY 1,2,3
)
SELECT combination_id, cnt FROM
(
SELECT un.combination_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un
JOIN outcome o on o.outcome_id = un.outcome
and o.ticket_id = un.ticket_id
GROUP BY 1
)x
GROUP BY 1,2
ORDER BY 1
As others have pointed out, the expected result for 514 should be 3 based on your input data.
I'd also like to suggest that using full field names in the group by and order by clauses makes queries easier to debug and maintain going forward.
You need to join on ticket_id also:
with un AS (
SELECT c.combination_id, c.ticket_id, unnest(c.outcomes) outcome
FROM combination c JOIN outcome o
on o.ticket_id = c.ticket_id
GROUP BY 1,2,3
)
SELECT combination_id, cnt
FROM (SELECT un.combination_id, un.ticket_id,
COUNT(CASE WHEN o.val >= 1.3 THEN 1 END) as cnt
FROM un JOIN outcome o
on o.outcome_id = un.outcome and o.ticket_id = un.ticket_id
GROUP BY 1,2
) x
GROUP BY 1, 2
ORDER BY 1
See the demo.
Results:
> combination_id | cnt
> -------------: | --:
> 510 | 2
> 511 | 4
> 512 | 2
> 513 | 3
> 514 | 3
> 515 | 4

how to get value using latest date from one table and joining to another table

i have 1 table inventory_movement here is data in table
product_id | staff_name | status | sum | reference_number
--------------------------------------------------
1 zes cp 1 000122
2 shan cp 4 000133
i have another table inventory_orderproduct where i have cost date
orderdate product_id cost
--------------------------------
01/11/2018 1 3200
01/11/2018 2 100
02/11/2018 1 4000
02/11/2018 1 500
03/11/2018 2 2000
i want this result
product_id| staff_name | status | sum reference_number | cost
--------------------------------------------------------------
1 zes cp 1 000122 4000
2 shan cp 4 000133 2000
here is my query
select ipm.product_id,
case when ipm.order_by_id is not null then
(select au.first_name from users_staffuser us inner join auth_user au on us.user_id= au.id
where us.id = ipm.order_by_id) else '0' end as "Staff_name"
,ipm.status,
Sum(ipm.quantity), ip.reference_number
from inventory_productmovement ipm
inner join inventory_product ip on ipm.product_id = ip.id
inner join users_staffuser us on ip.branch_id = us.branch_id
inner join auth_user au on us.user_id = au.id
AND ipm.status = 'CP'
group by ipm.product_id, au.first_name, ipm.status,
ip.reference_number, ip.product_name
order by 1
Here is the solution of your question.its working fine.if you like the answer please vote!
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM (SELECT product_id,MAX(cost) AS Cost
FROM inventory_orderproduct
GROUP BY product_id ) s
JOIN inventory_movement i ON i.product_id =s.product_id
In the given situation, this should work fine:
Select table1.product_id, table2.staff_name, table2.status, table2.reference_number,
MAX(table1.cost)
FROM table2
LEFT JOIN table1 ON table1.product_id = table2.product_id
GROUP BY table2.product_id, table2.staff_name, table2.status, table2.reference_number
You can use the below query to get MAX cost for products
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.MAXCost
FROM (SELECT product_id,MAX(cost) AS MAXCost
FROM inventory_orderproduct
GROUP BY product_id ) s
JOIN inventory_movement i ON i.product_id =s.product_id
For Retrieving the cost using the latest date use the below query
WITH cte as (
SELECT product_id,cost
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY orderdate DESC) AS Rno
FROM inventory_orderproduct )
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM cte s
JOIN inventory_movement i ON i.product_id =s.product_id
WHERE s.Rno=1
You can use below query it will pick the data according to the latest date
WITH result as (
SELECT product_id,cost
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date DESC)
FROM inventory_orderproduct )
SELECT i.product_id,i.staff_name,i.status,i.sum reference_number ,s.Cost
FROM result s
JOIN inventory_movement i ON i.product_id =s.product_id

How to select all records of n groups?

I want to select the records of the top n groups. My data looks like this:
Table 'runner':
id gid status rtime
---------------------------
100 5550 1 2016-08-19
200 5550 2 2016-08-22
300 5550 1 2016-08-30
100 6050 3 2016-09-01
200 6050 1 2016-09-02
100 6250 1 2016-09-11
200 6250 1 2016-09-15
300 6250 3 2016-09-19
Table 'static'
id description env
-------------------------------
100 something 1 somewhere 1
200 something 2 somewhere 2
300 something 3 somewhere 3
The unit id (id) is unique within the group but not unique in its column, because an instance of the group is generated regularly. The group id (gid) is assigned to every unit but will not generate on more than one instance.
Now, combining the tables and selecting everything or filter by a specific value is easy, but how do I select all records of, for example, the first two groups without directly refering to the group ids?
Expected result would be:
id gid description status rtime
--------------------------------------
300 6250 something 2 3 2016-09-19
200 6250 something 1 1 2016-09-15
100 6250 something 3 1 2016-09-11
200 6050 something 2 1 2016-09-02
100 6050 something 1 3 2016-09-01
Extra Question: When I filter for a timeframe like this:
[...]
WHERE runner.rtime BETWEEN '2016-08-25' AND '2016-09-16'
Is there a simple way of ensuring, that groups are not cut off but either appear with all their records or not at all?
You can use a ROW_NUMBER() to do this. First, create a query to rank groups:
SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid
Then use this as a derived table to get your other info, and use a where clause to filter to the number of groups you want to see. For instance, the below would return the top 5 groups RN <= 5:
SELECT id, R.gid, description, status, rtime
FROM (SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid) G
INNER JOIN Runner R on R.gid = G.gid
INNER JOIN Statis S on S.id = R.id
WHERE RN <= 5 --Change this to see more or less groups
For your second question about dates, you can do this with a subquery like so:
SELECT *
FROM Runner
WHERE gid IN (SELECT gid
FROM Runner
WHERE rtime BETWEEN '2016-08-25' AND '2016-09-16')
Hmmm. I suspect this might do what you want:
select top (1) with ties r.*
from runner r
order by min(rtime) over (partition by gid), gid;
At least, this will get the complete first group.
In any case, the idea is to include gid as a key in the order by and to use top with ties.
you can do the following
with report as(
select n.id,n.gid,m.description,n.status,n.rtime, dense_rank() over(order by gid desc) as RowNum
from #table1 n
inner join #table2 m on n.id = m.id )
select id,gid,description,status,rtime
from report
where RowNum<=2 -- <-- here n=2
order by gid desc,rtime desc
here a working demo
DENSE_RANK looks like a ideal solution here
Select * From
(
select DENSE_RANK() over (order by gid desc) as D_RN, r.*
from runner r
) A
Where D_RN = 1
No need to use ranking functions (ROW_NUMBER, DENSE_RANK etc).
SELECT r.id, gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
WHERE gid IN (
SELECT TOP 2 gid FROM runner GROUP BY gid ORDER BY gid DESC
)
ORDER BY rtime DESC;
The same using CTE:
WITH grouped
AS
(
SELECT TOP 2 gid
FROM runner GROUP BY gid ORDER BY gid DESC
)
SELECT r.id, grouped.gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
INNER JOIN grouped ON r.gid = grouped.gid
ORDER BY rtime DESC;

How can I select unique values from several columns in Oracle SQL?

Basically, I've got the following table:
ID | Amount
AA | 10
AA | 20
BB | 30
BB | 40
CC | 10
CC | 50
DD | 20
DD | 60
EE | 30
EE | 70
I need to get unique entries in each column as in following example:
ID | Amount
AA | 10
BB | 30
CC | 50
DD | 60
EE | 70
So far following snippet gives almost what I wanted, but first_value() may return some value, which isn't unique in current column:
first_value(Amount) over (partition by ID)
Distinct also isn't helpful, as it returns unique rows, not its values
EDIT:
Selection order doesn't matter
This works for me, even with the problematic combinations mentioned by Dimitri. I don't know how fast that is for larger volumes though
with ids as (
select id, row_number() over (order by id) as rn
from data
group by id
), amounts as (
select amount, row_number() over (order by amount) as rn
from data
group by amount
)
select i.id, a.amount
from ids i
join amounts a on i.rn = a.rn;
SQLFiddle currently doesn't work for me, here is my test script:
create table data (id varchar(10), amount integer);
insert into data values ('AA',10);
insert into data values ('AA',20);
insert into data values ('BB',30);
insert into data values ('BB',40);
insert into data values ('CC',10);
insert into data values ('CC',50);
insert into data values ('DD',20);
insert into data values ('DD',60);
insert into data values ('EE',30);
insert into data values ('EE',70);
Output:
id | amount
---+-------
AA | 10
BB | 20
CC | 30
DD | 40
EE | 50
I suggest using row_number() like this:
select ID ,Amount
from (
select ID ,Amount, row_number() over(partition by id order by 1) as rn
from yourtable
)
where rn = 1
However your expected results don't conform to a discrenable order, some are the first/lowest while some the last/highest so I wasn't sure what to include for the ordering.
My solution implements recursive with and makes following: first - select minival values of ID and amount, then for every next level searches values of ID and amount, which are more than already choosed (this provides uniqueness), and at the end query selects 1 row for every value of recursion level. But this is not an ultimate solution, because it is possible to find a combination of source data, where query will not work (I suppose, that such solution is impossible, at least in SQL).
with r (id, amount, lvl) as (select min(id), min(amount), 1
from t
union all
select t.id, t.amount, r.lvl + 1
from t, r
where t.id > r.id and t.amount > r.amount)
select lvl, min(id), min(amount)
from r
group by lvl
order by lvl
SQL Fiddle
I knew that there is an elegant solution! Thanks to friend of mine for a tip:
select max(ID), mAmount from (
select ID, max(Amount) mAmount from table group by ID
)
group by mAmount;
Maybe something like this can solve:
WITH tx AS
( SELECT ROWNUM ROW_NUMBER,
t.id,
t.amount
FROM test t
INNER JOIN test t2
ON t.id = t2.id
AND t.amount != t2.amount
ORDER BY t.id)
SELECT tx1.id, tx1.amount
FROM tx tx1
LEFT JOIN tx tx2
ON tx1.id = tx2.id
AND tx1.ROW_NUMBER > tx2.ROW_NUMBER
WHERE tx2.ROW_NUMBER IS NULL