I have a table with data like following, want to return those group_id with unique data. Both group_id 3 and 4 have two component 123 and 456, so they are "duplicated", we just need to return the smaller group_id, that's 3. Also group_id 5 doesn't have a duplication, it can be returned. So we want group_id 3 and 5 to be returned.
How can I write a SQL query against postgres database to achieve that? Thank you!
id
group_id
component_id
1
3
123
2
3
456
3
4
123
4
4
456
5
5
123
Use 2 levels of aggregation:
SELECT MIN(group_id) group_id
FROM (
SELECT group_id, STRING_AGG(component_id::text, ',' ORDER BY component_id) components
FROM tablename
GROUP BY group_id
) t
GROUP BY components;
See the demo.
SELECT group_id, MIN(component_id)
FROM MyTable
GROUP BY group_id
HAVING COUNT(*) > 1
Here's a method to assign the group_id's to the component_id's.
It uses a recursive CTE with arrays to find the possible combinations.
The recursion starts from the lonely group_id's.
Then the next CTE picks one of the longest combinations.
WITH RECURSIVE RCTE AS (
SELECT id, group_id, component_id
, 1 as Lvl
, array[group_id] as group_ids
, array[component_id] as component_ids
FROM YourTable
WHERE group_id IN (
SELECT group_id
FROM YourTable
GROUP BY group_id
HAVING COUNT(*) = 1
)
UNION ALL
SELECT t.id, t.group_id, t.component_id
, Lvl+1
, cte.group_ids || t.group_id
, cte.component_ids || t.component_id
FROM RCTE cte
JOIN YourTable t
ON t.group_id != ALL(group_ids)
AND t.component_id != ALL(component_ids)
)
, CTE_ARRAYS AS (
SELECT group_ids, component_ids
FROM RCTE
ORDER BY array_length(group_ids, 1) desc, Lvl desc
LIMIT 1
)
SELECT a.group_id, a.component_id
FROM CTE_ARRAYS c
CROSS JOIN LATERAL UNNEST(c.group_ids, c.component_ids) WITH ORDINALITY AS a(group_id, component_id)
ORDER BY a.group_id;
group_id
component_id
3
456
5
123
db<>fiddle here
Related
I have this select:
"Select * from table" that return:
Id
Value
1
1
1
1
2
10
2
10
My goal is create a sum from each Value group by id like this:
Id
Value
Sum
1
1
2
1
1
2
2
10
20
2
10
20
I Have tried ways like:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
But the is not grouping by id.
Id
Value
Sum
1
1
1
1
1
1
2
10
10
2
10
10
Aggregation aggregates rows, reducing the number of records in the output. In this case you want to apply the result of a computation to each of your records, task carried out by the corresponding window function.
SELECT table.*, SUM(Value) OVER(PARTITION BY Id) AS sum_
FROM table
Check the demo here.
Your attempt looks correct.
Can you try the below query :
It works for me :
SELECT Id, Value,
(SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY ID) as sum
FROM Table v;
You can do it using inner join to join with selection grouped by id :
select t.*, sum
from _table t
inner join (
select id, sum(Value) as sum
from _table
group by id
) as s on s.id = t.id
You can check it here
Your select is ok if you adjust it just a little:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
GROUP BY IDRNC is a mistake and should be GROUP BY ID
you should give an alias to a sum column ...
subquery selecting the sum does not have to have self table alias to be compared with outer query that has one (this is not a mistake - works either way)
Test:
WITH
a_table (ID, VALUE) AS
(
Select 1, 1 From Dual Union All
Select 1, 1 From Dual Union All
Select 2, 10 From Dual Union All
Select 2, 10 From Dual
)
SELECT ID, VALUE, (SELECT SUM(VALUE) FROM a_table WHERE ID = v.ID GROUP BY ID) "ID_SUM" FROM a_table v;
ID VALUE ID_SUM
---------- ---------- ----------
1 1 2
1 1 2
2 10 20
2 10 20
user_id product_id category_id date_added date_update
1 2 1 2.3.2021 null
1 3 1 2.3.2020 2.4.2023
1 4 2 2.3.2020 null
1 5 2 2.3.2020 2.4.2023
2 5 2 2.3.2020 2.4.2023
2 4 1 2.3.2020 null
List the most up-to-date product of each category
You can use row_number()
select * from
(
select *,row_number() over(parition by userid,category_id order by date_update) as rn
from tablename
)A where rn=1
OR you can also use distinct on
select distinct on (user_id,category_id) *
FROM tablename
ORDER BY user_id,category_id, date_update
List the most up-to-date product of each category
You can use distinct on. Let me assume that if the update date is null, then you want the creation date:
select distinct on (category_id) t.*
from t
order by category_id, coalesce(date_update, date_added) desc;
If you wanted this per user/category combination, the logic would be:
select distinct on (user_id, category_id) t.*
from t
order by user_id, category_id, coalesce(date_update, date_added) desc;
Using Window function
select u_id,c_id, p_id, coalesce (date_update, date_added) as date ,
rank () over (partition by u_id, c_id order by coalesce (date_update, date_added) desc) as r
from inventory
) t where r = 1
I have the data as below, When I apply dense_rank by ordering id column, I am getting rank according to the order of integers but I need to rank as the records are displayed when run a query:
Data from query:
Rid id
8100 161
8101 2
8102 2
8103 2
8104 156
When I apply dense_rank over order by id then I am getting
Rid id rank
8100 161 3
8101 2 1
8102 2 1
8103 2 1
8104 156 2
But my requirement is to get in below way:
Rid id rank
8100 161 1
8101 2 2
8102 2 2
8103 2 2
8104 156 3
Used row_number as well but the result is not as expected, not sure what option would be the better way.
Any help is appreciated.
Thanks
Edit------------------------------
Query used
Select rid, id,dense_rank() over (order by id) row_num
from table
I have adjusted solution from here: DENSE_RANK according to particular order for your need.
I am not sure if I should mark this as duplicate because on this link above there is no ORACLE tag. If more experience members think I should please do comment and I will do so and delete this answer.
Here is the adjusted code and demo:
SELECT t2.rid
, t2.id
, DENSE_RANK() OVER (ORDER BY t2.max_rid)
FROM (
SELECT MAX(t1.rid) OVER (PARTITION BY t1.grupa) AS max_rid
, t1.rid
, t1.id
FROM (
SELECT rid
, id
,ROW_NUMBER() OVER (ORDER BY rid) - ROW_NUMBER() OVER (PARTITION BY id ORDER BY rid) AS grupa
FROM test_table) t1 ) t2
ORDER BY rid
DEMO
You can use sum() aggregation containing (order by rid) after getting the values from lag() analytic function within the first query
with tab( rid,id ) as
(
select 8100,161 from dual union all
select 8101,2 from dual union all
select 8102,2 from dual union all
select 8103,2 from dual union all
select 8104,156 from dual
), t2 as
(
select t.*, lag(id,1,0) over (order by rid) lg
from tab t
)
select rid, id, sum(case when lg!=id then 1 else 0 end) over (order by rid) as row_num
from t2
Demo
My table
id name num
1 a 3
2 b 4
I need to return every row num number of times. I do it this way.
select DB.BAN_KEY as BAN_KEY, DB.CUST_FULLNAME as CUST_FULLNAME
from TST_DIM_BAN_SELECTED DB
inner join (select rownum rn from dual connect by level < 10) a
on a.rn <= DB.N
There resulting table looks like this.
id name
1 a
1 a
1 a
2 b
2 b
2 b
2 b
But I also need every row in the group to be numbered like this.
id name row_num
1 a 1
1 a 2
1 a 3
2 b 1
2 b 2
2 b 3
2 b 4
How can I do it?
You don't need an inner join to a dummy table or an analytic function to generate the row numbers; you could just use connect by (and its corresponding level function) on the table itself, like so:
WITH tst_dim_ban_selected AS (SELECT 1 ban_key, 'a' cust_fullname, 3 n FROM dual UNION ALL
SELECT 2 ban_key, 'b' cust_fullname, 4 n FROM dual)
-- end of mimicking your table with data in it. See SQL below
SELECT db.ban_key,
db.cust_fullname,
LEVEL row_num
FROM tst_dim_ban_selected db
CONNECT BY LEVEL <= db.n
AND PRIOR db.ban_key = db.ban_key -- assuming this is the primary key
AND PRIOR sys_guid() IS NOT NULL;
BAN_KEY CUST_FULLNAME ROW_NUM
---------- ------------- ----------
1 a 1
1 a 2
1 a 3
2 b 1
2 b 2
2 b 3
2 b 4
If you have other columns than ban_key in the table's primary key, you need to make sure they are included in the connect by clause's list of prior <column> = <column>s. This is so the connect by can identify each row uniquely, meaning that it's looping just over that row and no others. The PRIOR sys_guid() IS NOT NULL is required to prevent connect by loops from occurring.
You can use analytic function for this:
Select id, name,
row_number() over (partition by id, name order by id, name)
From(/* your query */) t;
This can be done without subquery:
Select id, name,
row_number() over (partition by id, name order by id, name)
From /* joins */
You could use this:
SELECT db.ban_key AS ban_key, db.cust_fullname AS cust_fullname,
ROW_NUMBER() OVER (PARTITION BY db.n ORDER BY db.ban_key) AS row_num
FROM tst_dim_ban_selected db
INNER JOIN (SELECT rownum rn FROM dual CONNECT BY level < 10) a
ON a.rn <= db.n;
Use a recursive sub-query factoring clause:
WITH split ( id, name, rn, n ) AS (
SELECT BAN_KEY, CUST_FULLNAME, 1, N
FROM TST_DIM_BAN_SELECTED
UNION ALL
SELECT id, name, rn + 1, n
FROM split
WHERE rn < n
)
SELECT id, name, rn
FROM split;
I have a scenario to get the respective field value of "Max" and "Min" records
Please find the sample data below
-----------------------------------------------------------------------
ID Label ProcessedDate
-----------------------------------------------------------------------
1 Label1 11/01/2016
2 Label2 11/02/2016
3 Label3 11/03/2016
4 Label4 11/04/2016
5 Label5 11/05/2016
I have the "ID" field populated in another table as a foreign key. While querying those records in that table based on the "ID" field I need to get the "Label" field of "Max" Processed date and "Min" processed date.
-----------------------------------------------------------------------
ID LabelID GroupingField
-----------------------------------------------------------------------
1 1 101
2 2 101
3 3 101
4 4 101
5 5 101
6 1 102
7 2 102
8 3 102
9 4 102
And the final result set I expect it to look something like this.
-----------------------------------------------------------------------
GroupingField FirstProcessed LastProcessed
-----------------------------------------------------------------------
101 Label1 Label5
102 Label1 Label4
I have 'almost' managed to get this above result using rank function but still not satisfied with it. So I am looking if someone can provide me with a better option.
Thanks,
Prakazz
CREATE TABLE #Details (ID INT,LabelID INT,GroupingField INT)
CREATE TABLE #Details1 (ID INT,Label VARCHAR(100),ProcessedDate VARCHAR(100))
INSERT INTO #Details1 (ID ,Label ,ProcessedDate )
SELECT 1,'Label1','11/01/2016' UNION ALL
SELECT 2,'Label2','11/02/2016' UNION ALL
SELECT 3,'Label3','11/03/2016' UNION ALL
SELECT 4,'Label4','11/04/2016' UNION ALL
SELECT 5,'Label5','11/05/2016'
INSERT INTO #Details (ID ,LabelID ,GroupingField )
SELECT 1,1,101 UNION ALL
SELECT 2,2,101 UNION ALL
SELECT 3,3,101 UNION ALL
SELECT 4,4,101 UNION ALL
SELECT 5,5,101 UNION ALL
SELECT 6,1,102 UNION ALL
SELECT 7,2,102 UNION ALL
SELECT 8,3,102 UNION ALL
SELECT 9,4,102
;WITH CTE (GroupingField , MAXId ,MinId) AS
(
SELECT GroupingField,MAX(LabelID) MAXId,MIN(LabelID) MinId
FROM #Details
GROUP BY GroupingField
)
SELECT GroupingField ,B.Label FirstProcessed, A.Label LastProcessed
FROM CTE
JOIN #Details1 A ON MAXId = A.ID
JOIN #Details1 B ON MinId = B.ID
You can use SQL Row_Number() function using Partition By as follows with a combination of Group By
;with cte as (
select
t.Label, t.ProcessedDate,
g.GroupingField,
ROW_NUMBER() over (partition by GroupingField Order By ProcessedDate ASC) minD,
ROW_NUMBER() over (partition by GroupingField Order By ProcessedDate DESC) maxD
from tbl t
inner join GroupingFieldTbl g
on t.ID = g.LabelID
)
select GroupingField, max(FirstProcessed) FirstProcessed, max(LastProcessed) LastProcessed
from (
select
GroupingField,
FirstProcessed = CASE when minD = 1 then Label else null end,
LastProcessed = CASE when maxD = 1 then Label else null end
from cte
where
minD = 1 or maxD = 1
) t
group by GroupingField
order by GroupingField
I also used CTE expression to make coding easier and understandable
Output is as