Grouping item recursively in sql

Grouping item recursively in sql - sql

I have this table (test.mytable in the sql script below)
CREATE OR REPLACE test.mytable (item STRING(1), I_groupe STRING(1));
INSERT INTO test.mytable (item, I_groupe)
values
('A', '1'),
('B', '1'),
('B', '2'),
('C', '2'),
('D', '3'),
item
Intermediate_group
A
1
B
1
B
2
C
2
D
3
My purpose is to group the item together. My expected result is :
item
Final_group
A,B,C
1
D
2
I would like to group the item A and B because they have at least one Intermediate_group in common (Intermediate_group 1). Then I would like to group A,B with C because there is an Intermediate_group in common (Intermediate_group 2). Item D has no intermediate group in common with other items. It is therefore alone in its final group.
I have this code:
WITH TEMP1 AS (
SELECT *
FROM (
select item as item_1,
array_agg(distinct I_groupe) as I_groupe1
from test.mytable
group by item_1) AS AA
cross join
(select item as item_2,
array_agg(distinct I_groupe) as I_groupe2
from test.mytable
group by item_2
) AS BB
)
,
TEMP2 AS (
SELECT item_1, item_2,
ARRAY(SELECT * FROM TEMP.I_groupe1
INTERSECT DISTINCT
(SELECT * FROM TEMP.I_groupe2)
) AS result
FROM TEMP1
)
,
TEMP3 AS (
SELECT item_1, item_2, test
FROM TEMP2, unnest(result) as test
)
,
TEMP4 AS (
SELECT STRING_AGG(DISTINCT item_2) as item, STRING_AGG(CAST(test AS STRING)) as I_groupe
FROM TEMP3
GROUP BY item_1
)
,
TEMP5 AS (
SELECT item, I_groupe
FROM TEMP4, UNNEST(SPLIT(item)) as item, UNNEST(SPLIT(I_groupe)) as I_groupe
)
I repeat this code/process manually three times for this "toy" example and finish by a select distinct to get only one row by Final_group
SELECT DISTINCT *
FROM TEMP14
But in a real example it's not scalable. I would like to use a recursive function or a loop to automate this code.
Thanks in advance for your help

Related

Select 75% of records to rename, based on column sum

I have a scenario where I need to rename a value in one column, based on another column's total. Example table below with basic math, to express concept. I'd like to change the value in 'Condition' column to "Used" for the rows that make up 70% of the 'Revenue' column (which in this example would be 7 rows). The other 30% would be renamed to "New" (the remaining 3 rows). No other specific logic required.
I found that the approach mentioned here works for selecting the percentage of rows required
Select Rows who's Sum Value = 80% of the Total
I suppose I could create two temporary tables, rename the column fields in each respective table, and then join together. Curious if there is an easier way?
Current Table:
Source
Condition
Revenue
A
Old
1
B
New
1
C
Old
1
D
New
1
E
Old
1
F
New
1
G
Old
1
H
New
1
I
Old
1
J
New
1
New Table:
Source
Condition
Revenue
A
Used
1
B
Used
1
C
Used
1
D
Used
1
E
Used
1
F
Used
1
G
Used
1
H
New
1
I
New
1
J
New
1

You could do this with two updates. The first would update the entire table. The second would update the first 70%.
First we need sample data in a table. I used a table variable here but you would use your actual table.
declare #Something table
(
Source char(1)
, Condition varchar(10)
, Revenue int
)
insert #Something values
('A', 'Old', 1)
, ('B', 'New', 1)
, ('C', 'Old', 1)
, ('D', 'New', 1)
, ('E', 'Old', 1)
, ('F', 'New', 1)
, ('G', 'Old', 1)
, ('H', 'New', 1)
, ('I', 'Old', 1)
, ('J', 'New', 1)
select *
from #Something;
Next simply update the entire table.
update #Something
set Condition = 'New';
Last step is to update the first 70%. An easy to do this is to use a cte to select the first 70% and then update the cte.
with Top70 as
(
select top 70 percent *
from #Something
order by Source
)
update Top70
set Condition = 'Used';
Here is the final output.
select *
from #Something;
--EDIT--
Now understanding we need a running total you could do something like this.
select *
, case when sum(Revenue) over(order by Source) > (sum(Revenue) over() * .7) then 'New' else 'Old' end
from #Something

You can select/mark the 70% and 30% records using this query :
with cte as (
SELECT *, SUM(revenue) OVER(ORDER BY source) AS cumulative_revenue, SUM(revenue) OVER() as total
FROM mytable t
)
select Source, iif((cumulative_revenue + 0.0) /total <= 0.7, 'Used', 'New') as Condition, revenue, cumulative_revenue, (cumulative_revenue + 0.0) /total as perc
from cte
Demo here

You could chain a couple of CTEs to run the UPDATE
DROP TABLE IF EXISTS #t
CREATE TABLE #t([Source] VARCHAR(10), [Condition] VARCHAR(10), Revenue INT)
INSERT INTO #t([Source], [Condition], [Revenue])
values
('A', 'Old', 1)
,('B', 'New', 1)
,('C', 'Old', 1)
,('D', 'New', 1)
,('E', 'Old', 1)
,('F', 'New', 1)
,('G', 'Old', 1)
,('H', 'New', 1)
,('I', 'Old', 1)
,('J', 'New', 1)
;WITH cte AS (
SELECT *, SUM( Revenue) OVER (ORDER BY Source) ACC
FROM #t
), cte2 as(
SELECT MAX(acc)*1. TotalRevenue FROM cte
)
UPDATE cte
SET Condition = CASE WHEN Acc / TotalRevenue <= .7 THEN 'Used' ELSE 'New' END
FROM cte
CROSS APPLY (SELECT TotalRevenue FROM cte2) ca
SELECT * FROM #t

sql generate code based on three column values

I have three columns
suppose
row no column1 column2 column3
1 A B C
2 A B C
3 D E F
4 G H I
5 G H C
I want to generate code by combining these three column values
For Eg.
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
by checking combination of three columns
logic is that
if values of three columns are same then like first time it shows 'ABC001'
and 2nd time it shows 'ABC002'

You can try this:
I dont know what you want for logic with 00, but you can add them manuel or let the rn decide for you
declare #mytable table (rowno int,col1 nvarchar(50),col2 nvarchar(50),col3 nvarchar(50)
)
insert into #mytable
values
(1,'A', 'B', 'C'),
(2,'A', 'B', 'C'),
(3,'D', 'E', 'F'),
(4,'G', 'H', 'I'),
(5,'G', 'H', 'C')
Select rowno,col1,col2,col3,
case when rn >= 10 and rn < 100 then concatcol+'0'+cast(rn as nvarchar(50))
when rn >= 100 then concatcol+cast(rn as nvarchar(50))
else concatcol+'00'+cast(rn as nvarchar(50)) end as ConcatCol from (
select rowno,col1,col2,col3
,Col1+col2+col3 as ConcatCol,ROW_NUMBER() over(partition by col1,col2,col3 order by rowno) as rn from #mytable
) x
order by rowno
My case when makes sure when you hit number 10 it writes ABC010 and when it hits above 100 it writes ABC100 else if its under 10 it writes ABC001 and so on.
Result

TSQL: CONCAT(column1,column2,column3,RIGHT(REPLICATE("0", 3) + LEFT(row_no, 3), 3))

You should combine your columns like below :
SELECT CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(ORDER BY
(
SELECT NULL
)))+') '+DATA AS Data
FROM
(
SELECT column1+column2+column3+'00'+CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(PARTITION BY column1,
column2,
column3 ORDER BY
(
SELECT NULL
))) DATA
FROM <table_name>
) T;
Result :
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001

MySQL:
CONCAT(column1,column2,column3,LPAD(row_no, 3, '0'))
[you will need to enclose the 'row no' in ticks if there is a space in the name of the field instead of underscore.]

SQL Server- Return Items Only When All Sub-Items Are Available

I have an Item table (denormalized for this example) containing a list of items, parts and whether the part is available. I want to return all the items for which all the parts are available. Each item can have a varying number of parts. For example:
Item Part Available
A 1 Y
A 2 N
A 3 N
B 1 Y
B 4 Y
C 2 N
C 5 Y
D 4 Y
D 6 Y
D 7 Y
The query should return the following:
Item Part
B 1
B 4
D 4
D 6
D 7
Thanks in advance for any assistance.

Here is one trick using Max() Over() Window aggregate Function
SELECT Item,
Part
FROM (SELECT Max([Available])OVER(partition BY [Item]) m_av,*
FROM yourtable) a
WHERE m_av = 'Y'
or using Group By and Having clause
Using IN clause
SELECT Item,
Part
FROM yourtable
WHERE Item IN (SELECT Item
FROM yourtable
GROUP BY Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using Exists
SELECT Item,
Part
FROM yourtable A
WHERE EXISTS (SELECT 1
FROM yourtable B
WHERE A.Item = B.Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using NOT EXISTS
SELECT Item,
Part
FROM yourtable A
WHERE NOT EXISTS (SELECT *
FROM yourtable B
WHERE A.Item = B.Item
AND B.Available = 'N')

I'd start with rephrasing the requirement - you want to return the items that don't have any parts that are not available. Once you put it like that, it's easy to translate the requirement to SQL using the not exists operator:
SELECT item, part
FROM parts a
WHERE NOT EXISTS (SELECT *
FROM parts b
WHERE a.item = b.item AND b.available = 'N')

Using window function does a single table read.
MIN and MAX window function
select *
from (
select
t.*,
max(available) over (partition by item) a,
min(available) over (partition by item) b
from your_table t
) t where a = b and a = 'Y';
COUNT window function:
select *
from (
select
t.*,
count(*) over (partition by item) n1
count(case when available = 'Y' then 1 end) over (partition by item) n2
from your_table t
) t where n1 = n2;

U can use NOT IN OR NOT EXISTS to achieve this
NOT EXISTS
Select item, part
from table as T1
where not exists( select 1 from tbl where item = t1.item and available = 'N')
NOT IN
Select item, part
from table
where item not in( select item from tbl where available = 'N')

I want to point out that the question in the text is: "I want to return all the items for which all the parts are available". However, your example results include the parts.
If the question is indeed that you want the items only, then you can use simple aggregation:
select item
from parts
group by item
having min(available) = max(available) and min(available) = 'Y';
If you indeed want the detail on the parts as well, then the other answers provide that information.

I do like it problems lend themselves well to being solved by infrequently used language features:
with cte as (
select * from (values
('A', 1, 'Y'),
('A', 2, 'N'),
('A', 3, 'N'),
('B', 1, 'Y'),
('B', 4, 'Y'),
('C', 2, 'N'),
('C', 5, 'Y'),
('D', 4, 'Y'),
('D', 6, 'Y'),
('D', 7, 'Y')
) as x(Item, Part, Available)
)
select *
into #t
from cte as c;
select *
from #t as c
where 'Y' = all (
select Available
from #t as a
where c.Item = a.Item
)
Here, we use a correlated subquery and the all keyword to see if all of the parts are available. My understanding is that, like exists, this will stop if it finds a counter-example.

Find the users having more than two elements and one of those elements must be A

I want to extract the users having more than two elements and one of those elements must be A.
This my table:
CREATE TABLE #myTable(
ID_element nvarchar(30),
Element nvarchar(10),
ID_client nvarchar(20)
)
This is the data of my table:
INSERT INTO #myTable VALUES
(13 ,'A', 1),(14 ,'B', 1),(15 ,NULL, 1),(16 ,NULL, 1),
(17 ,NULL, 1),(18 ,NULL, 1),(19 ,NULL, 1),(7, 'A', 2),
(8, 'B', 2),(9, 'C', 2),(10 ,'D', 2),(11 ,'F', 2),
(12 ,'G', 2),(1, 'A', 3),(2, 'B', 3),(3, 'C', 3),
(4, 'D', 3),(5, 'F', 3),(6, 'G', 3),(20 ,'Z', 4),
(22 ,'R', 4),(23 ,'D', 4),(24 ,'F', 5),(25 ,'G', 5),
(21 ,'x', 5)
And this is my query:
Select Distinct ID_client
from #myTable
Group by ID_client
Having Count(Element) > 2

Add to your query CROSS APPLY with id_clients that have element A
SELECT m.ID_client
FROM #myTable m
CROSS APPLY (
SELECT ID_client
FROM #myTable
WHERE ID_client = m.ID_client
AND Element = 'A'
) s
GROUP BY m.ID_client
HAVING COUNT(DISTINCT m.Element) > 2
Output:
ID_client
2
3

I think this is what you are looking for:
SELECT * FROM
(SELECT *, RANK() OVER (PARTITION BY element ORDER by id_client) AS grouped FROM #myTable) t
wHERE grouped > 1
AND Element = 'A'
ORDER by t.element
which brings back
ID_element Element ID_client grouped
7 A 2 2
1 A 3 3

You can select the ID_client values which have an 'A' as an Element and join your table with the result of that:
SELECT m.ID_Client
FROM #myTable AS m
JOIN (
SELECT a.ID_Client FROM #myTable AS a
WHERE a.Element = 'A') AS filteredClients
ON m.ID_client = filteredClients.ID_client
GROUP BY m.ID_client
HAVING COUNT(m.Element) > 2
Outputs:
ID_Client
2
3
However, this is not necessarily the best way to do it: When should I use Cross Apply over Inner Join?

Delete rows in table that are sum of other rows per group

Group rows by T, and in each group find the row that is the largest or smallest (if values are negative) sum of other rows from that group, and delete that row (one for each group), if group does not have enough elements to find sum or enough but none of the rows indicates sum of others nothing happens
CREATE TABLE Test (
T varchar(10),
V int
);
INSERT INTO Test
VALUES ('A', 4),
('B', -5),
('C', 5),
('A', 2),
('B', -1),
('C', 10),
('A', 2),
('B', -4),
('C', 5),
('D', 0);
expected result:
A 2
A 2
B -1
B -4
C 5
C 5
D 0

Like the comments, the requirements seem strange. The below code assumes that the summing is already pre-populated and merely removes the largest/smallest as long as the highest value is not 0.
if object_id('tempdb..#test') is not null
drop table #test
CREATE TABLE #Test (
T varchar(10),
V int
);
INSERT INTO #Test
VALUES ('A', 4), ('B', -5), ('C', 5), ('A', 2), ('B', -1), ('C', 10), ('A', 2), ('B', -4), ('C', 5), ('D', 0);
if object_id('tempdb..#test2') is not null
drop table #test2
SELECT
T,
V,
ABS(V) as absV
INTO #TEST2
FROM #TEST
SELECT * FROM #TEST2
if object_id('tempdb..#max') is not null
drop table #max
SELECT
T,
MAX(absV) AS MaxAbsV
INTO #Max
FROM #TEST2
GROUP BY T
HAVING MAX(AbsV) != 0
DELETE #TEST2
FROM #TEST2
INNER JOIN #MAX ON #TEST2.T = #MAX.T AND #TEST2.absV = #Max.MaxAbsV
SELECT * FROM #TEST2
ORDER BY T ASC

; with cte as
(
select T, V,
R = row_number() over (partition by T order by ABS(V) desc),
C = count(*) over (partition by T)
from Test
)
delete c
from cte c
inner join
(
select T, S = sum(V)
from cte
where R <> 1
group by T
) s on c.T = s.T
where c.C >= 3
and c.R = 1
and c.V = s.S

Using ABS and NOT Exists
DECLARE #Test TABLE (
T varchar(10),
V int
);
INSERT INTO #Test
VALUES ('A', 4), ('B', -5), ('C', 5), ('A', 2), ('B', -1), ('C', 10), ('A', 2), ('B', -4), ('C', 5), ('D', 0);
;WITH CTE as (
select T,max(ABS(v ))v from #Test
WHERE V <> 0
GROUP BY T )
SELECT T,V FROM #Test T where NOT exists (Select 1 FROM cte WHERE T = T.T AND v = ABS(T.V) )
ORDER BY T.T

Determine first if the rows are positive or negative by checking if SUM(V) is positive. And then determine if the smallest or largest value is equal to the SUM of the other rows, by subtracting from SUM(V) the MIN(V) if negative or MAX(V) if positive:
DELETE t
FROM Test t
INNER JOIN (
SELECT
T,
SUM(V) - CASE WHEN SUM(V) >= 0 THEN MAX(V) ELSE MIN(V) END AS ToDelete
FROM Test
GROUP BY T
HAVING COUNT(*) >= 3
) a
ON a.T = t.T
AND a.ToDelete = t.V
ONLINE DEMO

You can use the below query to get the required output :-
select * into #t1 from test
select * from
(
select TT.T as T,TT.V as V
from test TT
JOIN
(select T,max(abs(V)) as V from #t1
group by T) P
on TT.T=P.T
where abs(TT.V) <> P.V
UNION ALL
select A.T as T,A.V as V from test A
JOIN(
select T,count(T) as Tcount from test
group by T
having count(T)=1) B on A.T=B.T
) X order by T
drop table #t1

You are looking for a value per group that is the sum of all the group's other values. E.g. 4 of (2,2,4) or -5 of (-5,-4,-1).
This is usually only one record per group. But it can be multiple times the same number. Here are examples for ties: (0,0) or (-2,2,4,4), or (-2,-2,4,4,4) or (-10,3,3,3,3,4).
As you see, you are looking in any way for values that equal half of the group's total sum. (Of course. We are looking for n+n, where one n is in one record and the other n is the sum of all the other records.)
The only special case is when there is only one value in the group which is zero. That we don't want to delete of course.
Here is an update statement that cannot deal with ties, but would delete all maximum values instead of just one:
delete from test
where 2 * v =
(
select case when count(*) = 1 then null else sum(v) end
from test fullgroup
where fullgroup.t = test.t
);
In order to deal with ties you would need artificial row numbers, so as to delete only one record of all candidates.
with candidates as
(
select t, v, row_number() over (partition by t order by t) as rn
from
(
select
t, v,
sum(v) over (partition by t) as sumv,
count(*) over (partition by t) as cnt
from test
) comparables
where sumv = 2 * v and cnt > 1
)
delete
from candidates
where rn = 1;
SQL fiddle: http://sqlfiddle.com/#!6/6d97e/1

See if below query helps:
DELETE [Audit].[dbo].[Test] FROM [Audit].[dbo].[Test] as AA
INNER JOIN (select T,
CASE
WHEN MAX(V) < 0 THEN MIN(V)
WHEN MIN(V) > 0 THEN MAX(V) ELSE MAX(V)
END as MAX_V,
CASE
WHEN SUM(V) > 0 THEN SUM(V) - MAX(V)
WHEN SUM(V) < 0 THEN SUM(V) - MIN(V) ELSE SUM(V)
END as SUM_V_REST
from [Audit].[dbo].[Test]
Group by T
Having Count(V) > 1) as BB ON AA.T = BB.T and AA.V = BB.MAX_V

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grouping item recursively in sql - sql

Related

Select 75% of records to rename, based on column sum

sql generate code based on three column values

SQL Server- Return Items Only When All Sub-Items Are Available

Find the users having more than two elements and one of those elements must be A

Delete rows in table that are sum of other rows per group

Categories

Resources