SQL Server condition based concatenation of multiple rows - sql

Consider a table holding data like below,
COL1| COL2| active_flag
----| ----| ------------
1 | A | Y
1 | B | Y
1 | C | Y
1 | D | N
1 | E | N
2 | M | Y
2 | N | Y
2 | O | N
2 | P | Y
2 | Q | Y
and I require the output like below.
COL1| COL2
----| -----
1 | ABC
1 | D
1 | E
2 | MN
2 | O
2 | PQ
How to achieve this in SQL Server 2012

First you should add column to id to get stable sort. Then you could calculate each group and then concatenate using FOR XML:
WITH cte AS(
SELECT *,
CASE WHEN
LAG(active_flag) OVER(ORDER BY id) <> active_flag
OR LAG(active_flag) OVER(ORDER BY id) = 'N' AND active_flag = 'N' THEN 1
ELSE 0
b END as l
FROM t
), cte2 AS (
SELECT *, SUM(l) OVER(ORDER BY id) AS grp
FROM cte
)
SELECT DISTINCT col1, (SELECT '' + col2
FROM cte2
WHERE grp = c.grp
ORDER BY id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)') AS col2
FROM cte2 c;
Rextester demo
Output:
Warning!
ORDER BY 1/0 or ORDER BY (SELECT 1) do not provide stable sort.

Related

Suggest SQL query for given use case

Original Table
Id | Time | Status
------------------
1 | 5 | T
1 | 6 | F
2 | 3 | F
1 | 2 | F
2 | 4 | T
3 | 7 | F
2 | 3 | T
3 | 1 | F
4 | 7 | H
4 | 6 | S
4 | 5 | F
4 | 4 | T
5 | 5 | S
5 | 6 | F
Expected Table
Id | Time | Status
------------------
1 | 6 | F
3 | 7 | F
4 | 5 | F
I want all the distinct ids who have status as F but time should be maximum, if for any id status is T for given maximum time then that id should not be picked. Also only those ids should be picked who have at-least one T. For e.g 4 will not be picked at it doesn't have any 'T' as status.
Please help in writing the SQL query.
You can use EXISTS and NOT EXISTS in the WHERE clause:
select t.*
from tablename t
where t.status = 'F'
and exists (select 1 from tablename where id = t.id and status = 'T')
and not exists (
select 1
from tablename
where id = t.id and status in ('F', 'T') and time > t.time
)
See the demo.
Results:
| Id | Time | Status |
| --- | ---- | ------ |
| 1 | 6 | F |
| 4 | 5 | F |
Try the below way -
select * from tablename t
where time = (select max(time) from tablename t1 where t.id=t1.id and Status='F')
and Status='F'
the following should work
select id,max(time) as time,status
from table
where status='F'
group by id,status
select id, max(time), status
from stuff s
where status = 'F'
and id not in (
select id
from stuff s2
where s2.id = s.id
and s2.time > s.time
and s2.status = 'T')
group by id, status;
You can see the Fiddle here.
As I understand it, you want to find the highest time for each ID (max(time)) where the status is F, but only if there isn't a later record where the status is 'T'. The sub query filters out records where there exists a later record where the status is T.
WITH MAX_TIME_ID AS (
SELECT
ID
,MAX(TIME) AS MAX_TIME
GROUP BY
ID
)
SELECT
O.*
FROM
ORIGINAL_TABLE O
INNER JOIN
MAX_TIME_ID MAX
ON
O.ID = MAX.ID
WHERE
O.STATUS = 'F'
The CTE will find the max time for each ID and the inner join with the where clause on the status will select it only if the latest is 'F'.
I would just use window functions:
select t.*
from (select t.*
row_number() over (partition by id order by time desc) as seqnum,
sum(case when status = 'T' then 1 else 0 end) over (partition by id) as num_t
from t
) t
where num_t > 0 and
seqnum = 1 and status = 'F';
There is a another fun way to do this just with aggregation:
select id, max(time) as time, 'F' as status
from t
group by id
having sum(case when status = 'T' then 1 else 0 end) > 0 and
max(time) = max(case when status 'F' then time end);

how to select rows with same column_a but different column_b?

I want to select rows in sql server, there's my questions below:
Table1
--------------------------
| Name | Type |
--------------------------
| A | 1 |
| A | 2 |
| B | 1 |
| B | 3 |
| A | 3 |
| C | 1 |
| C | 3 |
| D | 1 |
| D | 2 |
| D | 3 |
| . | . |
| . | . |
Select rows like below:
Table2
--------------------------
| Name | Type |
--------------------------
| A | 1 |
| A | 2 |
| A | 3 |
| D | 1 |
| D | 2 |
| D | 3 |
| . | . |
| . | . |
The select rules is...
Show Name and Type which Type must have 1,2 and 3.
Example: A had 1,2,3 types,so i would select it.
Example: B only has 1,2 types,so i wouldn't select it.
You can use window functions for this:
select name, type
from (
select
t.*,
sum(case when type in (1, 2, 3) then 1 else 0 end)
over(partition by name) cnt
from mytable t
) t
where cnt = 3
This assumes that each (name, type) tuple occurs only once in the original table, which is consistant with your sample data.
Demo on DB Fiddle:
name | type
:--- | ---:
A | 1
A | 2
A | 3
D | 1
D | 2
D | 3
You could use INNER JOINs on the three Type columns to achieve this:
SELECT Table1.[Name],
Table1.[Type]
FROM Table1
INNER JOIN (
SELECT [Name]
FROM Table1
WHERE ([Type] = 1)
) A ON A.[Name] = Table1.[Name]
INNER JOIN (
SELECT [Name]
FROM Table1
WHERE ([Type] = 2)
) B ON B.[Name] = A.[Name]
INNER JOIN (
SELECT [Name]
FROM Table1
WHERE ([Type] = 3)
) C ON C.[Name] = A.[Name]
This outputs:
Name Type
A 1
A 2
A 3
D 1
D 2
D 3
The matching sqlfiddle.
This works by returning rows that contain [Type] = 1, and then ONLY matching rows where [Type] = 2 and [Type] = 3. Then this is joined back to your main table and the results are returned.
Get the names with group by name and set the condition in the having clause:
select * from Table1
where name in (
select name
from Table1
group by name
having count(distinct type) = 3
)
If there are for the column Type other values than 1, 2, 3 then:
select * from Table1
where type in (1, 2, 3) and name in (
select name
from Table1
where type in (1, 2, 3)
group by name
having count(distinct type) = 3
)
See the demo.
Results:
> Name | Type
> :--- | ---:
> A | 1
> A | 2
> A | 3
> D | 1
> D | 2
> D | 3
you can use string_agg if it is sql server 2017 and above or Azure SQL as below:
Select * from #yourTable yt join (
select [name], string_agg([Type], ',') as st_types
from #YourTable
group by [name] ) a
on yt.name = a.[name] and a.st_types like '%1,2,3%'
I give you this, this will work if you have:
A 1
A 2
A 3
A 2
It will then only give you B.
SELECT *
FROM Table1
WHERE Name in (
SELECT Name from
(
SELECT Name, Type, count(Name) c from Table1 where Type = 1
GROUP BY Name, Type
HAVING count(Name) = 1
UNION
SELECT Name, Type, count(Name) c from Table1 where Type = 2
GROUP by Name, Type
HAVING count(Name) = 1
UNION
SELECT Name, Type, count(Name) c from Table1 where Type = 3
GROUP by Name, Type
HAVING count(Name) = 1) t
GROUP by name
HAVING count(c) = 3)
Here is the DEMO

Padding tables with 0s in redshift

I have a table of the form:
id | A | B | C
-----------------
1 | 1 | 0 | 1
1 | 2 | 1 | 0
2 | 1 | 4 | 0
I would like to pad this table with rows of 0s (excluding the id) such that each id has exactly 3 entries. So the result would be:
id | A | B | C
-----------------
1 | 0 | 0 | 0
1 | 1 | 0 | 1
1 | 2 | 1 | 0
2 | 0 | 0 | 0
2 | 0 | 0 | 0
2 | 1 | 4 | 0
This is because id 1 had two entries, so we added one row of 0s, and id 2 had one entry, so we added two rows of 0s.
Note: we can assume each id occurs no more than 3 times and that if an id occurs exactly 3 times, there is no need to add padding.
Is there an intelligent way of doing this with Amazon Redshift? I need this to scale to 30 days of padding and a few hundred columns.
If column A is always sequential you can do:
select i.id, n.num,
coalesce(t.b, 0) as b,
coalesce(t.c, 0) as c
from (select distinct id from t) i cross join
(select 1 as num union all select 2 union all select 3) n left join
t on i.id = t.id and n.num = t.A;
You do need to list each column in the select to get the zeros.
If the above is not true, you can make it true with a CTE:
with t as (
select t.*, row_number() over (partition by id order by id) as num
from t
)
select i.id, coalesce(t.a, 0) as a,
coalesce(t.b, 0) as b,
coalesce(t.c, 0) as c
from (select distinct id from t) i cross join
(select 1 as num union all select 2 union all select 3) n left join
t on i.id = t.id and n.num = t.num;

matching groups of rows in two databases

I have the following (simplified) situation in two databases:
ID Prog T Qt
|---------|--------|---------|---------|
| a | 1 | N | 100 |
| b | 1 | Y | 10 |
| b | 2 | N | 90 |
| c | 1 | N | 25 |
| c | 2 | Y | 25 |
| c | 3 | Y | 25 |
| c | 4 | Y | 25 |
|---------|--------|---------|---------|
ID Prog T Qt
|---------|--------|---------|---------|
| 1 | 1 | Y | 10 |
| 1 | 2 | N | 90 |
| 2 | 1 | Y | 100 |
| 3 | 1 | Y | 100 |
| 4 | 1 | Y | 50 |
| 4 | 2 | Y | 25 |
| 4 | 3 | Y | 25 |
|---------|--------|---------|---------|
I need to compare groups of rows (primary keys are ID and Prog), to find out which groups of rows represent the same combination of factors (not considering ID).
In the example above, ID "b" in the first table and ID "1" in the second have the same combination of values for Prog, T and Qt, while no one else can be considered exactly the same between the 2 dbs (while ID "2" and "3" in the second table are equal, I'm not interested in comparing in the same db).
I hope I explained everything.
A join and aggregation should work for this purpose:
select t1.id, t2.id
from (select t1.*, count(*) over (partition by id) as cnt
from t1
) t1 join
(select t2.*, count(*) over (partition by id) as cnt
from t2
) t2
on t1.prog = t2.prog and t1.T = t2.T and t1.Qt = t2.Qt and t1.cnt = t2.cnt
group by t1.id, t2.id, t1.cnt
having count(*) = t1.cnt;
This is a little tricky. The subqueries count the number of rows for each id in each table. The on clause gets matches between the three columns -- and checks that the ids have the same count. The group by and having then get rows where number of matching rows is the total number of rows.
Join the two tables on the conditions you want to match. The results will be the values that match between them.
CREATE TABLE a (ID CHAR(1), Prog INT, T CHAR(1), Qt INT);
CREATE TABLE b (ID int, Prog INT, T CHAR(1), Qt INT);
INSERT INTO dbo.a
( ID ,Prog ,T ,Qt)
VALUES ('a',1,'N',100), ('b',1,'Y',10), ('b',2,'N',90),('c',1,'N',25),('c',2,'Y',25),('c',3,'Y',25),('c',4,'Y',25)
INSERT INTO dbo.b
( ID ,Prog ,T ,Qt)
VALUES (1,1,'Y',10),(1,2,'N',90),(2,1,'Y',100),(3,1,'Y',100),(4,1,'Y',50),(4,2,'Y',25),(4,3,'Y',25)
WITH CTEa
AS (SELECT ID,
Prog,
T,
Qt,
Cnt = COUNT(ID) OVER (PARTITION BY ID)
FROM dbo.a
),
CTEb
AS (SELECT ID,
Prog,
T,
Qt,
Cnt = COUNT(ID) OVER (PARTITION BY ID)
FROM dbo.b
)
SELECT ID_A = a.ID,
ID_B = b.ID,
b.Prog,
b.T,
b.Qt,
b.Cnt
FROM CTEa AS a
INNER JOIN CTEb AS b
ON a.Prog = b.Prog
AND a.T = b.T
AND a.Qt = b.Qt
AND a.Cnt = b.Cnt;
Results:
ID_A ID_B Prog T Qt Cnt
b 1 1 Y 10 2
b 1 2 N 90 2

Merge multiple rows in SQL with tie breaking on primary key

I have a table with data like the following
key | A | B | C
---------------------------
1 | x | 0 | 1
2 | x | 2 | 0
3 | x | NULL | 4
4 | y | 7 | 1
5 | y | 3 | NULL
6 | z | NULL | 4
And I want to merge the rows together based on column A with largest primary key being the 'tie breaker' between values that are not NULL
Result
key | A | B | C
---------------------------
1 | x | 2 | 4
2 | y | 3 | 1
3 | z | NULL | 4
What would be the best way to achieve this assuming my data is actually 40 columns and 1 million rows with an unknown level of duplications?
Using ROW_NUMBER and conditional aggregation:
SQL Fiddle
WITH cte AS(
SELECT *,
rnB = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN B IS NULL THEN 0 ELSE 1 END DESC, [key] DESC),
rnC = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN C IS NULL THEN 0 ELSE 1 END DESC, [key] DESC)
FROM tbl
)
SELECT
[key] = ROW_NUMBER() OVER(ORDER BY A),
A,
B = MAX(CASE WHEN rnB = 1 THEN B END),
C = MAX(CASE WHEN rnC = 1 THEN C END)
FROM cte
GROUP BY A