SQL Pivot 2nd step -- Combine rows / Remove nulls

SQL Pivot 2nd step -- Combine rows / Remove nulls - sql

I am trying to pivot data. We start with two columns: ListNum and Value
The row ?integrity? doesn't matter, I just want all the values to collapse upwards and remove the nulls.
In this case ListNum is like an enum, the values are limited to List1, List2, or List3. Notice that they are not in order (1,3,3,1,2 rather than 1,2,3,1,2,3 etc.).
It would be nice to have a solution that uses standard sql so it would work across many databases.
Starting Point:
+---------+------------+
| ListNum | Value |
+---------+------------+
| List1 | A |
| List3 | 123 |
| List3 | CDE |
| List1 | Somestring |
| List2 | randString |
+---------+------------+
I was able to separate the Lists into columns with:
select
case when ListNum = "List1" then Value end as List1,
case when ListNum = "List2" then Value end as List2,
case when ListNum = "List3" then Value end as List3
from Table;
Midpoint:
+------------+------------+-------+
| List1 | List2 | List3 |
+------------+------------+-------+
| A | NULL | NULL |
| NULL | NULL | 123 |
| NULL | NULL | CDE |
| Somestring | NULL | NULL |
| NULL | randString | NULL |
+------------+------------+-------+
but now I need to collapse upwards/remove the nulls to get -
Desired Output:
+------------+------------+-------+
| List1 | List2 | List3 |
+------------+------------+-------+
| A | randString | 123 |
| Somestring | NULL | CDE |
+------------+------------+-------+

Aren't you missing some kind of grouping criterion? How do you determine, that A belongs to 123 and not to CDE? And why is randString in the first line and not in the second?
This is easy, with such a grouping key:
DECLARE #tbl TABLE(GroupingKey INT, ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'List1','A')
,(1,'List3','123')
,(2,'List3','CDE')
,(2,'List1','Somestring')
,(1,'List2','randString');
SELECT p.*
FROM #tbl
PIVOT
(
MAX([Value]) FOR ListNum IN(List1,List2,List3)
) p;
But with your data this seems rather random...
UPDATE: A random approach...
The following approach will sort the values into their columns rather randomly:
DECLARE #tbl TABLE(ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
('List1','A')
,('List3','123')
,('List3','CDE')
,('List1','Somestring')
,('List2','randString');
--This will use three independant, but numbered sets and join them:
WITH All1 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List1')
,All2 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List2')
,All3 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List3')
SELECT All1.[Value] AS List1
,All2.[Value] AS List2
,All3.[Value] AS List3
FROM All1
FULL OUTER JOIN All2 ON All1.RandomNumber=All2.RandomNumber
FULL OUTER JOIN All3 ON All1.RandomNumber=All3.RandomNumber ;
Hint: There is no implicit sort order in your table!
From your comment:
It’s simply the index / instance number. randString is the first non-null row.
Without a specific ORDER BY the same SELECT may return your data in any random order. So there is no first non-null row, at least not in the meaning of first comes before second...

Something with recursive CTE may work:
DECLARE #tbl TABLE( ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
( 'List1','A')
,( 'List3','123')
,( 'List3','CDE')
,( 'List1','Somestring')
,( 'List2','randString');
DECLARE #mmax int;
SELECT #mmax = cnt from (SELECT TOP 1 count(*) cnt from #tbl group by ListNum ORDER BY count(*) DESC) t;
With rec AS (
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM rec WHERE num+1<=#mmax
)
SELECT t1.List1, t2.List2, t3.List3 FROM rec
FULL JOIN (
select Value as List1, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List1'
) t1
ON rec.num = t1.rn
FULL JOIN
(
select Value as List2, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List2'
) t2
ON rec.num = t2.rn
FULL JOIN
(
select Value as List3, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List3'
) t3
ON rec.num = t3.rn;
DEMO

Ordinarily you'd use an aggregating operation such as MAX because it hides nulls (a null can never be the max unless there is no other valid value in the group).. but your query is a bit odd because you don't seem to have a solid pivot anchor, and you're allowing any of your data to become associated with anything else. In the real world this probably wouldn't happen because it isn't particularly useful
Better example data:
Person, Attribute, Value
1, Name, John
1, Age, 10
2, Name, Sarah
3 Age, 39
Pivoting query:
SELECT
Person,
MAX(case when attribute = 'name' then value end) as name,
MAX(case when attribute = 'age' then value end) as age
FROM
data
GROUP BY person
Result:
Person, Name, Age
1, John, 10
2, Sarah, NULL
3, NULL, 39

Related

Sort an array of strings in SQL

I have a column of strings in SQL Server 2019 that I want to sort
Select * from ID
[7235, 6784]
[3235, 2334]
[9245, 2784]
[6235, 1284]
Trying to get the result below:
[6784, 7235]
[2334, 3235]
[2784, 9245]
[1284, 6235]

Given this sample data:
CREATE TABLE dbo.ID(ID int IDENTITY(1,1), SomeCol varchar(64));
INSERT dbo.ID(SomeCol) VALUES
('[7235, 6784]'),
('[3235, 2334]'),
('[9245, 2784]'),
('[6235, 1284]');
You can run this query:
;WITH cte AS
(
SELECT ID, SomeCol,
i = TRY_CONVERT(int, value),
s = LTRIM(value)
FROM dbo.ID CROSS APPLY
STRING_SPLIT(PARSENAME(SomeCol, 1), ',') AS s
)
SELECT ID, SomeCol,
Result = QUOTENAME(STRING_AGG(s, ', ')
WITHIN GROUP (ORDER BY i))
FROM cte
GROUP BY ID, SomeCol
ORDER BY ID;
Output:
ID
SomeCol
Result
1
[7235, 6784]
[6784, 7235]
2
[3235, 2334]
[2334, 3235]
3
[9245, 2784]
[2784, 9245]
4
[6235, 1284]
[1284, 6235]
Example db<>fiddle

The source table has a column with a JSON array.
That's why it is a perfect case to handle it via SQL Server JSON API.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID int IDENTITY PRIMARY KEY, jArray NVARCHAR(100));
INSERT #tbl (jArray) VALUES
('[7235, 6784]'),
('[3235, 2334]'),
('[9245, 2784]'),
('[6235, 1284]');
-- DDL and sample data population, end
SELECT t.*
, Result = QUOTENAME(STRING_AGG(j.value, ', ')
WITHIN GROUP (ORDER BY j.value ASC))
FROM #tbl AS t
CROSS APPLY OPENJSON(t.jArray) AS j
GROUP BY t.ID, t.jArray
ORDER BY t.ID;
Output
+----+--------------+--------------+
| ID | jArray | Result |
+----+--------------+--------------+
| 1 | [7235, 6784] | [6784, 7235] |
| 2 | [3235, 2334] | [2334, 3235] |
| 3 | [9245, 2784] | [2784, 9245] |
| 4 | [6235, 1284] | [1284, 6235] |
+----+--------------+--------------+

How to find duplicate sets of values in column SQL

I have a database table in SQL Server like this:
+----+--------+
| ID | Number |
+----+--------+
| 1 | 4 |
| 2 | 2 |
| 3 | 6 |
| 4 | 5 |
| 5 | 3 |
| 6 | 2 |
| 7 | 6 |
| 8 | 4 |
| 9 | 5 |
| 10 | 1 |
| 11 | 6 |
| 12 | 4 |
| 13 | 2 |
| 14 | 6 |
+----+--------+
I want to get all values of rows that are the same with last row or last 2 rows or last 3 rows or .... in column Number, and when finding those values, will go on to get the values that appear next and count the number its appearance.
Result output like this:
If the same with the last row:
We see that the number next to 6 in column Number is 4 and 5.
Times appear in column Number of pair 6,4 is 2 and pair 6,5 is 1.
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6 | 5 | 1 |
| 6 | 4 | 2 |
+---------------------+-------------------------+--------------+
If the same with the last two rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 2,6 | 5 | 1 |
| 2,6 | 4 | 1 |
+---------------------+-------------------------+--------------+
If the same with the last 3 rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 4,2,6 | 5 | 1 |
+---------------------+-------------------------+--------------+
And if the last 4,5,6...rows, find until Times appear returns 0
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6,4,2,6 | | |
+---------------------+-------------------------+--------------+
Any idea how to get this. Thank so much!

Here's an answer which uses the 'Lead' function - which (once ordered) takes a value from a certain number of rows ahead.
It converts your table with 1 number, to also include the next 3 numbers on each row.
Then you can join on those columns to get numbers etc.
CREATE TABLE #Src (Id int PRIMARY KEY, Num int)
INSERT INTO #Src (Id, Num) VALUES
( 1, 4),
( 2, 2),
( 3, 6),
( 4, 5),
( 5, 3),
( 6, 2),
( 7, 6),
( 8, 4),
( 9, 5),
(10, 1),
(11, 6),
(12, 4),
(13, 2),
(14, 6)
CREATE TABLE #SrcWithNext (Id int PRIMARY KEY, Num int, Next1 int, Next2 int, Next3 int)
-- First step - use LEAD to get the next 1, 2, 3 values
INSERT INTO #SrcWithNext (Id, Num, Next1, Next2, Next3)
SELECT ID, Num,
LEAD(Num, 1, NULL) OVER (ORDER BY Id) AS Next1,
LEAD(Num, 2, NULL) OVER (ORDER BY Id) AS Next2,
LEAD(Num, 3, NULL) OVER (ORDER BY Id) AS Next3
FROM #Src
SELECT * FROM #SrcWithNext
/* Find number with each combination */
-- 2 chars
SELECT A.Num, A.Next1, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B ON A.Num = B.Num AND A.Next1 = B.Next1
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1
ORDER BY A.Num, A.Next1
-- 3 chars
SELECT A.Num, A.Next1, A.Next2, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2
ORDER BY A.Num, A.Next1, A.Next2
-- 4 chars
SELECT A.Num, A.Next1, A.Next2, A.Next3, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2, Next3 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
AND A.Next3 = B.Next3
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2, A.Next3
ORDER BY A.Num, A.Next1, A.Next2, A.Next3
Here's a db<>fiddle to check.
Notes
The A.Num <= B.Num means it finds all matches to itself, and then only counts others once
This answer finds all combinations. To filter, it currently would need to filter as separate columns e.g., instead of 2,6, you'd filter on Num = 2 AND Next1 = 6. Feel free to then do various text/string concatenation functions to create references for your preferred search/filter approach.

Hmmm . . . I am thinking that you want to create the "pattern to find" as a string. Unfortunately, string_agg() is not a windowing function, but you can use apply:
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p;
You would change the "3" to whatever number of rows that you want.
Then you can use this to identify the rows where the patterns are matched and aggregate:
with tp as (
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p
)
select pattern_to_find, next_number, count(*)
from (select tp.*,
first_value(pattern) over (order by id desc) as pattern_to_find,
lead(number) over (order by id) as next_number
from tp
) tp
where pattern = pattern_to_find
group by pattern_to_find, next_number;
Here is a db<>fiddle.
If you are using an older version of SQL Server -- one that doesn't support string_agg() -- you can calculate the pattern using lag():
with tp as (
select t.*,
concat(lag(number, 2) over (order by id), ',',
lag(number, 1) over (order by id), ',',
number
) as pattern
from t
)
Actually, if you have a large amount of data, it would be interesting to know which is faster -- the apply version or the lag() version. I suspect that lag() might be faster.
EDIT:
In unsupported versions of SQL Server, you can get the pattern using:
select t.*, p.*
from t cross apply
(select (select cast(number as varchar(255)) + ','
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
order by t2.id desc
for xml path ('')
) as pattern
) p
You can use similar logic for lead().

I tried to solve this problem by converting "Number" column to a string.
Here is my code using a function with input of "number of last selected rows":
(Be careful that the name of the main table is "test" )
create function duplicate(#nlast int)
returns #temp table (RowNumbers varchar(20), Number varchar(1))
as
begin
declare #num varchar(20)
set #num=''
declare #count int=1
while #count <= (select count(id) from test)
begin
set #num = #num + cast((select Number from test where #count=ID) as varchar(20))
set #count=#count+1
end
declare #lastnum varchar(20)
set #lastnum= (select RIGHT(#num,#nlast))
declare #count2 int=1
while #count2 <= len(#num)-#nlast
begin
if (SUBSTRING(#num,#count2,#nlast) = #lastnum)
begin
insert into #temp
select #lastnum ,SUBSTRING(#num,#count2+#nlast,1)
end
set #count2=#count2+1
end
return
end
go
select RowNumbers AS "Condition to find", Number AS "Next Number in column" , COUNT(Number) AS "Times appear" from dbo.duplicate(2)
group by Number, RowNumbers

Two rows with the same id and two different values, getting the second value into another column

I have two rows with the same id but different values. I want a query to get the second value and display it in the first row.
There are only two rows for each productId and 2 different values.
I've tried looking for this for the solution everywhere.
What I have, example:
+-----+-------+
| ID | Value |
+-----+-------+
| 123 | 1 |
| 123 | 2 |
+-----+-------+
What I want
+------+-------+---------+
| ID | Value | Value 1 |
+------+-------+---------+
| 123 | 1 | 2 |
+------+-------+---------+

Not sure whether order matters to you. Here is one way:
SELECT MIN(Value), MAX(Value), ID
FROM Table
GROUP BY ID;

This is a self-join:
SELECT a.ID, a.Value, b.Value
FROM table a
JOIN table b on a.ID = b.ID
and a.Value <> b.Value
You can use a LEFT JOIN instead if there are IDs that only have one value and would be lost by the above JOIN

May be you may try this
DECLARE #T TABLE
(
Id INT,
Val INT
)
INSERT INTO #T
VALUES(123,1),(123,2),
(456,1),(789,1),(789,2)
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Val),
*
FROM #T
)
SELECT
*
FROM CTE
PIVOT
(
MAX(Val)
FOR
RN IN
(
[1],[2]--Add More Numbers here if there are more values
)
)Q

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.

You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;

If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group

Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

SQL Select First column and for each row select unique ID and the last date

I have a problems this mornig , I have tried many solutions and nothing gave me the expected result.
I have a table that looks like this :
+----+----------+-------+
| ID | COL2 | DATE |
+----+----------+-------+
| 1 | 1 | 2001 |
| 1 | 2 | 2002 |
| 1 | 3 | 2003 |
| 1 | 4 | 2004 |
| 2 | 1 | 2001 |
| 2 | 2 | 2002 |
| 2 | 3 | 2003 |
| 2 | 4 | 2004 |
+----+----------+-------+
And I have a query that returns a result like this :
I have the unique ID and for this ID I want to take the last date of the ID
+----+----------+-------+
| ID | COL2 | DATE |
+----+----------+-------+
| 1 | 4 | 2004 |
| 2 | 4 | 2004 |
+----+----------+-------+
But I don't have any idea how I can do that.
I tried Join , CROSS APPLY ..
If you have some idea ,
Thank you
Clement FAYARD

declare #t table (ID INT,Col2 INT,Date INT)
insert into #t(ID,Col2,Date)values (1,1,2001)
insert into #t(ID,Col2,Date)values (1,2,2001)
insert into #t(ID,Col2,Date)values (1,3,2001)
insert into #t(ID,Col2,Date)values (1,4,2001)
insert into #t(ID,Col2,Date)values (2,1,2002)
insert into #t(ID,Col2,Date)values (2,2,2002)
insert into #t(ID,Col2,Date)values (2,3,2002)
insert into #t(ID,Col2,Date)values (2,4,2002)
;with cte as(
select
*,
rn = row_number() over(partition by ID order by Col2 desc)
from #t
)
select
ID,
Col2,
Date
from cte
where
rn = 1

SELECT ID,MAX(Col2),MAX(Date) FROM tableName GROUP BY ID

If col2 and date allways the highest value in combination than you can try
SELECT ID, MAX(COL2), MAX(DATE)
FROM Table1
GROUP BY ID
But it is not realy good.
The alternative is a subquery with:
SELECT yourtable.ID, sub1.COL2, sub1.DATE
FROM yourtable
INNER JOIN -- try with CROSS APPLY for performance AND without ON 1=1
(SELECT TOP 1 COL2, DATE
FROM yourtable sub2
WHERE sub2.ID = topquery.ID
ORDER BY COL2, DATE) sub1 ON 1=1

You didn't tell what's the name of your table so I'll assume below it is tbl:
SELECT m.ID, m.COL2, m.DATE
FROM tbl m
LEFT JOIN tbl o ON m.ID = o.ID AND m.DATE < o.DATE
WHERE o.DATE is NULL
ORDER BY m.ID ASC
Explanation:
The query left joins the table tbl aliased as m (for "max") against itself (alias o, for "others") using the column ID; the condition m.DATE < o.DATE will combine all the rows from m with rows from o having a greater value in DATE. The row having the maximum value of DATE for a given value of ID from m has no pair in o (there is no value greater than the maximum value). Because of the LEFT JOIN this row will be combined with a row of NULLs. The WHERE clause selects only these rows that have NULL for o.DATE (i.e. they have the maximum value of m.DATE).
Check the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.

In order to do this you MUST exclude COL2 Your query should look like this
SELECT ID, MAX(DATE)
FROM table_name
GROUP BY ID
The above query produces the Maximum Date for each ID.
Having COL2 with that query does not makes sense, unless you want the maximum date for each ID and COL2
In that case you can run:
SELECT ID, COL2, MAX(DATE)
GROUP BY ID, COL2;
When you use aggregation functions(like max()), you must always group by all the other columns you have in the select statement.
I think you are facing this problem because you have some fundemental flaws with the design of the table. Usually ID should be a Primary Key (Which is Unique). In this table you have repeated IDs. I do not understand the business logic behind the table but it seems to have some flaws to me.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Pivot 2nd step -- Combine rows / Remove nulls - sql

Related

Sort an array of strings in SQL

How to find duplicate sets of values in column SQL

Two rows with the same id and two different values, getting the second value into another column

SELECT First Group

SQL Select First column and for each row select unique ID and the last date

Categories

Resources