Merge data from multiple columns into one

Merge data from multiple columns into one - sql

I have three columns X, Y & Z. I want to do a select statement that returns only one column.
For example for the following rows:
X Y Z
1 0 0
0 1 0
0 0 1
I want to return one column A:
A
X
Y
Z
So, wherever there is one, the column should return a string corresponding to the column name where it is one..
I don't have rights to create a new column in the database and then update it using where condition. So I was wondering if it could be done inside a SELECT statement

Without dealing with multiple 1 values in one row:
select case
when x = 1 then 'X'
when y = 1 then 'Y'
when z = 1 then 'Z'
end as A
from the_table;
If you are using Postgres and have a primary key column on that table, you can use JSON functions to make this dynamic and not hardcode the column names in the query.
Test data setup:
create table the_table (id integer, x int, y int, z int);
insert into the_table
(id,x,y,z)
values
(1, 1, 0, 0),
(2, 0, 1, 0),
(3, 0, 0, 1),
(4, 0, 1, 1),
(5, 0, 0, 0);
Then using this query:
select t.id, string_agg(k.col,'' order by k.col) as non_zero_columns
from the_table t,
jsonb_each(to_jsonb(t) - 'id') as k (col, val)
where k.val = '1'
group by id
order by id;
Will return this result:
id | non_zero_columns
---+-----------------
1 | x
2 | y
3 | z
4 | yz
Note that the row with ID=5 is not returned because all columns are zero.

If you have multiple columns with 1s in one row:
select ((case when x = 1 then 'X' else '' end) ||
(case when y = 1 then 'Y' else '' end) ||
(case when z = 1 then 'Z' else '' end)
) as A
from the_table;
Note that || is the ANSI standard operator for string concatenation. Some databases use other methods for concatenating strings.

if you are using ms sq-server, you can use UNPIVOT
DECLARE #MyTable TABLE (X INT, Y INT, Z INT)
INSERT INTO #MyTable VALUES
(1 ,0, 0 ),
(0 ,1, 0 ),
(0 ,0, 1 )
SELECT DISTINCT A FROM
(select * from #MyTable UNPIVOT( V FOR A IN ([X], [Y], [Z])) UNPVT) UP
WHERE V = 1
Result:
A
---------
X
Y
Z

You can use the INFORMATION_SCHEMA with find_in_set(str, strlist) function (MySql):
-- the schema
create table `docs` (
`X` int not null,
`Y` int not null,
`Z` int not null
);
insert into `docs` (`X`, `Y`, `Z`) values
(1, 0, 0),
(0, 1, 0),
(0, 0, 1);
-- the query
select column_name as A, concat(x, y, z)
from docs
join INFORMATION_SCHEMA.COLUMNS
on (ordinal_position = find_in_set('1', concat(x, ',', y, ',', z)))
where table_name like 'docs';
Here, the SQL Fiddle

Related

Choose a record from the table where two of the default values are 0 or 1, a bit tricky

Looking for a more elegant and most logical solution for:
The table:
index
id
by_default
text
1
1
0
AAA
2
1
1
ABA
3
1
0
ABC
4
2
0
BCA
5
2
0
BCB
The task is to find the minimum index value with defaults set to 1 and/or defaults set to 0.
I have the following code (not very elegant, but it works, also very slow):
declare #byd_1 as int=
(select min(t.index) idx from Table t where t.[id]=1 and t.by_default=1)
declare #byd_2 as int=
(select min(t.index) idx from Table t where t.[id]=1 and t.by_default=0)
select (case when #byd_1 is null then #byd_2 else #byd_1 end)
The tricky part is: sometimes the by_default column is always 0 (for example: id:2 may have no by_default values set) and as mentioned earlier the task is: need to get the minimum value of the index column.
What is the most elegant (one-line) code possible?
Using MSSQL
The expected results, according to the sample table, should be the following:
index
id
by_default
text
2
1
1
ABA
4
2
0
BCA

Edited to add text also.
A bit ugly but:
drop table #t
select *
into #t
from (
VALUES (1, 1, 0, N'AAA')
, (2, 1, 1, N'ABA')
, (3, 1, 0, N'ABC')
, (4, 2, 0, N'BCA')
, (5, 2, 0, N'BCB')
) t (index_,id,by_default,text)
select index_, id, by_default, text
from (
select min(index_) OVER(PARTITION BY id) AS minIndex
, MIN(case when by_default = 1 then index_ end) over(partition by id) AS minIndexDefault
, *
from #t
) t
where isnull(minIndexDefault, minIndex) = index_

How to count the number of occurrence in 4 columns

I have a T-SQL query to do in the most efficient way as possible.
Here is an example of my table:
A
B
C
D
E
1
x, y
z
NULL
NULL
2
x
NULL
NULL
y
3
y
z
NULL
NULL
4
a
NULL
b
x
Now, I need to do a query to classify my best matching records. Let's say that I need to take the top 3 records that match the more of the values 'x' & 'y' (it could be more than 2 values) into the columns B, C, D, E
A
NumberOfMatches
Comment
1
2
Because Column B contains x, y
2
2
Because Column B contains x & Column E contains y
3
1
Because Column B contains y
4
1
Because Column E contains x
Could you help me to find a good way to do this query?

I would highly recommend that you change how your data is stored, storing delimited strings to record multiple values is a recipe for disaster. If you have a one to many relationship use a child table with a foreign key to the main table. There is very rarely a reason to store delimited data like this, and when you have to query it and manipulate it you realise why it is not advised.
Assuming that each of your columns are the same and can hold many values, you'll need to split all of them, which you can do using this:
SELECT t.A, upvt.Col, Value = TRIM(ss.value)
FROM #T AS t
CROSS APPLY (VALUES ('B', t.B), ('C', t.C), ('D', t.D), ('E', t.E)) upvt (Col, Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss;
Which gives:
A
Col
Value
1
B
x
1
B
y
1
C
z
2
B
x
2
E
y
3
B
y
3
C
z
4
B
a
4
D
b
4
E
x
With this normalised data, you can then just do a simple WHERE Value IN ('x', 'y') along with GROUP BY and COUNT(*):
IF OBJECT_ID(N'tempdb..#T ', 'U') IS NOT NULL
DROP TABLE #T ;
CREATE TABLE #T (A INT, B VARCHAR(4), C VARCHAR(4), D VARCHAR(4), E VARCHAR(4));
INSERT #T(A, B, C, D, E)
VALUES
(1, 'x, y', 'z', NULL, NULL),
(2, 'x', NULL, NULL, 'y'),
(3, 'y', 'z', NULL, NULL),
(4, 'a', NULL, 'b', 'x');
SELECT t.A, NumberOfMatches = COUNT(*)
FROM #T AS t
CROSS APPLY (VALUES (t.B), (t.C), (t.D), (t.E)) upvt (Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss
WHERE TRIM(ss.value) IN ('x', 'y')
GROUP BY t.A
ORDER BY COUNT(*) DESC, t.A;

If you can't normalize the model you can use this query to get your expected result:
select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, trim(value) targetValue from (
select a, columnName, value result
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
outer apply string_split(result,',')
) r
where targetValue in ('x','y')
group by a
-- Result
/*
a numberOfMatches comment
1 2 Because column B, B
2 2 Because column B, E
3 1 Because column B
4 1 Because column E
*/
Update
You can use this custom function in order to implement string_split in SQL Server prior versions.
IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
#string nvarchar(MAX),
#separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING
AS RETURN
WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
T(N) AS (SELECT TOP(ISNULL(LEN(#string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
Delim(Pos) AS (SELECT t.N FROM T WHERE (SUBSTRING(#string, t.N, LEN(#separator+'x')-1) LIKE #separator OR t.N = 0)),
Separated(value) AS (SELECT SUBSTRING(#string, d.Pos + LEN(#separator+'x')-1, LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL)) - d.Pos - LEN(#separator))
FROM Delim d
WHERE #string IS NOT NULL)
SELECT s.value
FROM Separated s
WHERE s.value <> #separator
GO
Update 2
Just 1 if the field has x,y at the same time:
select a, count(distinct columnName) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, trim(value) targetValue from (
select a, columnName, value result
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
outer apply string_split(result,',')
) r
where targetValue in ('x','y')
group by a
/* another alternative */
select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, value targetValue
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
where targetValue like '%x%' or targetValue like '%y%'
group by a
-- Result
/*
a numberOfMatches comment
1 1 Because column B
2 2 Because column B, E
3 1 Because column B
4 1 Because column E
*/

Is there a 2d function that will satisfy this group by requirement on two columns or is there a more sql way to do?

I have a table and need to do a group by over two columns or a function of those that yields the expected grouping criteria. This became surprisingly an interesting math challenge but I am also happy to see how to solve it in a standard SQL solution.
I have the following table:
create table temp (a integer, x integer, y integer);
insert into temp values (3, 0, 1);
insert into temp values (3, 1, -1);
insert into temp values (4, 0, 1);
insert into temp values (4, 1, -1);
insert into temp values (4, 0, -1);
insert into temp values (4, 1, 1);
I'd like to group together rows:
1 and 2
3 and 4
5 and 6
Therefore, the select would be:
select a
from temp
group by a, f(x, y)
For that I need a function f(x, y) built on available SQLite integer operators i.e. *, +, -, / therefore I need a function such that:
f(0, 1) = f(1, -1) and
f(0, -1) = f(1, 1)
I have tried multiple possibilities but can't figure out such grouping function ... any clever ideas? :)
Would there be an alternative SQL solution for this problem?

Well, you can define such a "function" using a case expression:
(case when x = 0 and y = 1 then 1
when x = 1 and y = -1 then 1
when x = 0 and y = -1 then 2
when x = 1 and y = 1 then 2
end)
If you want an arithmetic expression:
(2 * x * y - y)

SQL Update question

I am wondering how this can be achieved.
Let's say I have a table with two columns (IU(uniqueidentifier),(ID(int), SEL(char(1))
ID column has the following values in each row(ordered by IU):
0, 1, 2, 2, 0, 0, 1, 2, 2, 2, 0, 0, 4, 2, 2, 0, 0, 1, 2, 0, 0
I need to update column SEL with 'Y' for rows which are part of the group:
1, 2, 2, 2 ...
(Starts With 1 and in the next rows thare are 2's. (Group 4, 2, 2 is not correct).
So in this example column: SEL should be:
null, Y, Y, Y, null, null, Y, Y, Y, Y, null, null, 4, 2, 2, null, null, Y, Y, null, null
Thanks!

Here's a set-based approach.
DDL & sample data:
DECLARE #atable TABLE (
UI uniqueidentifier DEFAULT NEWSEQUENTIALID(),
ID int,
SEL char(1)
);
INSERT INTO #atable (ID)
SELECT 0 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 0 UNION ALL
SELECT 0 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 0 UNION ALL
SELECT 0 UNION ALL
SELECT 4 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 0 UNION ALL
SELECT 0 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 0 UNION ALL
SELECT 0;
The UPDATE statement:
WITH marked AS (
SELECT
*,
grp = CASE ID WHEN 0 THEN 0 ELSE 1 END
FROM #atable
),
grouped AS (
SELECT
*,
grpID = ROW_NUMBER() OVER (ORDER BY UI)
- ROW_NUMBER() OVER (PARTITION BY grp ORDER BY UI)
FROM marked
),
ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY grp, grpID ORDER BY UI)
FROM grouped
)
UPDATE g
SET SEL = CASE r.ID
WHEN 0 THEN NULL
WHEN 1 THEN 'Y'
ELSE CAST(g.ID AS varchar)
END
FROM grouped g
INNER JOIN ranked r ON g.grp = r.grp AND g.grpID = r.grpID
WHERE r.rnk = 1;
The result of SELECT * FROM #atable after the update:
UI ID SEL
------------------------------------ ----------- ----
A4095E70-A0CC-E011-813B-20CF30905E89 0 NULL
A5095E70-A0CC-E011-813B-20CF30905E89 1 Y
A6095E70-A0CC-E011-813B-20CF30905E89 2 Y
A7095E70-A0CC-E011-813B-20CF30905E89 2 Y
A8095E70-A0CC-E011-813B-20CF30905E89 0 NULL
A9095E70-A0CC-E011-813B-20CF30905E89 0 NULL
AA095E70-A0CC-E011-813B-20CF30905E89 1 Y
AB095E70-A0CC-E011-813B-20CF30905E89 2 Y
AC095E70-A0CC-E011-813B-20CF30905E89 2 Y
AD095E70-A0CC-E011-813B-20CF30905E89 2 Y
AE095E70-A0CC-E011-813B-20CF30905E89 0 NULL
AF095E70-A0CC-E011-813B-20CF30905E89 0 NULL
B0095E70-A0CC-E011-813B-20CF30905E89 4 4
B1095E70-A0CC-E011-813B-20CF30905E89 2 2
B2095E70-A0CC-E011-813B-20CF30905E89 2 2
B3095E70-A0CC-E011-813B-20CF30905E89 0 NULL
B4095E70-A0CC-E011-813B-20CF30905E89 0 NULL
B5095E70-A0CC-E011-813B-20CF30905E89 1 Y
B6095E70-A0CC-E011-813B-20CF30905E89 2 Y
B7095E70-A0CC-E011-813B-20CF30905E89 0 NULL
B8095E70-A0CC-E011-813B-20CF30905E89 0 NULL

Rows in a table have no inherent order, so your grouping (1,2,2,2) is completely arbitrary. It is not guaranteed that your id's will always come in this order:
0, 1, 2, 2, 0, 0, 1, 2, 2, 2, 0, 0, 4, 2, 2, 0, 0, 1, 2, 0, 0
It could be that they come in a completely other order. So you need to specify a ORDER BY clause to get your order. As you have no other fields in your table but SEL and ID, I suppose this is not possible.

I really hope someone comes up with something better than this, because I hate this answer.
create table #test (
IU int identity primary key,
id int,
sel varchar(1)
)
insert into #test(id)
values (0), (1), (2), (2), (0), (0), (1), (2), (2), (2), (0), (0), (4), (2), (2), (0), (0), (1), (2), (0), (0)
DECLARE myCur CURSOR FORWARD_ONLY
FOR
select t.ID
from #test t
order by t.IU
FOR UPDATE OF t.sel
DECLARE #ID int, #lagSel varchar(1)
OPEN myCur
FETCH myCur INTO #ID
WHILE (##FETCH_STATUS = 0) BEGIN
SET #lagSel = CASE
WHEN #lagSel = 'Y' AND #ID in (1,2) THEN 'Y'
WHEN #ID = 1 THEN 'Y'
ELSE NULL
END
UPDATE #test
SET sel = #lagSel
WHERE CURRENT OF myCur
FETCH myCur INTO #ID
END
CLOSE myCur
DEALLOCATE myCur
A couple of things to note:
We're manually managing the value of #lagSel within the cursor so we can carry a value from one row to the next.
In order to be able to use the cursor FOR UPDATE, the table has to have a primary key.
In the UPDATE statement, the WHERE CURRENT OF myCur gives (at least in theory) a big performance gain over any other where clause.
I first tried doing this with lagged joins, but couldn't quite get it there. Here's my work in case someone else can do better:
select main.IU, main.id,
CASE
WHEN main.id = 1 THEN 'Y'
WHEN main.id = 2 AND lag.id in (1, 2) THEN 'Y'
ELSE NULL
END as new_sel
from #test main left outer join
#test lag on main.IU = lag.IU + 1

I think you have the design error here. MS SQL Server does not knows nothing about "next" and "previous" rows. if you try to select the records, the order of the records can be changed from time to time, unless you specify the ordering using ORDER BY statement.
I think you need to change the structure of the tables first.
EDIT: As I see, you have the field and can order your records. Now you can achive your goal using CURSOR.
Briefly, you can create the CURSOR FOR SELECT IU, ID ORDER BY IU ASC.
looping through the cursor records you can check the sequence of the values of ID field, and when sequence will be fully equivalent, you can update the corresponding record.

T-Sql Query - Get Unique Rows Across 2 Columns

I have a set of data, with columns x and y. This set contains rows where, for any 2 given values, A and B, there is a row with A and B in columns x and y respectivly and there will be a second row with B and A in columns x and y respectivly.
E.g
**Column X** **Column Y**
Row 1 A B
Row 2 B A
There are multiple pairs of data in
this set that follow this rule.
For every row with A, B in Columns
X and Y, there will always be a
row with B, A in X and Y
Columns X and Y are of type int
I need a T-Sql query that given a set with the rules above will return me either Row 1 or Row 2, but not both.
Either the answer is very difficult, or its so easy that I can't see the forest for the trees, either way it's driving me up the wall.

Add to your query the predicate,
where X < Y
and you can never get row two, but will always get row one.
(This assumes that when you wrote "two given values" you meant two distinct given values; if the two values can be the same, add the predicate where X <= Y (to get rid of all "reversed" rows where X > Y) and then add a distinct to your select list (to collapse any two rows where X == Y into one row).)
In reply to comments:
That is, if currently your query is select foo, x, y from sometable where foo < 3; change it to select foo, x, y from sometable where foo < 3 and x < y;, or for the the second case (where X and Y are not distinct values) select distinct foo, x, y from sometable where foo < 3 and x <= y;.

This should work.
Declare #t Table (PK Int Primary Key Identity(1, 1), A int, B int);
Insert into #t values (1, 2);
Insert into #t values (2, 1);
Insert into #t values (3, 4);
Insert into #t values (4, 3);
Insert into #t values (5, 6);
Insert into #t values (6, 5);
Declare #Table Table (ID Int Primary Key Identity(1, 1), PK Int, A Int, B Int);
Declare #Current Int;
Declare #A Int;
Insert Into #Table
Select PK, A, B
From #t;
Set #Current = 1;
While (#Current <= (Select Max(ID) From #Table) Begin
Select #A = A
From #Table
Where ID = #Current;
If (#A Is Not Null) Begin
Delete From #Table Where B = #A;
If ((Select COUNT(*) From #Table Where A = #A) > 1) Begin
Delete From #Table Where ID = #Current;
End
End
Set #A = Null;
Set #Current = #Current + 1;
End
Select a.*
From #tAs a
Inner Join #Table As b On a.PK = b.PK

SELECT O.X, O.Y
FROM myTable O
WHERE EXISTS (SELECT X, Y FROM myTable I WHERE I.X = O.Y AND I.Y = O.X)
I have not tried this. But, this should work.

To get the highest and lowest of each pair, you could use:
(X+Y+ABS(X-Y)) / 2 as High, (X+Y-ABS(X-Y)) / 2 as Low
So now use DISTINCT to get the pairs of them.
SELECT DISTINCT
(X+Y+ABS(X-Y)) / 2 as High, (X+Y-ABS(X-Y)) / 2 as Low
FROM YourTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merge data from multiple columns into one - sql

if you are using ms sq-server, you can use UNPIVOT DECLARE #MyTable TABLE (X INT, Y INT, Z INT) INSERT INTO #MyTable VALUES (1 ,0, 0 ), (0 ,1, 0 ), (0 ,0, 1 ) SELECT DISTINCT A FROM (select * from #MyTable UNPIVOT( V FOR A IN ([X], [Y], [Z])) UNPVT) UP WHERE V = 1 Result: A --------- X Y Z

Related

Choose a record from the table where two of the default values are 0 or 1, a bit tricky

How to count the number of occurrence in 4 columns

Is there a 2d function that will satisfy this group by requirement on two columns or is there a more sql way to do?

SQL Update question

T-Sql Query - Get Unique Rows Across 2 Columns

Categories

Resources