combine distinct row values into a string - sql

combine distinct row values into a string - sql - sql

I would like to take cells in every row and make them into a string of names... My method already deals with casing.
For example, the table;
'john' | | 'smith' | 'smith'
'john' | 'paul' | | 'smith'
'john' | 'john' | 'john' |
returns:
'john smith'
'john paul smith'
'john'
This would need to run postgreSQL 8.2.15 of postgres so I can't make use of potentially useful functions like CONCAT, and data is in a greenplum db.
Alternatively, a method to directly delete duplicate tokens in a list of strings would let me achieve the larger objective. For example:
'john smith john smith'
'john john smith'
'smith john smith'
returns
'john smith'
'john smith'
'smith john'
The order of the tokens is not important, as long as all the unique values are returned, once only.
Thanks

Normalize your table structure, select distinct name values from that table, create a function to aggregate strings (see, e.g., How to concatenate strings of a string field in a PostgreSQL 'group by' query?), and apply that function. Except for the aggregate function creation, this could all be done in a single statement or view.

I've come up with a solution for you! :)
The following query returns the four columns (which I named col_1,2,3and 4) and removes the duplicates by joining the test_table with itself.
Here is the code:
SELECT t1.col_1, t2.col_2, t3.col_3, t4.col_4
FROM (
SELECT id, col_1
FROM test_table
) AS t1
LEFT JOIN (
SELECT id, col_2
FROM test_table
) as t2
ON (t2.id = t1.id and t2.col_2 <> t1.col_1)
LEFT JOIN (
SELECT id, col_3
FROM test_table
) as t3
ON (t3.id = t1.id and t3.col_3 <> t1.col_1 and t3.col_3 <> t2.col_2)
LEFT JOIN (
SELECT id, col_4
FROM test_table
) as t4
ON (t4.id = t1.id and t4.col_4 <> t1.col_1 and t4.col_4 <> t2.col_2 and t4.col_4 <> t3.col_3);
If you want to obtain the final string, you just substitute the "SELECT" row with this one:
SELECT trim(both ' ' FROM (COALESCE(t1.col_1, '') || ' ' || COALESCE(t2.col_2, '') || ' ' || COALESCE(t3.col_3, '') || ' ' || COALESCE(t4.col_4, '')))
this should work with your version of postgres, according with the docs:
[for the trim and concatenation functions]
https://www.postgresql.org/docs/8.2/static/functions-string.html
//***************************************************
[for the coalesce function]
https://www.postgresql.org/docs/8.2/static/functions-conditional.html
Please let me know if I've been of help :)
P.S. Your question sounds like a bad database design: I would have those columns moved on a table in which you could do this operation by using a group by or something similar. Moreover I would do the string concatenation on a separate script.
But that's my way of doing :)

I would do this by unpivoting the data and then reaggregation:
select id, string_agg(distinct col)
from (select id, col1 from t union all
select id, col2 from t union all
select id, col3 from t union all
select id, col4 from t
) t
where col is not null
group by id;
This assumes that each row has an unique id.
You can also use a giant case:
select concat_ws(',',
col1,
(case when col2 <> col1 then col2 end),
(case when col3 <> col2 and col3 <> col1 then col3 end),
(case when col4 <> col3 and col4 <> col2 and col4 <> col1 then col4 end)
) as newcol
from t;
In ancient versions of Postgres, you can phrase this as:
select trim(leading ',' from
(coalesce(',' || col1, '') ||
(case when col2 <> col1 then ',' || col2 else '' end) ||
(case when col3 <> col2 and col3 <> col1 then ',' || col3 else '' end),
(case when col4 <> col3 and col4 <> col2 and col4 <> col1 then ',' || col4 else '' end)
)
) as newcol
from t;

Related

Dynamic Row Data into Column

I have a column which has 100 rows of data. I need to get the top 4 but in instead of rows I need to convert it into columns. Like Col1, Col2, Col3 and Col4.
I have tried
SELECT
MAX (CASE
WHEN rss_name = 'BBC-Sports'
THEN rss_name
END) AS col1,
MAX (CASE
WHEN rss_name = 'Talk Sports'
THEN rss_name
END) AS col2,
MAX (CASE
WHEN rss_name = 'Sky Sports'
THEN rss_name
END) AS col3,
MAX (CASE
WHEN rss_name = 'Crick Info'
THEN rss_name
END) AS col4
FROM
RSS
but it only works with static values:
I need
Col1, Col2, Col3, Col4
Sports,Talk Sports,Sky Sports,Crick Info
but since this is not constant data it will change and the values in Col keep changing.

You could use a derived table to set your column order then use your conditional aggregation on that.
SELECT
MAX(CASE WHEN Col_Rn = 1 THEN Rss_Name END) AS Col1,
MAX(CASE WHEN Col_Rn = 2 THEN Rss_Name END) AS Col2,
MAX(CASE WHEN Col_Rn = 3 THEN Rss_Name END) AS Col3,
MAX(CASE WHEN Col_Rn = 4 THEN Rss_Name END) AS Col4
FROM (
SELECT Rss_Name,
Row_Number() OVER (ORDER BY Rss_Name) AS Col_Rn -- set your order here
FROM RSS
) t

You need to use Dynamic Pivot. But in your case besides you need an extra column for Column names in Pivot like COL_1, COL_2....
Schema: (From your Image. Its better if you provide this sample data in Text).
CREATE TABLE #TAB (Rss_Name VARCHAR(50))
INSERT INTO #TAB
SELECT 'Sports'
UNION ALL
SELECT 'Talk Sports'
UNION ALL
SELECT 'Sky Sports'
UNION ALL
SELECT 'Crick Info'
Now Prepare your dynamic query as below
DECLARE #SQL VARCHAR(MAX)='',#PVT_COL VARCHAR(MAX)='';
--Preparing Dynamic Column List
SELECT #PVT_COL =#PVT_COL
+ '[COL_'+CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS VARCHAR(4))+'],'
FROM #TAB
SELECT #PVT_COL = LEFT(#PVT_COL,LEN(#PVT_COL)-1)
SELECT #SQL =
'SELECT * FROM (
SELECT Rss_Name
,''COL_''+CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS VARCHAR(4)) AS COL_NME
FROM #TAB
)AS A
PIVOT
(
MAX(Rss_Name) FOR COL_NME IN ('+#PVT_COL+')
)PVT'
EXEC (#SQL)
Result:
+--------+-------------+------------+------------+
| COL_1 | COL_2 | COL_3 | COL_4 |
+--------+-------------+------------+------------+
| Sports | Talk Sports | Sky Sports | Crick Info |
+--------+-------------+------------+------------+

SQL script to remove zeros between alphabetic and numeric values within a field

I would like to update a table within my db2 database and remove the zeros that are between the alphabetic and numeric values.
For example I have the column element: CompanyName. I would like to get all the CompanyName's that have the zero(s) between alphabetic and numeric values, i.e. ABCD001234, and replace it with ABCD1234. There are over 30000 of these values, so a script is needed.
Some more examples of the trimming are shown below:
ABCD1234 -> ABCD1234 (no change)
JFKD011011 -> JFKD11011
A000000001 -> A1
Z000000000 -> Z0
Preferably, I would like a script that I can test without the UPDATE, and then add the UPDATE statement after the results look correct.

try this:
select
trim(translate(CompanyName, ' ', '0123456789')) ||
cast(translate(CompanyName, ' ', 'ABCDEFGIJKLMNOPQRSTUVWXYZ') as integer)
from yourtable

Assuming your column is called COL1, in table TABLE1, you can obtain new values as desidered in NEWSTR. From there you can easily prepare the UPDATE:
SELECT COL1, SUBSTR(COL1, 1, ZX)
|| CAST( CAST(SUBSTR(COL1, ZX+1,99) AS INT) AS VARCHAR(18)) AS NEWSTR
FROM
(SELECT COL1, LENGTH(COL1) AS ZY, LENGTH(RTRIM(TRANSLATE(COL1, ' ', '0123456789 '))) AS ZX FROM TABLE1) X
;
To make a test:
SELECT COL1, SUBSTR(COL1, 1, ZX)
|| CAST( CAST(SUBSTR(COL1, ZX+1,99) AS INT) AS VARCHAR(18)) AS NEWSTR
FROM
(SELECT COL1, LENGTH(COL1) AS ZY, LENGTH(RTRIM(TRANSLATE(COL1, ' ', '0123456789 '))) AS ZX FROM
(SELECT 'ABCD1234' AS COL1 FROM sysibm.sysdummy1 UNION ALL SELECT 'JFKD011011' AS COL1 FROM sysibm.sysdummy1
UNION ALL SELECT 'A000000001' AS COL1 FROM sysibm.sysdummy1 UNION ALL SELECT 'Z000000000' AS COL1 FROM sysibm.sysdummy1
) K
) X
Output:
COL1 NEWSTR
---------- ---------
ABCD1234 ABCD1234
JFKD011011 JFKD11011
A000000001 A1
Z000000000 Z0
In MSSQL it should be easier, using PATINDEX() function.

SQL Server : Reuse calculated variable in select clause

I have the following table structure:
col1 col2 col3 col4
-------------------------------
aK Mbcd ABc defgh
col2, col3 and col4 columns are of type varchar(100) and col1 has type varchar(500).
I need a select query to have the output as following
col1 col2 col3 col4
-------------------------------
aK,Mb cd,A Bc,d efgh
Logic is explained as mentioned below:
In the result, Col2, col3 and col4 can have maximum 4 characters but col1 can have more than 4 characters upto 100.
If any column has more characters, last 4 characters will be retained in the same column and other extra columns will be concatenated with previous column's value separated by comma , and the same rule will be applied on the concatenated values as well.
I've written the following T-SQL statement. It works fine for last two columns. But I want to use new calculated value of col3 to strip out extra characters after adding some from col4
SELECT
CASE
WHEN X.Col4Length > 4
THEN concat(X.col3, ',', substring(x.col4, 0, X.Col4Length - 3))
ELSE X.col3
END AS col3,
CASE
WHEN X.Col4Length > 4
THEN substring(x.col4, X.Col4Length - 3, x.Col4Length)
ELSE X.col4
END AS col4
FROM
(SELECT
Col1, Col2, Col3, Col4,
Len(Col1) AS Col1Length,
Len(Col2) AS Col2Length,
Len(Col3) AS Col3Length,
Len(Col4) AS Col4Length
FROM
mytable) X

My try with a simple sub-query
with t1 as (
select 'aK' col1, 'Mbcd' col2, 'ABc' col3, 'defgh' col4
---
SELECT LEFT(col, LEN(col) - 12) col1,
RIGHT(LEFT(col, LEN(col) - 8), 4) col2,
RIGHT(LEFT(col, LEN(col) - 4), 4) col3,
RIGHT(col, 4) AS col4
FROM
(
SELECT col1+','+col2+','+col3+','+col4 AS col
FROM t1
) t;

You want to reuse calculated variables
There are two set-based /inline / adhoc approaches (and many more ugly procedural):
CTEs to do this for the whole set in advance
CROSS APPLY for the same on row level
Try it like this (CTE approach)
DECLARE #tbl TABLE(col1 VARCHAR(100),col2 VARCHAR(100),col3 VARCHAR(100),col4 VARCHAR(100));
INSERT INTO #tbl VALUES
('aK','Mbcd','ABc','defgh')
,('123456','abc','3456','123456789');
WITH ResolveCol4 AS
(
SELECT *
,RIGHT(col4,4) AS Col4_resolved
,col3 + ',' + CASE WHEN LEN(col4)>4 THEN SUBSTRING(col4,1,LEN(col4)-4) ELSE '' END AS col3_New
FROM #tbl
)
,ResolveCol3 AS
(
SELECT *
,RIGHT(col3_New,4) AS Col3_resolved
,col2 + ',' + CASE WHEN LEN(col3_New)>4 THEN SUBSTRING(col3_New,1,LEN(col3_New)-4) ELSE '' END AS col2_New
FROM ResolveCol4
)
,ResolveCol2 AS
(
SELECT *
,RIGHT(col2_New,4) AS Col2_resolved
,col1 + ',' + CASE WHEN LEN(col2_New)>4 THEN SUBSTRING(col2_New,1,LEN(col2_New)-4) ELSE '' END AS col1_New
FROM ResolveCol3
)
SELECT col1_new,Col2_resolved,Col3_resolved,Col4_resolved
FROM ResolveCol2
The result
aK,Mb cd,A Bc,d efgh
123456,abc,34 56,1 2345 6789

Is there any way to print the Query result horizontally in Oracle DB

Select * from TABLENAME WHERE "CLAUSE"
It will print the result in a single row.
Col 1 Col 2 ...... Col N
Val 1 Val 2 ...... Val N
I need
Col 1 Val 1
Col 2 Val 2
.
.
.
Col N Val N

A little time consuming to do on a regular basis, but:
select COL_NAME, COL_DATA
from (SELECT * FROM table_name
WHERE clause)
unpivot ( COL_NAME FOR COL_DATA IN ( COL1 as 'COL1'
,COL2 as 'COL2'
,COL3 as 'COL3'
,COL4 AS 'COL4')
)
Bear in mind that you also need to cast all of the values to the same data-type as Oracle won't mix datatypes in the same column, so if COL1-3 are number, but COL4 is varchar, then you would
select COL_NAME, COL_DATA
from (SELECT * FROM table_name
WHERE clause)
unpivot ( COL_NAME FOR COL_DATA IN ( TO_CHAR(COL1) as 'COL1'
,TO_CHAR(COL2) as 'COL2'
,TO_CHAR(COL3) as 'COL3'
,COL4 AS 'COL4')
)

select 'col 1', col1 from TABLENAME WHERE "CLAUSE"
UNION ALL
select 'col 2', col2 from TABLENAME WHERE "CLAUSE"
UNION ALL
...
select 'col n', coln from TABLENAME WHERE "CLAUSE"
order by 1

Grouping between consecutive rows in a table without cursor

I trying to get a grouping done between 2 rows without a cursor, can some one help me reg this
Col1(int) Col2(int)
--------- ---------
1 20
2 30
3 40
I want output like this
Col1 Col2
---- ----
1-2 50
2-3 70

Are you sure you aren't missing any rows...
Select cast(a.col1 as varchar(10)) + '-' + cast(b.col1 as varchar(10)) as col1,
a.col2 + b.Col2 as Col2
From mytable a
Inner Join mytable b on b.col1 = (a.col1 + 1)
if you might be missing rows, you might need to be more complicated.

That's a tricky one if you don't want to repeat the rows (1-2, 2-3) and you can expect there to be some missing ids (as would be normal if you have an identity field).
Try this:
CREATE TABLE #temp (id INT, value INT)
INSERT INTO #temp
SELECT 1,2
UNION ALL
SELECT 2,8
UNION ALL
SELECT 3,8
UNION ALL
SELECT 5,19
SELECT id, value, ROW_NUMBER() OVER (ORDER BY id) AS rownumber
INTO #temp2
FROM #temp
SELECT * FROM #temp2
SELECT CAST(b.id AS VARCHAR(10)) + '-' + CAST(a.id AS VARCHAR(10)) AS col1,
a.value + b.value as Col2
FROM #temp2 a
JOIN #temp2 b
ON a.rownumber = b.rownumber+1
WHERE ABS(a.rownumber)%2 = 0

Assuming that col1 is integer
SELECT CAST(a.col1 as VARCHAR(10))+ '-' + CAST(b.col1 as VARCHAR(10)), COALESCE(a.col2,0)+COALESCE(b.col2,0)
FROM table a
JOIN table b a.col1 = b.col1 + 1

you can test following query also...
I have oracle in my machine that's why I can run and say only oracle queries..
please check whether this will work on sql server also or not and tell me about ...
select * from
(Select lag (col1) over (order by col1)|| '-' || col1 as col1
col2 + lag (col2) over (order by col1) as Col2
From mytable
)
where col2 is not null;
in oracle lag () function used to fatch last row values.. and if it is first row then this function will give null values.. so that by appling addition on null values you will get null only
by this concept we will get desired output...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

combine distinct row values into a string - sql - sql

Related

Dynamic Row Data into Column

SQL script to remove zeros between alphabetic and numeric values within a field

SQL Server : Reuse calculated variable in select clause

Is there any way to print the Query result horizontally in Oracle DB

Grouping between consecutive rows in a table without cursor

Categories

Resources