SQL - Turn relationship IDs into a delimited list

Say I have a table with the following data:
You can see columns a, b, & c have a lot of redundancies. I would like those redundancies removed while preserving the site_id info. If I exclude the site_id column from the query, I can get part of the way there by doing SELECT DISTINCT a, b, c from my_table.
What would be ideal is a SQL query that could turn the site IDs relevant to a permutation of a/b/c into a delimited list, and output something like the following:
Is it possible to do that with a SQL query? Or will I have to export everything and use a different tool to remove the redundancies?
The data is in a SQL Server DB, though I'd also be curious how to do the same thing with postgres, if the process is different.

For SQL Server, you can use the FOR XML trick as found in the accepted answer in this post.
For your scenario it would look something like this:
SELECT a, b, c, SiteIds =
STUFF((SELECT ', ' + SiteId
FROM your_table t2
WHERE t2.a = t1.a AND t2.b = t1.b AND t2.c = t1.c
FOR XML PATH('')), 1, 2, '')
FROM your_table t1
GROUP BY a, b, c

For Postgres:
select a,b,c, string_agg(site_id::varchar, ',')
from my_table
group by a,b,b;
I assume site_id is a number, and as string_agg() only accepts character value, this needs to be casted to a character string for the aggregation. This is what site_id::text does. Alternatively you can use the cast() operator: string_agg(cast(site_id as varchar), ',')

This is generally known as String Aggregation. Many RDBMS's have the ability baked in, and many others don't.
In Postgres you just use the STRING_AGG(<field>, <delimiter>) function, and make sure to add a GROUP BY for your non-aggregated fields. Simple stuff.
In SQL Server.. not so pretty, but folks have functions and whatnot that will allow you to do this (like in this Q/A)


How to select a column whose name is a value in another column in POSTGRESQL?

I know this isn't valid SQL, but I'd like to do something like:
SELECT items.{SELECT items.preferred_column}
To elaborate, to achieve what I'm trying to achieve, I could write a long case when statement:
CASE WHEN items.preferred_column = "column_a" THEN items.column_a
CASE WHEN items.preferred_column = "column_b" THEN items.column_b
CASE WHEN items.preferred_column = "column_c" THEN items.column_c
... and so on...
But that seems wrong. I would prefer to write a query that looks at the value of items.preferred_column and loads that column.
Is this possible?
My use case involves an Active Record (the ORM for Rails) query, which limits me. I'm not able to use "INTO" for example.
Doing this without creating a SQL function would preferred, though if it's not possible without creating a SQL function that would be good to know.
Thanks in advance for lending your expertise!
You can try transforming the table rows with row_to_json() and then using json_each(), you can join the resultant "key" field on the preferred_column:
row_to_json(Z.*)::jsonb as rcr,
row_number() over(partition by null order by <whatever comparator clause>) as rn,
FROM items Z)
SELECT b.value, a.*
FROM CTE a, jsonb_each(rcr) b, CTE c
WHERE c.rn=a.rn AND b.key = ( c.preferred_column )
Note that this essentially operates as a quasi-pivot, so you'll need to maintain an index (the row_number invocation) to self-join the table when extracting the appropriate key-value pairs from jsonb_each's set-return. Casting to jsonb will be helpful in that the binary form will alphabetize the key-value pairs by key order within the object itself.
If you need to get the resultant value as a text string instead of a json primitive, you can do
b.value #>>'{}'
instead of using jsonb_each_text(), which will preserve any json columns.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
FROM <table_name>
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
FROM foo
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
FROM foo
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
value_type = 1
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

Parse values from one column into multiple columns in another table

Microsoft SQL Server 2008
I have two tables. One has a column in it with data that is _ delimited
such that:
I need to parse the data in this column out and insert it into another table that has individual columns for each value. Let's say Table A contains one column like the example above but Table B contains three columns, YesOrNo, Age, usState.
What's the easiest way to do this? I've tried doing something like:
SET YesOrNo = SUBSTRING (TableB.Column1, 1, 1)
but SUBSTRING only takes an expression. I really just need some guidance here, I've been banging my head against the wall trying to figure this out since I'm not much of a SQL guru. I can figure out the syntax no problem but maybe I'm not aware of some methods that exist. Thanks
A generic solution, using Charindex of '_' without hardcoding it
declare #s varchar(10) = 'Y_21_CA'
SELECT LEFT(#s, CHARINDEX('_',#s,1)-1) YN,
SUBSTRING(#s, CHARINDEX('_',#s,1)+1,
CHARINDEX('_',#s,CHARINDEX('_',#s,1)) ) Age,
RIGHT(#s, CHARINDEX('_',reverse(#s),1)-1) State
Y 21 CA
In case you expect to use this logic often in other queries, you could make the SELECT statement an inline TVF. Then you would be able to use it with your update like this:
SET YesOrNo = x.YN,
Age = x.Age,
State = x.State
FROM TableB b
INNER JOIN TableA a ON b.ID = a.ID
CROSS APPLY ThreeColumnSplit(a.S) x;
Here's a "live" demo at SQL Fiddle. (Please never mind its using the SQL Server 2012 engine. That's only because the Fiddle's 2008 instance appears to be down at the moment and can't be used. There's nothing SQL Server 2012-specific in the script.)
You should be able to use the following, I think you want an INSERT instead of UPDATE.
The SELECT statement to get the data is:
select substring(yourCol, 1, 1) YesOrNo,
substring(yourcol, 3, len(yourcol)-5) Age,
right(yourCol, 2) usState
from tableA;
See SQL Fiddle with Demo
Then the INSERT statement is:
insert into tableB (YesOrNo, Age, usState)
select substring(yourCol, 1, 1) YesOrNo,
substring(yourcol, 3, len(yourcol)-5) Age,
right(yourCol, 2) usState
from tableA;
See SQL Fiddle with Demo
Note: This assumes that the YesOrNo column will always only have one character and that the usState will always have 2 characters.

ORA-01489: result of string concatenation is too long

ORA-01489: result of string concatenation is too long
The sql query below is meant to extract data from the database as pipe delimited and spools it to a text file on unix
select a||'|'||b||'|'||c||'|'||d from table
It some times gives the ORA error ORA-01489: result of string concatenation is too long
This looks like occuring if the select exceeds 4000 limit
I tried using to_clob but this works only with "union all"
Is there a way i can get around this problem
Do the union before the concatenation.
select to_clob(a) ||'|'|| to_clob(b) ||'|'|| to_clob(c) ||'|'|| to_clob(d) from
select a, b, c, d from table1
select a, b, c, d from table2
As you have found out, using to_clob has its limitations. And to be honest, I think you are using a fine tool (a RDBMS) as a blunt paleolithic weapon.
The easiest way to get around the problem is to do the concatenation in-situ, in your code, as opposed to doing it with SQL. There is a maximum length limit for concatenation operations in Oracle (4k length), and there is a limit on where you can use to_clob.
So if you have those two hard limits, the most sensible thing is to do what I suggested you (do the concatenation in code) instead of trying to subvert or find an almost-magical way to work around that.
select a, b, c, d from table A
select a, b, c, d from table B
Then take the resulting resultset (or whatever language-specific construct you use) and concatenate the fields in your application code.
This way I use to find procedure
select name, LISTAGG(text, ' ' ON OVERFLOW TRUNCATE) WITHIN GROUP (ORDER BY name) from all_source where name in
(select distinct name from all_source where type = 'PROCEDURE' and text like '%P_%') and text like '% OUT %' group by name;
The interesting point is that the table all_source have the information of every procedure in divided by rows, which do processing tricky. For that reason first, I concatenate the rows for every procedure, and second set the option ON OVERFLOW TRUNCATE for avoid the string overflow; of course it works for me because I only need the procedure declaration.
In this case I'm looking for procedures that contain in the name "P_" and have attributes of type OUT.
The same could be applied for search functions only changing type = 'PROCEDURE' by type = 'FUNCTION'
Hope this help.

Matching two columns in MySQL

I'm quite new to SQL and have a question about matching names from two columns located within a table:
Let's say I want to use the soundex() function to match two columsn. If I use this query:
SELECT * FROM tablename WHERE SOUNDEX(column1)=SOUNDEX(column2);
a row is returned if the two names within that row match. Now I'd also like to get those name matches between column1 and column2 that aren't in the same row. Is there a way to automate a procedure whereby every name from column1 is compared to every name from column2?
Thanks :)
p.s.: If anyone could point me in the direction of a n-gram/bi-gram matching algorithm that is easy for a noob to implement into mysql that would be good as well.
If your table has a key, say id, you can try:
select A.column1, B.column2
from tablename as A, tablename as B
where (A.id != B.id) and (SOUNDEX(A.column1) = SOUNDEX(B.column2))
You can join the table to itself on that relationship as such:
SELECT * FROM tablename t1 JOIN tablename t2
ON SOUNDEX(t1.column1) = SOUNDEX(t2.column2);