Postgres Query based on Delimited String - sql

we have a column in one of our tables that consist of a delimited string of ID's, like this: 72,73,313,502.
What I need to do in the query, is to parse that string, and for each ID, join it to another table. So in the case of this string, I'd get 4 rows.......like the ID and the name from the other table.
Can this be done in a query?

One option is regexp_split_to_table() and a lateral join. Assuming that the CSV string is stored in table1(col) and that you want to join each element against table2(id), you would do:
select ... -- whatever columns you want
from table1 t1
cross join lateral regexp_split_to_table(t1.col, ',') x(id)
inner join table2 t2 on t2.id = x.id::int
It should be noted that storing CSV strings in a relational database is bad design, and should almost always be avoided. Each value in the list should be stored as a separate row, using the proper datatype. Recommended reading: Is storing a delimited list in a database column really that bad?

Related

Sql server query with CTE containing CLR returns empty string after joining to more than 5 tables

Kind of a weird one.
I have a CTE, it selects 3 columns. and id, then two varchar fields. The varchar fields use a CLR function to concatenate names with a '/' dividing them.
I run this query on 5 environments and on 4 of them it returns the names with the '/'.
On the other environment, it's empty string (all enviros/servers have same copy of db).
Running the CTE body returns data fine on all environments.
In commenting out sections of the sql, data is returned fine until I join to more than 5 tables (even if the table has nothing to do with the query). Just seems that no matter what table I join to, after 5 joins my varchar data becomes empty string. If I filter out for a specific id, data is returned fine.
Im guessing there's some sort of environment config that's different on the one environment compared to the others. Anyone have any idea?
with my_cte as
(
select my_id,
dbo.List(names1, '/') names1,
dbo.List(names2, '/') names2
from table
where blah= blah
)
select A.some_field,
OU.names1,
OU.names2
from myTable A
inner join my_cte OU on (OU.this_id = A.this_id)
inner join table3
inner join table4
inner join table5
inner join table6 --once i get here, no matter what table6 is, the names1, names2 become empty string
where A.blah = xxxx;
Quick fix was to rewrite the CTE into a temp table until dba's check the config diffs between the boxes. Thanks for the comments.

Unnesting 3rd level dependency in Google BigQuery

I'm trying to Replace the schema in existing table using BQ. There are certain fields in BQ which have 3-5 level schema dependency.
For Ex. comsalesorders.comSalesOrdersInfo.storetransactionid this field is nested under two fields.
Since I'm using this to replace existing table, I can not change the field names in query.
The query looks similar to this
SELECT * REPLACE(comsalesorders.comSalesOrdersInfo.storetransactionid AS STRING) FROM CentralizedOrders_streaming.orderStatusUpdated, UNNEST(comsalesorders) AS comsalesorders, UNNEST(comsalesorders.comSalesOrdersInfo) AS comsalesorders.comSalesOrdersInfo
BQ enables unnesting first schema field but presents problem for 2nd nesting.
What changes do I need to make to this query to use UNNEST() for such depedndent schemas ?
Given that you don't have a schema, I will try to provide a generalized answer. Please try to understand the difference between the 2 queries.
-- Provide an alias for each unnest (as if each is a separate table)
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(a.second_level_nested) b
left join unnest(b.third_level_nested) c
-- b and c won't work here because you are 'double unnesting'
select c.stuff
from table
left join unnest(table.first_level_nested) a
left join unnest(first_level_nested.second_level_nested) b
left join unnest(first_level_nested.second_level_nested.third_level_nested) c
I'm not sure I understand your question, but as I could guess, you want to change one column type to another type, such as STRING.
The UNNEST function is only used with columns that are array types, for example:
"comsalesorders":["comSalesOrdersInfo":{}, comSalesOrdersInfo:{}, comSalesOrdersInfo:{}]
But not with this kind of columns:
"comSalesOrdersInfo":{"storeTransactionID":"X1056-943462","ItemsWarrenty":0,"currencyCountry":"USD"}
Therefore, if a didn't misunderstand your question, I would make a query like this:
SELECT *, CAST(A.comSalesOrdersInfo.storeTransactionID as STRING)
FROM `TABLE`, UNNEST(comsalesorders) as A

BigQuery how to automatically handle "duplicate column names" on left join

I am working with a dataset of tables that (a) often requires joining tables together, however also (b) frequently has duplicate columns names. Any time I write a query along the lines of:
SELECT
t1.*, t2.*
FROM t1
LEFT JOIN t2 ON t1.this_id = t2.matching_id
...I get the error Duplicate column names in the result are not supported. Found duplicate(s): this_col, that_col, another_col, more_cols, dupe_col, get_the_idea_col
I understand that with BigQuery, it is better to avoid using * when selecting tables, however my data tables aren't too big + my bigquery budget is high, and doing these joins with all columns helps significantly with data exploration.
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
Thanks!
The simplest way is to select records rather than columns:
SELECT t1, t2
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
This is pretty much what I do for ad hoc queries.
If you want the results as columns and not records (they don't look much different in the results), you can use EXCEPT:
SELECT t1.* EXCEPT (duplicate_column_name),
t2.* EXCEPT (duplicate_column_name),
t1.duplicate_column_name as t1_duplicate_column_name,
t2.duplicate_column_name as t2_duplicate_column_name
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
This is possible with BigQuery Legacy SQL - which can be handy for data exploration unless you are dealing with data types or using some functions/features specific to standard sql
So below
#legacySQL
SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.this_id = t2.matching_id
will produce output where all column names will be prefixed with respective alias like t1_this_id and t2_matching_id

How to join two columns using 'on' statement if values in each column are not exactly the same?

I want to join two columns from two different tables where values in column A is not exactly the same as column B. I mean values in column A (which is of type text) is part of values in column B (of type text as well)
I don't find any SQL operation that fits what I need.
For example: this is a value from column A:
'bad-things-gone'
And this is the corresponding value from column B:
'/article/bad-things-gone'
I am using the inner join technique.
select
articles.title, counted_views.top_counts
from
articles
inner join
counted_views on articles.column_A (operation) counted_views.column_B;
If the prefix is always /article/ you could just concat() that.
SELECT articles.title,
counted_views.top_counts
FROM articles
INNER JOIN counted_views
ON counted_views.column_b = concat('/article/', articles.column_a);
If the prefix is variable you could use LIKE. It compares strings by simple patterns.
SELECT articles.title,
counted_views.top_counts
FROM articles
INNER JOIN counted_views
ON counted_views.column_b LIKE concat('%', articles.column_a);
% is a wildcard for any character.
If there's also a suffix you can append another % at the end.
There are many way to make such a weak join, which are mainly different in performance and from the database vendor.
Some common approaches, and the resulting join condition:
Normalize the strings e.g. by removing all non Alpha chars and only compare this.
ON regexp_replace(upper(column_b),[^A-Z],'') = regexp_replace(upper(column_b),[^A-Z],'')
Use database functions which returns the distance between strings (see [https://en.wikipedia.org/wiki/Levenshtein_distance]).
ON EDIT_DISTANCE(column_b, column_a) < 6
Use database functions which only check if string a is included in b.
ON contains(column_b, column_a)
The above functions like regexp_replace are oracle specific, but similar exists for all major databases.

Hive SQL: How to join number to a string of delimited numbers

I need to join two tables by an ID where one ID is stored as a number (i.e. 12345) the other ID is stored as a pipe delimited string (i.e 12345|12346|12347). Is there a quick way to join on the two? Thanks!
** I guess I should say join if the number ID (12345) is in the string of numbers (12345|12346|12347). In theory this example would join as 12345 is in the pipe delimited string.
This will work in Hive
select obj1.*,obj2.some_fields from table1 obj1
JOIN table2 obj2
on (obj1.id=split(obj2.id,'|')[0])
It's not clear to me if you mean SQL or HiveQL.
Is there a quick way to join on the two?
No, not really.
Your DB schema violates First Normal Form. Joining these tables will be slow and error prone.
For DB-agnostic try:
SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON t2.id LIKE ('%' + CAST(t1.id as varchar) + '%')