How do I select columns based on a string pattern in BigQuery - google-bigquery

I have a table in BigQuery with hundreds of columns, and it just happens that I want to select all of them except for those that begin with an underscore. I know how to do a query to select the columns beginning with an underscore using the INFORAMTION_SCHEMA.COLUMNS table, but I can't figure out how I would use this query to select the columns I want. I know BigQuery has EXCEPT but I want to avoid writing out each column that begins with an underscore, and I can't seem to pass to it a subquery or even something like a._*.

Consider below approach
execute immediate (select '''
select * except(''' || string_agg(col) || ''') from your_table
'''
from (
select col
from (select * from your_table limit 1) t,
unnest([struct(translate(to_json_string(t), '{}"', '') as kvs)]),
unnest(split(kvs)) kv,
unnest([struct(split(kv, ':')[offset(0)] as col)])
where starts_with(col, '_')
));
if apply to table like below
it generates below statement
select * except(_c,_e) from your_table
and produces below output

Related

Postgresql subtract comma separated string in one column from another column

The format is like:
col1
col2
V1,V2,V3,V4,V5,V6
V4,V1,V6
V1,V2,V3
V2,V3
I want to create another column called col3 which contains the subtraction of two columns.
What I have tried:
UPDATE myTable
SET col3=(replace(col1,col2,''))
It works well for rows like row2 since the order of replacing patterns matters.
I was wondering if there's a perfect way to achieve the same goal for rows like row1.
So the desired output would be:
col1
col2
col3
V1,V2,V3,V4,V5,V6
V4,V1,V6
V2,V3,V5
V1,V2,V3
V2,V3
V1
Any suggestions would be appreciated!
Split values into tables, subtract sets and then assemble it back. Everything is possible as an expression defining new query column.
with t (col1,col2) as (values
('V1,V2,V3,V4,V5,V6','V4,V1,V6'),
('V1,V2,V3','V2,V3')
)
select col1,col2
, (
select string_agg(v,',')
from (
select v from unnest(string_to_array(t.col1,',')) as a1(v)
except
select v from unnest(string_to_array(t.col2,',')) as a2(v)
) x
)
from t
DB fiddle
You will have to unnest the elements then apply an EXCEPT clause on the "unnested" rows and aggregate back:
select col1,
col2,
(select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
from the_table;
I wouldn't store that result in a separate column, but if you really need to introduce even more problems by storing another comma separated list.
update the_table
set col3 = (select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
;
string_to_table() requires Postgres 14 or newer. If you are using an older version, you need to use unnest(string_to_array(col1, ',')) instead
If you need that a lot, consider creating a function:
create function remove_items(p_one text, p_other text)
returns text
as
$$
select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t;
$$
language sql
immutable;
Then the above can be simplified to:
select col1, col2, remove_items(col1, col2)
from the_table;
Note, POSTGRESQL is not my forte, but thought I'd have a go at it. Try:
SELECT col1, col2, RTRIM(REGEXP_REPLACE(Col1,CONCAT('\m(?:', REPLACE(Col2,',','|'),')\M,?'),'','g'), ',') as col3 FROM myTable
See an online fidle.
The idea is to use a regular expession to replace all values, based on following pattern:
\m - Word-boundary at start of word;
(?:V4|V1|V6) - A non-capture group that holds the alternatives from col2;
\M - Word-boundary at end of word;
,? - Optional comma.
When replaced with nothing we need to clean up a possible trailing comma with RTRIM(). See an online demo where I had to replace the word-boundaries with the \b word-boundary to showcase the outcome.

BigQuery - concatenate ignoring NULL

I'm very new to SQL. I understand in MySQL there's the CONCAT_WS function, but BigQuery doesn't recognise this.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string, but some are NULL, and if one is NULL then the whole result will be NULL. Here's what I have so far:
CONCAT(m.track1, ", ", m.track2))) As Tracks,
I tried this but it returns NULL too:
CONCAT(m.track1, IFNULL(m.track2,CONCAT(", ", m.track2))) As Tracks,
Super grateful for any advice, thank you in advance.
Unfortunately, BigQuery doesn't support concat_ws(). So, one method is string_agg():
select t.*,
(select string_agg(track, ',')
from (select t.track1 as track union all select t.track2) x
) x
from t;
Actually a simpler method uses arrays:
select t.*,
array_to_string([track1, track2], ',')
Arrays with NULL values are not supported in result sets, but they can be used for intermediate results.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string
Assuming that these are the only fields in the table - you can use below approach - generic enough to handle any number of columns and their names w/o explicit enumeration
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
Below is oversimplified dummy example to try, test the approach
#standardSQL
with `project.dataset.table` as (
select 1 track1, 2 track2, 3 track3, 4 track4 union all
select 5, null, 7, 8
)
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
with output

In BigQuery, identify when columns do not match on UNION ALL

with
table1 as (
select 'joe' as name, 17 as age, 25 as speed
),
table2 as (
select 'nick' as name, 21 as speed, 23 as strength
)
select * from table1
union all
select * from table2
In Google BigQuery, this union all does not throw an error because both tables have the same number of columns (3 each). However I receive bad data output because the columns do not match. Rather than outputting a new table with 4 columns name, age, speed, strength with correct values + nulls for missing values (which would probably be preferred), the union all keeps the 3 columns from the top row.
Is there a good way to catch that the columns do not match, rather than the query silently returning bad data? Is there any way for this to return an error perhaps, as opposed to a successful table? I'm not sure how to check in SQL that the columns in the 2 tables match.
Edit: in this example it is clear to see that the columns do not match, however in our data we have 100+ columns and we want to avoid a situation where we make an error in a UNION ALL
Below is for BigQuery Standard SQL and using scripting feature of BQ
DECLARE statement STRING;
SET statement = (
WITH table1_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table1` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), table2_columns AS (
SELECT column FROM (SELECT * FROM `project.dataset.table2` LIMIT 1) t,
UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
), all_columns AS (
SELECT column FROM table1_columns UNION DISTINCT SELECT column FROM table2_columns
)
SELECT (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table1` UNION ALL '
FROM all_columns a LEFT JOIN table1_columns t USING(column)
) || (
SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table2`'
FROM all_columns a LEFT JOIN table2_columns t USING(column)
)
);
EXECUTE IMMEDIATE statement;
when applied to sample data from your question - output is
Row name age speed strength
1 joe 17 25 null
2 nick null 21 23
After saving table1 and table2 as 2 tables in a dataset in BigQuery, I then used the metadata using INFORMATION_SCHEMA to check that the columns matched.
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table1'
SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table2'
INFORMATION_SCHEMA.COLUMNS returns information including the column names and their positioning. I can join these 2 tables together then to check that the names match...

ORACLE SQL CSV Column comparison

I have a table with CSV values as column. I want use that column in where clause to compare subset of CSV is present or not. For example Table has values like
1| 'A,B,C,D,E'
Query:
select id from tab where csv_column contains 'A,C';
This query should return 1.
How to achieve this in SQL?
You can handle this using LIKE, making sure to search for the three types of pattern for each letter/substring which you intend to match:
SELECT id
FROM yourTable
WHERE (csv_column LIKE 'A,%' OR csv_column LIKE '%,A,%' OR csv_column LIKE '%,A')
AND
(csv_column LIKE 'C,%' OR csv_column LIKE '%,C,%' OR csv_column LIKE '%,C')
Note that match for the substring A means that either A,, ,A, or ,A appears in the CSV column.
We could also write a structurally similar query using INSTR() in place of LIKE, which might even give a peformance boost over using wildcards.
there's probably something funky you can do with regular expressions but in simple terms... if A and C will always be in that order
csv_column LIKE '%A%C%'
otherwise
(csv_column LIKE '%A%' AND csv_column LIKE '%C%' )
If you don't want to edit your search string, this could be a way:
select *
from yourTable
where csv like '%' || replace('A,C', ',', '%') || '%'
For example:
with yourTable(id, csv) as (
select 1, 'A,B,C,D,E' from dual union all
select 2, 'A,C,D,E' from dual union all
select 3, 'B,C,D,E' from dual
)
select *
from yourTable
where csv like '%' || replace('A,C', ',', '%') || '%'
gives:
ID CSV
---------- ---------
1 A,B,C,D,E
2 A,C,D,E
Consider that this will only work if the characters in the search string have the same order of the CSV column; for example:
with yourTable(id, csv) as (
select 1, 'C,A,B' from dual
)
select *
from yourTable
here csv like '%' || replace('A,C', ',', '%') || '%'
will give no results
Why not store the values as separate columns, and then use simple predicate filtering?

Is there a way to shorten this query?

I have a query like this:
SELECT Name,
REPLACE(RTRIM((
SELECT CAST(Score AS VARCHAR(MAX)) + ' '
FROM
(SELECT Name, Score
FROM table
WHERE
---CONDITIONS---
) AS InnerTable
WHERE (InnerTable.Name = OuterTable.Name) FOR XML PATH (''))),' ',', ') AS Scores
FROM table AS OuterTable
WHERE
---CONDITIONS---
GROUP BY Name;
As it can be seen, I am using the same set of conditions to derive the InnerTable and OuterTable. Is there a way to shorten this query? I am asking this because, sometime back, I saw a keyword USING in MySQL that simplified my life using which you can specify a query once and then use its alias for the rest of the query.
You could look at creating a Common Table Expression (CTE). That is your best bet for aliasing a select. Unfortunately I'm not sure how much shoerter it will make your query though it does prevent you from defining the where conditions twice. see below:
with temp as
(
SELECT Name, Score
FROM table
WHERE whatever = 'whatever'
)
SELECT Name,
REPLACE(RTRIM((
SELECT CAST(Score AS VARCHAR(MAX)) + ' '
FROM
(SELECT Name, Score
FROM temp ) AS InnerTable
WHERE (InnerTable.Name = OuterTable.Name) FOR XML PATH (''))),' ',', ') AS Scores
FROM temp AS OuterTable
GROUP BY Name;