BigQuery: Call a UDF on each column of each row and aggregate the output in new column dynamically - google-bigquery

I have come up with a JS UDF in BigQuery which needs to be call on each cell of each row and the output of that row needs to be aggregated in another column dynamically & should work for all tables. I have referred answer provided by Mikhail in this question : BigQuery - Concatenate multiple columns into a single column for large numbers of columns
This answer partially works for me. But since, some of my tables have columns having text with comma, it ends up in splitting those columns again. eg. In below screenshot, it should have 5 values in the last column one for each. I have tried few ways like using %T for format etc. since I need to make it generic. It is having limitations.by comma.
Following is the query I am using :
SELECT *, (SELECT string_agg(myFunc(col), ', ' ORDER BY offset) FROM UNNEST(split(trim(format('%t', (SELECT AS struct t.* )), '()'), ', ')) col WITH offset WHERE NOT upper(col) = 'NULL') AS funcOutPut FROM `my-project`.db.customer t;
Is there anyway this can be achieved generically for all the tables I have? Any help would be appreciated. :)

Related

How can I automatically extract content from a field in a SQL query?

The environment I am currently working in is Snowflake.
As a matter of data sensitivty, I will be using pseudonyms for my following question.
I have a specific field in one of my tables called FIELD_1. The data in this field is structured as such:
I am trying to figure out how to automatically extract from my FIELD_1 the output I have in FIELD_2.
Does anyone have any idea what kind of query I would need to achieve this? Any help would be GREATLYappreciated! I am really quite stuck on this problem.
Thank you!
You seem to want everything up to the first four numbers. Then to replace the underscores with spaces. If so:
select replace(regexp_substr(field_1, '^[^0-9]*[0-9]{4}'), '_', ' ')
Or alternatively, if you want the first three components separated by underscores:
select replace(regexp_substr(field_1, '^[^_]+_[^_]+_[0-9]{4}'), '_', ' ')
If the data is as simplistic in reality as you've described here, you can use a variable-length LEFT() function in conjunction with REPLACE() to get the desired output:
SELECT FIELD_1, REPLACE(LEFT(FIELD_1, LEN(FIELD_1)-10),'_',' ') AS FIELD_2
FROM table_name
See also:
SELECT - Snowflake Documentation
LEFT - Snowflake Documentation
REPLACE - Snowflake Documentation
LENGTH, LEN - Snowflake Documentation

SQLite - order by numbers inside string

its my first post here, so bear with me.
I'm trying to order a query by numbers in a specific row that contain letters, using SQLite.
Example: "Winter 1993".
I want to be able to sort by the numbers only, without altering the table structure.
My query:
select Col from table order by Col*1, Col Asc
The query sorts by letters first and then by numbers, I just want it sorted by numbers.
Anyone has any idea how to do this?
So it would be {Season} {Year}
If the numbers are consistently located after the first space in the string, we can use string functions to extract them as follows:
select col
from mytable
order by substr(col, instr(col, ' ') + 1) + 0, col

Big Query Record split into multiple columns

I have a table that looks like:
text | STRING
concepts | RECORD
concepts.name | STRING
[...]
So one row could look like this:
"This is a text about BigQuery and how to split records into columns. "
SQL
BigQuery
Questions
I would like to transform that into:
text, concepts_1, concepts_2, concepts_3 // The names are not important
"This is a text about BigQuery and how to split records into columns. ",SQL,BigQuery,Questions
The number of concepts in each row vary.
EDIT:
This would also work:
text, concepts
"This is a text about BigQuery and how to split records into columns. ","SQL,BigQuery,Questions"
Below is for BigQuery Standard SQL
If comma separated list is fine with you - consider below shortcut versions
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM UNNEST(concepts)) AS conceptName
FROM `project.dataset.articles`
and
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM articles.concepts) AS conceptName
FROM `project.dataset.articles` articles
Both above versions return output like below
Row identifier conceptName
1 1 SQL, BigQuery, Questions
2 2 xxx, yyy, zzz
As you can see - above versions are brief and compact and don't use extra grouping to array with then transforming it into string - as all this can be done in one simple shot
This was the solution for me. But it only creates a comma-separated string. However, in my case, this was fine.
SELECT articles.identifier, ARRAY_TO_STRING(ARRAY_AGG(concepts.name), ",") as
conceptName
FROM `` articles, UNNEST(concepts) concepts
GROUP BY articles.identifier
Try using this:
SELECT
text,
c.*
FROM
`your_project.your_dataset.your_table`,
UNNEST(
concepts
) c
This will get the text column along with the unnested values from your RECORD column.
Hope it helps.

Transform data in Google bigquery - extract text, split it into multiple columns and pivoting the data

I have some weblog data in big query which I need to transform to make it easier to use and query. The data looks like:
I want to extract and transform the data within the curled brackets after Results{…..} (colored blue). The data is of the form ‘(\d+((PQ)|(KL))+\d+)’ and there can be 1-20+ entries in the result array. I am only interested in the first 16 entries.
I have been able to extract the data within curled brackets into a new column, using Substr and regext_extract. But I'm unable to SPLIT it into columns (sometimes there is only 1 result and so the delimiter "," is missing. I'm new with regex, may be I can use something like ‘(\d+((PQ)|(KL))+\d+){1}’ etc. to split the data into multiple columns and then pivot it.
Ideal output in my case would be to transform it into something like:
In the above solution, each row in original table is repeated from 1-16 times depending on the number of items in the Results array.
I’m not completely sure if it’s possible to do this in big query. I’ll be grateful if anyone can help me out a little here.
If this is not possible, then I can have 16 rows for every event with NULL values in Event_details for cases where there are less than 16 entries in result array.
In case both of these are not possible, the last solution would be to have it transformed into something like:
The reason I want to transform the data is that in most of the cases I would need to find which result array items are appearing and in what order.
Check this out: Split string into multiple columns with bigquery.
In their case its delimited by spaces. replace the \s with ','
something like:
SELECT
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){0}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word0,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){1}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word1,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){2}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word2,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){3}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word3,
FROM
(SELECT 'bla{1234PQ5,6789KL0,1234PQ5,6789KL0,123' as StringToParse)
Use SPLIT()
SELECT Event_ID, Event_UserID, Event_SessionID, Keyword,
SPLIT(REGEXP_EXTRACT(Event_details,"Results\{(.*)\}"),",") as Event_details_item
FROM mydata.mytable

split up sql column into queryable results set

Here is my issue:
I have a column with the following data in an sql column:
Answers
=======
1:2:5: <--- notice my delimiter
I need to be able to break up the digits into a result set that i can join against a lookup table such as
Answers_Expanded
=======
1 apple
2 pear
3 cherry
4 mango
5 grape
and return
Answers
=======
apple pear grape
Any such way?
Thanks!
This is a bit of a hack (the LIKE, the XML PATH, and the STUFF), and it assumes that you want the answers ordered by their ID as opposed to matching up with the original order in the multivalued column...
But this gives the results you're looking for:
SELECT STUFF((
SELECT ' ' + ae.Answer
FROM
Answers_Expanded ae
JOIN Answers a ON ':' + a.Answers LIKE '%:' + CAST(ae.ID AS VARCHAR) + ':%'
ORDER BY ae.ID
FOR XML PATH(''))
, 1, 1, '') AS Answers
Sql Fiddle
This works because:
Joining with LIKE finds any Answer_Expanded rows that match the multivalued column.
XML PATH simulates a group concatenation... and allows for ' ' to be specified as the delimiter
STUFF removes the leading delimiter.
This blog post has a good example of a user defined function that will return a table with the values from your delimited string in a column. You can then join that table to your Answers_Expanded table to get your value.
This works fine if you are parsing reasonably short strings, and if you are doing it as a one time thing, but if you have a table with your answers stored in a column like that, you don't want to be running this on the whole table as it will be a large performance hit. Ideally you'd want to avoid getting delimited strings like this in SQL.
i would suggest that you save your answers in a way that one cell has only one number...not multible information in one cell. (violation of the 1st normal form).
otherwise you better use some higher sql language such as T-SQL.