Big Query Record split into multiple columns - sql

I have a table that looks like:
text | STRING
concepts | RECORD
concepts.name | STRING
[...]
So one row could look like this:
"This is a text about BigQuery and how to split records into columns. "
SQL
BigQuery
Questions
I would like to transform that into:
text, concepts_1, concepts_2, concepts_3 // The names are not important
"This is a text about BigQuery and how to split records into columns. ",SQL,BigQuery,Questions
The number of concepts in each row vary.
EDIT:
This would also work:
text, concepts
"This is a text about BigQuery and how to split records into columns. ","SQL,BigQuery,Questions"

Below is for BigQuery Standard SQL
If comma separated list is fine with you - consider below shortcut versions
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM UNNEST(concepts)) AS conceptName
FROM `project.dataset.articles`
and
#standardSQL
SELECT identifier,
(SELECT STRING_AGG(name, ', ') FROM articles.concepts) AS conceptName
FROM `project.dataset.articles` articles
Both above versions return output like below
Row identifier conceptName
1 1 SQL, BigQuery, Questions
2 2 xxx, yyy, zzz
As you can see - above versions are brief and compact and don't use extra grouping to array with then transforming it into string - as all this can be done in one simple shot

This was the solution for me. But it only creates a comma-separated string. However, in my case, this was fine.
SELECT articles.identifier, ARRAY_TO_STRING(ARRAY_AGG(concepts.name), ",") as
conceptName
FROM `` articles, UNNEST(concepts) concepts
GROUP BY articles.identifier

Try using this:
SELECT
text,
c.*
FROM
`your_project.your_dataset.your_table`,
UNNEST(
concepts
) c
This will get the text column along with the unnested values from your RECORD column.
Hope it helps.

Related

BigQuery: Call a UDF on each column of each row and aggregate the output in new column dynamically

I have come up with a JS UDF in BigQuery which needs to be call on each cell of each row and the output of that row needs to be aggregated in another column dynamically & should work for all tables. I have referred answer provided by Mikhail in this question : BigQuery - Concatenate multiple columns into a single column for large numbers of columns
This answer partially works for me. But since, some of my tables have columns having text with comma, it ends up in splitting those columns again. eg. In below screenshot, it should have 5 values in the last column one for each. I have tried few ways like using %T for format etc. since I need to make it generic. It is having limitations.by comma.
Following is the query I am using :
SELECT *, (SELECT string_agg(myFunc(col), ', ' ORDER BY offset) FROM UNNEST(split(trim(format('%t', (SELECT AS struct t.* )), '()'), ', ')) col WITH offset WHERE NOT upper(col) = 'NULL') AS funcOutPut FROM `my-project`.db.customer t;
Is there anyway this can be achieved generically for all the tables I have? Any help would be appreciated. :)

Single hive query to remove certain text in data

I have a column data like this in 2 formats
1)"/abc/testapp/v1?FirstName=username&Lastname=test123"
2)"/abc/testapp/v1?FirstName=username"
I want to retrieve the output as "/abc/testapp/v1?FirstName=username" and strip out the data starting with "&Lastname" and ending with "".The idea is to remove the Lastname with its value.
But if the data doesn't contain "&Lastname" then it should also work fine as per the second scenario
The value for Lastname shown in the example is "test123" but in general this will be dynamic
I have started with regexp_replace but i am able to replace "&Lastname" but not its value.
select regexp_replace("/abc/testapp/v1?FirstName=username&Lastname=test123&type=en_US","&Lastname","");
Can someone please help here how i can achieve both these with a single hive query?
Use split function:
with your_data as (--Use your table instead of this example
select stack (2,
"/abc/testapp/v1?FirstName=username&Lastname=test123",
"/abc/testapp/v1?FirstName=username"
) as str
)
select split(str,'&')[0] from your_data;
Result:
_c0
/abc/testapp/v1?FirstName=username
/abc/testapp/v1?FirstName=username
Or use '&Lastname' pattern for split:
select split(str,'&Lastname')[0] from your_data;
It will allow something else with & except starting with &Lastname
for both queries with or without last name its working in this way using split for hive no need for any table to select you can directly execute the function like select functionname
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0]
select
split("/abc/testapp/v1FirstName=username",'&')[0]
Result :
_c0
/abc/testapp/v1FirstName=username
you can make a single query :
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0],
split("/abc/testapp/v1FirstName=username",'&')[0]
_c0 _c1
/abc/testapp/v1FirstName=username /abc/testapp/v1FirstName=username

How run Select Query with LIKE on thousands of rows

Newbie here. Been searching for hours now but I can seem to find the correct answer or properly phrase my search.
I have thousands of rows (orderids) that I want to put on an IN function, I have to run a LIKE at the same time on these values since the columns contains json and there's no dedicated table that only has the order_id value. I am running the query in BigQuery.
Sample Input:
ORD12345
ORD54376
Table I'm trying to Query: transactions_table
Query:
SELECT order_id, transaction_uuid,client_name
FROM transactions_table
WHERE JSON_VALUE(transactions_table,'$.ordernum') LIKE IN ('%ORD12345%','%ORD54376%')
Just doesn't work especially if I have thousands of rows.
Also, how do I add the order id that I am querying so that it appears under an order_id column in the query result?
Desired Output:
Option one
WITH transf as (Select order_id, transaction_uuid,client_name , JSON_VALUE(transactions_table,'$.ordernum') as o_num from transactions_table)
Select * from transf where o_num like '%ORD12345%' or o_num like '%ORD54376%'
Option two
split o_num by "-" as separator , create table of orders like (select 'ORD12345' as num
Union
Select 'ORD54376' aa num) and inner join it with transf.o_num
One method uses OR:
WHERE JSON_VALUE(transactions_table, '$.ordernum') LIKE IN '%ORD12345%' OR
JSON_VALUE(transactions_table, '$.ordernum') LIKE '%ORD54376%'
An alternative method uses regular expressions:
WHERE REGEXP_CONTAINS(JSON_VALUE(transactions_table, '$.ordernum'), 'ORD12345|ORD54376')
According to the documentation, here, the LIKE operator works as described:
Checks if the STRING in the first operand X matches a pattern
specified by the second operand Y. Expressions can contain these
characters:
A percent sign "%" matches any number of characters or
bytes.
An underscore "_" matches a single character or byte.
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If
you are using raw strings, only a single backslash is required. For
example, r"\%".
Thus , the syntax would be like the following:
SELECT
order_id,
transaction_uuid,
client_name
FROM
transactions_table
WHERE
JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD12345%'
OR JSON_VALUE(transactions_table,
'$.ordernum') LIKE '%ORD54376%
Notice that we specify two conditions connected with the OR logical operator.
As a bonus information, when querying large datasets it is a good pratice to select only the columns you desire in your out output ( either in a Temp Table or final view) instead of using *, because BigQuery is columnar, one of the reasons it is faster.
As an alternative for using LIKE, you can use REGEXP_CONTAINS, according to the documentation:
Returns TRUE if value is a partial match for the regular expression, regex.
Using the following syntax:
REGEXP_CONTAINS(value, regex)
However, it will also work if instead of a regex expression you use a STRING between single/double quotes. In addition, you can use the pipe operator (|) to allow the searched components to be logically ordered, when you have more than expression to search, as follows:
where regexp_contains(email,"gary|test")
I hope if helps.

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

split up sql column into queryable results set

Here is my issue:
I have a column with the following data in an sql column:
Answers
=======
1:2:5: <--- notice my delimiter
I need to be able to break up the digits into a result set that i can join against a lookup table such as
Answers_Expanded
=======
1 apple
2 pear
3 cherry
4 mango
5 grape
and return
Answers
=======
apple pear grape
Any such way?
Thanks!
This is a bit of a hack (the LIKE, the XML PATH, and the STUFF), and it assumes that you want the answers ordered by their ID as opposed to matching up with the original order in the multivalued column...
But this gives the results you're looking for:
SELECT STUFF((
SELECT ' ' + ae.Answer
FROM
Answers_Expanded ae
JOIN Answers a ON ':' + a.Answers LIKE '%:' + CAST(ae.ID AS VARCHAR) + ':%'
ORDER BY ae.ID
FOR XML PATH(''))
, 1, 1, '') AS Answers
Sql Fiddle
This works because:
Joining with LIKE finds any Answer_Expanded rows that match the multivalued column.
XML PATH simulates a group concatenation... and allows for ' ' to be specified as the delimiter
STUFF removes the leading delimiter.
This blog post has a good example of a user defined function that will return a table with the values from your delimited string in a column. You can then join that table to your Answers_Expanded table to get your value.
This works fine if you are parsing reasonably short strings, and if you are doing it as a one time thing, but if you have a table with your answers stored in a column like that, you don't want to be running this on the whole table as it will be a large performance hit. Ideally you'd want to avoid getting delimited strings like this in SQL.
i would suggest that you save your answers in a way that one cell has only one number...not multible information in one cell. (violation of the 1st normal form).
otherwise you better use some higher sql language such as T-SQL.