BigQuery SQL - flatten array + autofill column

BigQuery SQL - flatten array + autofill column - sql

I'm trying to write a query to unnest an array path in my table, BUT i want each new row to autofill the non-null/non-empty path.id for the interaction_id of that row.
interaction_id
path.id
f91aef41
5863068202
so i would like this to be:
interaction_id
path.id
f91aef41
5863068202
f91aef41
5863068202
not sure if that makes sense but i've tried to do
select
*
from logs, unnest(path)
but that will output:
interaction_id
path.id
f91aef41
f91aef41
5863068202
any help is much appreciated.. thank you

Related

How to combine rows in BigQuery that share a similar name

i'm having trouble creating a query that'll group together responses from multiple rows that share a similar name and count the specific response record in them.
the datatable i currently have looks like this
test_control
values
test
selected
control
selected
test us
not selected
control us
selected
test mom
not selected
control mom
selected
what i'd like, is an output like the below that only counts the number of "selected" responses and groups together the rows that have either "control" or "test" in the name"
test_control
values
test
3
control
1
The query i have below is wrong as it doesn't give me an output of anything. The group by section is where im lost as i'm not sure how to do this. tried to google but couldn't seem to find anything. appreciate any help in advance!!!
SELECT distinct(test_control), values FROM `total_union`
where test_control="%test%" and values="selected"
group by test_control, values

use below
SELECT
REGEXP_EXTRACT(test_control, r'^(TEST|CONTROL) ') AS test_control,
COUNTIF(values = 'selected') AS values
FROM `total_union`
GROUP BY 1

As mentioned by #Mikhail Berlyant, you can use REGEX_EXTRACT to match the expression and COUNTIF to get the count of the total number of matching expressions according to the given condition. Try below code to get the expected output :
Code
SELECT
REGEXP_EXTRACT(test_control, r'^(test|control)') AS test_control,
COUNTIF(values = "selected") AS values
FROM `project.dataset.testvalues`
group by 1
Output

SQL Getting first value from a column and duplicate it in a new column

Hi guys, first thank you for reading and for your potential help.
I'm beginner in Standard SQL and i'm trying to do something but I'm stuck.
As you can see on the picture I have some products with the same item_group_id.
For these products , I want to take the FIRST declinaison value and give it to the other products having the same item_group_id in a new column.
to be more clear I will give the example for the products I encircled.
This is what I'm trying to get :
sku Declinaison item_group_id NEW_COLUMN
195810 ...multi dimensional sophistiqué_10 P195800 ...multi dimensional sophistiqué_10
195820 ...multi dimensional sophistiqué_20 P195800 ...multi dimensional sophistiqué_10
Thank you so much for your help

A way to achieve this could be using a JOIN clause to reference the same table twice. However, this approach is not recommended as it computes many more rows than needed.
Using an analytic function such as FIRST_VALUE is the recommended approach:
SELECT
sku, declinaison, item_group_id,
FIRST_VALUE(declinaison) OVER (PARTITION BY item_group_id ORDER BY sku) AS NEW_COLUMN
FROM
TABLE_NAME

Corresponding rows sql

Can we do this in SQL
Col1,Col2
Asset,1234
Date,1/1/2020
Date2,1/31/2020
Value,10
Asset,12344
Date,1/10/2020
Date2,1/21/2020
Value,45
Asset,12345
Date,1/15/2020
Date2,1/36/2020
Value,99
I have the asset numbers I want to query, but I would like to get corresponding rows as well.
So, when I write where Asset=12344, I need asset,Date, Date2, Value information for asset 12344.
Never came across this before. Is there any way we could achieve this.
Thanks.

Combining duplicate values into 1 row?

please see attached the image (this is the table I am looking at replicated in excel called dbo.StaffDetails)
This row has been duplicated because the surgeon 1 code 127 has been incorrectly input in surgeon1code instead of surgeon2code.. is it possible to write a query that when the SourceID is the same and the surgeon1, surgeon2, surgeon3 codes are not on the same row to actually merge them all? (I hope this makes sense?)
So ideally, the attached image would have the 127 as Surgeon2code instead of being on its own row?
Many thanks for any help with this, I really appreciate it

You can use grouping. Something like:
SELECT SourceID,
MIN(Surgeon1Code) AS Surgeon1Code,
MAX(Surgeon1Code) AS Surgeon2Code,
MAX(Surgeon1Code) AS Surgeon3Code,
MAX(Anaesthetist1Code) AS Anaesthetist1Code,
....
FROM dbo.StaffDetails
GROUP BY SourceID

Sort public trigram data in BigQuery

what I'd like to do is recreate the bigram data from the available/public trigram data on BigQuery. Along the way, I'd like to trim down the data. It's hard because there seems to be a list of data within a single row, for example, cell.value is a column name that contains all the years, and it could have 100 elements in it, and all of that is in one row.
The columns I'd like are something like this:
ngram, first, second, third, cell.match_count*modified
where the modified last column is the sum of all match counts from 2000-2008 (ignoring all the older data). I suspect this would greatly reduce the size of the file (along with a few other tweaks).
The code I have so far is (and I have to run 2 separate queries for this)
SELECT ngram, cell.value, cell.match_count
FROM [publicdata:samples.trigrams]
WHERE ngram = "I said this"
AND cell.value in ("2000","2001","2002","2003","2004","2005","2006","2007","2008")
SELECT ngram, SUM(cell.match_count) as total
FROM [one_syllable.test]
GROUP BY ngram
The result is 2 columns with 1 row of data: I said this, 1181
But I'd like to get this for every ngram before I do some more trimming
How can I combine the queries so it's done all at once and also return the columns first, second, and third ?
Thanks!
PS I've tried
SELECT ngram, cell.value, cell.match_count
FROM [publicdata:samples.trigrams]
WHERE cell.value in ("2000","2001","2002","2003","2004","2005","2006","2007","2008")
But I get an error "response too large to return"...

The error "response too large to return" means that you will have to write the results to a destination table, with "Allow Large Results" checked. BigQuery won't return more than 128MB directly without using a destination table.
You should be able to generate the table you want using some aggregation functions. Try "GROUP EACH BY ngram" to aggregate in parallel and use the FIRST function to pick a single value from the first, second and third columns. It would look something like this:
SELECT ngram, FIRST(first), FIRST(second), FIRST(third), SUM(cell.match_count)
FROM [publicdata:samples.trigrams]
WHERE cell.value in ("2000","2001","2002","2003","2004","2005","2006","2007","2008")
GROUP EACH BY ngram;

Google BIGQUERY now has arrays on the free trigrams dataset and the original answer needs to be modified to flatten the array (cell in this case) by using the UNNEST function. Modified sample SQL code below.
SELECT t1.ngram, t1.first, t1.second, t1.third, SUM(c.match_count)
from bigquery-public-data.samples.trigrams t1, UNNEST(cell) as c
WHERE {"2000","2001","2002","2003","2004","2005","2006","2007","2008"} IN
UNNEST(c.value)
GROUP BY 1,2,3,4;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery SQL - flatten array + autofill column - sql

Related

How to combine rows in BigQuery that share a similar name

SQL Getting first value from a column and duplicate it in a new column

Corresponding rows sql

Combining duplicate values into 1 row?

Sort public trigram data in BigQuery

Categories

Resources