I was trying with cross join and unnest , but I only managed to split one column, not all three at the same time
I have this table in amazon athena
and I want to separate the columns with lists into rows, leaving a table like this
COL1
COL2
COL3
COL4
COL5
COL6
765045
5782
jd938
1
a
pickup
765045
5782
jd938
2
b
delivery
41118
78995
kd982
5
g
pickup
41118
78995
kd982
8
q
delivery
411620
65852
km0899
9
k
pickup
411620
65852
km0899
6
b
delivery
select
t.COL1, t.COL2,t.COL3, u.COL4
from t
cross join
unnest(t.COL4) u(COL4)
I was thinking of making subtables and repeating this code 3 times but I wanted to know if there is a more efficient way
unnest supports handling multiple columns in one statement. Also you can use succinct syntax omitting the CROSS JOIN:
select
t.COL1, t.COL2,t.COL3, u.COL4, u.COL5, u.COL6
from t,
unnest(t.COL4, t.COL5, t.COL6) AS u(COL4, COL5, COL6)
Note that for array of different cardinality it will substitute missing values with null's. And if all arrays are empty the row will not be added to the final result (but you can work around this by adding a dummy array with one element like was done here).
Related
I have data that looks like:
row
col1
col2
col3
...
coln
1
A
null
B
...
null
2
null
B
C
...
D
3
null
null
null
...
A
I want to condense the columns together to get:
row
final
1
A, B
2
B, C, D
3
A
The order of the letters doesn't matter, and if the solution includes the nulls eg. A,null,B,null ect. I can work out how to remove them later. I've used up to coln as I have about 200 columns to condense.
I've tried a few things and if I were trying to condense rows I could use STRING_AGG() example
Additionally I could do this:
SELECT
CONCAT(col1,", ",col2,", ",col3,", ",coln) #ect.
FROM mytable
However, this would involve writing out each column name by hand which isn't really feasible. Is there a better way to achieve this ideally for the whole table.
Additionally CONCAT returns NULL if any value is NULL.
#standardSQL
select row,
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.* except(row))), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as final
from `project.dataset.table` t
if to apply to sample data in your question - output is
Not in exact format that you asked for, but you can try if this simplifies things for you:
SELECT TO_JSON_STRING(mytable) FROM mytable
If you want the exact format, you can write a regex to extract values from the output JSON string.
I have a table that has a field where the contents are a concatenated list of selections from a multi-select form. I would like to convert the data in this field into in another table where each row has the text of the selection and a count the number of times this selection was made.
eg.
Original table:
id selections
1 A;B
2 B;D
3 A;B;D
4 C
I would like to get the following out:
selection count
A 2
B 3
C 1
D 2
I could easily do this with split and maps in javascript etc, but not sure how to approach it in SQL. (I use Postgresql) The goal is to use the second table to plot a graph in Google Data Studio.
A much simpler solution:
select regexp_split_to_table(selections, ';'), count(*)
from test_table
group by 1
order by 1;
You can use a lateral join and handy set-returning function regexp_split_to_table() to unnest the strings to rows, then aggregate and count:
select x.selection, count(*) cnt
from mytable t
cross join lateral regexp_split_to_table(t.selections, ';') x(selection)
group by x.selection
I'm using SQL Server 14 and I need to count the number of null values in a row to create a new column where a "% of completeness" for each row will be stored. For example, if 9 out of 10 columns contain values for a given row, the % for that row would be 90%.
I know this can be done via a number of Case expressions, but the thing is, this data will be used for a live dashboard and won't be under my supervision after completion.
I would like for this % to be calculated every time a function (or procedure? not sure what is used in this case) is run and need to know the number of columns that exist in my table in order to count the null values in a row and then divide by the number of columns to find the "% of completeness".
Any help is greatly appreciated!
Thank you
One method uses cross apply to unpivot the columns to rows and count the ratio of non-null values.
Assuming that your table has columns col1 to col4, you would write this as:
select t.*, x.*
from mytable t
cross apply (
select avg(case when col is not null then 1.0 else 0 end) completeness_ratio
from (values (col1), (col2), (col3), (col4)) x(col)
) x
I'm asking for a solution without functions or procedures (Permissions problem).
I have a table like this:
where k=number of columns (In reality : k=500)
col1 col2 col3 col4 col5.... col(k)
10 20 30 -50 60 100
and I need to create a comulative row like this:
col1 col2 col3 col4 col5 ... col(k)
10 30 60 10 70 X
In Excel, it's a simple shit to make a forumla and drag it but in sql if I have lot of columns, it seems a very clumsy work to add Manually (col1 as col1, col1+col2 as col2, col1+col2+col3 as col3 till colk etc).
Any way of finding a good solution for this problem?
You say that you've changed your data model to rows. So let's say that the new table has three columns:
grp (some group key to identify which rows belong together, i.e. what was one row in your old table)
pos (a position number from 1 to 500 to indicate the order of the values)
value
You get the cumulative sums with SUM OVER:
select grp, pos, value, sum(value) over (partition by grp order by pos) as running_total
from mytable
order by grp, pos;
If this "colk" is going to be needed/used in a lot of reports, I suggest you create a computed column or a view to sum all the columns using k = cola+colb+...
There's no function in sql to sum up columns (ex. between colA and colJ)
Given a Google BigQuery dataset with col_1....col_m, how can you use Google BigQuery SQL to return the dataset where there are no duplicates in say... [col1, col3, col7] such that when there are rows with duplicates in [col1, col3, col7], then the first row among those duplicates is returned, and the rest of the rows which have duplicate fields in those columns are all removed?
Example: removeDuplicates([col1, col3])
col1 col2 col3
---- ---- ----
r1: 20 25 30
r2: 20 70 30
r3: 40 70 30
returns
col1 col2 col3
---- ---- ----
r1: 20 25 30
r3: 40 70 30
To do this using python pandas is easy. For a dataframe (i.e. matrix), you call the pandas function removedDuplicates([field1, field2, ...]). However, removeDuplicates is not specified within the context of Google Big Query SQL.
My best guess with how to do it in Google Big Query is to use the rank() function:
https://cloud.google.com/bigquery/query-reference#rank
I am looking for a concise solution if one exists.
You can group by all of your columns that you want to remove duplicates from, and use FIRST() of the others. That is, removeDuplicates([col1, col3]) would translate to
SELECT col1, FIRST(col2) as col2, col3
FROM table
GROUP EACH BY col1, col3
Note that in BigQuery SQL, if you have more than a million distinct values for col1 and col3, you'll need the EACH keyword.