Split function across multiple fields in BigQuery SQL - google-bigquery

I have data like this:
Each column will have the same number of elements across a row, where the first element in the first column corresponds to the first element in the second column etc.
How can I flatten this to get the below?
With a single column I am able to do this by combining a CROSS JOIN with an UNNEST but I cannot get this to work with multiple columns since the join ends up creating multiple variations and UNNEST loses the order of the array so I can't match them.
If I were building the arrays from scratch, I would use some kind of STRUCT element in there, but I can't find a way of doing this when the arrays are created from a SPLIT()?

WITH_OFFSET is your friend here:
WITH strings AS (
SELECT "a,b,c" a, "aa,bb,cc" b
UNION ALL
SELECT "a1,b1,c1" a, "aa1,bb1,cc1" b
)
SELECT x_a, x_b
FROM strings
, UNNEST(SPLIT(a)) x_a WITH OFFSET o_a
JOIN UNNEST(SPLIT(b)) x_b WITH OFFSET o_b
ON o_a=o_b

Another approach for BigQuery Standard SQL is shown below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'a|b|c' col1, 'n|o|p' col2 UNION ALL
SELECT 2, 'd|e', 'q|r' UNION ALL
SELECT 3, 'f|g|h|i', 's|t|u|v' UNION ALL
SELECT 4, 'j', 'w' UNION ALL
SELECT 5, 'k|l|m', 'x|y|z'
)
SELECT
id,
SPLIT(col1, '|')[SAFE_ORDINAL(pos)] value1,
SPLIT(col2, '|')[SAFE_ORDINAL(pos)] value2
FROM `project.dataset.table`,
UNNEST(GENERATE_ARRAY(1, ARRAY_LENGTH(SPLIT(col1, '|')))) pos
with expected result
Row id value1 value2
1 1 a n
2 1 b o
3 1 c p
4 2 d q
5 2 e r
6 3 f s
7 3 g t
8 3 h u
9 3 i v
10 4 j w
11 5 k x
12 5 l y
13 5 m z

Related

Create multiple rows based on 1 column

I currently have a table with a quantity in it.
ID Code Quantity
1 A 1
2 B 3
3 C 2
4 D 1
Is there anyway to write a sql statement that would get me
ID Code Quantity
1 A 1
2 B 1
2 B 1
2 B 1
3 C 1
3 C 1
4 D 1
I need to break out the quantity and have that many number of rows
Thanks
Here's one option using a numbers table to join to:
with numberstable as (
select 1 AS Number
union all
select Number + 1 from numberstable where Number<100
)
select t.id, t.code, 1
from yourtable t
join numberstable n on t.quantity >= n.number
order by t.id
Online Demo
Please note, depending on which database you are using, this may not be the correct approach to creating the numbers table. This works in most databases supporting common table expressions. But the key to the answer is the join and the on criteria.
One way would be to generate an array with X elements (where X is the quantity). So for rows
ID Code Quantity
1 A 1
2 B 3
3 C 2
you would get
ID Code Quantity ArrayVar
1 A 1 [1]
2 B 3 [1,2,3]
3 C 2 [2]
using a sequence function (e.g, in PrestoDB, sequence(start, stop) -> array(bigint))
Then, unnest the array, so for each ID, you get a X rows, and set the quantity to 1. Not sure what SQL distribution you're using, but this should work!
You can use connect by statement to cross join tables in order to get your desired output.
check my solution it works pretty robust.
select
"ID",
"Code",
1 QUANTITY
from Table1, table(cast(multiset
(select level from dual
connect by level <= Table1."Quantity") as sys.OdciNumberList));

How to update table with concatenation

I have table like this
create table aaa (id int not null, data varchar(50), numb int);
with data like this
begin
for i in 1..30 loop
insert into aaa
values (i, dbms_random.string('L',1),dbms_random.value(0,10));
end loop;
end;
now im making this
select a.id, a.data, a.numb,
count(*) over (partition by a.numb order by a.data) count,
b.id, b.data,b.numb
from aaa a, aaa b
where a.numb=b.numb
and a.data!=b.data
order by a.data;
and i want to update every row where those numbers are the same but with different letters, and in result i want to have new data with more than one letter (for example in data column- "a c d e"), i just want to create concatenation within. How can i make that? the point is to make something like group by for number but for that grouped column i would like to put additional value.
that is how it looks like in begining
id | data |numb
1 q 1
2 z 8
3 i 7
4 a 2
5 q 4
6 h 1
7 b 9
8 u 9
9 s 4
That i would like to get at end
id | data |numb
1 q h 1
2 z 8
3 i 7
4 a 2
5 q s 4
7 b u 9
Try this
SELECT MIN(id),
LISTAGG(data,' ') WITHIN GROUP(
ORDER BY data
) data,
numb
FROM aaa GROUP BY numb
ORDER BY 1
Demo
This selects 10 random strings 1 to 4 letters long, letters in words may repeat:
select level, dbms_random.string('l', dbms_random.value(1, 4))
from dual connect by level <= 10
This selects 1 to 10 random strings 1 to 26 letters long, letters do not repeat and are sorted:
with aaa(id, data, numb) as (
select level, dbms_random.string('L', 1),
round(dbms_random.value(0, 10))
from dual connect by level <= 30)
select numb, listagg(data) within group (order by data) list
from (select distinct data, numb from aaa)
group by numb

SQL - Return All Possible Combinations of Values in a Column by Means of Two New Columns

I want to return all possible combinations of values in a column by means of two new columns. E.g. my column consists out of the values (A,B,C,D). The possible combinations of those values are (A,B), (A,C), (A,D), (B,C), (B,D), (C,D), (A,B,C), (B,D,C), (D,C,A), (C,A,B) [Remark: I don't want to consider (1) the combintions with just one value, (2) the combination with all values and (3) the combination with no values. Thus I have 2^(n)-n-1-1 combinations for n different values]. I want to list all those combinations in two columns like demonstrated below.
Consider that I start with this column:
Col0
----
A
B
C
D
Out of Col0 I want to produce the 10 combinations using two columns:
Col1 Col2
---- ----
1 A
1 B
2 A
2 C
3 A
3 D
4 B
4 C
5 B
5 D
6 C
6 C
7 A
7 B
7 C
8 B
8 C
8 D
9 C
9 D
9 A
10 D
10 A
10 B
How do I do this in SQL? I use SQLite.
Thank you a lot!
I have a solution, but it requires two changes...
Each item must be given an id (starting from 1)
The output id's may not be sequential
id | datum
----+-------
1 | A
2 | B
3 | C
4 | D
(The output id's I calculate are effectively identifiers for each Permutation, but I don't output permutations you're not interested in...)
group_id | datum
----------+-------
6 | A
6 | B
7 | A
7 | C
8 | A
8 | D
12 | B
12 | C
13 | B
13 | D
18 | C
18 | D
32 | A
32 | B
32 | C
33 | A
33 | B
33 | D
38 | A
38 | C
38 | D
63 | B
63 | C
63 | D
http://dbfiddle.uk/?rdbms=sqlite_3.8&fiddle=87d670ecaba8b735cb3f95fa66cea96b
http://dbfiddle.uk/?rdbms=sqlite_3.8&fiddle=26e4f59874009ef95367d85565563c3c
WITH
cascade AS
(
SELECT
1 AS depth,
NULL AS parent_id,
id,
datum,
id AS datum_id
FROM
sample
UNION ALL
SELECT
parent.depth + 1,
parent.id,
parent.id * (SELECT MAX(id)+1 FROM sample) + child.id - 1,
child.datum,
child.id
FROM
cascade AS parent
INNER JOIN
sample AS child
ON child.id > parent.datum_id
),
travelled AS
(
SELECT
depth AS depth,
parent_id AS parent_id,
id AS group_id,
datum AS datum,
datum_id AS datum_id
FROM
cascade
WHERE
depth NOT IN (1, (SELECT COUNT(*) FROM sample))
UNION ALL
SELECT
parent.depth,
parent.parent_id,
child.group_id,
parent.datum,
parent.datum_id
FROM
travelled AS child
INNER JOIN
cascade AS parent
ON parent.id = child.parent_id
)
SELECT
group_id,
datum
FROM
travelled
ORDER BY
group_id,
datum_id
The first CTE walks all the available combinations (recursively) creating a directed graph. At this stage I don't exclude combinations of one item, or all items, but I do exclude equivalent permutations.
Each node also has a unique identifier calculated for it. There are gaps in these ids, because the calculation would also work for all permutations, even though they're not all included.
Taking any node in that graph and walking up to the final parent node (recursively again) will always give a different combination than if you started from a different node in the graph.
So the second CTE does all of those walks, excluding the combinations of "just one item" and "all items".
The final select just outputs the results in order.
The gaps in the id's are probably avoidable but the maths is too hard for my head at the end of a working day.
The idea is to enumerate the power set, by assigning each value a power of 2, then iterate from 1 to 2^n - 1 , and filter the elements which corresponding bit is set.
-- map each value with a power of 2 : 1, 2, 4, 8, 16
with recursive ELEMENTS(IDX, POW, VAL) as (
-- init with dummy values
values(-1, 0.5, null)
union all
select IDX + 1,
POW * 2,
-- index the ordered values from 0 to N - 1
( select COL0
from DATA d1
where (select count(*) from DATA d2 where d2.COL0 < d1.COL0) = IDX + 1)
from ELEMENTS
where IDX + 1 < (select count(*) from data)
), POWER_SETS(ITER, VAL, POW) as (
select 1, VAL, POW from ELEMENTS where VAL is not null
union all
select ITER + 1, VAL, POW
from POWER_SETS
where ITER < (select SUM(POW) from elements) )
select ITER, VAL from POWER_SETS
-- only if the value's bit is set
where ITER & POW != 0
EDIT: 2nd version, with help from MatBailie. Only one of the CTE is recursive, and singleton subsets are excluded.
WITH RECURSIVE
-- number the values
elements(val, idx) AS (
SELECT d1.col0, (select count(*) from DATA d2 where d2.COL0 < d1.COL0)
FROM DATA d1
),
-- iterate from 3 (1 and 2 are singletons)
-- to 2^n - 1 (subset containing all the elements)
subsets(iter) AS (
VALUES(3)
UNION ALL
SELECT iter + 1
from subsets
WHERE iter < (1 << (SELECT COUNT(*) FROM elements)) - 1
)
SELECT iter AS Col1, val AS Col2
FROM elements
CROSS JOIN subsets
-- the element is present is this subset (the bit is set)
WHERE iter & (1 << idx) != 0
-- exclude singletons (another idea from MatBailie)
AND iter != (iter & -iter)
ORDER BY iter, val
If window functions and CTE is available then you can use the following approach
with data_rn as
(
select d1.col0 col1,
d2.col0 col2,
row_number() over (order by d1.col0) rn
from data d1
inner join data d2 on d1.col0 > d2.col0
)
select rn, col1 from data_rn
union all
select rn, col2 from data_rn
order by rn
dbfiddle demo

SQL query: same rows

I'm having trouble finding the right sql query. I want to select all the rows with a unique x value and if there are rows with the same x value, then I want to select the row with the greatest y value. As an example I've put a part of my database below.
ID x y
1 2 3
2 1 5
3 4 6
4 4 7
5 2 6
The selected rows should then be those with ID 2, 4 and 5.
This is what I've got so far
SELECT *
FROM base
WHERE x IN
(
SELECT x
FROM base
HAVING COUNT(*) > 1
)
But this only results in the rows that occur more than once. I've added the tags R, postgresql and sqldf because I'm working in R with those packages.
Here is a typical way to formulate the query in ANSI SQL:
select b.*
from base b
where not exists (select 1
from base b2
where b2.x = b.x and
b2.y > b.y
);
In Postgres, you would use distinct on for performance:
select distinct on (x) b.*
from base b
order by x, y desc;
You could try this query:
select x, max(y) from base group by x;
And, if you'd also like the id column in the result:
select base.*
from base join (select x, max(y) from base group by x) as maxima
on (base.x = maxima.x and base.y = maxima.max);
Example:
CREATE TABLE tmp(id int, x int ,y int);
INSERT INTO .....
test=# SELECT x, max(y) AS y FROM tmp GROUP BY x;
x | y
---+---
4 | 7
1 | 5
2 | 6

PLSQL or SSRS, How to select having all values in a group?

I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.