BigQuery get all values in a range - sql

I have a column that can contains a range and I want to get all values in that range. I.e
I have the following records
id | number
-------------
1 | 342-345
-------------
2 | 346
And a want it in the following format:
id | number
-------------
1 | 342
| 343
| 344
| 345
-------------
2 | 346
I am using Standard SQL.

You can use split() to split the string and then generate_array() to get the values you want:
select t.*,
GENERATE_ARRAY(cast(split(numbers, '-')[ordinal(1)] as int64),
cast(COALESCE(split(numbers, '-')[SAFE_ORDINAL(2)], split(numbers, '-')[ordinal(1)]) as int64),
1)
from (select 1 as id, '342-345' as numbers UNION ALL
SELECT 2, '346'
) t;

Related

SQL Count Consecutive Rows

I have a SQL table that I need to count the rows with 0 turnover, but the challenge is they resets. I only need the number of consecutive rows since it last generated any turnover.
Source data (it has a lot of different ID, just using 442 and 4500 in this case):
ID |Date | T/O |
442 |2019-12-31 | 0 |
442 |2020-01-01 |200.00|
442 |2020-01-02 | 0 |
442 |2020-02-06 | 0 |
442 |2020-02-07 | 0 |
442 |2020-02-08 | 0 |
442 |2020-02-09 |150.00|
442 |2020-02-10 | 0 |
442 |2020-02-11 | 0 |
442 |2020-02-15 | 0 |
4500 |2020-01-01 | 0 |
Intended results:
442 | 3 |
4500 | 1 |
I thought of using LAG(), but the number of rows between turnover generated can vary significantly. Sometimes it can be even 30+ rows.
SELECT id, COUNT(*) as [result]
FROM SourceData sd1
WHERE t_o=0
AND NOT EXISTS (SELECT 1
FROM SourceData sd2
WHERE sd1.id=sd2.id AND t_o != 0 AND sd2.[Date] > sd1.[Date])
GROUP BY id
DEMO
First we can get the last non-zero date for each id.
select id, max(date) as date
from example
where t_o > 0
group by id
This will not show a value for 4500 because it lacks a non-zero value.
Then we can use this to select and group only the values after those dates, or all rows if there was no non-zero date for an id.
with last_to as(
select id, max(date) as date
from example
where t_o > 0
group by id
)
select example.id, count(example.t_o)
from example
-- Use a left join to get all ids in example,
-- even those missing from last_to.
left join last_to on last_to.id = example.id
-- Account for the lack of a last_to row
-- if the ID has no non-zero values.
where last_to.date is null
or example.date > last_to.date
group by example.id
Demonstration.

Count string occurances within a list column - Snowflake/SQL

I have a table with a column that contains a list of strings like below:
EXAMPLE:
STRING User_ID [...]
"[""null"",""personal"",""Other""]" 2122213 ....
"[""Other"",""to_dos_and_thing""]" 2132214 ....
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
QUESTION:
I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:
SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by u.STRING
order by cnt desc;
However the above method doesn't work as it only counts the number of user ids that use a specific grouping of strings.
The ideal output using the example above would like this!
DESIRED OUTPUT:
STRING COUNT_Instances
"null" 1223
"personal" 543
"Other" 324
"to_dos_and_thing" 221
"getting_things_done" 146
"Work!!!!!" 22
Based on your description, here is my sample table:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here's the query and output:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html
REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html

Select one row per JSON element in SQL field / convert JSON field into rows

I have a table with lots of rows like:
ID | Attributes
1 | {"Rank":1, "LoadLocation": London, "Driver":Tom}
2 | {"Rank":2, "LoadLocation": Southampton, "Driver":Dick}
3 | {"Rank":3, "DischargeLocation": Stratford}
There isn't a template for the JSON - it's a dumping ground for any number of attributes of the ID rows.
For use in a join I'd like to get these into a table this:
ID | Attribute Name | Attribute Value
1 | 'Rank' | 1
1 | 'LoadLocation' | 'London'
1 | 'Driver' | 'Tom'
2 | 'Rank' | 2
2 | 'LoadLocation' | 'Southampton'
2 | 'Driver' | 'Dick'
3 | 'Rank' | 3
3 | 'DischargeLocation'| 'Stratford'
I can see that I probably need to be using OpenJSON, but also that for that I likely need to know the explicit structure. I don't know the structure, even to the point of each row having a different numbe of attributes.
Any help gratefully received!
If you have sql-server-2016 and above, you can use OPENJSON with CROSS APPLY
DECLARE #TestData TABLE (ID INT, Attributes VARCHAR(500))
INSERT INTO #TestData VALUES
(1 ,'{"Rank":1, "LoadLocation": "London", "Driver":"Tom"}'),
(2 ,'{"Rank":2, "LoadLocation": "Southampton", "Driver":"Dick"}'),
(3 ,'{"Rank":3, "DischargeLocation": "Stratford"}')
SELECT T.ID, X.[key] AS [Attribute Name], X.value AS [Attribute Value]
FROM #TestData T
CROSS APPLY (SELECT * FROM OPENJSON(T.Attributes)) AS X
Result:
ID Attribute Name Attribute Value
----------- -------------------- -------------------
1 Rank 1
1 LoadLocation London
1 Driver Tom
2 Rank 2
2 LoadLocation Southampton
2 Driver Dick
3 Rank 3
3 DischargeLocation Stratford

Seperate phone numbers from string in cell - random order

I have a bunch of data that contains a phone number and a birthday as well as other data.
{1997-06-28,07742367858}
{07791100873,1996-07-14}
{30/01/1997,07974335488}
{1997-04-04,07701003703}
{1996-03-11,07480227283}
{1998-06-20,07713817233}
{1996-09-13,07435148920}
{"21 03 2000",07548542539,1st}
{1996-03-09,07539248008}
{07484642432,1996-03-01}
I am trying to extract the phone number from this, however unsure on how to get this out when the data is not always in the same order.
I would expect to one column that return a phone number, the next which returned a birthday then another which return any arbitrary value in the 3rd column slot.
You can try to sort parts of each string by the number of digits they contain. This can be done with the expression:
select length(regexp_replace('1997-06-28', '\D', '', 'g'))
length
--------
8
(1 row)
The query removes curly brackets from strings, splits them by comma, sorts elements by the number of digits and aggregates back to arrays:
with my_data(str) as (
values
('{1997-06-28,07742367858}'),
('{07791100873,1996-07-14}'),
('{30/01/1997,07974335488}'),
('{1997-04-04,07701003703}'),
('{1996-03-11,07480227283}'),
('{1998-06-20,07713817233}'),
('{1996-09-13,07435148920}'),
('{"21 03 2000",07548542539,1st}'),
('{1996-03-09,07539248008}'),
('{07484642432,1996-03-01}')
)
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc)
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
Result:
id | array_agg
----+--------------------------------
1 | {07742367858,1997-06-28}
2 | {07791100873,1996-07-14}
3 | {07974335488,30/01/1997}
4 | {07701003703,1997-04-04}
5 | {07480227283,1996-03-11}
6 | {07713817233,1998-06-20}
7 | {07435148920,1996-09-13}
8 | {07548542539,"21 03 2000",1st}
9 | {07539248008,1996-03-09}
10 | {07484642432,1996-03-01}
(10 rows)
See also this answer Looking for solution to swap position of date format DMY to YMD if you want to normalize dates. You should modify the function:
create or replace function iso_date(text)
returns date language sql immutable as $$
select case
when $1 like '__/__/____' then to_date($1, 'DD/MM/YYYY')
when $1 like '____/__/__' then to_date($1, 'YYYY/MM/DD')
when $1 like '____-__-__' then to_date($1, 'YYYY-MM-DD')
when trim($1, '"') like '__ __ ____' then to_date(trim($1, '"'), 'DD MM YYYY')
end
$$;
and use it:
select id, a[1] as phone, iso_date(a[2]) as birthday, a[3] as comment
from (
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc) as a
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
) s
id | phone | birthday | comment
----+-------------+------------+---------
1 | 07742367858 | 1997-06-28 |
2 | 07791100873 | 1996-07-14 |
3 | 07974335488 | 1997-01-30 |
4 | 07701003703 | 1997-04-04 |
5 | 07480227283 | 1996-03-11 |
6 | 07713817233 | 1998-06-20 |
7 | 07435148920 | 1996-09-13 |
8 | 07548542539 | 2000-03-21 | 1st
9 | 07539248008 | 1996-03-09 |
10 | 07484642432 | 1996-03-01 |
(10 rows)

How to pivot rows into columns in AWS Athena?

I'm new to AWS Athena and trying to pivot some rows into columns, similar to the top answer in this StackOverflow post.
However, when I tried:
SELECT column1, column2, column3
FROM data
PIVOT
(
MIN(column3)
FOR column2 IN ('VALUE1','VALUE2','VALUE3','VALUE4')
)
I get the error: mismatched input '(' expecting {',', ')'} (service: amazonathena; status code: 400; error code: invalidrequestexception
Does anyone know how to accomplish what I am trying to achieve in AWS Athena?
Extending #kadrach 's answer.
Assuming a table like this
uid | key | value1 | value2
----+-----+--------+--------
1 | A | 10 | 1000
1 | B | 20 | 2000
2 | A | 11 | 1001
2 | B | 21 | 2001
Single column PIVOT works like this
SELECT
uid,
kv1['A'] AS A_v1,
kv1['B'] AS B_v1
FROM (
SELECT uid, map_agg(key, value1) kv1
FROM vtable
GROUP BY uid
)
Result:
uid | A_v1 | B_v1
----+------+-------
1 | 10 | 20
2 | 11 | 21
Multi column PIVOT works like this
SELECT
uid,
kv1['A'] AS A_v1,
kv1['B'] AS B_v1,
kv2['A'] AS A_v2,
kv2['B'] AS B_v2
FROM (
SELECT uid,
map_agg(key, value1) kv1,
map_agg(key, value2) kv2
FROM vtable
GROUP BY uid
)
Result:
uid | A_v1 | B_v1 | A_v2 | B_v2
----+------+------+------+-----
1 | 10 | 20 | 1000 | 2000
2 | 11 | 21 | 1001 | 2001
You can do a single-column PIVOT in Athena using map_agg.
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT uid, map_agg(key, value) kv
FROM vtable
GROUP BY uid
) t
Credit goes to this website. Unfortunately I've not found a clever way to do a multi-column pivot this way (I nest the query, which is not pretty).
I had the same issue with using PIVOT function. However I used a turn around way to obtain a similar format data set :
select
columnToGroupOn,
min(if(colToPivot=VALUE1,column3,null)) as VALUE1,
min(if(colToPivot=VALUE2,column3,null)) as VALUE2,
min(if(colToPivot=VALUE3,column3,null)) as VALUE3
from
data
group by columnToGroupOn