StandardSQL NTH() and FIRST() functions for BigQuery

StandardSQL NTH() and FIRST() functions for BigQuery - sql

I'm starting out in BigQuery, with some experience in pSQl.
The #legacySQL query I'm running successfully is:
SELECT
FIRST(SPLIT(ewTerms, '/')) AS place,
NTH(2, SPLIT(ewTerms, '/')) AS divisor
FROM (SELECT ewTerms FROM account.free)
The string values in the 'ewTerms' column from the table 'free' are single digit fractions, such as "2/4", "3/5", etc. This #legacySQl query successfully creates two columns from 'ewTerms', reading:
Row place divisor
1 3 5
2 2 4
I need now to use this column creation in a WITH function, so I have to switch to using #standardSQL.
Can anyone tell me how I can call to the string's FIRST() and NTH() functions using #standardSQL? I've tried:
WITH prep AS(
SELECT
SPLIT(ewTerms, '/') AS split
FROM (SELECT ewTerms FROM accounts.free)
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep
but this is wrong. Help anyone?

Your question is not clear about what is wrong. This query works for me:
#standardSQL
WITH Input AS (
SELECT '3/5' AS ewTerms UNION ALL
SELECT '2/4' AS ewTerms
), prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM Input
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;
The output is:
+-------+---------+
| place | divisor |
+-------+---------+
| 2 | 4 |
| 3 | 5 |
+-------+---------+
Using your original table, your query would be:
#standardSQL
WITH prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM accounts.free
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;

Related

SQL - check unused cte

Let's say I have the below SQL query:
with unused_cte as
(select 1 as one),
used_cte as
(select 2 as two)
select * from used_cte
I want to know your opinion on how I can automate the catching of unused CTE (the CTE which is not used in the SQL execution, for Example 'unused_cte' in this case) in the form of SQL assertion or dbt checks or any other suggestions.

If using dbt-core, you can add SQLFluff as your linter locally. Then, after setting it up, it will be able to catch the cases where you have a CTE inside a data model that is not being used.
See example below -> I have a model called stg_testing.sql that looks like the following:
with used_cte as (
select 1 as fake_id
),
unused_cte as (
select 2 as fake_id
),
final as (
select fake_id
from used_cte
)
select * from final
Since SQLFluff detects that the second CTE is not used, it will raise the following problem:
Query defines CTE (unused_cte) but does not use it. sqlfluff(L045) [Ln 5, Col 2]

It will not be run, as this simple test shows .
the first CTE is a division by zero, which should produce and error.
But as the unused cte isn't executed, the second valid will run.
Postgres
with cte_unused as (select 1/0 as val), cte_used as (select 'it works' as val)
select * from cte_used
| val |
| :------- |
| it works |
db<>fiddle here
MySQL
with cte_unused as (select 1/0 as val), cte_used as (select 'it works' as val)
select * from cte_used
| val |
| :------- |
| it works |
db<>fiddle here

Big Query String Manipulation using SubQuery

I would appreciate a push in the right direction with how this might be achieved using GCP Big Query, please.
I have a column in my table of type string, inside this string there are a repeating sequence of characters and I need to extract and process each of them. To illustrate, lets say the column name is 'instruments'. A possible value for instruments could be:
'band=false;inst=basoon,inst=cello;inst=guitar;cases=false,permits=false'
In which case I need to extract 'basoon', 'cello' and 'guitar'.
I'm more or less a SQL newbie, sorry. So far I have:
SELECT
bandId,
REGEXP_EXTRACT(instruments, r'inst=.*?\;') AS INSTS
FROM `inventory.band.mytable`;
This extracts the instruments substring ('inst=basoon,inst=cello;inst=guitar;') and gives me an output column 'INSTS' but now I think I need to split the values in that column on the comma and do some further processing. This is where I'm stuck as I cannot see how to structure additional queries or processing blocks.
How can I reference the INSTS in order to do subsequent processing? Documentation suggests I should be buildin subqueries using WITH but I can't seem to get anything going. Could some kind soul give me a push in the right direction, please?

BigQuery has a function SPLIT() that does the same as SPLIT_PART() in other databases.
Assuming that you don't alternate between the comma and the semicolon for separating your «key»=«value» pairs, and only use the semicolon,
first you split your instruments string into as many parts that contain inst=. To do that, you use an in-line table of consecutive integers to CROSS JOIN with, so that you can SPLIT(instruments,';',i) with an increasing integer value for i. You will get strings in the format inst=%, of which you want the part after the equal sign. You get that part by applying another SPLIT(), this time with the equal sign as the delimiter, and for the second split part:
WITH indata(bandid,instruments) AS (
-- some input, don't use in real query ...
-- I assume that you don't alternate between comma and semicolon for the delimiter, and stick to semicolon
SELECT
1,'band=false;inst=basoon;inst=cello;inst=guitar;cases=false;permits=false'
UNION ALL
SELECT
2,'band=true;inst=drum;inst=cello;inst=bass;inst=flute;cases=false;permits=true'
UNION ALL
SELECT
3,'band=false;inst=12string;inst=banjo;inst=triangle;inst=tuba;cases=false;permits=true'
)
-- real query starts here, replace following comma with "WITH" ...
,
-- need a series of consecutive integers ...
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
SELECT
bandid
, i
, SPLIT(SPLIT(instruments,';',i),'=',2) AS instrument
FROM indata CROSS JOIN i
WHERE SPLIT(instruments,';',i) like 'inst=%'
ORDER BY 1
-- out bandid | i | instrument
-- out --------+---+------------
-- out 1 | 2 | basoon
-- out 1 | 3 | cello
-- out 1 | 4 | guitar
-- out 2 | 2 | drum
-- out 2 | 3 | cello
-- out 2 | 4 | bass
-- out 2 | 5 | flute
-- out 3 | 2 | 12string
-- out 3 | 3 | banjo
-- out 3 | 4 | triangle
-- out 3 | 5 | tuba

Consider below few options (just to demonstrate different technics here)
Option 1
select bandId,
( select string_agg(split(kv, '=')[offset(1)])
from unnest(split(instruments, ';')) kv
where split(kv, '=')[offset(0)] = 'inst'
) as insts
from `inventory.band.mytable`
Option 2 (for obvious reason this one would be my choice)
select bandId,
array_to_string(regexp_extract_all(instruments, r'inst=([^;$]+)'), ',') instrs
from `inventory.band.mytable`
If applied to sample data in your question - output in both cases is

BigQuery - nested json - select where nested item equals

Having the following table in BigQuery database, where the f0_
Row | f0_
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
3 | {"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}
4 | {"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}
5 | {"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}
6 | {"configuration":[{"param1":"value1"}]}
f0_ column is a pure string.
Is there a way to write a select query, where the "param2" value is equal to [3.0, 45] array meaning it would only return rows 1 and 2? Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.

Below is for BigQuery Standrad SQL
#standardSQL
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}' line UNION ALL
SELECT '{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"}]}'
)
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
with result
Row line
1 {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Note: this solution does not depend on position of "param2" in the configuration array

You can use some of BQ's neat JSON functions as described here.
Based on that, you can locate param2 and check if its value matches what you're looking for. If you aren't sure of the configuration order, you can iterate through the array to find param2, but it's not particularly efficient. I recommend you try to find a way where param2 is always the second field in the array. I was able to get the correct results like so:
SELECT json_text AS correct_configurations
FROM UNNEST([
'{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}',
'{"configuration":[{"param1":"value1"}]}'
])
AS json_text
WHERE JSON_EXTRACT(json_text, '$.configuration[1].param2') LIKE "[3.0,45]";
Gives a result of:
Row | correct_configurations
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}

How to get index of an array value in PostgreSQL?

I have a table called pins like this:
id (int) | pin_codes (jsonb)
--------------------------------
1 | [4000, 5000, 6000]
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
Now, I want the row with pin_code 8600 and with its array index. The output must be like this:
pin_codes | index
------------------------------
[8500, 8500, 8600] | 2
If I want the row with pin_code 2700, the output :
pin_codes | index
------------------------------
[2700, 2300, 2980] | 0
What I've tried so far:
SELECT pin_codes FROM pins WHERE pin_codes #> '[8600]'
It only returns the row with wanted value. I don't know how to get the index on the value in the pin_codes array!
Any help would be great appreciated.
P.S:
I'm using PostgreSQL 10

If you were storing the array as a real array not as a json, you could use array_position() to find the (first) index of a given element:
select array_position(array['one', 'two', 'three'], 'two')
returns 2
With some text mangling you can cast the JSON array into a text array:
select array_position(translate(pin_codes::text,'[]','{}')::text[], '8600')
from the_table;
The also allows you to use the "operator"
select *
from pins
where '8600' = any(translate(pin_codes::text,'[]','{}')::text[])
The contains #> operator expects arrays on both sides of the operator. You could use it to search for two pin codes at a time:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] #> array['8600','8400']
Or use the overlaps operator && to find rows with any of multiple elements:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] && array['8600','2700']
would return
id | pin_codes
---+-------------------
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
If you do that a lot, it would be more efficient to store the pin_codes as text[] rather then JSON - then you can also index that column to do searches more efficiently.

Use the function jsonb_array_elements_text() using with ordinality.
with my_table(id, pin_codes) as (
values
(1, '[4000, 5000, 6000]'::jsonb),
(2, '[8500, 8400, 8600]'),
(3, '[2700, 2300, 2980]')
)
select id, pin_codes, ordinality- 1 as index
from my_table, jsonb_array_elements_text(pin_codes) with ordinality
where value::int = 8600;
id | pin_codes | index
----+--------------------+-------
2 | [8500, 8400, 8600] | 2
(1 row)

As has been pointed out previously the array_position function is only available in Postgres 9.5 and greater.
Here is custom function that achieves the same, derived from nathansgreen at github.
-- The array_position function was added in Postgres 9.5.
-- For older versions, you can get the same behavior with this function.
create function array_position(arr ANYARRAY, elem ANYELEMENT, pos INTEGER default 1) returns INTEGER
language sql
as $BODY$
select row_number::INTEGER
from (
select unnest, row_number() over ()
from ( select unnest(arr) ) t0
) t1
where row_number >= greatest(1, pos)
and (case when elem is null then unnest is null else unnest = elem end)
limit 1;
$BODY$;
So in this specific case, after creating the function the following worked for me.
SELECT
pin_codes,
array_position(pin_codes, 8600) AS index
FROM pins
WHERE array_position(pin_codes, 8600) IS NOT NULL;
Worth bearing in mind that it will only return the index of the first occurrence of 8600, you can use the pos argument to index which ever occurrence that you like.

In short, normalize your data structure, or don't do this in SQL. If you want this index of the sub-data element given your current data structure, then do this in your application code (take result, cast to list/array, get index).

Try to unnest the string and assign numbers as follows:
with dat as
(
select 1 id, '8700, 5600, 2300' pins
union all
select 2 id, '2300, 1700, 1000' pins
)
select dat.*, t.rn as index
from
(
select id, t.pins, row_number() over (partition by id) rn
from
(
select id, trim(unnest(string_to_array(pins, ','))) pins from dat
) t
) t
join dat on dat.id = t.id and t.pins = '2300'

If you insist on storing Arrays, I'd defer to klins answer.
As the alternative answer and extension to my comment...don't store SQL data in arrays. 'Normalize' your data in advance and SQL will handle it significantly better. Klin's answer is good, but may suffer for performance as it's outside of what SQL does best.
I'd break the Array prior to storing it. If the number of pincodes is known, then simply having the table pin_id,pin1,pin2,pin3, pinetc... is functional.
If the number of pins is unknown, a first table as pin that stored the pin_id and any info columns related to that pin ID, and then a second table as pin_id, pin_seq,pin_value is also functional (though you may need to pivot this later on to make sense of the data). In this case, select pin_seq where pin_value = 260 would work.

PostgreSQL differences in value 1 and 11

I try to call to my DB and where is only one table:
id | value
----------
1 | 1|2|4
2 | 11|23
3 | 1|4|3|11
4 | 2|4|11
5 | 5|6|11
6 | 12|15|16
7 | 3|1|4
8 | 5|2|1
QUERY was : SELECT * FROM table_name WHERE value LIKE '%1%'
I want to select only rows with value 1 but I get rows with 11 value to.
How to show in SQL differences?

If you have to stick with this broken design, it's probably better to use Postgres' ability to parse a string into an array.
This is more robust than using a like condition:
select *
from the_table
where string_to_array(value,'|') #> array['1']
or maybe a bit easier to read
select *
from the_table
where '1' = any (string_to_array(value,'|'))
using the overlaps operator #> you can also search for more than one value at a time:
select *
from the_table
where string_to_array(value,'|') #> array['1','2']
will return all rows where value contains 1 and 2
SQLFiddle example: http://sqlfiddle.com/#!15/8793d/2

I strongly recommend that you should normalize your schema to every column store only atomic values.
Without it, you are forced to do some nasty trick, f.ex. with arrays:
select * from t
where '1' = any (string_to_array(value, '|'))
or, with pattern matching:
select * from t
where '1' similar to value
SQLFiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

StandardSQL NTH() and FIRST() functions for BigQuery - sql

Related

SQL - check unused cte

Big Query String Manipulation using SubQuery

BigQuery - nested json - select where nested item equals

How to get index of an array value in PostgreSQL?

PostgreSQL differences in value 1 and 11

Categories

Resources