how to find the count of substring in string using BigQuery? - google-bigquery

I want to find how many times "fizz" appears in "fizzbuzzfizz" string in bigquery or sql.
here output should be 2.

You can use REGEXP_EXTRACT_ALL and ARRAY_LENGTH, See this sql:
WITH data AS(
SELECT 'fizzbuzzfizz' as string
)
SELECT
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(string, "fiz")) AS size FROM data;
Which produces this:

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Presto SQL query

Let's assume that i have an array of strings with the following values:
string = {'123','12ab','38','abc','01a8','1123b'}
how should i do a query in Presto SQL to extract only the values containing only and only numerical digits, so that my output would be {'123','38'}?
doing something like the query below, does not returns any output
SELECT string
FROM table1
WHERE string LIKE '[0-9]*'
GROUP BY string
example of my data sample
enter image description here
There are at least two options:
leverage try_cast operator provided by Presto
-- sample data
WITH dataset(string) AS (
values ('123'),
('12ab'),
('38'),
('abc'),
('01a8'),
('1123b')
)
-- query
select *
from dataset
where try_cast(string as integer) is not null;
Or use regular expressions via regexp_like:
-- query
select *
from dataset
where regexp_like(string, '^\d+$');
Output:
string
123
38

How to use SPLIT() and get Nth row in the resultset in BigQuery

Suppose I split a string:
SELECT SPLIT('a,b,c', ',')
How can I get the nth element in the resulting query?
SELECT SPLIT('a,b,c', ',')[2] #This doesn't work to get 'b'
The solution is to use ordinal(N) where N is an integer.
SELECT SPLIT('a,b,c', ',')[ordinal(3)]
Below would return 'c'

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

BigQuery Error In where Clause with a String type column

I have a column that contains strings like this '0 0PAA01 CF101 -S07'. I have some records in the database and when I tried to retrieve it using BigQuery the query is not returning records.
I am doing
select *
from table
where column='0 0PAA01 CF101 -S07'
Is a BigQuery Problem?
I'm 99% sure this is not a BigQuery problem - but that the strings are in fact different.
Look at this:
SELECT MD5('my name is Felipe Hoffa') from_keyboard
, MD5(str) from_db
, str='my name is Felipe Hoffa' equal_are_they
FROM (
SELECT 'my name is Felipe Hoffa' str
)
Why are they different? One has a tab instead of a space.