how to find the count of substring in string using BigQuery?

how to find the count of substring in string using BigQuery? - google-bigquery

I want to find how many times "fizz" appears in "fizzbuzzfizz" string in bigquery or sql.
here output should be 2.

You can use REGEXP_EXTRACT_ALL and ARRAY_LENGTH, See this sql:
WITH data AS(
SELECT 'fizzbuzzfizz' as string
)
SELECT
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(string, "fiz")) AS size FROM data;
Which produces this:

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.

Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Presto SQL query

Let's assume that i have an array of strings with the following values:
string = {'123','12ab','38','abc','01a8','1123b'}
how should i do a query in Presto SQL to extract only the values containing only and only numerical digits, so that my output would be {'123','38'}?
doing something like the query below, does not returns any output
SELECT string
FROM table1
WHERE string LIKE '[0-9]*'
GROUP BY string
example of my data sample
enter image description here

There are at least two options:
leverage try_cast operator provided by Presto
-- sample data
WITH dataset(string) AS (
values ('123'),
('12ab'),
('38'),
('abc'),
('01a8'),
('1123b')
)
-- query
select *
from dataset
where try_cast(string as integer) is not null;
Or use regular expressions via regexp_like:
-- query
select *
from dataset
where regexp_like(string, '^\d+$');
Output:
string
123
38

How to use SPLIT() and get Nth row in the resultset in BigQuery

Suppose I split a string:
SELECT SPLIT('a,b,c', ',')
How can I get the nth element in the resulting query?
SELECT SPLIT('a,b,c', ',')[2] #This doesn't work to get 'b'

The solution is to use ordinal(N) where N is an integer.
SELECT SPLIT('a,b,c', ',')[ordinal(3)]
Below would return 'c'

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.

You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)

You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

BigQuery Error In where Clause with a String type column

I have a column that contains strings like this '0 0PAA01 CF101 -S07'. I have some records in the database and when I tried to retrieve it using BigQuery the query is not returning records.
I am doing
select *
from table
where column='0 0PAA01 CF101 -S07'
Is a BigQuery Problem?

I'm 99% sure this is not a BigQuery problem - but that the strings are in fact different.
Look at this:
SELECT MD5('my name is Felipe Hoffa') from_keyboard
, MD5(str) from_db
, str='my name is Felipe Hoffa' equal_are_they
FROM (
SELECT 'my name is Felipe Hoffa' str
)
Why are they different? One has a tab instead of a space.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to find the count of substring in string using BigQuery? - google-bigquery

I want to find how many times "fizz" appears in "fizzbuzzfizz" string in bigquery or sql. here output should be 2.

You can use REGEXP_EXTRACT_ALL and ARRAY_LENGTH, See this sql: WITH data AS( SELECT 'fizzbuzzfizz' as string ) SELECT ARRAY_LENGTH(REGEXP_EXTRACT_ALL(string, "fiz")) AS size FROM data; Which produces this:

Related

Hive sql extract one to multiple values from key value pairs

Presto SQL query

How to use SPLIT() and get Nth row in the resultset in BigQuery

SQL group by middle part of string

BigQuery Error In where Clause with a String type column

Categories

Resources