Splitting value in database in WHERE clause - sql

I am querying a BigQuery table to extract building data. We are storing building data in a cell, together with location and sensor name data. The building row has the following values e.g.
GB-FRE-BB2003_MSU-01
GB-FRE-BB2001_MSU-12
GB-FRE-BB2003_MSU-12
GB-FRE-BB2012_MSU-12
GB-FRE-BB2003_MSU-10
etc
I would like to query the data, using a substring, so I can find all the data from the BD2003 building, regardless of location and sensor.
SELECT `presentvalue`
FROM `database`
WHERE ???
Is someone able to help with this? I have looked at SPLIT and SUBSTRING but can't seem to get the query right.

Few more options
Using LIKE
select *
from your_table
where presentvalue like '%-BB2003_%'
Using REGEXP_CONTAINS
select *
from your_table
where regexp_contains(presentvalue, '-BB2003_')
if applied to sample data in your question - both have below output

use where substring(split(val , '-')[offset(2)], 1, 6)='BB2003'
tested it on the below code and it works:
create temp table sample(
val string
);
insert into sample
(val)
values
('GB-FRE-BB2003_MSU-01'),
('GB-FRE-BB2001_MSU-12'),
('GB-FRE-BB2003_MSU-12'),
('GB-FRE-BB2012_MSU-12'),
('GB-FRE-BB2003_MSU-10');
select *, from sample where substring(split(val , '-')[offset(2)], 1, 6)='BB2003''

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

BigQuery - JSON_EXTRACT only extracts first entry

I have a column containing a json-string as follows:
[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]
I want to extract ALL answers given in ONE column and row, seperated by a comma:
europe-austria-swiss, europe-italy, europe-france
I think I tried all possibilites offered by JSON_EXTRACT and JSON_EXTRACT_ARRAY or replacing parentheses and other signs, but I either only get the first entry extracted (in this case
europe-austria-swiss
) or it splits up in rows as array from which I can no longer extract the strings of "answer".
Has anyone any idea on how to solve that problem? It's very much appreciated!
This column is of course part of a much larger table (if that is relevant anyhow).
I think I know what's going on (please, correct me if I'm wrong).
My best guess is that you are trying something like:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
This returns:
"europe-austria-swiss"
However, if you change the underlying data for something like this (each line as a json string object), it should resolve the issue:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"}',
'{"answer":"europe-italy","text":"Italien"}',
'{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
Result:
"europe-austria-swiss"
"europe-italy"
"europe-france"
Hope this helps!
Below is for BigQuery Standard SQL
#standardSQL
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]' json_string
)
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
with result
Row answers
1 europe-austria-swiss ,europe-italy ,europe-france

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query
You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.
It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

BigQuery Error In where Clause with a String type column

I have a column that contains strings like this '0 0PAA01 CF101 -S07'. I have some records in the database and when I tried to retrieve it using BigQuery the query is not returning records.
I am doing
select *
from table
where column='0 0PAA01 CF101 -S07'
Is a BigQuery Problem?
I'm 99% sure this is not a BigQuery problem - but that the strings are in fact different.
Look at this:
SELECT MD5('my name is Felipe Hoffa') from_keyboard
, MD5(str) from_db
, str='my name is Felipe Hoffa' equal_are_they
FROM (
SELECT 'my name is Felipe Hoffa' str
)
Why are they different? One has a tab instead of a space.