Splitting a string and converting to integer in BigQuery - sql

I have a simple problem but I started to use google bq and their help menu was so complex for me.
I have a column like that for some rows:
ANSWER(title of column)
9
10 - Certainly Satisfied.
7 -
My aim is to split the previous part of that column from "-" sign and convert it to integer. I found some formulas like split(), regexp_extract() but I couldn't be sure how can I imply them for my data.
Thanks for your help in advance :)

If the number is always first, you can use:
select sum(safe.cast((split(answer, '-'))[ordinal(1)] as int64)
from t;
Note: It looks like you have spaces, so you might really want to split on the space:
select sum(safe.cast((split(answer, ' '))[ordinal(1)] as int64)
from t;

Consider below option
select answer,
safe_cast(regexp_extract(trim(answer), r'^\d+') as int64) as score
from `project.dataset.table`
if to apply to sample data in your question - output is

Related

BigQuery - JSON_EXTRACT only extracts first entry

I have a column containing a json-string as follows:
[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]
I want to extract ALL answers given in ONE column and row, seperated by a comma:
europe-austria-swiss, europe-italy, europe-france
I think I tried all possibilites offered by JSON_EXTRACT and JSON_EXTRACT_ARRAY or replacing parentheses and other signs, but I either only get the first entry extracted (in this case
europe-austria-swiss
) or it splits up in rows as array from which I can no longer extract the strings of "answer".
Has anyone any idea on how to solve that problem? It's very much appreciated!
This column is of course part of a much larger table (if that is relevant anyhow).
I think I know what's going on (please, correct me if I'm wrong).
My best guess is that you are trying something like:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
This returns:
"europe-austria-swiss"
However, if you change the underlying data for something like this (each line as a json string object), it should resolve the issue:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"}',
'{"answer":"europe-italy","text":"Italien"}',
'{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
Result:
"europe-austria-swiss"
"europe-italy"
"europe-france"
Hope this helps!
Below is for BigQuery Standard SQL
#standardSQL
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]' json_string
)
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
with result
Row answers
1 europe-austria-swiss ,europe-italy ,europe-france

Can I combine like and equal to get data?

I have data like this
1234500010
1234500020
1234500021
12345600010
12345600011
123456700010
123456700020
123456710010
The pattern is
1-data(varian 3-7 digit number) + 2-data(any 3 digit number) + 3-data (any 2 digit number)
I want to create SQL to get 1-data only.
For example I want to get data 12345
I want the result only
1234500010
1234500020
1234500021
If I using "like",
select *
FROM data
where ID like '12345%' `
I will get all the data with 12345, 123456 and 1234567
If I using equal, I will only get one specific data.
Can I combine like and equal together to get result like what I want?
select * FROM data where data = '12345 + any 2-data(3 digit) + any 3-data(2 digit)'
Anyone can help?
Addition : Sorry if I didn't mention the data type and make some miss communication. The data type is in char. #Gordon answers and the others not wrong. It works for number and varchar. but not works for char type. Here I post some pic for char data type. Oracle specification for char data type is a fixed lenght. So if I input less than lenght the remain of it will be change into a space.
Thank you very much. Hope someone can help for this
Since your datatype is CHAR, Gordon's answer is not working for you. CHAR adds trailing spaces for the strings less than maximum limit. You could use TRIM to fix this as shown. But, you should preferably store numbers in the NUMBER type and not CHAR or VARCHAR2, which will create other problems sooner or later.
select *
from data
where trim(ID) like '12345_____';
I think you want:
select *
from data
where ID like '12345_____' -- exactly 5 _
Here is a rextester demonstrating the answer.
You really can't combine equality and LIKE. But you can use a regular expression to do this kind of searching, with the REGEXP_LIKE function:
SELECT *
FROM DATA
WHERE REGEXP_LIKE(ID, '^12345[0-9]{3}[0-9]{2}');
But if I understand correctly, for your 1-data you really want a 3 to 7 digit number:
SELECT *
FROM DATA
WHERE REGEXP_LIKE(ID, '^[0-9]{3,7}[0-9]{3}[0-9]{2}');
Oracle regular expression docs here
SQLFiddle here
Best of luck.
I think this gives you the solution you want,
create table data(ID number(15));
insert into data values(1234500010);
insert into data values(1234500020);
insert into data values(1234500021);
insert into data values(12345600010);
insert into data values(12345600011);
insert into data values(123456700010);
insert into data values(123456700020);
insert into data values(123456710010);
select * from data where ID like '12345_____'
// After 5_ underscore are exactly 5 , any 3 digits from 2-data(3 underscores) and 2 digits from 3-data(2 underscores)
You'll be getting(OUTPUT) :
ID
1234500010
1234500020
1234500021
3 rows returned in 0.00 seconds

Translate function not returning relevant string in amazon redshift

I am trying to use a simple Translate function to replace "-" in a 23 digit string. The example of one such string is "1049477-1623095-2412303" The expected outcome of my query should be 104947716230952412303
The list of all "1049477-1623095-2412303" is present in a single column "table1". The name of the column is "data"
My query is
Select TRANSLATE(t.data, '-', '')
from table1 as t
However, it is returning 104947716230952000000 as the output.
At first, I thought it is an overflow error since the resulting integer is 20 digit so I also tried to use following
SELECT CAST(TRANSLATE(t.data,'-','') AS VARCHAR)
from table1 as t
but this is not working as well.
Please suggest a way so that I could have my desirable output
This is too long for a comment.
This code:
select translate('1049477-1623095-2412303', '-', '')
is going to return:
'104947716230952412303'
The return value is a string, not a number.
There is no way that it can return '104947716230952000000'. I could only imagine that happening if somehow the value is being converted to a numeric or bigint type.
Try regexp_replace()
Taking your own example, execute:
select regexp_replace('[string / column_name]','-');
It can be achieve RPAD try below code.
SELECT RPAD(TRANSLATE(CAST(t.data as VARCHAR),'-','') ,20,'00000000000000000000')

How to get substring from a sql table?

So I have a column (called account_uri) in a postgres table that looks like this:
/randomCharacters/123456/randomNumbers
I need to query for the substring in the middle, which is a string of characters between two / symbols.
My current attempt looked like this:
SELECT
REVERSE(SUBSTRING(REVERSE([account_uri]),0,CHARINDEX('/',REVERSE(account_uri))))
FROM exp_logs
LIMIT 15
Which selects only the randomNumbers and not the desired numbers.
I tried to build on that idea though and used
(SUBSTRING(REVERSE(SUBSTRING(REVERSE([account_uri]),CHARINDEX('/',REVERSE(account_uri)))),1,CHARINDEX('/',REVERSE(SUBSTRING(REVERSE([account_uri]),CHARINDEX('/',REVERSE(account_uri)))))))
but that only returns a bunch of / symbols and no numbers at all.
If anyone can help me query for this substring, I would be immensely grateful
select
split_part(account_url, '/', 3)
from exp_logs;
works.
http://www.postgresql.org/docs/9.3/static/functions-string.html
Compatibility: 8.3+
Fiddle: http://sqlfiddle.com/#!15/5e931/1
A couple of solutions, sorted by fastest first:
SELECT split_part(account_uri, '/', 3) AS solution_1 -- Neil's solution
,substring(account_uri,'^/.*?/(.*?)/') AS solution_2
,substring(account_uri,'^/[^/]*/(\d*)') AS solution_3
,(string_to_array(account_uri,'/'))[3] AS solution_4
FROM (
VALUES
('/randomCharacters/123456/randomNumbers')
,('/v o,q9063´6qu/24734782/2369872986')
,('/DFJTDZTJ/1/234567')
,('/ ijgtoqu29836UGB /999/29672')
) exp_logs(account_uri);
#Neil's solution proved fastest in a quick test on a table with 30k rows.

How can I SELECT DISTINCT on the last, non-numerical part of a mixed alphanumeric field?

I have a data set that looks something like this:
A6177PE
A85506
A51SAIO
A7918F
A810004
A11483ON
A5579B
A89903
A104F
A9982
A8574
A8700F
And I need to find all the ENDings where they are non-numeric. In this example, that means PE, AIO, F, ON, B and F.
In pseudocode, I'm imagining I need something like
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,[SOME_CLEVER_LOGIC]) AS X FROM TABLE);
Any ideas? Can I solve this without learning regexp?
EDIT: To clarify, my data set is a lot larger than this example. Also, I'm only interested in the part of the string AFTER the numeric part. If the string is "A6177PE" I want "PE".
Disclaimer: I don't know Oracle SQL. But, I think something like this should work:
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,REGEXP_INSTR(COL, "[[:ALPHA:]]+$")) AS X FROM TABLE);
REGEXP_INSTR(COL, "[[:ALPHA:]]+$") should return the position of the first of the characters at the end of the field.
For readability, I'd recommend using the REGEXP_SUBSTR function (If there are no performance issues of course, as this is definitely slower than the accepted solution).
...also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself
SELECT DISTINCT SUBSTR(MY_COLUMN,REGEXP_SUBSTR("[a-zA-Z]+$")) FROM MY_TABLE;
(:alpha: is supported also, as #Audun wrote )
Also useful: Oracle Regexp Support (beginning page)
For example
SELECT SUBSTR(col,INSTR(TRANSLATE(col,'A0123456789','A..........'),'.',-1)+1)
FROM table;