How to add up a string of numbers using SQL (BigQuery)? - sql

I have a string of numbers like this:
670000000000100000000000000000000000000000000000000000000000000
I want to add up these numbers which in the above example would result in 14: 6+7+0+...+1+0+...+0+0+0=14
How would I do this in BigQuery?

Consider below approach
with example as (
select '670000000000100000000000000000000000000000000000000000000000000' as s
)
select s, (select sum(cast(num as int64)) from unnest(split(s,'')) num) result
from example
with output

Yet another [fun] option
create temp function sum_digits(expression string)
returns int64
language js as """
return eval(expression);
""";
with example as (
select '670000000000100000000000000000000000000000000000000000000000000' as s
)
select s, sum_digits(regexp_replace(replace(s, '0', ''), r'(\d)', r'+\1')) result
from example
with output
What it does is -
first it transform initial long string into shorter one - 671.
then it transforms it into expression - +6+7+1
and finally pass it to javascript eval function (unfortunatelly BigQuery does not have [hopefully yet] eval function)

Related

Google Bigquery Complex UDF

I am creating a UDF in bigquery to call in a more powerful query. The input to the UDF is a string made up of numbers and different length units. There are three main cases. See below for an explanation of the cases and also an example for each case. The "#" represents a number. "Unit" represents one of 3 distance units (Miles, Yards, and Furlongs). Case 3 is comprised of two different units that are added together. The end goal of the UDF is to normalize the input to one unit (yards), remove any alphabetic characters, and convert any complex fractions to floats. The UDF would then return back a string.
Cases:
'# Unit'; Example: Input '350 Y', Output '350'
'# #/#Unit'; Example: Input '5 1/2F', Output '1210'
'#Unit1 #Unit2'; Example:Input '4F 70Y', Output '950.002'
I have tried to do this using If statements. In my first attempt at this, I could only get rid of the complex fraction and two units that get added to each other. Is there a way to do many if statements to hit all possible combinations? I haven't found a way to use else-if conditional statements. Any advice, guidance, or code would be very much appreciated. I am relatively new to using SQL/bigquery so please let me know if I am doing this in a bad way. Below is my first attempt code:
CREATE OR REPLACE FUNCTION `location`(str STRING) AS (
(
if(
REGEXP_CONTAINS(str, r' ') AND REGEXP_CONTAINS(str, r'/')=FALSE #does not contain /
, (1760 * SAFE_CAST(SPLIT(REGEXP_REPLACE(str, 'M',''),' ')[OFFSET(0)] AS FLOAT64)) + SAFE_CAST(SPLIT(str,' ')[OFFSET(1)] AS FLOAT64)
,if(
REGEXP_CONTAINS(str, r'/'),
SAFE_CAST(SPLIT(str,' ')[OFFSET(0)] AS FLOAT64) + SAFE_CAST(SPLIT(SPLIT(str,' ')[OFFSET(1)], '/')[OFFSET(0)] AS INT64) / SAFE_CAST(SPLIT(SPLIT(str,' ')[OFFSET(1)], '/')[OFFSET(1)] AS INT64),
IFNULL(SAFE_CAST(REGEXP_REPLACE(str, r'FYM','') AS FLOAT64), -1)
)
)
)
);
Consider below
create temp function eval (str string) returns float64
language js as r"""
return eval(str);
""";
select str,
(
select sum(
case right(x,1)
when 'M' then 1760
when 'F' then 220
when 'Y' then 1
end * eval(replace(trim(translate(x, 'MFY', '')), ' ', '+')))
from unnest(regexp_extract_all(str, r'[^MFY]+(?:M|F|Y)')) val,
unnest([struct(trim(val) as x)])
) yards
from your_table
if applied to sample data in your question - output is
Update (per recent comments): you can package whole stuff into js udf and sql udf as in below example
create temp function eval (str string) returns float64
language js as r"""
return eval(str);
""";
create temp function to_yards(str string) as ((
select sum(
case right(x,1)
when 'M' then 1760
when 'F' then 220
when 'Y' then 1
end * eval(replace(trim(translate(x, 'MFY', '')), ' ', '+')))
from unnest(regexp_extract_all(str, r'[^MFY]+(?:M|F|Y)')) val,
unnest([struct(trim(val) as x)])
));
select str, to_yards(str) as yards
from your_table
with same output as above

cast string to array<float64> in standard-sql

I've imported a csv into a table in BigQuery, and some columns hold a string which really should be a list of floats.
So now i'm try to convert these strings to some arrays. Thanks to SO I've managed to convert a string to a list of floats, but now i get one row by element of the list, instead of 1 row by initial row ie the array is "unnested". But it's a problem as it would generate a huge amount of duplicated data. Would someone please now how to do the conversion from STRING to ARRAY<FLOAT64> please?
partial code:
with tbl as (
select "id1" as id,
"10000\t10001\t10002\t10003\t10004" col1_str,
"10000\t10001.1\t10002\t10003\t10004" col2_str
)
select id, cast(elem1 as float64) col1_floatarray, cast(elem2 as float64) col2_floatarray
from tbl
, unnest(split(col1_str, "\t")) elem1
, unnest(split(col2_str, "\t")) elem2
expected:
1 row, with 3 columns of types STRING id, ARRAY<FLOAT64> col1_floatarray, ARRAY<FLOAT64> col2_floatarray
Thank you!
Use below
select id,
array(select cast(elem as float64) from unnest(split(col1_str, "\t")) elem) col1_floatarray,
array(select cast(elem as float64) from unnest(split(col2_str, "\t")) elem) col2_floatarray
from tbl
if applied to sample data in y our question - output is

how can I convert a number to a string, extract each digit, cast it again to integers, and add them all together (e.g. 1+2+3+4+5+6) in SQL?

I'm thinking I have to create a view that stores the cast as text and I have to utilize the substring function in sql but I'm a little lost. Any help would be greatly appreciated?
You can convert a string to rows using unnest() and string_to_array() and then add the digits using sum()
select sum(digit::int)
from unnest(string_to_array(12345::text, null)) as x(digit)
You can format the number
select to_char(12345, '0+0+0+0+0+0+0');
to_char
----------------
0+0+1+2+3+4+5
Inside a function, you can run this statement as a dynamic query to get the result
EXECUTE 'SELECT ' || to_char(12345, '0+0+0+0+0+0+0');
You can use a recursive CTE:
WITH RECURSIVE qsum AS (
SELECT 123456 AS num,
0 AS sum
UNION ALL
SELECT num / 10,
sum + num % 10
FROM qsum
WHERE num > 0
)
SELECT max(sum)
FROM qsum;
max
═════
21
(1 row)
You can get there with regexp_split_to_table, using ?!^ to split on every character.
regexp_split_to_table(cast(<your column> as text),'(?!^)') as s
Here's a simple Fiddle
You can shred the string to rows, then sum:
SELECT SUM(CAST(n as INTEGER))
FROM regexp_split_to_table('01234567', '') as n
But this sort of string math is probably a better fit for calling code. If you are trying to validate new records, consider doing it outside of the DB.

How to get the first field from an anonymous row type in PostgreSQL 9.4?

=# select row(0, 1) ;
row
-------
(0,1)
(1 row)
How to get 0 within the same query? I figured the below sort of working but is there any simple way?
=# select json_agg(row(0, 1))->0->'f1' ;
?column?
----------
0
(1 row)
No luck with array-like syntax [0].
Thanks!
Your row type is anonymous and therefore you cannot access its elements easily. What you can do is create a TYPE and then cast your anonymous row to that type and access the elements defined in the type:
CREATE TYPE my_row AS (
x integer,
y integer
);
SELECT (row(0,1)::my_row).x;
Like Craig Ringer commented in your question, you should avoid producing anonymous rows to begin with, if you can help it, and type whatever data you use in your data model and queries.
If you just want the first element from any row, convert the row to JSON and select f1...
SELECT row_to_json(row(0,1))->'f1'
Or, if you are always going to have two integers or a strict structure, you can create a temporary table (or type) and a function that selects the first column.
CREATE TABLE tmptable(f1 int, f2 int);
CREATE FUNCTION gettmpf1(tmptable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
SELECT gettmpf1(ROW(0,1));
Resources:
https://www.postgresql.org/docs/9.2/static/functions-json.html
https://www.postgresql.org/docs/9.2/static/sql-expressions.html
The json solution is very elegant. Just for fun, this is a solution using regexp (much uglier):
WITH r AS (SELECT row('quotes, "commas",
and a line break".',null,null,'"fourth,field"')::text AS r)
--WITH r AS (SELECT row('',null,null,'')::text AS r)
--WITH r AS (SELECT row(0,1)::text AS r)
SELECT CASE WHEN r.r ~ '^\("",' THEN ''
WHEN r.r ~ '^\("' THEN regexp_replace(regexp_replace(regexp_replace(right(r.r, -2), '""', '\"', 'g'), '([^\\])",.*', '\1'), '\\"', '"', 'g')
ELSE (regexp_matches(right(r.r, -1), '^[^,]*'))[1] END
FROM r
When converting a row to text, PostgreSQL uses quoted CSV formatting. I couldn't find any tools for importing quoted CSV into an array, so the above is a crude text manipulation via mostly regular expressions. Maybe someone will find this useful!
With Postgresql 13+, you can just reference individual elements in the row with .fN notation. For your example:
select (row(0, 1)).f1; --> returns 0.
See https://www.postgresql.org/docs/13/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS

Selecting data into a Postgres array

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.
You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;
You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;