Cast in Google BigQuery not appropriate?

Cast in Google BigQuery not appropriate? - sql

I have a #StandardSQL query
SELECT
CAST(created_utc AS STRING),
author,
FROM
`table`
WHERE
something = "Something"
which gives me the following error,
Error: Cannot read field 'created_utc' of type STRING as INT64
An example of created_utc is 1517360483
If I understand that error, which I clearly don't. created_utc is stored a string, but the query is trying unsuccessfully to convert it to a INT64. I would have hoped the CAST function would enforce it to be kept as a string.
What have I done wrong?

The problem is that you don't actually have a single table. In your question, you wrote table, but I suspect that you are querying table*, which matches multiple tables where one of them happens to have a different type for that column. Instead of using table*, your options are to:
Use UNION ALL with the individual tables, preforming casts as appropriate in the SELECT lists.
If you know which table(s) have that column as an INT64 instead of a STRING, and you are okay with excluding them, you can use a filter on _TABLE_SUFFIX to skip reading from certain tables.

As Elliott has already pointed - some of your values are actually cannot be casted to INT64 because they are not represented integers and rather have some other characters than digits
Using below SELECT you can identify such values so it will help you to locate problematic entries and make then decision on next actions
#standardSQL
SELECT created_utc, author
FROM `table`
WHERE something = "Something"
AND NOT REGEXP_CONTAINS(created_utc , r'[0-9]')

Related

SQL Decode format numbers only

I want to format amounts to salary format, e.g. 10000 becomes 10,000, so I use to_char(amount, '99,999,99')
SELECT SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0)) Salary,
SUM(DECODE(e.element_name,'Transportation Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Transportation,
SUM(DECODE(e.element_name,'GOSI Processing',to_char(v.screen_entry_value,'99,999,99'),0)) GOSI,
SUM(DECODE(e.element_name,'Housing Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Housing
FROM values v,
values_types vt,
elements e
WHERE vt.value_type = 'Amount'
this gives error invalid number because not all values are numbers until value_type is equal to Amount but I guess decode check all values anyway although what I know is that the execution begins with from then where then select, what's going wrong here?

You said you added decode(...), but it looks like you might have actually added sum(decode(...)).
You are converting your values to strings with to_char(v.screen_entry_value,'99,999,99'), so your decode() generates a string - the default 0 will be converted to '0' - giving you a value like '1,234,56'. Then you are aggregating those, so sum() has to implicitly convert those strings to numbers - and it is throwing the error when it tries to do that:
select to_number('1,234,56') from dual
will also get "ORA-01722: invalid number", unless you supply a similar format mask so it knows how to interpret it. You could do that, e.g.:
SUM(to_number(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0),'99,999,99'))
... but it's maybe more obvious that something is strange, and even if you did, you would end up with a number, not a formatted string.
So instead of doing:
SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0))
you should format the result after aggregating:
to_char(SUM(DECODE(e.element_name,'Basic Salary',v.screen_entry_value,0)),'99,999,99')
fiddle with dummy tables, data and joins.

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query

You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.

It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

Athena/Presto : complex structure/array

Think that I am asking the impossible here, but throwing it out there.
Trying to query some json in Athena.
The data I'm working with looks like this (excerpt)
condition={
"foranyvalue:stringlike":{"s3:prefix":["lala","hehe"]},
"forallvalues:stringlike":{"s3:prefix":["apples","bananas"]
}
.. and I need to get to here :
... PLUS:the key names are not fixed, so one day I might get:
condition={"something not seen before":{"surprise":["haha","hoho"]}}
With that last point, I was hoping to treat this an an array, and start by splitting the 'foranyvalue' and 'forallvalues' parts into separate rows.
but with everything wrapped in {}, it refuses to unnest.
But despite the above failed plan - ANY tips on solving this by ANY means gratefully received !
Thank You

When you have JSON data that does not have a schema that is easy to describe you can use STRING as the type of the column and then use Athena/Presto's JSON functions to query them, in combination with casting to MAP and UNNEST to flatten the structures.
One way of achieving what I think you're trying to do would be something like this:
WITH the_table AS (
SELECT CAST(condition AS MAP(VARCHAR, JSON)) AS condition
FROM (
VALUES
(JSON '{"foranyvalue:stringlike":{"s3:prefix":["lala","hehe"]},"forallvalues:stringlike":{"s3:prefix":["apples","bananas"]}}'),
(JSON '{"something not seen before":{"surprise":["haha","hoho"]}}')
) AS t (condition)
),
first_flattening AS (
SELECT
SPLIT(first_level_key, ':', 2) AS first_level_key,
CAST(first_level_value AS MAP(VARCHAR, JSON)) AS first_level_value
FROM the_table
CROSS JOIN UNNEST (condition) AS t (first_level_key, first_level_value)
),
second_flattening AS (
SELECT
first_level_key,
second_level_key,
second_level_value
FROM first_flattening
CROSS JOIN UNNEST (first_level_value) AS t (second_level_key, second_level_value)
)
SELECT
first_level_key[1] AS "for",
TRY(first_level_key[2]) AS condition,
second_level_key AS "left",
second_level_value AS "right"
FROM second_flattening
I've included the two examples you gave as an inline VALUES list in the first CTE, and exactly what to do in the table declaration (i.e. what type for the column to use) and what processing to do in the query (i.e. the cast) depends on your data and how you want/can set up the table. YMMV.
The query flattens the JSON structure in a couple of separate steps, first flattening the first level of keys and values, then the keys and values of the inner documents. It might be possible to do this in one step, but doing it in two at least makes it easier to read.
Since the first level keys don't always have the colon I've used TRY to make sure that accessing the second value doesn't break anything. You could perhaps filter out values without a colon earlier and avoid this, since you're not interested in them.

SQL Select to keep out fields that are NULL

I am trying to connect a Filemaker DB to Firebird SQL DB in both ways import to FM and export back to Firebird DB.
So far it works using the MBS Plug-in but FM 13 Pro canot handle NULL.
That means that for example Timestamp fields that are empty (NULL) produce a "0" value.
Thats means in Time something like 01.01.1889 00:00:00.
So my idea was to simply ignore fields containing NULL.
But here my poor knowlege stops.
First I thought I can do this with WHERE, but this is ignoring whole records sets:
SELECT * FROM TABLE WHERE FIELD IS NOT NULL
Also I tried to filter it later on like this:
If (IsEmpty (MBS("SQL.GetFieldAsDateTime"; $command; "FIELD") ) = 0 ; MBS("SQL.GetFieldAsDateTime"; $command; "FIELD"))
With no result either.

This is a direct answer to halfbit's suggestion, which is correct but not for this SQL dialect. In a query to provide a replacement value when a field is NULL you need to use COALESCE(x,y). Where if X is null, Y will be used, and if Y is null then the field is NULL. Thats why it is common for me to use it like COALESCE(table.field,'') such that a constant is always outputted if table.field happens to be NULL.
select COALESCE(null,'Hello') as stackoverflow from rdb$database
You can use COALESCE() for more than two arguments, I just used two for conciseness.

I dont know the special SQL dialect, but
SELECT field1, field2, value(field, 0), ...FROM TABLE
should help you:
value gives the first argument, ie, your field if it is NOT NULL or the second argument if it is.

How to find MAX() value of character column?

We have legacy table where one of the columns part of composite key was manually filled with values:
code
------
'001'
'002'
'099'
etc.
Now, we have feature request in which we must know MAX(code) in order to give user next possible value, in example case form above next value is '100'.
We tried to experiment with this but we still can't find any reasonable explanation how DB2 engine calculates that
MAX('001', '099', '576') is '576'
MAX('099', '99', 'www') is '99' and so on.
Any help or suggestion would be much appreciated!

You already have the answer to getting the maximum numeric value, but to answer the other part with regard to 'www','099','99'.
The AS/400 uses EBCDIC to store values, this is different to ASCII in several ways, the most important for your purposes is that Alpha characters come before numbers, which is the opposite of Ascii.
So on your Max() your 3 strings will be sorted and the highest EBCDIC value used so
'www'
'099'
'99 '
As you can see your '99' string is really '99 ' so it is higher that the one with the leading zero.

Cast it to int before applying max()

For the numeric maximum -- filter out the non-numeric values and cast to a numeric for aggregation:
SELECT MAX(INT(FLD1))
WHERE FLD1 <> ' '
AND TRANSLATE(FLD1, '0123456789', '0123456789') = FLD1
SQL Reference: TRANSLATE
And the reasonable explanation:
SQL Reference: MAX

This max working well in your type definition, when you want do max on integer values then convert values to integer before calling MAX, but i see you mixing max with string 'www' how you imagine this works?
Filter integer only values, cast it to int and call max. This is not good designed solution but looking at your problem i think is enough.

Sharing the solution for postgresql
which worked for me.
Suppose here temporary_id is of type character in database. Then above query will directly convert char type to int type when it gives response.
SELECT MAX(CAST (temporary_id AS Integer)) FROM temporary
WHERE temporary_id IS NOT NULL
As per my requirement I've applied MAX() aggregate function. One can remove that also and it will work the same way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cast in Google BigQuery not appropriate? - sql

Related

SQL Decode format numbers only

Convert strings into table columns in biq query

Athena/Presto : complex structure/array

SQL Select to keep out fields that are NULL

How to find MAX() value of character column?

Categories

Resources