I'm analyzing the data of New York City taxi trips of yellow cars in 2018. (You need a Google BigQuery account to access this data set.)
The schema says that most of the columns are numeric. However, when I tried to calculate the sum of the key dollar figures (tip_amount, tolls_amount, total_amount), I got an error message saying that they are string variables.
SELECT sum(total_amount)
FROM [bigquery-public-data:new_york_taxi_trips.tlc_yellow_trips_2018]
WHERE month(dropoff_datetime) = 12
Error: Field total_amount is of type STRING which is not supported for SUM
I then tried to use the cast() function to convert it to a numeric variable, but that did not work.
SELECT sum(total_amount_numeric) FROM
(
SELECT cast(total_amount as numeric) as total_amount_numeric
FROM [bigquery-public-data:new_york_taxi_trips.tlc_yellow_trips_2018]
WHERE month(dropoff_datetime) = 12
)
Error: Field total_amount_numeric is of type STRING which is not supported for SUM
How can I analyze these numeric variables as I intended, instead of the string variables as they are erroneously set in the database?
Below is for BigQuery Standard SQL
#standardSQL
SELECT SUM(total_amount)
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`
WHERE EXTRACT(MONTH FROM dropoff_datetime) = 12
The problem you had is because NUMERIC data type is not supported by BigQuery Legacy SQL and rather is treated as STRING and cannot CAST to neither FLOAT nor INTEGER
So, the workaround is to use BigQuery Standard SQL as in above example - and as you see here you don't need to do any CAST'ing as this field is already NUMERIC
Your query will run as follows in Standard SQL:
SELECT sum(total_amount_numeric)
FROM (SELECT cast(total_amount as numeric) as total_amount_numeric
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`
WHERE EXTRACT(month FROM dropoff_datetime) = 12
) x;
You can include this hint before the query to ensure that it is run using standard SQL:
#standardSQL
Related
I am writing a customSQL in Tableau to import data from BigQuery.
Columns in BigQuery View
Column1 | VALUE(Type = Numeric)
I want to CAST the "VALUE" column to Float, so that the column appears in Tableau
My current SQL query:
SELECT Column1, CAST(VALUE as Float) from Table 1
The above statement is giving me an error. Any idea?
According to this information on the BigQuery site the data type should be FLOAT64.
CAST(values AS FLOAT64)
(Original Answer pre-comment)
You are able to convert it to a float in Tableau. FLOAT([YourField])
Finally got it to work using the below SQL
SELECT Column1, CAST(VALUE as Float64) AS VALUE from Table 1.
I was missing "AS VALUE" in my previous SQLstatement.
Note: Tableau v2020.3 will support Numeric data type
Im trying to update table using code below but got error SQL Error: ORA-01840: input value not long enough for date format, DUE_ON_DT_WID is a number column with sample records like this '20191231' and the expected sult on X_NEED_BY_DATE is '31-DEC-19'. X_NEED_BY_DT is a date column. Thank you in advance for the help.
update ADW12_DW.W_PURCH_COST_F T
set (
T.X_NEED_BY_DT
) =
(
select
TO_DATE(DUE_ON_DT_WID,'YYYYMMDD')
from ADW12_DW.I$_1200778522_6 S
where T.DATASOURCE_NUM_ID =S.DATASOURCE_NUM_ID
and T.INTEGRATION_ID =S.INTEGRATION_ID
)
where (DATASOURCE_NUM_ID, INTEGRATION_ID)
in (
select DATASOURCE_NUM_ID,
INTEGRATION_ID
from ADW12_DW.I$_1200778522_6
where IND_UPDATE = 'U'
)
There must be an issue with your data as format YYYYMMDD must have 8 as the length of data (as Year is coming first in your data).
Please find the issue data using the following query and correct it.
select * from ADW12_DW.I$_1200778522_6 where length(DUE_ON_DT_WID) <> 8
I think you have the data like this 20191231 and also 191231. do you want to consider them as the same? i.e. '31-DEC-19' then you need to use the CASE..WHEN and RR/YYYY for formatting the Year as follows:
CASE WHEN LENGTH(DUE_ON_DT_WID) = 6 THEN
TO_DATE(DUE_ON_DT_WID,'RRMMDD')
WHEN LENGTH(DUE_ON_DT_WID) = 8 THEN
TO_DATE(DUE_ON_DT_WID,'YYYYMMDD')
END
If you are using Oracle 12.2 or higher then you can use ON CONVERSION ERROR clause in the TO_DATE to default some value when the conversion from your column to date fails as follows:
TO_DATE(DUE_ON_DT_WID DEFAULT '20010101' ON CONVERSION ERROR, 'YYYYMMDD' )
Using Google BIGQUERY, I need to check that the values in a column called birth_day_col are the correct and desired date format: YYYY-MM-DD. The values in this column are defined as STRING. Also the values in this column are currently of the following format: YYYY-MM-DD.
I researched a lot on the internet and found an interesting workaround. The following query:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]'
AND country_code = 'country1'
But the result is: "This query returned no results."
I then checked with NOT, using the following code:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE NOT(birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]')
AND country_code = 'country1'
Surprisingly it gave all the values in birth_dat_col, which I have verified and are of the correct date format, but this result coud very much be a coincidence.
And it is very strange (wrong) that I used a query that should result only the wrong format dates, but it actually gives me the correct ones. Everything about these two queries seems like an inversation of each one's role.
The expected result of any query for this business case is to make a count of all incorrect formatted dates (even if currently this is 0).
Thank you for your help!
Robert
A couple of things here:
Read the documentation for the LIKE operator if you want to understand how to use it. It looks like you're trying to use regular expression syntax, but the LIKE operator does not take a regular expression as input.
The standard format for BigQuery's dates is YYYY-MM-DD, so you can just try casting and see if the result is a valid date, e.g.:
SELECT SAFE_CAST(birth_day_col AS DATE) AS birth_day_col
FROM `project`.dataset.table
This will return null for any values that don't have the correct format. If you want to find all of the ones that don't have the correct format, you can use SAFE_CAST inside a filter:
SELECT DISTINCT birth_day_col AS invalid_date
FROM `project`.dataset.table
WHERE SAFE_CAST(birth_day_col AS DATE) IS NULL
The result of this query will be all of the date strings that don't use YYYY-MM-DD format. If you want to check for slashes instead, you can use REGEXP_CONTAINS, e.g. try this:
SELECT
date,
REGEXP_CONTAINS(date, r'^[0-9]{4}/[0-9]{2}/[0-9]{2}$')
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
If you want to find all dates with either YYYY-MM-DD format or YYYY/MM/DD format, you can use a query like this:
SELECT
DISTINCT date
FROM `project`.dataset.table
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')
For example:
SELECT
DISTINCT date
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')
Yet another example for BigQuery Standrad SQL - with use of SAFE.PARSE_DATE
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1980/08/10' AS birth_day_col UNION ALL
SELECT '1980-08-10' UNION ALL
SELECT '08/10/1980'
)
SELECT birth_day_col
FROM `project.dataset.table`
WHERE SAFE.PARSE_DATE('%Y-%m-%d', birth_day_col) IS NULL
with result of list of all dates which are not formatted as yyyy-mm-dd
Row birth_day_col
1 1980/08/10
2 08/10/1980
Google BigQuery's LIKE operator does not support matching digits nor does it uses the [ character in its syntax (I don't think ISO standard SQL does either - LIKE is nowhere near as powerful as Regex).
X [NOT] LIKE Y
Checks if the STRING in the first operand X matches a pattern specified by the second operand Y. Expressions can contain these characters:
A percent sign "%" matches any number of characters or bytes
An underscore "_" matches a single character or byte
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%".
You should use REGEX_CONTAINS instead.
I note that string format tests won't tell you if a date is valid or not, however. Consider that 2019-02-31 has a valid date format, but an invalid date value. I suggest using a datatype conversion function (to convert the STRING to a DATE value) instead.
I'm a B-grade SQL user, so bear with me. I have a field that is in the NVARCHAR format ("Year"), but all but only about 1 in 1000 records is something other than a number. Yes, this is a ridiculous way to do this, but we receive this database from a customer, and we can't change it.
I want to pull records from the database where the year field is greater than something (say, 2006 or later). I can ignore any record whose year doesn't evaluate to an actual year. We are using SQL server 2014.
I have created an embedded query to convert the data to a "float" field, but for whatever reason, I can't add a where clause with this new floating-point field. I originally tried using a "case-if" but I got the same result.
I'm pulling my hair out, as I'm either missing something really silly, or there's a bug in SQL server. When I look at the field in the little hint, it's showing as a float. When I run this, I get "Error converting data type nvarchar to float."
SELECT VL.Field_A,
VL.FLYear,
VL.Field_B
FROM
(select
Field_A,
cast ([Year] as float) as FLYear,
/* didn't work either*/
/*Convert(float, [Year]) as FLYear, */
Field_B
from CustomerProvidedDatabaseTable
where (Field_A like 'E-%' OR
Field_A like 'F-%')
and
(isnumeric(year)=1)
and
year is not null
) VL
/* this statement is the one it chokes on */
where
VL.FLYear >= 2006.0
If I remove the last "where" clause, it works fine, and the field looks like a number. If I change the last where clause to:
where VL.FLYear like '%2006%'
SQL Server accepts it, though of course it doesn't return me all the records I want.
Try to simplify it and just use TRY_CONVERT(DATETIME, aYearvalue) or TRY_PARSE which will return NULL for values it can't convert and continue to process valid rows. I think you can do away with the where clause as join and just work directly against the column like: (substitute the literal string after datetime with your column)
SET DATEFORMAT mdy;
Select YEAR(try_convert(datetime, '08/01/2017')) as value1
WHERE value1 >=2016;
Try cast/convert to a numeric data type. I have modified the last line of your query to do just that. Take a peek.
SELECT
VL.Field_A,
VL.FLYear,
VL.Field_B
FROM
(select
Field_A,
cast ([Year] as float) as FLYear,
/* didn't work either*/
/*Convert(float, [Year]) as FLYear, */
Field_B
from CustomerProvidedDatabaseTable
where (Field_A like 'E-%' OR
Field_A like 'F-%')
and
(isnumeric(year)=1)
and
year is not null
) VL
/* this statement is the one it chokes on */
where
ISNUMERIC(VL.FLYear) = 1
and
CAST(VL.FLYear AS INT) >= 2006
Check out the following link for cast and convert documentation:
https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
NOTE: ISNUMERIC will return true ( a false positive for a value which has a scientific numerical value, e.g. 1E10, though I don't see this happening from your data).
Another option is TRY_CONVERT.
Documentation on TRY_CONVERT: https://learn.microsoft.com/en-us/sql/t-sql/functions/try-convert-transact-sql
Try using Cast . Use the below link to check in more detail about casting.
https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
im sure i am not the first one to ask this but i can't find the answer to this:
I haver a select query on a datatable in a sqlite database.
select *, ((int_EndTime)-(int_StartTime))/60 as dou_usage_min FROM tbl_unautho_usage;
when i run this i get all the fields from the datatable including a new column calculated from to integer columns with unix time stamp values. However, i want my calculated column to be of the type double. With the query above i get a type integer.
select *, ((int_EndTime as float)-(int_StartTime as float))/60 as dou_usage_min FROM tbl_unautho_usage;
Afterwards I tried to change the column type of my integer-columns to float, but this gies me the following error:
near "as": syntax error:
i got the idea for that from the following post:
How to cast computed column with correct decimal/$ result
Try multiplying a value used within the arithmetic operation by 1.0.
select
*,
((int_EndTime*1.0)-(int_StartTime*1.0))/60 as dou_usage_min
FROM tbl_unautho_usage;
Probably only one value multiplied will be sufficient.
The correct syntax of a CAST expression is "CAST(something AS type)".
But in this case, for the division to be done with floating-point numbers, it is sufficient for at least one of the operands to be a floating-point number:
SELECT *, (int_EndTime - int_StartTime) / 60.0 ...