Is there a numeric version of SUBSTR in BigQuery? - sql

This may be a complete noob question. I have spent hours looking for a solution but haven't found one...
I am trying to capture the first digit of each number in a column of numbers. For example:
173563 = 1
247309 = 2
653638 = 6
etc
I know that I could do this using SUBSTR if the value were a string but that doesn't work for numbers. I have a work around but there must be a better way than how I'm currently doing it.
Any help would be much appreciated
with
-- Convert to string and get the first character
stage1 as (
SELECT
number
, SUBSTR(CAST(number AS STRING), 1, 1) AS cut_off
FROM my_table
),
-- Convert back to a numeric value
stage2 as (
SELECT
number
, CAST(cut_off AS NUMERIC) AS cut_off,
FROM stage1
)
select
number,
cut_off
from stage2

You can use below "trick"
with `project.dataset.table` as (
select 173563 number union all
select 247309 union all
select 653638
)
select number, left('' || number, 1) cut_off
from `project.dataset.table`
with output
Obviously, such a trick allows you to use any string function to meet your needs

You can do this all in one expression:
CAST(LEFT(CAST(number AS STRING), 1) as int64)
If the length of the number is always 6 digits, you can use arithmetic:
FLOOR(number / 100000)
Or:
DIV(number, 100000)

Related

How to extract a part of a decimal in Hive

I have two columns called quantity and price. Quantity is divided by price. If the result contains decimal, I want the number before the decimal. Or else, the number as it is.
I think you are looking for:
select floor(quantity / price) as output
Casting issues?
select cast(quantity / price as bigint)
By the way, I think you may want this:
Hive Data Types Manual
Also DIV operator can be used (for integer arguments):
with your_data as (
select stack(3, 13,2,8,2,233,2) as(quantity,price)
)
select quantity div price
from your_data d
;
Result:
6
4
116

How to extract first number after decimal point in value

I have an age column which calculates age for each member in my report.The output is a whole number followed by a decimal point and numbers. I would like the first number only right after the decimal point .
I tried trunc but it gives me everything before the decimal and then the number I want after .Then I tried to trunc with a call out with a comma and it doesnt work.
trunc(age,',')
Example -
age 15.7
expected output 7
Here the mathematical answer
take the decimal part by susbriacting the whole part (trunc).
multiply by 10 and take the whole part
.
with age as (select 15.7231 age from dual)
select trunc(10*(age-trunc(age))) dp1 from age
DP1
----------
7
try like below
select substr(to_char(15.7,'9999.0'),-1,1) as col from dual
it will return 7
Multiply by 10, trunc it and take the remainder of the division by 10.
with age as (select 15.7231 age from dual)
select mod(trunc(10*age), 10) dp from age
Output:
DP
--
7

Trim a decimal to 2 places Bigquery

I am currently running a query that runs a sum function and also divides this number. Currently I get values like 0.0904246741698848, and 1.6419814808335567. I want these decimals to be trimmed to 2 spaces past the decimal point. Their schema is a float. Here is my code. Thanks for the help.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.Firebase_ConnectionInfo`
WHERE PeripheralType = 1 or PeripheralType = 2 or PeripheralType = 12
GROUP BY Serial
ORDER BY Latest_Use DESC
#standardSQL
WITH `data` AS (
SELECT 0.0904246741698848 AS val UNION ALL
SELECT 1.6419814808335567
)
SELECT val, ROUND(val, 2) AS rounded_val
FROM `data`
for example, assuming your want apply this to your Total_Hours column :
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
ROUND(SUM(ConnectionTime/3600),2) AS Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.Firebase_ConnectionInfo`
WHERE PeripheralType = 1 OR PeripheralType = 2 OR PeripheralType = 12
GROUP BY Serial
ORDER BY Latest_Use DESC
I found that rounding was problematic if my data had a whole number such as 2.00 and I needed all of my data to reflect 2 decimal places as these were for prices that end up getting displayed. Big Query was returning 2.0 no matter what I specified to round to using ROUND.
Assuming you're working with data that never surpasses 2 decimal places, and it is stored as a STRING, this code will work (if it's more decimal places, add another 0 to the addition for each space).
FORMAT("%.*f",2,CAST(GROSS_SALES_AMT AS FLOAT64) + .0001)
This will take a float in BigQuery and format it with two decimal points.
CAST(SUM(ConnectionTime/3600) AS STRING FORMAT '999,999.99')
Note: Add a a currency symbol (e.g., $) for currency ($999,999.99).
Example:
You can always use the round() function.
If you are looking for precision after decimal (as using round will round-off the values) you can use substr(str(value),precision) which will give exact output after decimal.

"bad double value" in Google BigQuery

I'm working in Google BigQuery (not using LegacySQL), and I'm currently trying to cast() a string as a float64. Each time I get the error "Bad double value". I've also tried safe_cast() but it completely eliminates some of my id's (Ex: if one customer repeats 3 times for 3 different dates, and only has 'null' for a single "Height" entry, that customer is completely eliminated after I do safe_cast(), not just the row that had the 'null' value). I don't have any weird string value in my data, just whole or rational numbers or null entries.
Here's my current code:
select id, date,
cast(height as float64) as height,
cast(weight as float64) as weight
from (select id, date, max(height) as height, max(weight) as weight
from table
group by 1,2
)
group by 1, 2
Of course safe_cast() returns NULL values. That is because you have inappropriate values in the data.
You can find these by doing:
select height, weight
from table
where safe_cast(height) is null or safe_cast(weight) is null;
Once you understand what the values are, fix the values or adjust the logic of the query.
If you just want the max of values are are properly numeric, then cast before the aggregation:
select id, date,
max(safe_cast(height as float64)) as height,
max(safe_cast(weight as float64)) as weight
from table
group by 1, 2;
A subquery doesn't seem necessary or desirable for your query.

Oracle sql count

I have below information in table and want to retrive the count if difference between two dates is >= 1.
Id testdate exdate
1 20120502 20120501 --> This should included, because diff is 1
2 20120601 20120601 --> This should not included, because diff is 0
3 20120704 20120703 --> This should included, because diff is 1
4 20120803 20120802 --> This should included, because diff is 1
Based on the above data, my select count should return 3.
I am trying the following, but it's not giving any results:
select count(to_char(testdate,'YYYYMMDD')-to_char(exdate,'YYYYMMDD')) from test ;
select count(*)
from my_table
where testdate <> exdate
You really should convert those to a date data-type though... it saves a lot of problems in the long run.
Your query will give you results. It will return 4. It gives you results because as long as the result of testdate - exdate is not null it will return a value for that row.
However, as you're not using dates Oracle will most probably convert those to numbers, which won't help for date comparisons should you do that in the future.
20120901 - 20120831 = 70 -- not 1
Okay, from your comment:
Working with ,if i use down voteaccept select count(*) from test where
to_char((testdate,'YYYYMMDD') - to_char(exdate,'YYYYMMDD')) >= 1; .But
count is one of the column.how to retrive above select statement as
one of the column
you're trying something completely different.
Your dates are actually dates; it's helpful to post this. You're looking for an analytic function, specifically count().
select a.*, count(*) over ( partition by 1 ) as ct
from my_table a
where trunc(exdate) <> trunc(testdate)
Note the trunc function, which, without additional parameters will remove the time portion of the date this enabling a direct comparison without resorting to converting the date to a character.
select count(*)
from test
where to_date(testdate,'YYYYMMDD') - to_date(exdate,'YYYYMMDD') >= 1;
or
select count(*)
from test
where to_date(testdate,'YYYYMMDD') <> to_date(exdate,'YYYYMMDD');
Looking at testdate and exdate it looks more like the columns are VARCHAR type so you would require apropriate date conversion.
In Oracle if the type is date you can calculate with them. 1 equal 1 day. 1/24 equals 1 hour.
Your case is rather easy because you could even compare the strings.
SELECT count(*)
FROM test
WHERE testdate <> exdate
But it sounds like you want to be able to be variable, so you rather convert them to a date and then you can do
SELECT count(*)
FROM test
WHERE to_date(testdate,'YYYYMMDD')-to_date(exdate,'YYYYMMDD') >= 1
I am not sure what you want if testdate minus exdate is -1 or more because the exdate is after testdate. Then you can work with ABS
SELECT count(*)
FROM test
WHERE ABS(to_date(testdate,'YYYYMMDD')-to_date(exdate,'YYYYMMDD')) >= 1