I'm working in Google BigQuery (not using LegacySQL), and I'm currently trying to cast() a string as a float64. Each time I get the error "Bad double value". I've also tried safe_cast() but it completely eliminates some of my id's (Ex: if one customer repeats 3 times for 3 different dates, and only has 'null' for a single "Height" entry, that customer is completely eliminated after I do safe_cast(), not just the row that had the 'null' value). I don't have any weird string value in my data, just whole or rational numbers or null entries.
Here's my current code:
select id, date,
cast(height as float64) as height,
cast(weight as float64) as weight
from (select id, date, max(height) as height, max(weight) as weight
from table
group by 1,2
)
group by 1, 2
Of course safe_cast() returns NULL values. That is because you have inappropriate values in the data.
You can find these by doing:
select height, weight
from table
where safe_cast(height) is null or safe_cast(weight) is null;
Once you understand what the values are, fix the values or adjust the logic of the query.
If you just want the max of values are are properly numeric, then cast before the aggregation:
select id, date,
max(safe_cast(height as float64)) as height,
max(safe_cast(weight as float64)) as weight
from table
group by 1, 2;
A subquery doesn't seem necessary or desirable for your query.
Related
I have a table in PostgreSQL, that have a Float column. In my select I use AVG() on that column, so often it gives a number with many decimals. Is there any way to retrict the number of decimals to a maximum of 3, meaning there can be less but not more than 3.
This is the Query:
SELECT team, AVG(score) FROM team_score_table GROUP BY team
You can use round():
select round(val::numeric, 3)
You can also convert to a numeric, but you need a precision appropriate for your values:
select val::numeric(20, 3)
I actually prefer the explicit cast() because it sets the data type of the column to a numeric with an explicit scale -- so downstream apps are aware of the number of decimal places intended in the result.
round() returns a numeric value but it is a "generic" numeric, with no specified scale and precision.
You can see the difference in this example.
You can use a several functions to do that:
SELECT round(42.43666, 2) -- 42.44
SELECT trunc(42.43666, 2) -- 42.43
or cast:
SELECT cast(42.43666 as numeric(20, 2)) -- 42.44
according to your example should be:
SELECT team, round(AVG(score)::numeric, 2) FROM team_score_table GROUP BY team
SELECT team, trunc(AVG(score)::numeric, 2) FROM team_score_table GROUP BY team
SELECT team, cast(AVG(score) as numeric(20,2)) FROM team_score_table GROUP BY team
I'm currently working on a task where I'm dealing with a table in which every row has a boolean value, "is_important". I am trying to create a ratio of important entries to total entries grouped by date but I can't seem to get SQL to recognize that I want to divide using a WHERE clause.
One method is:
select date, avg( case when is_important then 1.0 else 0 end) as important_ratio
from t
group by date;
There may also be shortcuts, depending on the database you are using, such as:
avg( is_important )
avg( is_important::int )
I'm working on an assignment and I have a table with several columns. The two I'm interested in are the Type and the Easting column.
I'm trying to use a query to return the max value from the Easting column and also show me what that value holds in the Type column.
I'm using Microsoft Access for the assignment.
Here is what I have so far, but it returns all the values not the max
SELECT Type, Location,MAX(Easting)
FROM CrimeData
GROUP BY Easting, Location, Type
Any help would be great.
You'll need to obtain the maximum Easting value (using a subquery), and then select all records which hold such value, e.g.:
select c.type, c.location, c.easting
from crimedata c
where c.easting = (select max(t.easting) from crimedata t)
Solutions which use a group by clause will only provide the maximum Easting value for records within each combination of values held by the fields in the group by clause.
You must group by all output columns except the one with the aggregate (MAX).
SELECT Type, Location, MAX(Easting)
FROM CrimeData
GROUP BY Type, Location
When you are grouping, you can filter the raw data before grouping with a WHERE clause and the grouped data after grouping with a having clause. e.g.
SELECT Type, Location, MAX(Easting)
FROM CrimeData
WHERE Type > 1
GROUP BY Type, Location
HAVING MAX(Easting) < 10
Try this:
SELECT Type, Location, MAX(Easting)
FROM CrimeData
GROUP BY Location, Type
HAVING Easting = MAX(Easting)
or simpler:
SELECT Type, Location, Easting
FROM CrimeData
where Easting = (select max(Easting) from CrimeData)
The column you use in the function (in your case MAX) should not be part of the group by statement.
Hopefully this clears it otherwise I can explain more ;)
I am currently running a query that runs a sum function and also divides this number. Currently I get values like 0.0904246741698848, and 1.6419814808335567. I want these decimals to be trimmed to 2 spaces past the decimal point. Their schema is a float. Here is my code. Thanks for the help.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.Firebase_ConnectionInfo`
WHERE PeripheralType = 1 or PeripheralType = 2 or PeripheralType = 12
GROUP BY Serial
ORDER BY Latest_Use DESC
#standardSQL
WITH `data` AS (
SELECT 0.0904246741698848 AS val UNION ALL
SELECT 1.6419814808335567
)
SELECT val, ROUND(val, 2) AS rounded_val
FROM `data`
for example, assuming your want apply this to your Total_Hours column :
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
ROUND(SUM(ConnectionTime/3600),2) AS Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.Firebase_ConnectionInfo`
WHERE PeripheralType = 1 OR PeripheralType = 2 OR PeripheralType = 12
GROUP BY Serial
ORDER BY Latest_Use DESC
I found that rounding was problematic if my data had a whole number such as 2.00 and I needed all of my data to reflect 2 decimal places as these were for prices that end up getting displayed. Big Query was returning 2.0 no matter what I specified to round to using ROUND.
Assuming you're working with data that never surpasses 2 decimal places, and it is stored as a STRING, this code will work (if it's more decimal places, add another 0 to the addition for each space).
FORMAT("%.*f",2,CAST(GROSS_SALES_AMT AS FLOAT64) + .0001)
This will take a float in BigQuery and format it with two decimal points.
CAST(SUM(ConnectionTime/3600) AS STRING FORMAT '999,999.99')
Note: Add a a currency symbol (e.g., $) for currency ($999,999.99).
Example:
You can always use the round() function.
If you are looking for precision after decimal (as using round will round-off the values) you can use substr(str(value),precision) which will give exact output after decimal.
It would be appreciated explaining the internal functionality of SUM function in Oracle, when encountering null values:
The result of
select sum(null) from dual;
is null
But when a null value is in a sequence of values (like sum of a null-able column), the calculated value of null value will be 0
select sum(value) from
(
select case when mod(level , 2) = 0 then null else level end as value from dual
connect by level <= 10
)
is 25
This will be more interesting when seeing the result of
select (1 + null) from dual
is null
As any operation with null will result null (except is null operator).
==========================
Some update due to comments:
create table odd_table as select sum(null) as some_name from dual;
Will result:
create table ODD_TABLE
(
some_name NUMBER
)
Why some_name column is of type number?
If you are looking for a rationale for this behaviour, then it is to be found in the ANSI SQL standards which dictate that aggregate operators ignore NULL values.
If you wanted to override that behaviour then you're free to:
Sum(Coalesce(<expression>,0))
... although it would make more sense with Sum() to ...
Coalesce(Sum(<expression>),0)
You might more meaningfully:
Avg(Coalesce(<expression>,0))
... or ...
Min(Coalesce(<expression,0))
Other ANSI aggregation quirks:
Count() never returns null (or negative, of course)
Selecting only aggregation functions without a Group By will always return a single row, even if there is no data from which to select.
So ...
Coalesce(Count(<expression>),0)
... is a waste of a good coalesce.
SQL does not treat NULL values as zeros when calculating SUM, it ignores them:
Returns the sum of all the values, or only the DISTINCT values, in the expression. Null values are ignored.
This makes a difference only in one case - when the sequence being totalled up does not contain numeric items, only NULLs: if at least one number is present, the result is going to be numeric.
You're looking at this the wrong way around. SUM() operates on a column, and ignores nulls.
To quote from the documentation:
This function takes as an argument any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. The function returns the same data type as the numeric data type of the argument.
A NULL has no data-type, and so your first example must return null; as a NULL is not numeric.
Your second example sums the numeric values in the column. The sum of 0 + null + 1 + 2 is 3; the NULL simply means that a number does not exist here.
Your third example is not an operation on a column; remove the SUM() and the answer will be the same as nothingness + 1 is still nothingness. You can't cast a NULL to an empty number as you can with a string as there's no such thing as an empty number. It either exists or it doesn't.
Arithmetic aggregate functions ignore nulls.
SUM() ignores them
AVG() calculates the average as if the null rows didn't exist (nulls don't count in the total or the divisor)
As Bohemian has pointed out, both SUM and AVG exclude entries with NULL in them. Those entries do not go into the aggregate. If AVG treated NULL entries as zero, it would bias the result towards zero.
It may appear to the casual observer as though SUM is treating NULL entries as zero. It's really excluding them. If all the entries are excluded, the result is no value at all, which is NULL. Your example illustrates this.
This is incorrect: The sum of 0 + null + 1 + 2 is 3;
select 0 + null + 1 + 2 total from dual;
Result is null!
Similar statements give result null if any operand is null.
Here's a solution if you want to sum and NOT ignore nulls.
This solution splits the records into two groups: nulls and non-nulls. NVL2(a, 1, NULL) does this by changing all the non-nulls to 1 so they sort together identically. It then sorts those two groups to put the null group first (if there is one), then sums just the first of the two groups. If there are no nulls, there will be no null group, so that first group will contain all the rows. If, instead, there is at least one null, then that first group will only contain those nulls, and the sum of those nulls will be null.
SELECT SUM(a) AS standards_compliant_sum,
SUM(a) KEEP(DENSE_RANK FIRST ORDER BY NVL2(a, 1, NULL) DESC) AS sum_with_nulls
FROM (SELECT 41 AS a FROM DUAL UNION ALL
SELECT NULL AS a FROM DUAL UNION ALL
SELECT 42 AS a FROM DUAL UNION ALL
SELECT 43 AS a FROM DUAL);
You can optionally include NULLS FIRST to make it a little more clear about what's going on. If you're intentionally ordering for the sake of moving nulls around, I always recommend this for code clarity.
SELECT SUM(a) AS standards_compliant_sum,
SUM(a) KEEP(DENSE_RANK FIRST ORDER BY NVL2(a, 1, NULL) DESC NULLS FIRST) AS sum_with_nulls
FROM (SELECT 41 AS a FROM DUAL UNION ALL
SELECT NULL AS a FROM DUAL UNION ALL
SELECT 42 AS a FROM DUAL UNION ALL
SELECT 43 AS a FROM DUAL);