Get column sum and use to calculate percent of total, why doesn't work with CTEs - sql

I did this following query, however it gave the the result of 0 for each orderStatusName, does anyone know where is the problem?
with tbl as (
select s.orderstatusName, c.orderStatusId,count(c.orderId) counts
from [dbo].[ci_orders] c left join
[dbo].[ci_orderStatus] s
on s.orderStatusId = c.orderStatusId
where orderedDate between '2018-10-01' and '2018-10-29'
group by orderStatusName, c.orderStatusId
)
select orderstatusName, counts/(select sum(counts) from tbl as PofTotal) from tbl
the result is :0

You're using what is known as integer math. When using 2 integers in SQL (Server) the return value is an integer as well. For example, 2 + 2 = 4, 5 * 5 = 25. The same applies to division 8 / 10 = 0. That's because 0.8 isn't an integer, but the return value will be one (so the decimal points are lost).
The common way to change this behaviour is to multiply one of the expressions by 1.0. For example:
counts/(select sum(counts) * 1.0 from tbl) as PofTotal
If you need more precision, you can increase the precision of the decimal value of 1.0 (i.e. to 1.000, 1.0000000, etc).

Use window functions and proper division:
select orderstatusName, counts * 1.0 / total_counts
from (select t.*, sum(counts) over () as total_counts
from tbl
) t;
The reason you are getting 0 is because SQL Server does integer division when the operands are integers. So, 1/2 = 0, not 0.5.

Related

How to calc Win Rate in SQL?

I am trying to calculate win rate for the below table in SQL but not getting a correct answer
What I am looking for is WON = 1111/1496*100
So far i've got
SELECT Sum( Status = 'Won') /(Select Count(Status))*100 as Win_rate
FROM table
If i run the above it gives me 0.
Your problem is integer division. Both operands of the division are integers, so SQLite forces an integer result. A typical workaround is force decimal context, like:
SELECT 100.0 * sum(Status = 'Won') / Count(*) as Win_rate FROM mytable
But it is simpler to use avg() here:
SELECT avg(Status = 'Won') * 100 as Win_rate
FROM mytable

Out of range integer: infinity

So I'm trying to work through a problem thats a bit hard to explain and I can't expose any of the data I'm working with but what Im trying to get my head around is the error below when running the query below - I've renamed some of the tables / columns for sensitivity issues but the structure should be the same
"Error from Query Engine - Out of range for integer: Infinity"
WITH accounts AS (
SELECT t.user_id
FROM table_a t
WHERE t.type like '%Something%'
),
CTE AS (
SELECT
st.x_user_id,
ad.name as client_name,
sum(case when st.score_type = 'Agility' then st.score_value else 0 end) as score,
st.obs_date,
ROW_NUMBER() OVER (PARTITION BY st.x_user_id,ad.name ORDER BY st.obs_date) AS rn
FROM client_scores st
LEFT JOIN account_details ad on ad.client_id = st.x_user_id
INNER JOIN accounts on st.x_user_id = accounts.user_id
--WHERE st.x_user_id IN (101011115,101012219)
WHERE st.obs_date >= '2020-05-18'
group by 1,2,4
)
SELECT
c1.x_user_id,
c1.client_name,
c1.score,
c1.obs_date,
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
FROM CTE c1
LEFT JOIN CTE c2 on c1.x_user_id = c2.x_user_id and c1.client_name = c2.client_name and c1.rn = c2.rn +2
I know the query works for sure because when I get rid of the first CTE and hard code 2 id's into a where clause i commented out it returns the data I want. But I also need it to run based on the 1st CTE which has ~5k unique id's
Here is a sample output if i try with 2 id's:
Based on the above number of row returned per id I would expect it should return 5000 * 3 rows = 150000.
What could be causing the out of range for integer error?
This line is likely your problem:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
When the value of c2.score is 0, 1.0/c2.score will be infinity and will not fit into an integer type that you’re trying to cast it into.
The reason it’s working for the two users in your example is that they don’t have a 0 value for c2.score.
You might be able to fix this by changing to:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / NULLIF(c2.score, 0)) * 100, 0) AS INT) AS score_diff

Lots of WHEN conditions in CASE statement (binning)

How can I do binning in SQL Server 2008 if I need about 100 bins? I need to group records depending if a binning variable belongs to one of 100 equal intervals.
For example if there is continious variable age I could write:
CASE
WHEN AGE >= 0 AND AGE < 1 THEN '1'
WHEN AGE >= 1 AND AGE < 2 THEN '2'
...
WHEN AGE >= 99 AND AGE < 100 THEN '100'
END [age_group]
But this process would be timeconsuming? Are there some other ways how to do that?
Try This Code Once:
SELECT CASE
WHEN AGE = 0 THEN 1
ELSE Ceiling([age])
END [age_group]
FROM #T
Here CEILING function returns the smallest integer greater than or equal to the specified numeric expression.i.e select CEILING(0.1) SQL Returns 1 As Output
But According to Your Output Requirement Floor(age)+1 is enough to get Required Output.
SELECT Floor([age]) + 1 [age_group]
FROM #T
Here Floor Function Returns the largest integer less than or equal to the specified numeric expression.
Try this based upon your comment about the segments being 1200:
;With Number
AS
(
SELECT *
FROM (Values(1),(2), (3), (4), (5), (6), (7), (8), (9), (10))N(x)
),
Segments
As
(
SELECT (ROW_NUMBER() OVER(ORDER BY Num1.x) -1) * 1200 As StartNum,
ROW_NUMBER() OVER(ORDER BY Num1.x) * 1200 As EndNum
FROM Number Num1
CROSS APPLY Number Num2
)
SELECT *
FROM Segments
SELECT *
FROM Segments
INNER JOIN MyTable
ON MyTable.Price >= StartNum AND MyTable.Price < EndNum
Mathematics, I guess. In this case,
Ceiling(Age) AS [age_group]
cast as necessary into character type of your choice. Ceiling is the 'round up to an integer' function in SQL Server.
You can use arithmetic for this purpose. Something like this:
select floor(bins * (age - minage) / (range + 1)), count(*)
from t cross join
(select min(age) as minage, max(age) as maxage,
1.0*(max(age) - min(age)) as range, 100 as bins
from t
) m
group by floor(bins * (age - minage) / (range + 1))
However, this is overkill for your example, which doesn't need a case at all.
If your interval for the groups are fixed - for example 1200, you can just do an integer division to get the index with that grouping.
For example:
SELECT 1000 / 1200 equals 0
SELECT 2200 / 1200 equals 1
Remember - you need to cast to int to get the result if you're using a decimal datatype. Integer division requires int on both sides of the operator.
And then add 1 to get the group

Split float between list of numbers

I have problem with splitting 0.00xxx float values between numbers.
Here is example of input data
0 is sum of 1-3 float numbers.
As result I want to see rounded numbers without loosing sum of 1-3:
IN:
0 313.726
1 216.412
2 48.659
3 48.655
OUT:
0 313.73
1 216.41
2 48.66
3 48.66
How it should work:
Idea is to split the lowest rest(in our example it's 0.002 from value 216.412) between highest. 0.001 to 48.659 = 48.66 and 0.001 to 48.655 = 48.656 after this we can round numbers without loosing data.
After sitting on this problem yesterday I found the solution. The query as I think should look like this.
select test.*,
sum(value - trunc(value, 2)) over (partition by case when id = 0 then 0 else 1 end) part,
row_number() over(partition by case when id = 0 then 0 else 1 end order by value - trunc(value, 2) desc) rn,
case when row_number() over(partition by case when id = 0 then 0 else 1 end order by value - trunc(value, 2) desc) / 100 <=
round(sum(value - trunc(value, 2)) over (partition by case when id = 0 then 0 else 1 end), 2) then trunc(value, 2) + 0.01 else trunc(value, 2) end result
from test;
But still for me it's strange to add const value "0.01" while getting the result.
Any ideas to improve this query?
You could use the round() sql function when presenting results. Round()'s second argument is the number of significant digits you want to round the number to. Issuing this select on the test table:
select id, round(value, 2) from test;
gives you the following result
0 313.73
1 216.41
2 48.66
3 48.65
Generally, you can use the stored numbers for summations and then use the round() function for presentation of the results: Here is a way to do the sum with the full significant digits and then use the round() function for presenting the final result:
select sum(value) from test where id != 0
gives the result: 313.726
select round(sum(value), 2) from test where id != 0
gives the result: 313.73
By the way allow me two observations:
1) the rounding you give for id = 3 is confusing to me: 48.654 rounds to 48.65 rather than 48.66 in two significant digits. Am I missing something?
2) Strictly speaking this issue is not a pl/sql issue as labeled. It is totally in the realm of sql. However there is a round() function in pl/sql as well and the same principles apply.
select id, value,
case when id <> max(id) over () then round(value, 2)
else round(value, 2) - sum(round(value, 2)) over () +
round(first_value(value) over (order by id), 2) * 2
end val_rnd
from test
Output:
ID VALUE VAL_RND
------ ---------- ----------
0 313.726 313.73
1 216.413 216.41
2 48.659 48.66
3 48.654 48.66
Above query works, but it moves all difference to last row. And this is not "honest" and maybe not what you are after for other scenarios.
The most "unhonest" behavior is observable with big number of values, all equal 0.005.
To make full distribution you need to:
sum all original values in sub-rows and subtract rounded total value from row with id 0,
use row_number() to sort sub-rows in order of difference between rounded value and original value (maybe descending, it depends on sign of difference, use sign(), abs),
assign to each row value increased by .01 (or decreased if difference < 0 ) until it reaches difference/.01 (use case when ),
union row with id = 0 containing rounded sum
optionally sort results.
It's hard (but achievable) in one query. Alternative is some PL/SQL procedure or function, which might be more readable.
If I get you correct, you don't want to use round because rounding the partial numbers don't match the rounded total.
In this case simple trick is applied. You use round for all but the last number. The last fraction is calculated as a difference between the rounded sum and the rounded parts so far (all but the last one).
You may express this with analytical function as follows
WITH total AS
(SELECT id, value, ROUND(value,2) value_rounded FROM test WHERE id = 0
),
rounded AS
( SELECT id, value, ROUND(value,2) value_rounded FROM test WHERE id != 0
)
SELECT id, value_rounded FROM total
UNION ALL
SELECT id,
CASE
WHEN row_number() over (order by id) != COUNT(*) over ()
THEN
/* not the last row - regular result */
value_rounded
ELSE
/* last row - corrected result */
(select value_rounded from total) - SUM(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
END AS value
FROM rounded
ORDER BY id;
Note that this is the test for the last numer
row_number() over (order by id) != COUNT(*) over ()
and this is the sum of all parts from begin (UNBOUNDED PRECEDING) up to the one but last ( 1 PRECEDING)
SUM(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
I splitted your data in two source total - one row with the total and and rounded parts.
UPDATE
In some case the last corrected number shows an "ugly" large difference to the original value,
as the differences in one rounding direction are higher that in the opposite one.
The following select takes this in account and distributes the difference between the parts.
The example bellow illustrated this on teh example with lot of 0.05s
WITH nums AS
(SELECT rownum id, 0.005 value FROM dual connect by level <= 5
),
rounded AS
( SELECT id, value, ROUND(value,2) value_rounded FROM nums
),
with_diff as
(SELECT id, value, value_rounded,
-- difference so far - between the exact SUM and SUM of rounded parts
-- cut to two decimal points
floor(100* (
sum(value) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -
sum(value_rounded) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))
/ 100 diff_so_far
FROM rounded),
delta_diff as
(select id, value, value_rounded,DIFF_SO_FAR,
DIFF_SO_FAR - LAG(DIFF_SO_FAR,1,0) over (order by ID) as diff_delta
from with_diff)
SELECT id, value,
CASE
WHEN row_number() over (order by id) != COUNT(*) over ()
THEN
/* not the last row - take the rounded value and ... */
value_rounded +
/* ... add or subtract the delta difference */
diff_delta
ELSE
/* last row - corrected result */
round(sum(value) over(),2) - SUM(value_rounded + diff_delta) over (order by id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
END AS value_rounded, diff_delta
FROM delta_diff
ORDER BY id;
ID VALUE VALUE_ROUNDED DIFF_DELTA
---------- ---------- ------------- ----------
1 ,005 0 -0,01
2 ,005 ,01 0
3 ,005 0 -0,01
4 ,005 ,01 0
5 ,005 ,01 -0,01
pragamtic solution based on following rules:
1) check the difference between the rounded sum and sum of rounded parts.
select round(sum(value),2) - sum(round(value,2)) from test where id != 0;
2) apply this difference
e.g. if you get 0.01, this means one rounded part must be increased by 0.01
if you get -.02, it means two rounded parts must be decreased by 0.01
The query below simple correct the last N parts:
with diff as (
select round(sum(value),2) - sum(round(value,2)) diff from test where id != 0
), diff_values as
(select sign(diff)*.01 diff_value, abs(100*diff) corr_cnt
from diff)
select id, round(value,2)
+ case when row_number() over (order by id desc) <= corr_cnt then diff_value else 0 end result
from test, diff_values where id != 0
order by id;
ID RESULT
---------- ----------
1 216,41
2 48,66
3 48,66
If the number of corrected records in much higher than two, check the data and the rounding precision.

T-SQL average rounded to the closest integer

I'm not sure if this has been asked before, but how do I get the average rounded to the closest integer in T-SQL?
This should do it. You might need a GROUP BY on the End depending on what you are looking for the average of.
SELECT CONVERT(int,ROUND(AVG(ColumnName),0))
FROM
TableName
EDIT: This question is more interesting than I first thought.
If we set up a dummy table like so...
WITH CTE
AS
(
SELECT 3 AS Rating
UNION SELECT 4
UNION SELECT 7
)
SELECT AVG(Rating)
FROM
CTE
We get an integer average of 4
However if we do this
WITH CTE
AS
(
SELECT 3.0 AS Rating
UNION SELECT 4.0
UNION SELECT 7.0
)
SELECT AVG(Rating)
FROM
CTE
We get a decimal average of 4.666..etc
So it looks like the way to go is
WITH CTE
AS
(
SELECT 3 AS Rating
UNION SELECT 4
UNION SELECT 7
)
SELECT CONVERT(int,ROUND(AVG(CONVERT(decimal,Rating)),0))
FROM CTE
Which will return an integer value of 5 which is what you are looking for.
If you are in SQL Server, just use round(avg(column * 1.0), 0).
The reason for * 1.0 is because sql server in some cases returns calculations using the same datatype of the values used in the calculation. So, if you calculate the average of 3, 4 and 4, the result is 3.66..., but the datatype of the result is integer, therefore the sql server will truncate 3.66... to 3, using * 1.0 implicit convert the input to a decimal.
Alternatively, you can convert or cast the values before the average calculation, like cast(column as decimal) instead of using the * 1.0 trick.
If your column it's not a integer column, you can remove the * 1.0.
PS: the result of round(avg(column * 1.0), 0) still is a decimal, you can explicit convert it using convert(int, round(avg(column * 1.0), 0), 0) or just let whatever language you are using do the job (it's a implicit conversion)
Select cast(AVG(columnname) as integer)
This worked for it:
CONVERT(int,ROUND(AVG(CAST(COLUMN-NAME AS DECIMAL)) ,0))
Isn't there a shorter way of doing it though?
T-SQL2018.
CAST(ROUND(COLUMN, 0) AS INT) This code does the job for me and gives the output I require so a 4.8 becomes 5.
whereas
CAST(AVG(COLUMN) AS INT) This code almost does the job but rounds down, so 4.8 becomes a 4 and not 5.
select cast(avg(a+.5) as int) from
(select 1 a union all select 2) b
If you don't like shortcuts, you could use the long way:
select round(avg(cast(a as real)), 0)
from (select 1 a union all select 2) b
The following statements are equivalent:
-- the original code
CONVERT(int, ROUND(AVG(CAST(mycolumn AS DECIMAL)) ,0))
-- using '1e0 * column' implicitly converts mycolumn value to float
CONVERT(int, ROUND(AVG(1e0 * mycolumn) ,0))
-- the conversion to INT already rounds the value
CONVERT(INT, AVG(1e0 * mycolumn))
On SQL 2014,
select round(94,-1)
select round(95,-1)