Lots of WHEN conditions in CASE statement (binning)

Lots of WHEN conditions in CASE statement (binning) - sql

How can I do binning in SQL Server 2008 if I need about 100 bins? I need to group records depending if a binning variable belongs to one of 100 equal intervals.
For example if there is continious variable age I could write:
CASE
WHEN AGE >= 0 AND AGE < 1 THEN '1'
WHEN AGE >= 1 AND AGE < 2 THEN '2'
...
WHEN AGE >= 99 AND AGE < 100 THEN '100'
END [age_group]
But this process would be timeconsuming? Are there some other ways how to do that?

Try This Code Once:
SELECT CASE
WHEN AGE = 0 THEN 1
ELSE Ceiling([age])
END [age_group]
FROM #T
Here CEILING function returns the smallest integer greater than or equal to the specified numeric expression.i.e select CEILING(0.1) SQL Returns 1 As Output
But According to Your Output Requirement Floor(age)+1 is enough to get Required Output.
SELECT Floor([age]) + 1 [age_group]
FROM #T
Here Floor Function Returns the largest integer less than or equal to the specified numeric expression.

Try this based upon your comment about the segments being 1200:
;With Number
AS
(
SELECT *
FROM (Values(1),(2), (3), (4), (5), (6), (7), (8), (9), (10))N(x)
),
Segments
As
(
SELECT (ROW_NUMBER() OVER(ORDER BY Num1.x) -1) * 1200 As StartNum,
ROW_NUMBER() OVER(ORDER BY Num1.x) * 1200 As EndNum
FROM Number Num1
CROSS APPLY Number Num2
)
SELECT *
FROM Segments
SELECT *
FROM Segments
INNER JOIN MyTable
ON MyTable.Price >= StartNum AND MyTable.Price < EndNum

Mathematics, I guess. In this case,
Ceiling(Age) AS [age_group]
cast as necessary into character type of your choice. Ceiling is the 'round up to an integer' function in SQL Server.

You can use arithmetic for this purpose. Something like this:
select floor(bins * (age - minage) / (range + 1)), count(*)
from t cross join
(select min(age) as minage, max(age) as maxage,
1.0*(max(age) - min(age)) as range, 100 as bins
from t
) m
group by floor(bins * (age - minage) / (range + 1))
However, this is overkill for your example, which doesn't need a case at all.

If your interval for the groups are fixed - for example 1200, you can just do an integer division to get the index with that grouping.
For example:
SELECT 1000 / 1200 equals 0
SELECT 2200 / 1200 equals 1
Remember - you need to cast to int to get the result if you're using a decimal datatype. Integer division requires int on both sides of the operator.
And then add 1 to get the group

Related

Get column sum and use to calculate percent of total, why doesn't work with CTEs

I did this following query, however it gave the the result of 0 for each orderStatusName, does anyone know where is the problem?
with tbl as (
select s.orderstatusName, c.orderStatusId,count(c.orderId) counts
from [dbo].[ci_orders] c left join
[dbo].[ci_orderStatus] s
on s.orderStatusId = c.orderStatusId
where orderedDate between '2018-10-01' and '2018-10-29'
group by orderStatusName, c.orderStatusId
)
select orderstatusName, counts/(select sum(counts) from tbl as PofTotal) from tbl
the result is :0

You're using what is known as integer math. When using 2 integers in SQL (Server) the return value is an integer as well. For example, 2 + 2 = 4, 5 * 5 = 25. The same applies to division 8 / 10 = 0. That's because 0.8 isn't an integer, but the return value will be one (so the decimal points are lost).
The common way to change this behaviour is to multiply one of the expressions by 1.0. For example:
counts/(select sum(counts) * 1.0 from tbl) as PofTotal
If you need more precision, you can increase the precision of the decimal value of 1.0 (i.e. to 1.000, 1.0000000, etc).

Use window functions and proper division:
select orderstatusName, counts * 1.0 / total_counts
from (select t.*, sum(counts) over () as total_counts
from tbl
) t;
The reason you are getting 0 is because SQL Server does integer division when the operands are integers. So, 1/2 = 0, not 0.5.

How to Read Data Number by Number

I have a field that contains numbers such as the examples below in #Numbers. Each number within each row in #Numbers relates
to many different values that are contained within the #Area table.
I need to make a relationship from #Numbers to #Area using each number within each row.
CREATE TABLE #Numbers
(
Number int
)
INSERT INTO #Numbers
(
Number
)
SELECT 102 UNION
SELECT 1 UNION
SELECT 2 UNION
select * from #Numbers
CREATE TABLE #Area
(
Number int,
Area varchar(50)
)
INSERT INTO #Area
(
Number,
Area
)
SELECT 0,'Area1' UNION
SELECT 1,'Area2' UNION
SELECT 1,'Area3' UNION
SELECT 1,'Area5' UNION
SELECT 1,'Area8' UNION
SELECT 1,'Area9' UNION
SELECT 2,'Area12' UNION
SELECT 2,'Area43' UNION
SELECT 2,'Area25' UNION
select * from #Area
It would return the following for 102:
102,Area2
102,Area3
102,Area5
102,Area8
102,Area9
102,Area1
102,Area12
102,Area43
102,Area25
For 1 it would return:
1,Area2
1,Area3
1,Area5
1,Area8
1,Area9
For 2 it would return:
2,Area12
2,Area43
2,Area25
Note how the numbers match up to the individual Areas and return the values accordingly.

Well, the OP marked an answer already, which even got votes. Maybe he will not read this, but here is another option using direct simple select, which (according to the EP) seems like using a lot less resources:
SELECT *
FROM #Numbers t1
LEFT JOIN #Area t2 ON CONVERT(VARCHAR(10), t1.Number) like '%' + CONVERT(CHAR(1), t2.Number) + '%'
GO
Note! According to Execution Plan this solution uses only 27% while the selected answer (written by Squirrel) uses 73%, but Execution Plan can be misleading sometimes and you should check IO and TIME statistics as well using the real table structure and real data.

looks like you need to extract individual digit from #Number and then used it to join to #Area
; with tally as
(
select n = 1
union all
select n = n + 1
from tally
where n < 10
)
select n.Number, a.Area
from #Numbers n
cross apply
(
-- here it convert n.Number to string
-- then extract 1 digit
-- and finally convert back to integer
select num = convert(int,
substring(convert(varchar(10), n.Number),
t.n,
1)
)
from tally t
where t.n <= len(convert(varchar(10), n.Number))
) d
inner join #Area a on d.num = a.Number
order by n.Number
or if you prefer to do it in arithmetic and not string
; with Num as
(
select Number, n = 0, Num = Number / power(10, 0) % 10
from #Numbers
union all
select Number, n = n + 1, Num = Number / power(10, n + 1) % 10
from Num
where Number > power(10, n + 1)
)
select n.Number, a.Area
from Num n
inner join #Area a on n.Num = a.Number
order by n.Number

Here is my idea. In theory, it should work.
Have a table (temp or permanent) with the values and it's translation
I.E.
ID value
1 Area1, Area2, Area7, Area8, Area15
2 Area28, Area35
etc
Take each row and put a some special character between each number. Use a function like string_split with that character to turn it into a column of values.
e.g 0123 will then be something like 0|1|2|3 and when you run that through string_split you would get
0
1
2
3
Now join each value to your lookup table and return the Value.
Now you have a row with all the values that you want. Use another function like STUFF FOR XML and put those values back into a single column.
This doesn't sound very efficient.. but this is one way of achieving what you desire..
Another is to do a replace().. but that would be very messy!

Create a third table called n which contains a single column also called n that contains integers from 1 to the maximum number of digits in your number. Make it 1000 if you like, doesn't matter. Then:
select #numbers.number, substring(convert(varchar,#numbers.number),n,1) as chr, Area
from #numbers
join n on n>0 and n <=len(convert(varchar,number))
join #area on #area.number=substring(convert(varchar,#numbers.number),n,1)
The middle column chr is just there to show you what it's doing, and would be removed from the final result.

SQL Server sorting odd numbers ASC and even numbers DESC

I am wondering if there is a way in SQL Server to sort a table which holds numbers in a varchar column. I went and cast the number (in my case House Numbers) to Int and use
order by cast([sano]%2 as int), cast([sano] as Int)
which actually creates the output of 2,4,6,8...,1,3,5,7,9 and so on. but I need to get the output like 2,4,6,8..9,7,5,3,1 so even is asc and then the odd is desc.

One trick is to use a case expression to multiply odd numbers by -1, and thus get the ones with the largest absolute value first in an ascending order:
ORDER BY CAST([sano] % 2 AS INT),
CAST([sano] AS INT) * CASE CAST([sano] % 2 AS INT) WHEN 0 THEN 1 ELSE -1 END

Assuming that there is a reasonable upper bound on a house number, you can use the following:
declare #Samples as Table ( HouseNumber Int );
insert into #Samples ( HouseNumber ) values
( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 );
select HouseNumber,
case HouseNumber % 2 when 0 then HouseNumber else 1000000 - HouseNumber end as SortValue
from #Samples
order by case HouseNumber % 2 when 0 then HouseNumber else 1000000 - HouseNumber end;
For even values it uses the house number. By flipping the sign of odd numbers they are sorted in the opposite order, but an offset is needed to have them sort after the even values.

This seems to work, but I have no idea why
select *
from test
order by CASE WHEN sano%2=0 THEN sano%2 END ASC
I am testing and investigating this behaviour

SQL - Generate "missing rows" with a select

This question relates to SQL 2012 -
Lets say I have 3 rows generated as follows:
Start Position = 10
End Position = 13
Value = 100
Start position = 14
End Position = 14
Value = 250
Start Position = 15
End Position = 25
Value = 300
on 3 rows ..
Is there a way I can force SQL to write the output:
10 - 100
11 - 100
12 - 100
13 - 100
14 - 250
15 - 300
16 - 300
etc and so on and so forth
Been wracking the brains but cant work out an easy way to do it
Thanks a lot
J

You can do this with a recursive CTE or a numbers table. Assuming the gaps are no more than a few hundred or thousand:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select (t.startpos + n.n) as position, value
from t join
n
on t.startpos + n.n <= t.endpos;

No database is complete without its table of numbers! Instructions on how to create one are all over the net, here is an example:
Create a numbers table
I have a numbers table in my database, its called t_numbers, it has a single column "n" with a row for each number starting from 1. It goes up to 999,999 but it takes up very little space on disk. Once you have that, you can write something like this:
set up a bit of data to use first
declare #Rows table
(
StartPos int,
EndPos int,
Value int
)
insert into #Rows values (10, 13, 100), (14, 14, 250), (15, 25, 300)
If you want don't want gaps for nulls use an inner join
select n.n, Value
from t_numbers n
inner join #Rows r on n.n >= StartPos and n.n <= EndPos
If you want the gaps then left join, but limit the return with a where clause
select n.n, Value
from t_numbers n
left join #Rows r on n.n >= StartPos and n.n <= EndPos
where n <= (select MAX(EndPos) from #Rows)

Thankyou people! that did it
In the end I created a table of numbers (thankyou guys)
+ said
CROSS JOIN MyNumbers
WHERE T.Real_Position between T.triangle_start_position and T.triangle_end_position
This gave me the exact resultset I was looking for

AS400 DB2 query math expression in Select

I have not done DB2 queries for a while so I am having issues with a math expression in my Select statement. It does not throw an error but I get the wrong result. Can someone tell me how DB2 evaluates the expression?
Part of my Select is below.
The values are:
t1.Points = 100
t2.Involvepoints = 1
(current date - t1.fromdt) in days is 1268 (so it would be current
date 7/19/2013 - 01/28/2010 in days)
It should read like (100 * 1) * (1 - (.000274 * 1268)) = 65.2568
SELECT Value1,
value2,
(CASE
WHEN (T1.POINTS * T2.INVOLVEPOINTS) * (1 - .000274 * DAYS(CURRENT DATE) - DAYS(T1.FROMDT)) >= 0 THEN (T1.POINTS * T2.INVOLVEPOINTS) * (1 - .000274 * DAYS(CURRENT DATE) - DAYS(T1.FROMDT))
ELSE 0
END) AS POINTSTOTAL
FROM TABLE1;

The parenthesis are not enforcing the correct precedence of operations and the join declaration is missing. In addition you can use the MAX scalar function instead of the repetitive CASE statement.
Here is a proof using common table expressions to simulate the source data:
with
t1 (value1, points, fromdt)
as (select 1, 100, '2010-01-28' from sysibm.sysdummy1),
t2 (value2, involvepoints)
as (select 2, 1 from sysibm.sysdummy1)
select value1, value2,
max(0, t1.points * t2.involvepoints *
(1 - .000274 * (DAYS('2013-07-19') - DAYS(t1.fromdt)))) as pointstotal
from t1, t2;
The result is:
VALUE1 VALUE2 POINTSTOTAL
------ ------ -----------
1 2 65.256800

Did you mean this?
...
(T1.POINTS * T2.INVOLVEPOINTS) * (1 - .000274 * ( DAYS(CURRENT DATE) - DAYS(T1.FROMDT) ) )
...
Note the extra pair of parentheses around the subtraction of dates. Normally multiplication takes precedence over addition, so in your original query you multiply today's date by 0.000274, subtract that from 1, then subtract the value of FROMDT from the result.
Curiously, you have those parentheses in your explanation, but not in the actual formula.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lots of WHEN conditions in CASE statement (binning) - sql

Mathematics, I guess. In this case, Ceiling(Age) AS [age_group] cast as necessary into character type of your choice. Ceiling is the 'round up to an integer' function in SQL Server.

Related

Get column sum and use to calculate percent of total, why doesn't work with CTEs

How to Read Data Number by Number

SQL Server sorting odd numbers ASC and even numbers DESC

SQL - Generate "missing rows" with a select

AS400 DB2 query math expression in Select

Categories

Resources