Division with Aggregate Functions in SQL Not Behaving as Expected - sql

I'm trying to do some crosstabs in SQL Server 2008 R2. That part is alright, however, if I try to get percentages for each cell, I run into a problem.
Here is a distilled use case: A survey where people give their favorite color and their favorite fruit. I'd like to know how many like a given fruit AND a given color:
with survey as (
select 'banana' fav_fruit, 'yellow' fav_color
union select 'banana', 'red'
union select 'apple', 'yellow'
union select 'grape', 'red'
union select 'apple', 'blue'
union select 'orange', 'purple'
union select 'pomegranate', 'green'
)
select
s.fav_color,
sum(case
when s.fav_fruit = 'banana' then 1
else 0
end) as banana,
sum(case
when s.fav_fruit = 'banana' then 1
else 0
end) / sum(1) -- why does division always yield 0? "+", "-", and "*" all behave as expected.
* 100 as banana_pct,
sum(1) as total
from
survey s
group by
s.fav_color;
Results:
fav_color banana banana_pct total
------------------------------------
blue 0 0 1
green 0 0 1
purple 0 0 1
red 1 0 2
yellow 1 0 2
What I was expecting:
fav_color banana banana_pct total
------------------------------------
blue 0 0 1
green 0 0 1
purple 0 0 1
red 1 50 2
yellow 1 50 2
Please help me to get what I was expecting?

You are using SQL Server. Here is a much simpler example that replicates the issue:
select 1/2
SQL Server does integer division.
Replace the denominator with something like sum(1.0) or sum(cast 1 as float) or sum(1e0) instead of sum(1).
Contrary to my expectation at least, SQL Server stores numbers with decimal points as numeric/decimal type (see here) rather than float. The fixed number of decimal spaces might affect subsequent operations.

Query:
SQLFIddleexample
SELECT s.fav_color,
sum( CASE WHEN s.fav_fruit = 'banana' THEN 1 ELSE 0 END ) AS banana,
sum( CASE WHEN s.fav_fruit = 'banana' THEN 1 ELSE 0 END) / sum(1.00) -- why does division always yield 0? "+", "-", and "*" all behave as expected.
* 100 AS banana_pct,
sum(1) AS total
FROM survey s
GROUP BY s.fav_color
Result:
| FAV_COLOR | BANANA | BANANA_PCT | TOTAL |
-------------------------------------------
| blue | 0 | 0 | 1 |
| green | 0 | 0 | 1 |
| purple | 0 | 0 | 1 |
| red | 1 | 50 | 2 |
| yellow | 1 | 50 | 2 |

I've recently discovered the IIF function. It makes things much cleaner. Taking Justin's example from above:
SELECT s.fav_color,
sum( IIF(s.fav_fruit = 'banana', 1,0) AS banana,
sum( IIF(s.fav_fruit = 'banana', 1,0) / sum(1.00)
* 100 AS banana_pct,
sum(1) AS total
FROM survey s
GROUP BY s.fav_color

Related

Postgres SQL aggregates query in BigQuery?

For Below Postgres SQL query, I do use PIVOT in BigQuery, beside PIVOT, any other method for such query in BigQuery?
-- Postgres SQL --
SELECT
Apple,
Orange,
Lemon,
CASE WHEN Apple >= 50 THEN 1 ELSE 0 END AS Apple50
CASE WHEN Orange >= 50 THEN 1 ELSE 0 END AS Orange50
CASE WHEN Lemon >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT td.timestamp,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 16), 0) as Apple,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 17), 0) as Orange,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 18), 0) as Lemon
FROM TableData td
WHERE td.attribute_id IN (16, 17, 18)
GROUP BY td.timestamp
ORDER BY timestamp;
) AS td2
-- My attempt BigQuery Query --
SELECT
value_16 as Apple,
value_17 as Orange,
value_18 as Lemon,
CASE WHEN value_16 >= 50 THEN 1 ELSE 0 END as Apple50
CASE WHEN value_17 >= 50 THEN 1 ELSE 0 END as Orange50
CASE WHEN value_18 >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT * FROM(
SELECT
timestamp,
attribute_id,
value
FROM `PROJECT_ID.DB_NAME.FRUITS` as td
WHERE td.attribute_id IN (16,17,18)
)PIVOT
(
MAX(value) as value
FOR attribute_id IN (16,17,18)
)
)as td2
Below is the sample relation of the table.
-- TableData --
attribute_id | value | timestamp |
--------------+-----------+------------+
17 | 100 | 1618822794 |
17 | 100 | 1618822861 |
16 | 50 | 1618822794 |
16 | 50 | 1618822861 |
-- TableAttribute --
id | name |
--------------+----------+
16 | Apple |
17 | Orange |
18 | Lemon |
-- Expected Result --
timestamp | Apple | Orange | Lemon | Apple50 | Orange50 | Lemon50 |
--------------+---------+--------+-------+---------+----------+---------+
1618822794 | 50 | 100 | 0 | 1 | 1 | 0
1618822861 | 50 | 100 | 0 | 1 | 1 | 0
Pivot is likely the best way to achieve what you're wanting. Consider the following approach though as it might be simpler to manage:
with aggregate_data as (
select td.timestamp
, ta.name
, td.value as value
from TableData td
full outer join TableAttribute ta
on td.attribute_id = ta.id
)
select timestamp
, value_Apple as Apple
, value_Orange as Orange
, value_Lemon as Lemon
, _50_Apple as Apple50
, _50_Orange as Orange50
, _50_Lemon as Lemon50
from aggregate_data
pivot(max(value) value, max(case when value >=50 then 1 else 0 end) _50 for name in ('Apple', 'Orange', 'Lemon'))
where timestamp is not null

T-SQL Crosstab count query

If have the following dataset:
... and I want to do a crosstab of sorts, counting the data against specific criteria e.g.:
Colour criteria: String contains "Blue", "Red", "Yellow" or "Green" (not case sensitive)
Type criteria: String contains "Car", "Lorry", or "Bus (not case sensitive)
... and I would like the result to look like the following:
Is there an SQL query that I can run on the original data to produce the result I'm looking for?
You can use CROSS APPLY with conditional aggregation; CROSS APPLY simplifies the generation of the list of colours:
select c.colour,
sum(case when v.VehicleData like '%Car%' then 1 else 0 end) Car,
sum(case when v.VehicleData like '%Lorry%' then 1 else 0 end) Lorry,
sum(case when v.VehicleData like '%Bus%' then 1 else 0 end) Bus
from vehicles v
cross apply (values ('Blue'), ('Red'), ('Yellow'), ('Green')
) AS c(colour)
where v.VehicleData like '%' + c.colour + '%'
group by c.colour
Output:
colour Car Lorry Bus
Blue 3 1 0
Red 1 2 0
Yellow 0 1 1
Green 0 0 2
Demo on dbfiddle
With conditional aggregation:
select c.colour,
count(case when t.VehicleData like '%Car%' then 1 end) Car,
count(case when t.VehicleData like '%Lorry%' then 1 end) Lorry,
count(case when t.VehicleData like '%Bus%' then 1 end) Bus
from (
select 'Blue' colour union all
select 'Red' union all
select 'Yellow' union all
select 'Green'
) c left join tbl1 t
on t.VehicleData like '%' + c.colour + '%'
group by c.colour
See the demo.
Results:
> colour | Car | Lorry | Bus
> :----- | --: | ----: | --:
> Blue | 3 | 1 | 0
> Red | 1 | 2 | 0
> Yellow | 0 | 1 | 1
> Green | 0 | 0 | 2

Sql query for getting group by on 2 columns

I am expecting the result of having count of 2 different columns values
Name | fruits
----------------
Vishal | orange
Manish | orange
Vishal | apple
Manish | orange
Manish | apple
Vishal | orange
Vishal | mango
Vishal | banana
Result should be
Name | Orange count | Apple count| mango | banana
--------------------------
Vishal | 2 | 1 | 1 | 1
Manish | 2 | 1 | 0 | 0
Another result should be
name | fruits
---------------
Vishal | orange, Apple , mango, banana
Manish | orange , Apple
You can use conditional aggregation for this:
select name,
count(case when fruits = 'orange' then 1 end) as orange_count,
count(case when fruits = 'apple' then 1 end) as apple_count,
count(case when fruits = 'mango' then 1 end) as mango_count,
count(case when fruits = 'banana' then 1 end) as banana_count
from the_table
group by name;
Some DBMS also support the ANSI SQL filter clause which makes this a bit more readable:
select name,
count(*) filter (where fruits = 'orange') as orange_count,
count(*) filter (where fruits = 'apple') as apple_count,
count(*) filter (where fruits = 'mango') as mango_count,
count(*) filter (where fruits = 'banana') as banana_count
from the_table
group by name;
Here is a generic pivot query which should work across most RDBMS:
SELECT Name,
SUM(CASE WHEN fruits = 'orange' THEN 1 ELSE 0 END) AS orange_count,
SUM(CASE WHEN fruits = 'apple' THEN 1 ELSE 0 END) AS apple_count,
SUM(CASE WHEN fruits = 'mango' THEN 1 ELSE 0 END) AS mango_count,
SUM(CASE WHEN fruits = 'banana' THEN 1 ELSE 0 END) AS banana_count
FROM yourTable
GROUP BY Name
If you are using SQL Server, Oracle, or Postgres, there are built-in PIVOT functions which can simplify this and possibly improve performance as well.

SQL: count occurrences of values

Let
user | fruit
------------
1 | apple
1 | apple
1 | apple
2 | apple
2 | apple
1 | pear
Trying to combine count and group by to get
user | apples | pears
---------------------
1 | 3 | 1
2 | 2 | 0
Any hints on how to proceed are appreciated.
Use case expressions to do conditional counting:
select user,
count(case when fruit = 'apple' then 1 end) as apples,
count(case when fruit = 'pear' then 1 end) as pears
from tablename
group by user
If you´re working on an Oracle, you would use the PIVOT-function:
SELECT *
FROM fruit t
PIVOT (COUNT(fruit) AS cnt
FOR(fruit) IN ('apple' AS apple
, 'pear' AS pear) );
More details and full samples on PIVOT / UNPIVOT you´ll find in the web (f.e. here https://oracle-base.com/articles/11g/pivot-and-unpivot-operators-11gr1 )

How can I turn a bunch of rows into aggregated columns WITHOUT using pivot in SQL Server 2005?

Here is the scenario:
I have a table that records the user_id, the module_id, and the date/time the module was viewed.
eg.
Table: Log
------------------------------
User_ID Module_ID Date
------------------------------
1 red 2001-01-01
1 green 2001-01-02
1 blue 2001-01-03
2 green 2001-01-04
2 blue 2001-01-05
1 red 2001-01-06
1 blue 2001-01-07
3 blue 2001-01-08
3 green 2001-01-09
3 red 2001-01-10
3 green 2001-01-11
4 white 2001-01-12
I need to get a result set that has the user_id as the 1st column, and then a column for each module. The row data is then the user_id and the count of the number of times that user viewed each module.
eg.
---------------------------------
User_ID red green blue white
---------------------------------
1 2 1 2 0
2 0 1 1 0
3 1 2 1 0
4 0 0 0 1
I was initially thinking that I could do this with PIVOT, but no dice; the database is a converted SQL Server 2000 DB that is running in SQL Server 2005. I'm not able to change the compatibility level, so pivot is out.
The other catch is that the modules will vary, and it isn't feasible to re-write the query every time a module is added or removed. This means that I can't hard-code in the modules because I don't know in advance which will and will not be installed.
How can I accomplish this?
PIVOT can be simulated with CASE and GROUP BY
select
[user_id],
sum(case when [Module_ID] = 'red' then 1 else 0 end) as red,
sum(case when [Module_ID] = 'green' then 1 else 0 end) as green,
sum(case when [Module_ID] = 'blue' then 1 else 0 end) as blue,
sum(case when [Module_ID] = 'white' then 1 else 0 end) as white
from [log]
group by
[user_id]
Of course this doesn't work if the modules vary (as stated in the question) but then, PIVOT has the same problem.
Dynamically generating some sql overcomes this problem but this solution smells a bit!
declare #sql nvarchar(max)
set #sql = '
select
[user_id],'
select #sql = #sql + '
sum(case when [Module_ID] = ''' + replace([Module_ID], '''','''''') + ''' then 1 else 0 end) as [' + replace([Module_ID], '''','') + '],'
from (select distinct [Module_ID] from [log]) as moduleids
set #sql = substring(#sql,1,len(#sql)-1) + '
from [log]
group by
[user_id]
'
print #sql
exec sp_executesql #sql
Note that this may be vulnerable to sql-injection if the module id data can't be trusted.
SELECT User_ID, MAX(red) AS red, MAX(green) AS green, MAX(blue) AS blue,
MAX(white) AS white FROM
((SELECT User_ID, COUNT(Module_ID) AS red, 0 AS green, 0 AS blue,
0 AS white
FROM log
WHERE Module_ID = 'red'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, COUNT(Module_ID) AS green, 0 AS blue,
0 AS white
FROM log
WHERE Module_ID = 'green'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, 0 AS green, COUNT(Module_ID) AS blue,
0 AS white
FROM log
WHERE Module_ID = 'blue'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, 0 AS green, 0 AS blue,
COUNT(Module_ID) AS white
FROM log
WHERE Module_ID = 'white'
GROUP BY User_ID))
GROUP BY User_ID
ORDER BY User_ID
Using MySQL I did this:
Copied your data into Log_Table.sql
create table Log (User_ID mediumint, Module_ID CHAR(5), dte CHAR(10));
load data infile 'Log_Table.sql' INTO TABLE Log FIELDS TERMINATED BY ',';
Pivot:
select User_ID AS 'USER', sum(case
Module_ID WHEN 'red' then 1 else 0
END) AS 'red',
sum(case Module_ID WHEN 'green' then 1
else 0 END) AS 'green',
sum(case Module_ID WHEN 'blue' then 1
else 0 END) AS 'blue',
sum(case Module_ID WHEN 'white' then 1
else 0 END) AS 'white'
from Log
Group By User_ID;
> +------+------+-------+------+-------+
> | USER | red | green | blue | white |
> +------+------+-------+------+-------+
> | 1 | 2 | 1 | 2 | 0 |
> | 2 | 0 | 1 | 1 | 0 |
> | 3 | 1 | 2 | 1 | 0 |
> | 4 | 0 | 0 | 0 | 1 |
> +------+------+-------+------+-------+
> 4 rows in set (0.00 sec)
Hope this helps.
I believe characteristic functions are what you want.