How to group the outcomes of a query by the values in another column? - sql

I have a table called vegetation with 2 columns:
type, count
I want to sum the count for all the rows where count value is smaller than the average count and all those for which it is higher.
I don't know how to reflect this in the group by clause... (or somewhere else?).
I guess that another way of doing it should be by assigning a value to all less-than-average data and another value to all higher-than-average data and then group by this value. But I just started and can not figure out how to do that either.

SELECT sum(CASE WHEN ct <= x.avg_ct THEN ct ELSE 0 END) AS sum_ct_low
,sum(CASE WHEN ct > x.avg_ct THEN ct ELSE 0 END) AS sum_ct_hi
FROM vegetation v
,(SELECT avg(ct) AS avg_ct FROM vegetation) x
The average is a single value, you can just CROSS JOIN the subquery to the base table (comma separated list of tables means cross-joining).
Then filter with a simple CASE statement.
I use ct instead of count, since that's a reserved word in SQL.

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

TSQL "where ... group by ..." issue that needs solution like "having ..."

I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.
What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.
I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE
The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.
It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.

How do I count the rows with a where clause in SQL Server?

I am pretty much stuck with a problem I am facing with SQL Server. I want to show in a query the amount of times that specific value occurs. This is pretty easy to do, but I want to take it a step further and I think the best way to explain on what I am trying to achieve is to explain it using images.
I have two tables:
Plant and
Chest
As you can see with the chest the column 'hoeveelheid' tells how full the chest is, 'vol' == 1 and 3/4 is == 0,75. In the plant table there is a column 'Hoeveelheidperkist' which tells how much plants there can be in 1 chest.
select DISTINCT kist.Plantnaam, kist.Plantmaat, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat
This query counts all the chests, but it does not seperate the count of 'Vol' chests and '3/4' chests. It only does This. What I want to achieve is this. But I have no idea how. Any help would be much appreciated.
If you use group by you don't need distinct
and if you want the seprated count for hoeveelheid you ust add to the group by clause
select DISTINCT kist.Plantnaam, kist.Plantmaat, kist.hoeveelheid, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat, hoeveelheid
or if you want all the 3 count ond the samw rowx you could use a condition aggreagtion eg:
select DISTINCT kist.Plantnaam, kist.Plantmaat
, sum(case when kist.hoeveelheid ='Vol' then 1 else 0 end) vol
, sum(case when kist.hoeveelheid ='3/3' then 1 else 0 end) 3_4
, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat
When you want to filter the data on the counts you have to use having clause. When ever you are using aggregate functions(sum, count, min, max) and you want to filter them on aggregation basis, use having clause
select DISTINCT kist.Plantnaam, kist.Plantmaat, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat having count(*) = 1 -- or provide necessary conditions

SQL null spaces in calculated columns

I have created a calculated column but it is giving me a row with null value. If I add another calculated field, it adds 2 null rows, and so on.
My objective is to get a single row with a single value. No nulls.
The code:
SELECT
CLIENT_CODE,
( CASE WHEN CLITBP.TBPCODIGO=101 THEN COALESCE( CLITBP.TBPDESC2,0) ELSE NULL END) TAB101
FROM
CLIENT
GROUP BY 1,2
the wrong output
the intended output
If you want one row per client code, then you should have only one key in the GROUP BY. Perhaps this is what you want:
SELECT CLIENT_CODE,
MAX(CASE WHEN CLITBP.TBPCODIGO = 101 THEN COALESCE(CLITBP.TBPDESC2, 0) END) as TAB101
FROM CLIENT
GROUP BY CLIENT_CODE;

SQLite3 database conditional summing

I am looking to order a list of keys based on the number of orders placed from a database containing order requests. Basically, on table, call it orders(o_partkey, o_returnflag) I am trying to get the total number of returns for each order. I have tried many variations of the following snippet with the goal schema returnlist(partkey, numreturns):
select O.o_partkey as partkey,
count(case when O.o_returnflag = 'R' then 1 else 0 end) as numreturns
from orders O
orderby quantity_returned desc;
I am very new to SQLite and am just jumping into the basics. This is an adjustment of a homework question (the actual question is more complex) but I have simplified down the issue I am having.
Consider using a derived table subquery with SUM() as the aggregate function:
SELECT dT.partkey, dT.numreturns
FROM
(SELECT O.o_partkey as partkey,
SUM(CASE WHEN O.o_returnflag = 'R' THEN 1 ELSE 0 END) as numreturns
FROM [ORDER] O
GROUP BY O.o_partkey) AS dT
ORDER BY dT.numreturns DESC;
Be sure to bracket name of table as [ORDER] is an SQLite key word.
Your problem is that COUNT counts rows, so it counts both 0 and 1 values.
You are not interested in any other rows, so you can just filter out the returns with WHERE:
SELECT o_partkey AS partkey,
COUNT(*) AS numreturns
FROM orders
WHERE o_returnflag = 'R'
ORDER BY 2 DESC;