TSQL "where ... group by ..." issue that needs solution like "having ..." - sql

I have 3 sub-tables of different formats joined together with unions if this affects anything into full-table. There I have columns "location", "amount" and "time". Then to keep generality for my later needs I union full-table with location-table that has all possible "location" values and other fields are null into master-table.
I query master-table,
select location, sum(amount)
from master-table
where (time...)
group by location
However some "location" values are dropped because sum(amount) is 0 for those "location"s but I really want to have full list of those "location"s for my further steps.
Alternative would be to use HAVING clause but from what I understand HAVING is impossible here because i filter on "time" while grouping on "location" and I would need to add "time" in grouping which destroys the purpose. Keep in mind that the goal here is to get sum(amount) in each "location"
select location, sum(amount)
from master-table
group by location, time
having (time...)
To view the output:
with the first code I get
loc1, 5
loc3, 10
loc6, 1
but I want to get
loc1, 5
loc2, 0
loc3, 10
loc4, 0
loc5, 0
loc6, 1
Any suggestions on what can be done with this structure of master-table? Alternative solution to which I have no idea how to code would be to add numbers from the first query result to location-table (as a query, not actual table) with the final result query that I've posted above.

What you want will require a complete list of locations, then a left-outer join using that table and your calculated values, and IsNull (for tsql) to ensure you see the 0s you expect. You can do this with some CTEs, which I find valuable for clarity during development, or you can work on "putting it all together" in a more traditional SELECT...FROM... statement. The CTE approach might look like this:
WITH loc AS (
SELECT DISTINCT LocationID
FROM location_table
), summary_data as (
SELECT LocationID, SUM(amount) AS location_sum
FROM master-table
GROUP BY LocationID
)
SELECT loc.LocationID, IsNull(location_sum,0) AS location_sum
FROM loc
LEFT OUTER JOIN summary_data ON loc.LocationID = summary_data.LocationID
See if that gets you a step or two closer to the results you're looking for.

I can think of 2 options:
You could move the WHERE to a CASE WHEN construction:
-- Option 1
select
location,
sum(CASE WHEN time <'16:00' THEN amount ELSE 0 END)
from master_table
group by location
Or you could JOIN with the possible values of location (which is my first ever RIGHT JOIN in a very long time 😉):
-- Option 2
select
x.location,
sum(CASE WHEN m.time <'16:00' THEN m.amount ELSE 0 END)
from master_table m
right join (select distinct location from master_table) x ON x.location = m.location
group by x.location
see: DBFIDDLE

The version using T-SQL without CTEs would be:
SELECT l.location ,
ISNULL(m.location_sum, 0) as location_sum
FROM master-table l
LEFT JOIN (
SELECT location,
SUM(amount) as location_sum
FROM master-table
WHERE (time ... )
GROUP BY location
) m ON l.location = m.location
This assumes that you still have your initial UNION in place that ensures that master-table has all possible locations included.

It is the where clause that excludes some locations. To ensure you retain every location you could introduce "conditional aggregation" instead of using the where clause: e.g.
select location, sum(case when (time...) then amount else 0 end) as location_sum
from master-table
group by location
i.e. instead of excluding some rows from the result, place the conditions inside the sum function that equate to the conditions you would have used in the where clause. If those conditions are true, then it will aggregate the amount, but if the conditions evaluate to false then 0 is summed, but the location is retained in the result.

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

Sql need to sum column several times and break result in one row horizontally

SQL needed to sum column several times and break result in one row by store horizontally, but I'm getting null rows mixed with totals need to take them out
Query:
SELECT distinct IdPurch,
Datediff(week, (dateadd(day,-1, '2021-01-03')), '2021-01-09') AS wks,
(Select sum(WgtSold) where store = '000001' ) as lam,
(Select sum(WgtSold) where store = '000002') as Riv
FROM [LAHAMARK].[dbo].[itemSale]
LEFT OUTER Join [BRdata].[dbo].[Item] on [LAHAMARK].[dbo].[itemSale].UPC15 = [BRdata].[dbo].[Item].[UPC]
LEFT OUTER join [BRdata].[dbo].[ItemMovement] on [LAHAMARK].[dbo].[itemSale].UPC15 = [BRdata].[dbo].[ItemMovement].[UPC]
where Date between '2021-01-03' and '2021-01-03' and idPurch = '1018'
group by IdPurch,Store
query Result :
You seem to want conditional aggregation:
select IdPurch,
sum(case when store = '000001' then WgtSold end) as lam,
sum(case when store = '000002' then WgtSold end) as Riv
from [LAHAMARK].[dbo].[itemSale] its join
[BRdata].[dbo].[Item] i
on its.UPC15 = i.[UPC] join
[BRdata].[dbo].[ItemMovement] im
on its.UPC15 = im.[UPC]
where Date between '2021-01-03' and '2021-01-03' and
idPurch = 1018
group by IdPurch ;
Notes:
Table aliases make the query much easier to write and to read.
The week calculation is superfluous for only one week and has nothing really to do with the gist of the question, so I removed it.
There is no need to aggregate by store. You don't want a separate row for each store.
I removed the single quotes around 1018. It looks like a number and ids typically are numbers. If it is a string, definitely include the single quotes, though.

Clean up 'duplicate' data while preserving most recent entry

I want to display each crew member, basic info, and the most recent start date from their contracts. With my basic query, it returns a row for each contract, duplicating the basic info with a distinct start and end date.
I only need one row per person, with the latest start date (or null if they have never yet had a start date).
I have limited understanding of group by and partition functions. Queries I have reverse engineered for similar date use partition and create temp tables where they select from. Ultimately I could reuse that but it seems more convoluted than what we need.
select
Case when P01.EMPLOYMENTENDDATE < getdate() then 'Y'
else ''
end as "Deactivate",
concat(p01.FIRSTNAME,' ',p01.MIDDLENAME) as "First and Middle",
p01.LASTNAME,
p01.PIN,
(select top 1 TELENO FROM PW001P0T WHERE PIN = P01.PIN and TELETYPE = 6 ORDER BY TELEPRIORITY) as "EmailAddress",
org.NAME AS Vessel,
case
WHEN c02.CODECATEGORY= '20' then 'MARINE'
WHEN c02.CODECATEGORY= '10' then 'MARINE'
ELSE 'HOTEL' end as "Department",
c02.name as RankName,
c02.Alternative RankCode,
convert(varchar, ACT.DATEFROM,101) EmbarkDate,
convert(varchar,(case when ACT.DATEFROM is null then p03.TODATEESTIMATED else ACT.DATEFROM end),101) DebarkDate
FROM PW001P01 p01
JOIN PW001P03 p03
ON p03.PIN = p01.PIN
LEFT JOIN PW001C02 c02
ON c02.CODE = p03.RANK
/*LEFT JOIN PW001C02 CCIRankTbl
ON CCIRankTbl.CODE = p01.RANK*/
LEFT JOIN PWORG org
ON org.NUMORGID = dbo.ad_scanorgtree(p03.NUMORGID, 3)
LEFT JOIN PWORGVESACT ACT
ON ACT.numorgid=dbo.ad_scanorgtree(p03.numorgid,3)
where P01.EMPLOYMENTENDDATE > getdate()-10 or P01.EMPLOYMENTENDDATE is null
I only need to show one row per column. The first 5 columns will be the same always. The last columns depend on contract, and we just need data from the most recent one.
<table><tbody><tr><th>Deactivate</th><th>First and Middle</th><th>Lastname</th><th>PIN</th><th>Email</th><th>Vessel</th><th>Department</th><th>Rank</th><th>RankCode</th><th>Embark</th><th>Debark</th></tr><tr><td> </td><td>Martin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship1</td><td>Marine</td><td>ViceCaptain</td><td>VICE</td><td>9/1/2008</td><td>9/20/2008</td></tr><tr><td> </td><td>Matin</td><td>Smith</td><td>123</td><td>msmith#fake.com</td><td>Ship2</td><td>Marine</td><td>Captain</td><td>CAP</td><td>12/1/2008</td><td>12/20/2008</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship1</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>5/1/2009</td><td>8/1/2009</td></tr><tr><td> </td><td>Steve Mark</td><td>Dude</td><td>98765</td><td>sdude#fake.com</td><td>Ship3</td><td>Hotel</td><td>Chef</td><td>CHEF</td><td>10/1/2010</td><td>12/20/2010</td></tr></tbody></table>
Change your query to a SELECT DISTINCT on the main query and use a sub-select for DebarkDate column:
(SELECT TOP 1 A.DATEFROM FROM PWORGVESACT A WHERE A.numorgid = ACT.numorgid ORDER BY A.DATEFROM DESC) AS DebarkDate
You can do whatever conversions on the date you need to from the result of that sub-query.

Count query giving wrong column name error

select COUNT(analysed) from Results where analysed="True"
I want to display count of rows in which analysed value is true.
However, my query gives the error: "The multi-part identifier "Results.runId" could not be bound.".
This is the actual query:
select ((SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True')/failCount) as PercentAnalysed
from Runs
where Runs.runId=Analysed.runId
My table schema is:
The value I want for a particular runId is: (the number of entries where analysed=true)/failCount
EDIT : How to merge these two queries?
i) select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner from Runs,Product where Runs.prodId=Product.prodId
ii) select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I tried this :
select runId,Runs.prodId,prodDate,prodName,buildNumber,totalCount as TotalTestCases,(passCount*100)/(passCount+failCount) as PassPercent,
passCount,failCount,runOwner,counts.runId,(cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs,Product
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
where Runs.prodId=Product.prodId
but it gives error.
Your problems are arising from improper joining of tables. You need information from both Runs and Results, but they aren't combined properly in your query. You have the right idea with a nested subquery, but it's in the wrong spot. You're also referencing the Analysed table in the outer where clause, but it hasn't been included in the from clause.
Try this instead:
select (cast(counts.Count as decimal(10,4)) / cast(failCount as decimal(10,4))) as PercentAnalysed
from Runs
inner join
(
SELECT COUNT(*) AS 'Count', runId
FROM Results
WHERE Analysed = 'True'
GROUP BY runId
) counts
on counts.runId = Runs.runId
I've set this up as an inner join to eliminate any runs which don't have analysed results; you can change it to a left join if you want those rows, but will need to add code to handle the null case. I've also added casts to the two numbers, because otherwise the query will perform integer division and truncate any fractional amounts.
I'd try the following query:
SELECT COUNT(*) AS 'Count'
FROM Results
WHERE Analysed = 'True'
This will count all of your rows where Analysed is 'True'. This should work if the datatype of your Analysed column is either BIT (Boolean) or STRING(VARCHAR, NVARCHAR).
Use CASE in Count
SELECT COUNT(CASE WHEN analysed='True' THEN analysed END) [COUNT]
FROM Results
Click here to view result
select COUNT(*) from Results where analysed="True"

SQL percentage of the total

Hi how can I get the percentage of each record over the total?
Lets imagine I have one table with the following
ID code Points
1 101 2
2 201 3
3 233 4
4 123 1
The percentage for ID 1 is 20% for 2 is 30% and so one
how do I get it?
There's a couple approaches to getting that result.
You essentially need the "total" points from the whole table (or whatever subset), and get that repeated on each row. Getting the percentage is a simple matter of arithmetic, the expression you use for that depends on the datatypes, and how you want that formatted.
Here's one way (out a couple possible ways) to get the specified result:
SELECT t.id
, t.code
, t.points
-- , s.tot_points
, ROUND(t.points * 100.0 / s.tot_points,1) AS percentage
FROM onetable t
CROSS
JOIN ( SELECT SUM(r.points) AS tot_points
FROM onetable r
) s
ORDER BY t.id
The view query s is run first, that gives a single row. The join operation matches that row with every row from t. And that gives us the values we need to calculate a percentage.
Another way to get this result, without using a join operation, is to use a subquery in the SELECT list to return the total.
Note that the join approach can be extended to get percentage for each "group" of records.
id type points %type
-- ---- ------ -----
1 sold 11 22%
2 sold 4 8%
3 sold 25 50%
4 bought 1 50%
5 bought 1 50%
6 sold 10 20%
To get that result, we can use the same query, but a a view query for s that returns total GROUP BY r.type, and then the join operation isn't a CROSS join, but a match based on type:
SELECT t.id
, t.type
, t.points
-- , s.tot_points_by_type
, ROUND(t.points * 100.0 / s.tot_points_by_type,1) AS `%type`
FROM onetable t
JOIN ( SELECT r.type
, SUM(r.points) AS tot_points
FROM onetable r
GROUP BY r.type
) s
ON s.type = t.type
ORDER BY t.id
To do that same result with the subquery, that's going to be a correlated subquery, and that subquery is likely to get executed for every row in t.
This is why it's more natural for me to use a join operation, rather than a subquery in the SELECT list... even when a subquery works the same. (The patterns we use for more complex queries, like assigning aliases to tables, qualifying all column references, and formatting the SQL... those patterns just work their way back into simple queries. The rationale for these patterns is kind of lost in simple queries.)
try like this
select id,code,points,(points * 100)/(select sum(points) from tabel1) from table1
To add to a good list of responses, this should be fast performance-wise, and rather easy to understand:
DECLARE #T TABLE (ID INT, code VARCHAR(256), Points INT)
INSERT INTO #T VALUES (1,'101',2), (2,'201',3),(3,'233',4), (4,'123',1)
;WITH CTE AS
(SELECT * FROM #T)
SELECT C.*, CAST(ROUND((C.Points/B.TOTAL)*100, 2) AS DEC(32,2)) [%_of_TOTAL]
FROM CTE C
JOIN (SELECT CAST(SUM(Points) AS DEC(32,2)) TOTAL FROM CTE) B ON 1=1
Just replace the table variable with your actual table inside the CTE.