Group by T-SQL vs. MySQL (single column) - sql

I am new to SQL Server/used to MySQL databases and I am running into an issue that I never ran into with MySQL. I am looking to pull all current policy numbers, the name of the company/person it belongs to, their total premium, and whether or not they have what we call 'equipment breakdown' coverage. This is all pretty simple, the issue I am having is with grouping. I want to group by one column only, aka one distinct policy number, the company name, a sum of the premium (it is possible to have several premium amounts both negative and positive so I want to sum these to see what the true total is), and a simple Yes or No column for equipment breakdown.
Here is the query I am running:
SELECT pol_num as policy_number,
insd_name as insureds_name,
SUM(amt) as 'total_premium',
(SELECT
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END) as 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
I get the an error saying that I need to group by insd_name and cvg_desc as well, but I DON'T want that as it gives me duplicate policy numbers.
Here is an example of what I get when I group everything it tells me to:
policy_number insureds_name total_premium equipment_breakdown
001 company a 0.00 n
001 company a 25,000.00 n
001 company a -10,000.00 n
002 company b 100.00 y
002 company b 10,000.00 y
Here is an example of the results I want:
policy_number insureds_name total_premium equipment_breakdown
001 company a 15,000.00 n
002 company b 10,100.00 y
Basically, I just want to group by the policy number and sum the premium amounts. Above is how I would achieve this in MySQL, how can I achieve the results I am looking for in SQL Server?
Thanks

MySQL doesn't require all non-aggregate fields to be included in the GROUP BY clause, even though not doing so can yield unexpected results. SQL Server requires this, so you are forced to decide how you want to handle multiple insd_name values for a given pol_num, you can use MAX(), MIN(), or if the values are always the same, just add them to your GROUP BY:
SELECT pol_num AS policy_number
, MAX(insd_name) AS insureds_name
, SUM(amt) AS 'total_premium'
, MAX(CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END) AS 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
Or:
SELECT pol_num AS policy_number
, insd_name AS insureds_name
, SUM(amt) AS 'total_premium'
, CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END AS 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
, insd_name
, CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END
ORDER BY policy_number

It looks like the cvg_desc column is probably what's messing you up. You want to group by the resulting Y or N from your CASE statement, but SQL server is grouping by the original cvg_desc column. You could approach this in a way that resolves the CASE statement before it groups. For example, wrap the main query in a common table expression (CTE), which is sort of like an inline-view. Then with the equipment breakdown column reduced to just a Y or an N, a subsequent query from the CTE with your SUM aggregation on premium should give you the results you desire:
WITH Policies(policy_number, insureds_name, premium, equipment_breakdown) AS
(
SELECT
pol_num
,insd_name
,amt
,(CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y' ELSE 'N' END)
AS 'equipment_breakdown'
FROM
bapu.dbo.fact_prem
WHERE
'2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND
amt_type = 'Premium'
AND
amt_desc = 'Written Premium'
)
SELECT
policy_number
,insureds_name
,SUM(premium) AS total_premium
,equipment_breakdown
FROM
Policies
GROUP BY
policy_number
,insureds_name
,equipment_breakdown

You'll need an aggregate function on the fields you don't want to group by. A simple one to use is MAX which works with most types;
SELECT pol_num as policy_number,
MAX(insd_name) as insureds_name,
SUM(amt) as 'total_premium',
(SELECT
CASE
WHEN MAX(cvg_desc) = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END) as 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
The reason SQL Server wants this is that it likes to give deterministic answers, for example
column_a | column_b
1 | 1
1 | 2
...grouped by only column_a would in MySQL give either 1 or 2 as an answer for column_b, while SQL Server wants you to tell it explicitly which one to use.

I would probably write this as below -- did not test
SELECT pol_num as policy_number,
insd_name as insureds_name,
SUM(amt) as total_premium
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END as equipment_breakdown
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY
pol_num, policy_number,
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END
ORDER BY policy_number

Related

Is it possible to only bring in values into a column if there is no data in another column?

I work for a transportation company. We have two types of drivers, our own company drivers and external Carriers. Carriers are a business to business transaction and what we pay them for each move is easily entered into our system as a "Payable." The payable is easily extracted.
For our company drivers, we have to go through a process called "Settlements" which is an extra module for our system. Unfortunately, this extra module does not have a relationship with all other tables, it is segregated and hard to capture data. When we run "Settlements" each payable to the company driver is separated by a rate code.
What I am trying to do is the following: We have a ratecode named "DDR" , short for Driver Deadrun. When we have a driver deadrun, we pay a fixed price of $40 to our company drivers. However, when there is a driver deadrun on the "Carrier" side, then that amount is not fixed. The amount will be whatever the carrier bills us, we enter this invoiced amount as a payable to the carrier.
I have embedded the cost of a driver deadrun (DDR) in the query using a case statement. However, if there is a payable it is bringing in the embedded $40 plus the payable. This is providing me with inaccurate results.
Here is a sample of my query and a screenshot of query results.
SELECT ds_id AS TMP,
ds_ship_date AS ShipDate,
ds_ref1_text AS ContainerNumber,
(CASE ds_status WHEN 'A' THEN 'TEMPLATE'
WHEN 'C' THEN 'CANCELLED'
WHEN 'D' THEN 'DECLINED'
WHEN 'E' THEN 'QUOTED'
WHEN 'F' THEN 'OFFERED'
WHEN 'H' THEN 'PENDING'
WHEN 'K' THEN 'OPEN'
WHEN 'N' THEN 'AUTHORIZED'
WHEN 'Q' THEN 'AUDIT REQUIRED'
WHEN 'T' THEN 'AUDITED'
WHEN 'W' THEN 'BILLED'
END) AS 'TMPStatus',
b.co_name as "BillTo",
o.co_name AS Origin,
o.co_city AS OriginCity,
o.co_state AS OriginState,
de_arrdate AS DeliveryDate,
de_arrtime AS ArrivalTime,
de_deptime AS DepartureTime,
dba.disp_items.di_qty AS QTY,
dba.disp_items.ratecodename AS RateCode,
dba.disp_items.di_our_rate AS OURRATE,
dba.disp_items.di_our_itemamt AS ITEMAMT,
dba.disp_items.amounttype AS AMTTYPE,
dba.disp_items.di_pay_itemamt AS PAYITEMAMT,
CASE dba.disp_items.ratecodename
WHEN 'SHUNTING' then '10.00'
WHEN 'LIFT OFF' THEN '10.00'
WHEN 'LIFT ON' THEN '10.00'
WHEN 'LIFT' THEN '10.00'
WHEN 'DDR' THEN '40.00'
WHEN 'DEADRUN' THEN '40.00'
ELSE (
select SUM (amount)
from dba.amountowed
where string ( ds_id ) = amountowed.shipment
and dba.amountowed.startdate between '20220101'
and today()
) end case AS Total_Payable,
(CASE ds_ship_type
WHEN '2201' THEN 'TRAS&D.V.'
WHEN '2202' THEN 'TRAS&D.V.'
WHEN '2203' THEN 'SOLUTIONS'
WHEN '2204' THEN 'OLD BROKERAGE'
WHEN '2205' THEN 'LIFTING'
WHEN '2206' THEN 'WAREHOUSE'
END) AS Division
FROM dba.disp_ship
JOIN dba.disp_events ON de_shipment_id = ds_id
JOIN dba.disp_items ON dba.disp_items.di_shipment_id = dba.disp_ship.ds_id
JOIN dba.companies o ON o.co_id = ds_origin_id
JOIN dba.companies b on b.co_id = ds_billto_id
WHERE de_site = ds_findest_id
AND de_event_type IN ('D','R','N')
and DeliveryDate between '20220101' and today()
GROUP BY TMP, SHIPDATE, CONTAINERNUMBER, TMPSTATUS,
BILLTO, ORIGIN, ORIGINCITY, ORIGINSTATE, DELIVERYDATE,
ARRIVALTIME, DEPARTURETIME, QTY, RATECODE, OURRATE, ITEMAMT,
AMTTYPE, PAYITEMAMT, TOTAL_PAYABLE, DIVISION
ORDER BY TOTAL_PAYABLE DESC
What I am basically looking for is if in the column "PAYITEMAMT" there is a value of "0.00' or null then in the column "Total_Payable" the value would be 40.00.
I do not want values in in both columns as this would provide inaccurate data.
Yes, you can just use a CASE statement:
CASE
WHEN dba.disp_items.di_pay_itemamt IS NULL THEN 40
WHEN dba.disp_items.di_pay_itemamt = 0 THEN 40
ELSE dba.disp_items.di_pay_itemamt
END AS PAYITEMAMT
If it can never be 0 (i.e. is populated or will be NULL) then you could just do
COALESCE(dba.disp_items.di_pay_itemamt,40) AS PAYITEMAMT
I don't know if I have understood your question right but, why not just put a case ?
case when isnull(PAYITEMAMT, 0) = 0 then 40
else
case dba.disp_items.ratecodename WHEN 'SHUNTING' then '10.00'
WHEN 'LIFT OFF' THEN '10.00'
WHEN 'LIFT ON' THEN '10.00'
WHEN 'LIFT' THEN '10.00'
WHEN 'DDR' THEN '40.00'
WHEN 'DEADRUN' THEN '40.00'
else
(select SUM (amount) from dba.amountowed where string ( ds_id ) = amountowed.shipment
and dba.amountowed.startdate between '20220101' and today())
end
end AS Total_Payable

SQL query to find Date, TranscationAmount_US,TransactionAmount_UK

We have 2 tables:
Transaction (AccountId,Date,TransactionAmount)
Master(Aid,Country)
We need to find Date, TotalTAmt_US, Total_TAmt_UK
My solution:
Select
Date,
CASE WHEN Country in ('US') THEN SUM(TransAmt) ELSE '0'END AS TotalTAmt_US,
CASE WHEN Country in ('UK') THEN SUM(TransAmt) ELSE '0'END AS TotalTAmt_UK
FROM
(
SELECT
T.Date As Date,
M.Country As Country,
SUM(T.TransAmt) As TransAmt,
FROM
Transaction T JOIN Master M On T.Aid = M.Aid
WHERE Country in ('US','UK')
group by Date,Country
) As T1
group by Date;
Is this right?
Can we use Country in CASE WHEN without pulling it as I do not want to pull it and then group by it.
Advice please.
Thanks.
You have to declare in GROUP BY section all columns that you use in select statement.
So just put your cases to the grouping.
Select
Date,
CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END AS TotalTAmt_US,
CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END AS TotalTAmt_UK
FROM
(
SELECT
T.Date As Date,
M.Country As Country,
SUM(T.TransAmt) As TransAmt,
FROM
Transaction T JOIN Master M On T.Aid = M.Aid
WHERE Country in ('US','UK')
GROUP BY Date,Country
) As T1
GROUP BY Date, CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END AS
TotalTAmt_US, CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END
AS TotalTAmt_UK
Additionaly, remove the SUM() function in your case conditions. If you put them to the GROUP BY you can get error:
ORA-00934: Group function is not allowed here.
And, in the end, remove the ticks in zeros in ELSE conditions. You can get another error about incosistens data types.
I believe you wanted to do a conditional aggregation getting the sum for all US based transaction for a day and the sum of all the UK based transactions for the same day. Then you had to move the CASE into the sum(), adding the amount only if the country is the one you look for, otherwise zero.
Also your subquery isn't necessary.
SELECT t.date,
sum(CASE m.country
WHEN 'US' THEN
t.transamt
ELSE
0
END) totaltamt_us,
sum(CASE m.country
WHEN 'UK' THEN
t.transamt
ELSE
0
END) totaltamt_uk
FROM transaction t
INNER JOIN master m
ON m.aid = t.accountid
WHERE m.county IN ('US',
'UK')
GROUP BY t.date;
If you don't insist on having the different sums in different columns, but would also accept rows, it can be as simple as:
SELECT m.country,
t.date,
sum(t.transamt) totaltamt
FROM transaction t
INNER JOIN master m
ON m.aid = t.accountid
WHERE m.county IN ('US',
'UK')
GROUP BY m.country,
t.date;
I think you want conditional aggregation:
SELECT Date,
SUM(CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END) AS TotalTAmt_US,
SUM(CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END) AS TotalTAmt_UK
FROM Transaction T JOIN
Master M
On T.Aid = M.Aid
WHERE Country in ('US', 'UK')
GROUP BY Date;
Notes:
You do not need two levels of aggregation.
Numbers should not be enclosed in single quotes, so 0, not '0'.

SQL - how to SUM two Count Statements to create a Total Column

i have the following query and for the life of me have forgotton how to SUM both columns to create a 'Total' column
SELECT username as 'username',
count (case when casestype <> 'car'
OR casestype <> 'van'
OR casestype <> 'bike'
OR casestype <> 'NONE'
THEN 1 ELSE NULL END) as 'non-auto',
count (case when casestype = 'car'
OR casestype= 'van'
OR casestype = 'bike'
THEN 1 ELSE NULL END) as 'auto'
FROM Case WITH (NOLOCK)
WHERE CaseDate BETWEEN '01 may 2016' AND '31 may 2016')
GROUP BY username
I want to have a total column of non-auto + auto
SELECT username,
sum(case when casestype not in ('car', 'van', 'bike', 'NONE')
then 1 else 0
end) as non_auto,
sum(case when casestype in ('car', 'ban', 'bike') then 1 else 0
end) as auto,
sum(case when casestype <> 'NONE' then 1 else 0 end) as total
FROM [Case]
WHERE CaseDate BETWEEN '2016-05-01' and '2016-05-31'
GROUP BY username;
Additional advice:
in and not in are much more readable than a series of or statements.
Use ISO standard date formats. YYYY-MM-DD is my preference, although SQL Server has a slight preference for YYYYMMDD.
The total needs to be calculated separately. If you want to use column aliases you need subqueries or CTEs.
Don't use single quotes for column aliases. Only use them for string and date constants.

Merging data SQL Query

I have a query request where I have to show one customer activity for each web-site but it has to be only one row each, instead of one customer showing multiple times for each activity.
Following is the query I tried but brings lot more rows. please help me as how I can avoid duplicates and show only one customer by each row for each activity.
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null
Sounds like you need to group by customer_id and perform aggregations for the other columns you are selecting. For example:
sum(case when s.subscription_type = '5' then 1 else 0 end) as pb_subs_count
You could try one of two things:
Use a GROUP BY statement to combine all records with the same id, e.g.,
...
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id
Use the DISTINCT command in your SELECT, e.g.,
SELECT DISTINCT i.customer_id, i.SEGMENT, ...
you could use a aggregation (SUM) on customer_id, but what do you expect to happen on the other fields? for example, if you have SUBSCRIPTION_TYPE 5 and 13 for the same customer (2 rows), which value do you want?
Perhaps you are looking for something like this:
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
MAX(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
MAX(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
MAX(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id, i.SEGMENT
I can't be sure, though, without knowing more about the tables involved.

Using case to create multiple columns of data

I am trying to create a query in MS SQL 2005 that will return data for 4 date ranges as separate columns in my results set.
Right now my query looks like the query below. It works fine, however I want to add the additional columns for each date range since it currently supports one date range when.
This would then return a total1,total2, total3 and total 4 column instead of a single total column like the current query below. Each total would represent the 4 date ranges:
I am fairly sure this can be accomplished using case statements, but am not 100%.
Any help would be certainly appreciated.
SELECT
vendor,location,
sum(ExtPrice) as total
FROM [database].[dbo].[saledata]
where processdate between '2010-11-03' and '2010-12-14'
and location <>''
and vendor <> ''
group by vendor,location with rollup
I usually do it like this:
SELECT
vendor,location,
sum(CASE WHEN processdate BETWEEN #date1start AND #date1end THEN xtPrice ELSE 0 END) as total,
sum(CASE WHEN processdate BETWEEN #date2start AND #date2end THEN xtPrice ELSE 0 END) as total2,
sum(CASE WHEN processdate BETWEEN #date3start AND #date3end THEN xtPrice ELSE 0 END) as total3,
sum(CASE WHEN processdate BETWEEN #date4start AND #date4end THEN xtPrice ELSE 0 END) as total4
FROM [database].[dbo].[saledata]
and location <>''
and vendor <> ''
group by vendor,location with rollup
And you can change the WHEN portion to make your desired date ranges.
Use Subqueries, ie
select sd.vendor, sd.location, sd1.total, sd2.total, sd3.total, sd4.total
from (select distinct vendor, location from saledata) AS sd
LEFT JOIN (
SELECT vendor,location, sum(ExtPrice) as total
FROM [database].[dbo].[saledata]
where processdate between 'startdate1' and 'enddate1'
and location <>''
and vendor <> ''
group by vendor,location with rollup) sd1 on sd1.vendor=sd.vendor and sd1.location=sd.location
LEFT JOIN (
SELECT vendor,location, sum(ExtPrice) as total
FROM [database].[dbo].[saledata]
where processdate between 'startdate2' and 'enddate2'
and location <>''
and vendor <> ''
group by vendor,location with rollup) sd2 on sd2.vendor=sd.vendor and sd2.location=sd.location
LEFT JOIN (
SELECT vendor,location, sum(ExtPrice) as total
FROM [database].[dbo].[saledata]
where processdate between 'startdate3' and 'enddate3'
and location <>''
and vendor <> ''
group by vendor,location with rollup) sd3 on sd3.vendor=sd.vendor and sd3.location=sd.location
LEFT JOIN (
SELECT vendor,location, sum(ExtPrice) as total
FROM [database].[dbo].[saledata]
where processdate between 'startdate4' and 'enddate4'
and location <>''
and vendor <> ''
group by vendor,location with rollup) sd4 on sd4.vendor=sd.vendor and sd4.location=sd.location