SQL CASE WHEN to narrow results, change field - sql

I've read dozens of answers regarding CASE and am not sure that is what i need to be using here, it seems like it should work but its not:
Data:
OrderNum OrderLine PartNum
200011 1 ABC-1
200011 2 DEF-1
200012 1 XYZ-1
What I would like to return:
OrderNum Item#
200011 MIXED
200012 XYZ-1
What I am returning instead:
OrderNum Item#
200011 ABC-1
200011 MIXED
200012 XYZ-1
My query:
SELECT OrderHed.OrderNum,
(CASE WHEN ShipDtl.OrderLine > '1' then 'MIXED' else ShipDtl.PartNum end) as [Item#]
FROM dbo.OrderHed, dbo.ShipDtl
WHERE ShipDtl.Company = OrderHed.Company
AND ShipDtl.OrderNum = OrderHed.OrderNum
GROUP BY OrderHed.OrderNum, ShipDtl.OrderLine, ShipDtl.Part

Try with grouping like
SELECT OrderHed.OrderNum,
(CASE WHEN SUM(ShipDtl.OrderLine) > 1 then 'MIXED' else MAX(ShipDtl.PartNum) end) as [Item#]
FROM dbo.OrderHed, dbo.ShipDtl
WHERE ShipDtl.Company = OrderHed.Company
AND ShipDtl.OrderNum = OrderHed.OrderNum
GROUP BY OrderHed.OrderNum
SQLFiddle Demo : http://sqlfiddle.com/#!3/209d8/1

You didn't write which database engine you use, but if it is SQL 2005 and above, I'd think using the window function for COUNT will make things easier as you then do not need to group.
SELECT DISTINCT
OrderHed.OrderNum ,
CASE
WHEN COUNT(ShipDtl.OrderLine) OVER (PARTITION BY ShipDtl.OrderNum) > 1 THEN 'MIXED'
ELSE PartNum
END AS [Item#]
FROM dbo.OrderHed ,
dbo.ShipDtl
WHERE ShipDtl.Company = OrderHed.Company
AND ShipDtl.OrderNum = OrderHed.OrderNum
You'll need to DISTINCT though, because it'll select a row per line, but each order with multiple lines will be MIXED, so you can easily distinct.
This will simply select the OrderNum and if multiple orderlines exists per ordernum Count(xxx) OVER (partition by yyy) it'll select 'MIXED', otherwise the partnum.
And then distinct the result.

There is no compulsion to use CASE (is there?).
In your example, CASE is being used to perform logical OR. There are other ways of performing logical OR in SQL e.g. UNION:
WITH T
AS
(
SELECT OrderHed.OrderNum, ShipDtl.PartNum, ShipDtl.OrderLine
FROM dbo.OrderHed, dbo.ShipDtl
WHERE ShipDtl.Company = OrderHed.Company
AND ShipDtl.OrderNum = OrderHed.OrderNum
)
SELECT OrderNum, PartNum
FROM T
WHERE OrderLine = 1
AND OrderNum NOT IN (
SELECT OrderNum
FROM T T2
WHERE OrderLine > 1
)
UNION
SELECT OrderNum, 'MIXED' AS PartNum
FROM T
WHERE OrderLine > 1;

Related

How to Count Distinct on Case When?

I have been building up a query today and I have got stuck. I have two unique Ids that identify if and order is Internal or Web. I have been able to split this out so it does the count of how many times they appear but unfortunately it is not providing me with the intended result. From research I have tried creating a Count Distinct Case When statement to provide me with the results.
Please see below where I have broken down what it is doing and how I expect it to be.
Original data looks like:
Company Name Order Date Order Items Orders Value REF
-------------------------------------------------------------------------------
CompanyA 03/01/2019 Item1 Order1 170 INT1
CompanyA 03/01/2019 Item2 Order1 0 INT1
CompanyA 03/01/2019 Item3 Order2 160 WEB2
CompanyA 03/01/2019 Item4 Order2 0 WEB2
How I expect it to be:
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 1 1
What currently comes out
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 2 2
As you can see from my current result it is counting every line even though it is the same reference. Now it is not a hard and fast rule that it is always doubled up. This is why I think I need a Count Distinct Case When. Below is my query I am currently using. This pull from a Progress V10 ODBC that I connect through Excel. Unfortunately I do not have SSMS and Microsoft Query is just useless.
My Current SQL:
SELECT
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
, Count(DISTINCT SopOrder_0.SooOrderNumber) AS 'Orders'
, SUM(CASE WHEN SopOrder_0.SooOrderNumber IS NOT NULL THEN 1 ELSE 0 END) AS 'Order Items'
, SUM(SopOrderItem_0.SoiValue) AS 'Order Value'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN 1 ELSE 0 END) AS 'INT'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'WEB%' THEN 1 ELSE 0 END) AS 'WEB'
FROM
SBS.PUB.Company Company_0
, SBS.PUB.SopOrder SopOrder_0
, SBS.PUB.SopOrderItem SopOrderItem_0
WHERE
SopOrder_0.SopOrderID = SopOrderItem_0.SopOrderID
AND Company_0.CompanyID = SopOrder_0.CompanyID
AND SopOrder_0.SooOrderDate > '2019-01-01'
GROUP BY
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
I have tried using the following line but it errors on me when importing:
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE 0 END) AS 'INT'
Just so know the error I get when importing at the moment is syntax error at or about "CASE WHEN sopOrder_0.SooParentOrderRefer" (10713)
Try removing the ELSE:
COUNT(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference END) AS num_int
You don't specify the error, but the problem is probably that the THEN is returning a string and the ELSE a number -- so there is an attempt to convert the string values to a number.
Also, learn to use proper, explicit, standard JOIN syntax. Simple rule: Never use commas in the FROM clause.
count distinct on the SooOrderNumber or the SooParentOrderReference, whichever makes more sense for you.
If you are COUNTing, you need to make NULL the thing that your are not counting. I prefer to include an else in the case because it is more consistent and complete.
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE null END) AS 'INT'
Gordon Linoff is correct regarding the source of your error, i.e. datatype mismatch between the case then value else value end. null removes (should remove) this ambiguity - I'd need to double check.
Editing my earlier answer...
Even though it looks, as you say, like count distinct is not supported in Pervasive PSQL, CTEs are supported. So you can do something like...
This is what you are trying to do but it is not supported...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
select id
,count(distinct col1) as col_count
from dups
group by id;
Stick another CTE in the query to de-duplicate the data first. Then count as normal. That should work...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
,de_dup as
(
select id
,col1
from dups
group by id
,col1
)
select id
,count(col1) as col_count
from de_dup
group by id;
These 2 versions should give the same result set.
There is always a way!!
I cannot explain the error you are getting. You are mistakenly using single quotes for alias names, but I don't actually think this is causing the error.
Anyway, I suggest you aggregate your order items per order first and only join then:
SELECT
c.coacompanyname
, so.sooorderdate
, COUNT(*) AS orders
, SUM(soi.itemcount) AS order_items
, SUM(soi.ordervalue) AS order_value
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'INT%' THEN 1 END) AS int
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'WEB%' THEN 1 END) AS web
FROM sbs.pub.company c
JOIN sbs.pub.soporder so ON so.companyid = c.companyid
JOIN
(
SELECT soporderid, COUNT(*) AS itemcount, SUM(soivalue) AS ordervalue
FROM sbs.pub.soporderitem
GROUP BY soporderid
) soi ON soi.soporderid = so.soporderid
GROUP BY c.coacompanyname, so.sooorderdate
ORDER BY c.coacompanyname, so.sooorderdate;

Not a GROUP BY Expression & aggregate functions

I was wondering why, for this query that I have right here, why I have to use the MAX() aggregate function for the case statements, and not just jump directly into the case statement:
select
bank_id,
tran_branch_code,
acct_sol_id,
acct_sol_name,
transaction_date,
gl_date,
transaction_id,
account_number,
max(case
when cast(substr(GLSH_Code,0,1) as int) >= 1
and cast(substr(GLSH_Code,0,1) as int) <= 5
and trans_type = 'D'
then (trans_amount)
--else 0
end ) Ind_Part_Tran_Dr_RBU,
max(case
when cast(substr(GLSH_Code,0,1) as int) >= 1
and cast(substr(GLSH_Code,0,1) as int) <= 5
and trans_type = 'C'
then (trans_amount)
--else 0
end) Ind_Part_Tran_Cr_RBU,
max(case
when cast(substr(GLSH_Code,0,1) as int) = 0
or (cast(substr(GLSH_Code,0,1) as int) >= 6
and cast(substr(GLSH_Code,0,1) as int) <= 9)
and trans_type = 'D'
then (trans_amount)
--else 0
end)Ind_Part_Tran_Dr_FCDU,
max(case
when cast(substr(GLSH_Code,0,1) as int) = 0
or (cast(substr(GLSH_Code,0,1) as int) >= 6
and cast(substr(GLSH_Code,0,1) as int) <= 9)
and trans_type = 'C'
then (trans_amount)
--else 0
end) Ind_Part_Tran_Cr_FCDU,
ccy_alias,
ccy_name,
acct_currency,
tran_currency
from
(
SELECT
DTD.BANK_ID,
DTD.SOL_ID Acct_Sol_ID, --Account Sol ID
dtd.br_code Tran_branch_code, -- branch code of the transacting branch
sol.sol_desc Acct_sol_name, -- name/description of SOL
DTD.TRAN_DATE Transaction_Date, --TransactionDate
DTD.GL_DATE GL_Date, --GL Date
TRIM(DTD.TRAN_ID) Transaction_ID, --Transaction ID
DTD.GL_SUB_HEAD_CODE GLSH_Code, --GLSH Code
dtd.tran_amt trans_amount,
GAM.ACCT_CRNCY_CODE Acct_Currency, --Account Currency
DTD.TRAN_CRNCY_CODE Tran_Currency, --Transaction Currency
cnc.crncy_alias_num ccy_alias,
cnc.crncy_name ccy_name,
GAM.FORACID Account_Number, --Account Number
DTD.TRAN_PARTICULAR Transaction_Particulars, --Transaction Particulars
DTD.CRNCY_CODE DTD_CCY,
--GSH.CRNCY_CODE GSH_CCY,
DTD.PART_TRAN_TYPE Transaction_Code,
--'Closing_Balance',
DTD.PSTD_USER_ID PostedBy,
CASE WHEN DTD.REVERSAL_DATE IS NOT NULL
THEN 'Y' ELSE 'N' END Reversal,
TRIM(DTD.TRAN_ID) REV_ORIG_TRAN_ID,
--OTT.REF_NUM OAP_REF_NUM,
'OAP_SETTLEMENT',
'RATE_CODE',
EAB.EOD_DATE
FROM TBAADM.DTD
LEFT OUTER JOIN TBAADM.GAM ON DTD.ACID = GAM.ACID AND DTD.BANK_ID = GAM.BANK_ID
LEFT OUTER JOIN TBAADM.EAB ON DTD.ACID = EAB.ACID AND DTD.BANK_ID = EAB.BANK_ID AND EAB.EOD_DATE = '24-MAR-2014'
left outer join tbaadm.sol on dtd.sol_id = sol.sol_id and dtd.bank_id = sol.bank_id
left outer join tbaadm.cnc on dtd.tran_crncy_code = cnc.crncy_code
WHERE DTD.BANK_ID = 'CBC01'
AND GAM.ACCT_OWNERSHIP = 'O'
AND GAM.DEL_FLG != 'Y'
--AND DTD.TRAN_DATE = '14-APR-2014'
AND DTD.TRAN_DATE between '01-APR-2014' and '21-APR-2014'
--and foracid in ('50010112441109','50010161635051')
--and DTD.SOL_ID = '5001'
and GAM.ACCT_CRNCY_CODE = 'USD'
)
group by
bank_id,
tran_branch_code,
acct_sol_id,
acct_sol_name,
transaction_date,
gl_date,
transaction_id,
account_number,
ccy_alias,
ccy_name,
Acct_Currency,
Tran_Currency
Because If I would remove the MAX(), I'd get the "Not a GROUP BY Expression", and Toad points me to the first occurrence of the GLSH_Code. Based from other websites, the cure for this is really adding the MAX() function. I would just like to understand why should I use that particular function, what it exactly does in the query, stuff like that.
EDIT: inserted the rest of the code.
I know for sure what MAX() does, it returns the largest value in an expression. But in this case, I can't seem to figure out exactly what that largest value is that the function is attempting to return.
The GROUP BY statement declares that all columns returned in the SELECT should be aggregated, but that you want to separate the results by those listed in the GROUP BY.
This means we have to use aggregate functions like MIN, MAX, AVG, SUM, etc. on any column that is NOT listed in the GROUP BY.
It's about telling the SQL engine what the expected results should be when there is more than one option.
In a simple example, we have a table with three columns:
PrimaryId SubId RowValue
1 1 1
2 1 2
3 2 4
4 2 8
And an SQL like the following (which is invalid):
SELECT SubId, RowValue
FROM SampleTable
GROUP BY SubId
We know we want the distinct SubId's (because of the GROUP BY), but we don't know what RowValue should be when we aggregate the results.
SubId RowValue
1 ?
2 ?
We have to be explicit in our query, and indicate what RowValue should be as the results can vary.
If we choose MIN(RowValue) we see:
SubId RowValue
1 1
2 4
If we choose MAX(RowValue) we see:
SubId RowValue
1 2
2 8
If we choose SUM(RowValue) we see:
SubId RowValue
1 3
2 12
Without being explicit there's a high likelihood that the results will be wrong, so our SQL engine of choice protects us from ourselves by enforcing the need for aggregate functions.
You have group by clause at the end on all the columns except for Ind_Part_Tran_Dr_RBU, Ind_Part_Tran_Cr_RBU, Ind_Part_Tran_Dr_FCDU, Ind_Part_Tran_Cr_FCDU. In this case oracle wants you to tell what to do with these columns, i.e. based on which function it has to aggregate them for every group it finds.

SQL Aggreate Functions

I have table which list a number of cases and assigned primary and secondary technicians. What I am trying to accomplish is to aggregate the number of cases a technician has worked as a primary and secondary tech. Should look something like this...
Technician Primary Secondary
John 4 3
Stacy 3 1
Michael 5 3
The table that I am pulling that data from looks like this:
CaseID, PrimaryTech, SecondaryTech, DOS
In the past I have used something like this, but now my superiors are asking for the number of secondary cases as well...
SELECT PrimaryTech, COUNT(CaseID) as Total
GROUP BY PrimaryTech
I've done a bit of searching, but cant seem to find the answer to my problem.
Select Tech,
sum(case when IsPrimary = 1 then 1 else 0 end) as PrimaryCount,
sum(case when IsPrimary = 0 then 1 else 0 end) as SecondaryCount
from
(
SELECT SecondaryTech as Tech, 0 as IsPrimary
FROM your_table
union all
SELECT PrimaryTech as Tech, 1 as IsPrimary
FROM your_table
) x
GROUP BY Tech
You can group two subqueries together with a FULL JOIN as demonstrated in this SQLFiddle.
SELECT Technician = COALESCE(pri.Technician, sec.Technician)
, PrimaryTech
, SecondaryTech
FROM
(SELECT Technician = PrimaryTech
, PrimaryTech = COUNT(*)
FROM Cases
WHERE PrimaryTech IS NOT NULL
GROUP BY PrimaryTech) pri
FULL JOIN
(SELECT Technician = SecondaryTech
, SecondaryTech = COUNT(*)
FROM Cases
WHERE SecondaryTech IS NOT NULL
GROUP BY SecondaryTech) sec
ON pri.Technician = sec.Technician
ORDER By Technician;
SELECT COALESCE(A.NAME, B.NAME) AS NAME, CASE WHEN A.CASES IS NOT NULL THEN A.CASES ELSE 0 END AS PRIMARY_CASES,
CASE WHEN B.CASES IS NOT NULL THEN B.CASES ELSE 0 END AS SECONDARY_CASES
FROM
(
SELECT COUNT(*) AS CASES, PRIMARYTECH AS NAME FROM YOUR_TABLE
GROUP BY PRIMARYTECH
) AS A
FULL OUTER JOIN
(
SELECT COUNT(*) AS CASES, SECONDARYTECH AS NAME FROM YOUR_TABLE
GROUP BY SECONDARYTECH
) AS B
ON A.NAME = B.NAME

SQL select and Group by

I have a table in SQL like this.
OrderID ItemID ItemPrice ItemType
1 A 100 1
1 B 50 1
1 C 10 0
2 A 100 1
2 F 60 0
3 G 10 0
So I want to get out put like this?
OrderID ItemPrice -Type= 1 ItemPrice -Type= 0
1 150 10
2 10 60
3 10
Do you have any idea about the SQL command to use?
I think it is group by order ID and Item type.
What you are doing is a pivot transformation. There are a few ways to do it, but my favorite way is using CASE inside SUM:
SELECT
OrderId,
SUM(CASE WHEN ItemType = 0 THEN ItemPrice ELSE 0 END) AS Type0_Price,
SUM(CASE WHEN ItemType = 1 THEN ItemPrice ELSE 0 END) AS Type1_Price
FROM MyTable
GROUP BY OrderId
This scales nicely if you have more than two types. All you have to do is add another SUM(...) line in your select list without having to change the rest of the query.
I think this will perform well, since the calculations in the SELECT list can be done without incurring additional row scans or lookups. That's the downside of self-joins and sub-selects.
Try this::
Select
DISTINCT(orderId),
(Select SUM(ITEMPRICE) from table where Itemtype=1 group by ORDERID) as ItemType1,
(Select SUM(ITEMPRICE) from table where Itemtype=0 group by ORDERID) as ItemType0
from table
Untested, but this should work for you.
SELECT t1.OrderID,
ItemPrice-Type1 = SUM(t1.ItemPrice),
ItemPrice-Type2 = SUM(t2.ItemPrice)
FROM TableName t1
INNER JOIN TableName t2 on t1.OrderID = t2.OrderID and t1.ItemID = t2.ItemID
WHERE t1.ItemType = 1 AND t2.ItemType = 0
GROUP BY t1.OrderID
Did this work?:
SELECT
OrderID,
SUM(ItemPrice*ItemType) Type1,
SUM(ItemPrice*(1-ItemType)) Type0
FROM
TableName
GROUP BY OrderID

Subselect Query Improvement

How can I improve the SQL query below (SQL Server 2008)? I want to try to avoid sub-selects, and I'm using a couple of them to produce results like this
StateId TotalCount SFRCount OtherCount
---------------------------------------------------------
AZ 102 50 52
CA 2931 2750 181
etc...
SELECT
StateId,
COUNT(*) AS TotalCount,
(SELECT COUNT(*) AS Expr1 FROM Property AS P2
WHERE (PropertyTypeId = 1) AND (StateId = P.StateId)) AS SFRCount,
(SELECT COUNT(*) AS Expr1 FROM Property AS P3
WHERE (PropertyTypeId <> 1) AND (StateId = P.StateId)) AS OtherCount
FROM Property AS P
GROUP BY StateId
HAVING (COUNT(*) > 99)
ORDER BY StateId
This may work the same, hard to test without data
SELECT
StateId,
COUNT(*) AS TotalCount,
SUM(CASE WHEN PropertyTypeId = 1 THEN 1 ELSE 0 END) as SFRCount,
SUM(CASE WHEN PropertyTypeId <> 1 THEN 1 ELSE 0 END) as OtherCount
FROM Property AS P
GROUP BY StateId
HAVING (COUNT(*) > 99)
ORDER BY StateId
Your alternative is a single self-join of Property using your WHERE conditions as a join parameter. The OtherCount can be derived by subtracting the TotalCount - SFRCount in a derived query.
Another alternative would be to use the PIVOT function like this:
SELECT StateID, [1] + [2] AS TotalCount, [1] AS SFRCount, [2] AS OtherCount
FROM Property
PIVOT ( COUNT(PropertyTypeID)
FOR PropertyTypeID IN ([1],[2])
) AS pvt
WHERE [1] + [2] > 99
You would need to add an entry for each property type which could be daunting but it is another alternative. Scott has a great answer.
If PropertyTypeId is not null then you could do this with a single join. Count is faster than Sum. But is Count plus Join faster than Sum. The test case below mimics your data. docSVsys has 800,000 rows and there are about 300 unique values for caseID. The Count plus Join in this test case is slightly faster than the Sum. But if I remove the with (nolock) then Sum is about 1/4 faster. You would need to test with your data.
select GETDATE()
go;
select caseID, COUNT(*) as Ttl,
SUM(CASE WHEN mimeType = 'message/rfc822' THEN 1 ELSE 0 END) as SFRCount,
SUM(CASE WHEN mimeType <> 'message/rfc822' THEN 1 ELSE 0 END) as OtherCount,
COUNT(*) - SUM(CASE WHEN mimeType = 'message/rfc822' THEN 1 ELSE 0 END) as OtherCount2
from docSVsys with (nolock)
group by caseID
having COUNT(*) > 1000
select GETDATE()
go;
select docSVsys.caseID, COUNT(*) as Ttl
, COUNT(primaryCount.sID) as priCount
, COUNT(*) - COUNT(primaryCount.sID) as otherCount
from docSVsys with (nolock)
left outer join docSVsys as primaryCount with (nolock)
on primaryCount.sID = docSVsys.sID
and primaryCount.mimeType = 'message/rfc822'
group by docSVsys.caseID
having COUNT(*) > 1000
select GETDATE()
go;