Count Distinct Not Working in Case Select Oracle SQL - sql

I have a SQL question in which the code fails to count distinct ID's.  It does count them, but does not do so distinctly. I have provided a small snippet of code below and have bolded the issue.
SELECT
"RESERVATION_STAT_DAILY"."RESORT" AS "RESORT",
"RESERVATION_STAT_DAILY"."BUSINESS_DATE" AS "BUSINESS_DATE",
to_char("RESERVATION_STAT_DAILY"."BUSINESS_DATE",'MON-yyyy') AS "MONTHYEAR",
Extract(day from "RESERVATION_STAT_DAILY"."BUSINESS_DATE") AS "DAY",
Extract(month from "RESERVATION_STAT_DAILY"."BUSINESS_DATE") AS "MONTH",
Extract(year from "RESERVATION_STAT_DAILY"."BUSINESS_DATE") AS "YEAR",
"RESERVATION_STAT_DAILY"."SOURCE_CODE" AS "SOURCE_CODE",
"RESERVATION_STAT_DAILY"."MARKET_CODE" AS "MARKET_CODE",
"RESERVATION_STAT_DAILY"."RATE_CODE" AS "RATE_CODE",
"RESERVATION_STAT_DAILY"."RESV_NAME_ID" AS "RESV_NAME_ID",
(CASE WHEN "RESERVATION_STAT_DAILY"."SOURCE_CODE" = 'GDS'
AND "RESERVATION_STAT_DAILY"."RATE_CODE" NOT IN ('BKIT', 'EXPEDIA')
AND "RESERVATION_STAT_DAILY"."MARKET_CODE" NOT IN ('GOVG', 'ENT')
THEN 'GDS'
ELSE 'Other'
END) AS "BizUnit",
COUNT(DISTINCT CASE WHEN "RESERVATION_STAT_DAILY"."SOURCE_CODE" = 'GDS'
AND "RESERVATION_STAT_DAILY"."RATE_CODE" NOT IN ('BKIT', 'EXPEDIA')
AND "RESERVATION_STAT_DAILY"."MARKET_CODE" NOT IN ('GOVG', 'ENT')
THEN "RESERVATION_STAT_DAILY"."RESV_NAME_ID"
ELSE NULL
END) AS "COST",
(SUM("RESERVATION_STAT_DAILY"."BUSINESS_DATE" - "RESERVATION_STAT_DAILY"."BUSINESS_DATE_CREATED")/(COUNT ("RESERVATION_STAT_DAILY"."BUSINESS_DATE_CREATED"))) AS "DIFF",
SUM(NVL("RESERVATION_STAT_DAILY"."NIGHTS",0)) AS "NIGHTS",
SUM(NVL("RESERVATION_STAT_DAILY"."ROOM_REVENUE",0)) AS "ROOM_REVENUE"
FROM "OPERA"."RESERVATION_STAT_DAILY" "RESERVATION_STAT_DAILY"
Where RESORT in ('558339','558341','4856','558340','602836','HCA','HZSD', 'TAC') and
BUSINESS_DATE < SYSDATE AND EXTRACT(year FROM "RESERVATION_STAT_DAILY"."BUSINESS_DATE_CREATED") >=2016
GROUP BY
"RESERVATION_STAT_DAILY"."RESORT",
"RESERVATION_STAT_DAILY"."BUSINESS_DATE",
to_char("RESERVATION_STAT_DAILY"."BUSINESS_DATE",'MON-yyyy'),
Extract(day from "RESERVATION_STAT_DAILY"."BUSINESS_DATE"),
Extract(month from "RESERVATION_STAT_DAILY"."BUSINESS_DATE"),
Extract(year from "RESERVATION_STAT_DAILY"."BUSINESS_DATE"),
"RESERVATION_STAT_DAILY"."SOURCE_CODE",
"RESERVATION_STAT_DAILY"."MARKET_CODE",
"RESERVATION_STAT_DAILY"."RATE_CODE",
"RESERVATION_STAT_DAILY"."RESV_NAME_ID",
( CASE
WHEN (("RESERVATION_STAT_DAILY"."SOURCE_CODE" = 'GDS') AND ("RESERVATION_STAT_DAILY"."RATE_CODE" != 'BKIT' OR "RESERVATION_STAT_DAILY"."RATE_CODE" != 'EXPEDIA'
)) THEN 'GDS'
ELSE 'Other'
END )

Some general tips to clean up your code, plus a solution:
As others have said, NOT IN clauses would be perfect here. Substitute them for those huge blocks of != comparisons. You also want your COUNT and SUM functions to be outside the CASE statements, as shown below.
SELECT
...
CASE WHEN "RESERVATION_STAT_DAILY"."SOURCE_CODE" = 'GDS'
AND "RESERVATION_STAT_DAILY"."RATE_CODE" NOT IN ('BKIT', 'EXPEDIA', ...)
AND "RESERVATION_STAT_DAILY"."MARKET_CODE" NOT IN ('GOVG', 'ENT', ...)
THEN 'GDS'
ELSE 'Other'
END AS "BizUnit",
COUNT(DISTINCT CASE WHEN "RESERVATION_STAT_DAILY"."SOURCE_CODE" = 'GDS'
AND "RESERVATION_STAT_DAILY"."RATE_CODE" NOT IN ('BKIT', 'EXPEDIA', ...)
AND "RESERVATION_STAT_DAILY"."MARKET_CODE" NOT IN ('GOVG', 'ENT', ...)
THEN "RESERVATION_STAT_DAILY"."RESV_NAME_ID"
ELSE NULL
END) AS "COST",
...
FROM
"OPERA"."RESERVATION_STAT_DAILY" "RESERVATION_STAT_DAILY"
WHERE
...
GROUP BY
...
Your code was over 570 lines long. Some people consider 1/10 of that to be too much code. Notice how I snipped out the parts that aren't directly applicable to your issue? This is how you go about creating a minimal, complete, and verifiable working example.

Just a few remarks (too long for a comment):
AND ("RESERVATION_STAT_DAILY"."RATE_CODE" != 'BKIT' OR "RESERVATION_STAT_DAILY"."RATE_CODE" != 'EXPEDIA' )? Think about that for a moment. When is that condition not met? The rate code will always be different from either one value or the other (usually both), except for NULL, where the result is "unknown".
THEN COUNT(DISTINCT "RESERVATION_STAT_DAILY"."RESV_NAME_ID") ELSE COUNT(DISTINCT "RESERVATION_STAT_DAILY"."RESV_NAME_ID") END) AS "COST". So in any case you count distinct RESV_NAME_ID. Why the CASE then?
As you group by RESV_NAME_ID, COUNT(DISTINCT "RESERVATION_STAT_DAILY"."RESV_NAME_ID") can always only be 1 in a group.
sum(NVL("RESERVATION_STAT_DAILY"."NIGHTS",0)) and sum(NVL("RESERVATION_STAT_DAILY"."ROOM_REVENUE",0)): SUM ignores nulls, so you don't have to make these zeros before adding them up. A sum, hoever, can be null, so you may want NVL(SUM(NIGHTS), 0) instead.
As to readablity: queries in all upper caps are hard to read. Either use lower case or mix the two (e.g. upper case for SQL keywords). As no column contains blanks or the like, you don't need quotes. As only one table is involved, you don't need a table qualifier. And if you did, you should have a short alias name for the table and use this instead of the whole name. And you should format the query with indentation, so we see the clauses (FROM, GROUP BY etc.) on first glance.

Related

Joined tables returning correct value when selecting single row but incorrect when entire dataset

I have 3 tables which I have joined: campaign level, ad level, and keyword level, and I need certain things from each of these. All 3 contain the following identical columns: campaign_name, campaign_id, day. Two of them also contain 'ad_group_name'.
The query is functioning and returning all the right values for data coming from keyword and campaign, but the values I need from ad level (conversion_name and it's values) are not. But the confusing part for me is that when I use the 'WHERE' clause and select only one row, the values are correct and match up to the source tables. Additionally, the values of a.conversion_name add up to 'conversion' total (+/- 1/2).
Results from single row (WHERE clause)
When I remove the WHERE clause and select the entire table, my numbers are significantly larger than they should be. The a.conversion_name values no longer add up to the 'conversion' total - in fact sometimes conversions = 0, and the a.conversion_name values return values.
Results when selecting entire table
Results when selecting entire table 2
I think I understand why this is happening, it's a grouping issue (?), and I have searched through lots of the existing threads, tried out sub queries and DISTINCTS, but my skill level at the moment means I am really struggling to figure this out.
Should I change how things are grouped? I have also tried adding the a.conversion_name as a dimension and then selecting it, but this doesn't help either.
WITH raw AS (SELECT
k.day,
k.campaign_name,
k.ad_group_name,
k.ad_group_type,
k.ad_group_id,
k.campaign_id,
k.keyword,
k.keyword_match_type,
AVG(CASE WHEN a.conversion_name = 'Verification Submitted' THEN a.conversions END) AS conv_verification_submitted,
AVG(CASE WHEN a.conversion_name = 'Email Confirmed' THEN a.conversions END) AS conv_email_confirmed,
AVG(CASE WHEN a.conversion_name = 'Account created' THEN a.conversions END) AS conv_account_created,
AVG(CASE WHEN a.conversion_name = 'Verification Started' THEN a.conversions END) AS conv_verification_started,
AVG(CASE WHEN a.conversion_name = 'Deposit Succeeded' THEN a.conversions END) AS conv_deposit_succeeded,
AVG(CASE WHEN a.conversion_name = 'Trade Completed' THEN a.conversions END) AS conv_trade_completed,
AVG(K.clicks) as clicks,
AVG(K.conversions) as conversions,
AVG(K.costs) as spend,
AVG(K.impressions) as impressions,
AVG(k.quality_score) as quality_score,
AVG(c.search_impression_share) as search_impression_share,
AVG(k.search_exact_match_impression_share) as search_exact_match_impression_share,
AVG(c.search_lost_impression_share_rank) as search_lost_impression_share_rank,
AVG(c.search_top_impression_share) as search_top_impression_share,
AVG(c.search_lost_impression_share_budget) as search_lost_impression_share_budget,
FROM `bigqpr.keyword-level-data` as k
LEFT JOIN `bigqpr.campaign-level-data` as c
ON c.campaign_name = k.campaign_name and c.day = k.day
LEFT JOIN `bigqpr.ad-level-data` as a
ON a.campaign_name = k.campaign_name and a.day = k.day and a.ad_group_name = k.ad_group_name
group by 1,2,3,4,5,6,7,8,a.conversion_name)
SELECT
day,
campaign_name,
ad_group_name,
ad_group_type,
ad_group_id,
campaign_id,
keyword,
keyword_match_type,
AVG(conv_verification_submitted) as conv_verification_submitted,
AVG(conv_email_confirmed) as conv_email_confirmed,
AVG(conv_account_created) as conv_account_created,
AVG(conv_verification_started) as conv_verification_started,
AVG(conv_deposit_succeeded) as conv_deposit_succeeded,
AVG(conv_trade_completed) as conv_trade_completed,
AVG(clicks) as clicks,
AVG(conversions) as conversions,
AVG(spend) as spend,
AVG(impressions) as impressions,
AVG(quality_score) as quality_score,
AVG(search_impression_share) as search_impression_share,
AVG(search_exact_match_impression_share) as search_exact_match_impression_share,
AVG(search_lost_impression_share_rank) as search_lost_impression_share_rank,
AVG(search_top_impression_share) as search_top_impression_share,
AVG(search_lost_impression_share_budget) as search_lost_impression_share_budget,
FROM raw
WHERE keyword = "specifickeyword" and day = "2022-05-22" and ad_group_name = "specificadgroup"
GROUP BY 1,2,3,4,5,6,7,8
I am also a beginner, but my first thought is to try an Inner Join instead of a Left Join. Sometimes this helps my result sets fit the number I'm looking for instead of being so large.

SQL Group by CASE result

I have a simple SQL query on IBM DB2. I'm trying to run something as below:
select case when a.custID = 42285 then 'Credit' when a.unitID <> '' then 'Sales' when a.unitID = '' then 'Refund'
else a.unitID end TYPE, sum(a.value) as Total from transactions a
group by a.custID, a.unitID
This query runs, however I have a problem with group by a.custID - I'd prefer not to have this, but the query won't run unless it's present. I'd want to run the group by function based on the result of the CASE function, not the condition pool behind it. So, I'm looking something like:
group by TYPE
However adding group by TYPE reports an error message "Column or global variable TYPE not found". Also removing a.custID from group section reports "Column custID or expression in SELECT list not valid"
Is this going to be possible at all or do I need to review my CASE function and avoid using the custID column since at the moment I'm getting a grouping also based on custID column, even though it's not present in SELECT.
I understand why the grouping works as it does, I'm just wondering if it's possible to get rid of the custID grouping, but still maintain it within CASE function.
If you want terseness of code, you could use a subquery here:
SELECT TYPE, SUM(value) AS Total
FROM
(
SELECT CASE WHEN a.custID = 42285 THEN 'Credit'
WHEN a.unitID <> '' THEN 'Sales'
WHEN a.unitID = '' THEN 'Refund'
ELSE a.unitID END TYPE,
value
FROM transactions a
) t
GROUP BY TYPE;
The alternative to this would be to just repeat the CASE expression in the GROUP BY clause, which is ugly, but should still work. Note that some databases (e.g. MySQL) have overloaded GROUP BY and do allow aliases to be used in at least some cases.

AND OR SQL operator with multiple records

I have the following query where if brand1/camp1 taken individually, query returns the correct value but if I specify more than one brand or campaigns, it returns some other number and I am not sure what the math is behind that. It is not the total of the two either.
I think it is IN operator that is specifying OR with "," as opposed to what I require it to do which is consider AND
select campaign,
sum(case when campaign in ('camp1', 'camp2') and description in ('brand1', 'brand2') then orders else 0 end) as brand_convs
from data.camp_results
where campaign in ('camp1', 'camp2') and channel='prog' and type='sbc'
group by campaign
having brand_convs > 0
order by brand_convs desc;
Any thoughts?
The problem is in the IN part as you suspected: The two IN operators do not affect eachother in any way, so campaign can be camp1 while description is brand2.
If your DBMS supports multiple columns in an IN statement, you use a single IN statement:
SELECT campaign, SUM(
CASE WHEN (campaign, description) IN (
('camp1', 'brand1'),
('camp2', 'brand2')
) THEN orders ELSE 0 END
) [rest of query...]
If not, you're probably going to have to use ANDs and ORs
SELECT campaign, SUM(
CASE WHEN
(campaign='camp1' AND description='brand1')
OR (campaign='camp2' AND description='brand2')
THEN orders ELSE 0 END
) [rest of query...]

Nested Case Statements in Oracle

So, I'm trying to run a SQL Statement to select and entire DB for upload in an ETL process, but I want to create a calculated column for the number of days between a ticket opening and being closed.
The IF-THEN logic is like this:
IF the department is Grounds Maintenance, and there's a foreign key match with a second table, and there's specific task type, then use formula A
ELSE
IF INCIDENT_RESOLVED_DATE IS NULL, then use formula B
ELSE use formula C
I think my CASE logic is solid, but it keeps bringing me back the same row over and over again. This tells me I'm missing something. I'm almost positive it's something to do with the WHEN statement on the first part of the CASE statement, but if I knew, I wouldn't be asking.
SELECT
a.*
, a.REPORTED_DATE
, a.CLOSE_DATE
, a.INCIDENT_RESOLVED_DATE
, CASE
WHEN DEPARTMENT = 'Grounds Maintenance'
AND a.INCIDENT_ID = b.SOURCE_OBJECT_ID
AND b.TASK_TYPE_ID = '11501'
THEN (to_date(b.ACTUAL_END_DATE, 'DD-MM-YYYY') - to_date(a.REPORTED_DATE, 'DD-MM-YYYY'))
ELSE CASE
WHEN a.INCIDENT_RESOLVED_DATE IS NULL THEN (to_date(a.CLOSE_DATE, 'DD-MM-YYYY') - to_date(a.REPORTED_DATE, 'DD-MM-YYYY'))
ELSE (to_date(a.INCIDENT_RESOLVED_DATE, 'DD-MM-YYYY') - to_date(a.REPORTED_DATE, 'DD-MM-YYYY'))END
END
AS
DAYS_TO_RESOLVE
FROM
CMEM_CS_SERVICE_REQUESTS a, jtf_tasks_b b
WHERE
EXTRACT(YEAR FROM a.REPORTED_DATE) > 2009;
Thoughts?

Character string buffer too small

I have select:
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
wm_concat(px_dtct) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
but i get error ORA-06502: PL/SQL: : character string buffer too small if i'll remove one string, because sometimes (when v.accs= 3570) count(*) = 215
but when i try to skip using wm_concat for v.accs= 3570 for example this way:
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
(case when v.accs = 3570 then wm_concat(px_dtct) else 'too many' end) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
i still have the same error message. But why?
You concatenate results from a query. This query can result in a lot of rows so eventually you will run out of string length. Maybe concatenation is not the way to go here. Depends on what you want to achieve of course.
Why? Because you still use wm_concat for accs=3570... swap the THEN and ELSE part of your CASE expression
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,wm_concat(ids) npx_IDS,
(case when v.accs = 3570 then 'too many' else wm_concat(px_dtct) end) npx_DTCT
from table v group by accs, currency, amount, drcr_ind
First, as it has already been told, you have to switch then and else clauses in your query.
Then, I guess you should also similarily process your second wm_concat, the one that works with ids.
select v.accs, v.currency,v.amount,v.drcr_ind, count(*) qua,
(case when v.accs = 3570 then 'too many' else wm_concat(ids) end) npx_IDS,
(case when v.accs = 3570 then 'too many' else wm_concat(px_dtct) end) npx_DTCT
from table v
group by accs, currency, amount, drcr_ind
And, finally, why do you think that only v.accs = 3570 is able to bring 06502 error in front of you? I suppose you should handle all of them.