Distinct in SQL Server

Distinct in SQL Server - sql

I am executing the following query,
Select distinct
a.cr_id,
Case
When ca.ca_vote = 'Approve' and ca.ca_title='MANAGER' Then ca.ca_email
When ca.ca_vote = 'Reject' Then ''
When ca.ca_vote = 'Pending' Then ''
When ca.ca_vote = 'IN PROCESS' Then ''
End as ca_email
from
credit a
inner join credit_approvals ca on ca.c_id=a.cr_id
where
a.cr_cs_date between Convert(varchar(20),'11/16/2011',101) and dateadd(day,1,convert (varchar(20),'11/16/2011',101))
order by
a.cr_id
Despite distinct for cr_id, it is still displaying the duplicate values. Please let me know how to handle this, so that I could able to display only distinct records.

Distinct is applied to all columns, not the one which is immediately after Distinct.
If you have several different ca_email for a cr_id, you will see them all.
If you don't want that, you have to come up with a rule to decide what record among the duplicates must stay.

Related

Multiple columns in a subquery

I am trying to find the products selected in previous week vs products selected for this week to find the churn in selection decision. Currently I am doing it for a single site and the result works fine with the correct number of records. Now I want to change the code where I get the output for 10 sites in a single query.
create temporary view removes as
select distinct
asin,
lagweek,
fc,
'removed' as state,
demand_pp,
instock_pp,
source,
filter_reason,
is_missing_in_pp,
is_missing_in_dc,
is_missing_in_nmi,
asin_nmi,
asin_pre,
asin_dc,
filter_reason_old,
asin_omi,
asin_preo,
asin_dco
from sel_old so where asin not in (select asin from sel_new sn where sn.lagweek = so.lagweek);
Since this is for a single site just doing asin not in (select asin ...) works fine but now I want to look at ASINs across multiple sites from the same logic. I tried the approach below but it returns incorrect number of records.
create temporary view removes as
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so left join (select asin, fc, lagweek from sel_new) as sn
on (so.asin <> sn.asin)
and (so.fc = sn.fc)
and (so.lagweek = sn.lagweek);
How should I approach this. I haven't been able find an easier solution if there are any.

You can use EXISTS predicate. It doesn't produce additional records, just tests the existence of some case and filters accordingly.
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so
where not exists (
select 1
from sel_new sn
where so.fc = sn.fc
and so.lagweek = sn.lagweek
and so.asin = sn.asin
)

Query that detects difference between accounts/loads from TODAY and YESTERDAY

GOAL: DETECT any difference between yesterday's table loads and today's loads. Each load loads values of data that are associated with bank accounts. So I need a query that returns each individual account that has a difference, with the value in the column name.
I need data from several columns that are located from two different tables. AEI_GFXAccounts and AEI_GFXAccountSTP. Each time the table is loaded, it has a "run_ID" that is incremented by one. So it needs to be compared to MAX(run_id) and MAX(run_id) -1.
I have tried the following queries. All this query does is return all the columns I need. I now need to implement logic that runs these queries WHERE runID = MAX(runID). Then run it again where run_ID = Max(runID) -1. Compare the two tables, show the differences that can be displayed under columns like SELECT AccountBranch WHERE MAX(Run_ID) -1 AS WAS. etc. and another custom named column as 'IS NOW' etc for each column.
SELECT AEI_GFXAccounts.AccountNumber,
AccountBranch,
AccountName,
AccountType,
CostCenter,
TransactionLimit,
ClientName,
DailyCumulativeLimit
FROM AEI_GFXAccounts
JOIN AEI_GFXAccountSTP
ON (AEI_GFXAccounts.feed_id = AEI_GFXAccountSTP.feed_id
and AEI_GFXAccounts.run_id = AEI_GFXAccountSTP.run_id)

I use something similar to this to detect changes for a logging system:
WITH data AS (
SELECT
a.run_id,
a.AccountNumber,
?.AccountBranch,
?.AccountName,
?.AccountType,
?.CostCenter,
?.TransactionLimit,
?.ClientName,
?.DailyCumulativeLimit
FROM
AEI_GFXAccounts a
INNER JOIN AEI_GFXAccountSTP b
ON
a.feed_id = b.feed_id and
a.run_id = b.run_id
),
yest AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id)-1 FROM AEI_GFXAccounts)
),
toda AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id) FROM AEI_GFXAccounts)
)
SELECT
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN yest.AccountBranch END as yest_AccountBranch,
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN toda.AccountBranch END as toda_AccountBranch,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN yest.AccountName END as yest_AccountName,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN toda.AccountName END as toda_AccountName,
...
FROM
toda INNER JOIN yest ON toda.accountNumber = yestaccountNumber
Notes:
You didn't say which table some of your columns are from. I've prefixed them with ?. - replace these with a. or as. respectively (always good practice to fully qualify all your column aliases)
When you're repeating out the pattern in the bottom select (above ...) choose data for the COALESCE that will not appear in the column. I'm using COALESCE as a quick way to avoid having to write CASE WHEN a is null and b is not null or b is null and a is not null or a != b, but the comparison fails if accountname (for example) was 'x' yesterday and today it is null, because the null becomes 'x'. If you pick data that will never appear in the column then the check will work out because nulls will be coalesced to something that can never appear in the real data, and hence the <> comparison will work out
If you don't care when a column goes to null today from a value yesterday, or was null yesterday but is a value today, you can ditch the coalesce and literally just do toda.X <> yest.X
New accounts today won't show up until tomorrow. If you want them to show up do toda LEFT JOIN yest .... Of course all their properties will show as new ;)
This query returns all the accounts regardless of whether any changes have been made. If you only want a list of accounts with changes you'll need a where clause that is similar to your case whens:
WHERE
COALESCE(toda.AccountBranch, 'x') <> COALESCE(yest.AccountBranch, 'x') OR
COALESCE(toda.AccountName, 'x') <> COALESCE(yest.AccountName, 'x') OR
...

Do you have a date field? If so you can use Row_Number partitioned by your accounts. Exclude all accounts that have a max of 1 row 'New accounts", and then subtract the Max(rownumber) of each account's load by the Max(rownumber)-1's load. Only return accounts where this returned load is >0.You can also use the lag function to grab the previous accounts load instead of Max(rownumber)-1

SQL Group by CASE result

I have a simple SQL query on IBM DB2. I'm trying to run something as below:
select case when a.custID = 42285 then 'Credit' when a.unitID <> '' then 'Sales' when a.unitID = '' then 'Refund'
else a.unitID end TYPE, sum(a.value) as Total from transactions a
group by a.custID, a.unitID
This query runs, however I have a problem with group by a.custID - I'd prefer not to have this, but the query won't run unless it's present. I'd want to run the group by function based on the result of the CASE function, not the condition pool behind it. So, I'm looking something like:
group by TYPE
However adding group by TYPE reports an error message "Column or global variable TYPE not found". Also removing a.custID from group section reports "Column custID or expression in SELECT list not valid"
Is this going to be possible at all or do I need to review my CASE function and avoid using the custID column since at the moment I'm getting a grouping also based on custID column, even though it's not present in SELECT.
I understand why the grouping works as it does, I'm just wondering if it's possible to get rid of the custID grouping, but still maintain it within CASE function.

If you want terseness of code, you could use a subquery here:
SELECT TYPE, SUM(value) AS Total
FROM
(
SELECT CASE WHEN a.custID = 42285 THEN 'Credit'
WHEN a.unitID <> '' THEN 'Sales'
WHEN a.unitID = '' THEN 'Refund'
ELSE a.unitID END TYPE,
value
FROM transactions a
) t
GROUP BY TYPE;
The alternative to this would be to just repeat the CASE expression in the GROUP BY clause, which is ugly, but should still work. Note that some databases (e.g. MySQL) have overloaded GROUP BY and do allow aliases to be used in at least some cases.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)

Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.

looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

multiple count(distinct)

I get an error unless I remove one of the count(distinct ...). Can someone tell me why and how to fix it?
I'm in vfp. iif([condition],[if true],[else]) is equivalent to case when
SELECT * FROM dpgift where !nocalc AND rectype = "G" AND sol = "EM112" INTO CURSOR cGift
SELECT
list_code,
count(distinct iif(language != 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_E_New_Indiv,
count(distinct iif(language = 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_F_New_Indiv /*it works if i remove this*/
FROM cGift gift
LEFT JOIN
(select didnumb, language, type from dp) d
on cast(gift.donor as i) = cast(d.didnumb as i)
GROUP BY list_code
ORDER by list_code
edit:
apparently, you can't use multiple distinct commands on the same level. Any way around this?

VFP does NOT support two "DISTINCT" clauses in the same query... PERIOD... I've even tested on a simple table of my own, DIRECTLY from within VFP such as
select count( distinct Col1 ) as Cnt1, count( distinct col2 ) as Cnt2 from MyTable
causes a crash. I don't know why you are trying to do DISTINCT as you are just testing a condition... I more accurately appears you just want a COUNT of entries per each category of criteria instead of actually DISTINCT
Because you are not "alias.field" referencing your columns in your query, I don't know which column is the basis of what. However, to help handle your DISTINCT, and it appears you are running from WITHIN a VFP app as you are using the "INTO CURSOR" clause (which would not be associated with any OleDB .net development), I would pre-query and group those criteria, something like...
select list_code,
donor,
max( iif( language != 'F' and renew = '0' and type = 'IN', 1, 0 )) as EQualified,
max( iif( language = 'F' and renew = '0' and type = 'IN', 1, 0 )) as FQualified
from
list_code
group by
list_code,
donor
into
cursor cGroupedByDonor
so the above will ONLY get a count of 1 per donor per list code, no matter how many records that qualify. In addition, if one record as an "F" and another does NOT, then you'll have a value of 1 in EACH of the columns... Then you can do something like...
select
list_code,
sum( EQualified ) as DistEQualified,
sum( FQualified ) as DistFQualified
from
cGroupedByDonor
group by
list_code
into
cursor cDistinctByListCode
then run from that...

You can try using either another derived table or two to do the calculations you need, or using projections (queries in the field list). Without seeing the schema, it's hard to know which one will work for you.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Distinct in SQL Server - sql

Distinct is applied to all columns, not the one which is immediately after Distinct. If you have several different ca_email for a cr_id, you will see them all. If you don't want that, you have to come up with a rule to decide what record among the duplicates must stay.

Related

Multiple columns in a subquery

Query that detects difference between accounts/loads from TODAY and YESTERDAY

SQL Group by CASE result

the below select statement takes a long in running

multiple count(distinct)

Categories

Resources