multiple count(distinct)

multiple count(distinct) - sql

I get an error unless I remove one of the count(distinct ...). Can someone tell me why and how to fix it?
I'm in vfp. iif([condition],[if true],[else]) is equivalent to case when
SELECT * FROM dpgift where !nocalc AND rectype = "G" AND sol = "EM112" INTO CURSOR cGift
SELECT
list_code,
count(distinct iif(language != 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_E_New_Indiv,
count(distinct iif(language = 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_F_New_Indiv /*it works if i remove this*/
FROM cGift gift
LEFT JOIN
(select didnumb, language, type from dp) d
on cast(gift.donor as i) = cast(d.didnumb as i)
GROUP BY list_code
ORDER by list_code
edit:
apparently, you can't use multiple distinct commands on the same level. Any way around this?

VFP does NOT support two "DISTINCT" clauses in the same query... PERIOD... I've even tested on a simple table of my own, DIRECTLY from within VFP such as
select count( distinct Col1 ) as Cnt1, count( distinct col2 ) as Cnt2 from MyTable
causes a crash. I don't know why you are trying to do DISTINCT as you are just testing a condition... I more accurately appears you just want a COUNT of entries per each category of criteria instead of actually DISTINCT
Because you are not "alias.field" referencing your columns in your query, I don't know which column is the basis of what. However, to help handle your DISTINCT, and it appears you are running from WITHIN a VFP app as you are using the "INTO CURSOR" clause (which would not be associated with any OleDB .net development), I would pre-query and group those criteria, something like...
select list_code,
donor,
max( iif( language != 'F' and renew = '0' and type = 'IN', 1, 0 )) as EQualified,
max( iif( language = 'F' and renew = '0' and type = 'IN', 1, 0 )) as FQualified
from
list_code
group by
list_code,
donor
into
cursor cGroupedByDonor
so the above will ONLY get a count of 1 per donor per list code, no matter how many records that qualify. In addition, if one record as an "F" and another does NOT, then you'll have a value of 1 in EACH of the columns... Then you can do something like...
select
list_code,
sum( EQualified ) as DistEQualified,
sum( FQualified ) as DistFQualified
from
cGroupedByDonor
group by
list_code
into
cursor cDistinctByListCode
then run from that...

You can try using either another derived table or two to do the calculations you need, or using projections (queries in the field list). Without seeing the schema, it's hard to know which one will work for you.

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.

For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work

You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);

You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

Query that detects difference between accounts/loads from TODAY and YESTERDAY

GOAL: DETECT any difference between yesterday's table loads and today's loads. Each load loads values of data that are associated with bank accounts. So I need a query that returns each individual account that has a difference, with the value in the column name.
I need data from several columns that are located from two different tables. AEI_GFXAccounts and AEI_GFXAccountSTP. Each time the table is loaded, it has a "run_ID" that is incremented by one. So it needs to be compared to MAX(run_id) and MAX(run_id) -1.
I have tried the following queries. All this query does is return all the columns I need. I now need to implement logic that runs these queries WHERE runID = MAX(runID). Then run it again where run_ID = Max(runID) -1. Compare the two tables, show the differences that can be displayed under columns like SELECT AccountBranch WHERE MAX(Run_ID) -1 AS WAS. etc. and another custom named column as 'IS NOW' etc for each column.
SELECT AEI_GFXAccounts.AccountNumber,
AccountBranch,
AccountName,
AccountType,
CostCenter,
TransactionLimit,
ClientName,
DailyCumulativeLimit
FROM AEI_GFXAccounts
JOIN AEI_GFXAccountSTP
ON (AEI_GFXAccounts.feed_id = AEI_GFXAccountSTP.feed_id
and AEI_GFXAccounts.run_id = AEI_GFXAccountSTP.run_id)

I use something similar to this to detect changes for a logging system:
WITH data AS (
SELECT
a.run_id,
a.AccountNumber,
?.AccountBranch,
?.AccountName,
?.AccountType,
?.CostCenter,
?.TransactionLimit,
?.ClientName,
?.DailyCumulativeLimit
FROM
AEI_GFXAccounts a
INNER JOIN AEI_GFXAccountSTP b
ON
a.feed_id = b.feed_id and
a.run_id = b.run_id
),
yest AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id)-1 FROM AEI_GFXAccounts)
),
toda AS (
SELECT * FROM data WHERE run_id = (SELECT MAX(run_id) FROM AEI_GFXAccounts)
)
SELECT
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN yest.AccountBranch END as yest_AccountBranch,
CASE WHEN COALESCE(yest.AccountBranch, 'x') <> COALESCE(toda.AccountBranch, 'x') THEN toda.AccountBranch END as toda_AccountBranch,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN yest.AccountName END as yest_AccountName,
CASE WHEN COALESCE(yest.AccountName, 'x') <> COALESCE(toda.AccountName, 'x') THEN toda.AccountName END as toda_AccountName,
...
FROM
toda INNER JOIN yest ON toda.accountNumber = yestaccountNumber
Notes:
You didn't say which table some of your columns are from. I've prefixed them with ?. - replace these with a. or as. respectively (always good practice to fully qualify all your column aliases)
When you're repeating out the pattern in the bottom select (above ...) choose data for the COALESCE that will not appear in the column. I'm using COALESCE as a quick way to avoid having to write CASE WHEN a is null and b is not null or b is null and a is not null or a != b, but the comparison fails if accountname (for example) was 'x' yesterday and today it is null, because the null becomes 'x'. If you pick data that will never appear in the column then the check will work out because nulls will be coalesced to something that can never appear in the real data, and hence the <> comparison will work out
If you don't care when a column goes to null today from a value yesterday, or was null yesterday but is a value today, you can ditch the coalesce and literally just do toda.X <> yest.X
New accounts today won't show up until tomorrow. If you want them to show up do toda LEFT JOIN yest .... Of course all their properties will show as new ;)
This query returns all the accounts regardless of whether any changes have been made. If you only want a list of accounts with changes you'll need a where clause that is similar to your case whens:
WHERE
COALESCE(toda.AccountBranch, 'x') <> COALESCE(yest.AccountBranch, 'x') OR
COALESCE(toda.AccountName, 'x') <> COALESCE(yest.AccountName, 'x') OR
...

Do you have a date field? If so you can use Row_Number partitioned by your accounts. Exclude all accounts that have a max of 1 row 'New accounts", and then subtract the Max(rownumber) of each account's load by the Max(rownumber)-1's load. Only return accounts where this returned load is >0.You can also use the lag function to grab the previous accounts load instead of Max(rownumber)-1

SQL Group by CASE result

I have a simple SQL query on IBM DB2. I'm trying to run something as below:
select case when a.custID = 42285 then 'Credit' when a.unitID <> '' then 'Sales' when a.unitID = '' then 'Refund'
else a.unitID end TYPE, sum(a.value) as Total from transactions a
group by a.custID, a.unitID
This query runs, however I have a problem with group by a.custID - I'd prefer not to have this, but the query won't run unless it's present. I'd want to run the group by function based on the result of the CASE function, not the condition pool behind it. So, I'm looking something like:
group by TYPE
However adding group by TYPE reports an error message "Column or global variable TYPE not found". Also removing a.custID from group section reports "Column custID or expression in SELECT list not valid"
Is this going to be possible at all or do I need to review my CASE function and avoid using the custID column since at the moment I'm getting a grouping also based on custID column, even though it's not present in SELECT.
I understand why the grouping works as it does, I'm just wondering if it's possible to get rid of the custID grouping, but still maintain it within CASE function.

If you want terseness of code, you could use a subquery here:
SELECT TYPE, SUM(value) AS Total
FROM
(
SELECT CASE WHEN a.custID = 42285 THEN 'Credit'
WHEN a.unitID <> '' THEN 'Sales'
WHEN a.unitID = '' THEN 'Refund'
ELSE a.unitID END TYPE,
value
FROM transactions a
) t
GROUP BY TYPE;
The alternative to this would be to just repeat the CASE expression in the GROUP BY clause, which is ugly, but should still work. Note that some databases (e.g. MySQL) have overloaded GROUP BY and do allow aliases to be used in at least some cases.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)

Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.

looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

multiple count(distinct) - sql

You can try using either another derived table or two to do the calculations you need, or using projections (queries in the field list). Without seeing the schema, it's hard to know which one will work for you.

Related

SELECT DISTINCT to return at most one row

Select only the row with the max value, but the column with this info is a SUM()

Query that detects difference between accounts/loads from TODAY and YESTERDAY

SQL Group by CASE result

the below select statement takes a long in running

Categories

Resources