Multiple columns in a subquery - sql

I am trying to find the products selected in previous week vs products selected for this week to find the churn in selection decision. Currently I am doing it for a single site and the result works fine with the correct number of records. Now I want to change the code where I get the output for 10 sites in a single query.
create temporary view removes as
select distinct
asin,
lagweek,
fc,
'removed' as state,
demand_pp,
instock_pp,
source,
filter_reason,
is_missing_in_pp,
is_missing_in_dc,
is_missing_in_nmi,
asin_nmi,
asin_pre,
asin_dc,
filter_reason_old,
asin_omi,
asin_preo,
asin_dco
from sel_old so where asin not in (select asin from sel_new sn where sn.lagweek = so.lagweek);
Since this is for a single site just doing asin not in (select asin ...) works fine but now I want to look at ASINs across multiple sites from the same logic. I tried the approach below but it returns incorrect number of records.
create temporary view removes as
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so left join (select asin, fc, lagweek from sel_new) as sn
on (so.asin <> sn.asin)
and (so.fc = sn.fc)
and (so.lagweek = sn.lagweek);
How should I approach this. I haven't been able find an easier solution if there are any.

You can use EXISTS predicate. It doesn't produce additional records, just tests the existence of some case and filters accordingly.
select distinct
so.asin,
so.lagweek,
so.fc,
'removed' as state,
so.demand_pp,
so.instock_pp,
so.source,
so.filter_reason,
so.is_missing_in_pp,
so.is_missing_in_dc,
so.is_missing_in_nmi,
so.asin_nmi,
so.asin_pre,
so.asin_dc,
so.filter_reason_old,
so.asin_omi,
so.asin_preo,
so.asin_dco
from sel_old so
where not exists (
select 1
from sel_new sn
where so.fc = sn.fc
and so.lagweek = sn.lagweek
and so.asin = sn.asin
)

Related

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work
You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);
You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.
looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

Comparing outputs from 2 separate sql queries

I am trying to write a query for an inventory system. In order to do this I have to count the number of duplicates in one table and then compare it to the default quantity values taken from another table.
Here are two queries which I am working with currently:
SELECT Template_ID,
COUNT(Template_ID) AS Howmuch, t.name
FROM consumables e, templates t
Where t.consumable_type_id = '2410980'
GROUP BY template_id, t.name
HAVING ( COUNT(Template_ID) > 1 )
The query above takes account of each unique Template Id and gives me a count of how many duplicates are present which tells me the amount of a single substance.
Select
property_descriptors.default_value,
templates.name
From
templates,
Property_descriptors
Where
templates.consumable_type_id = '858190' And
templates.audit_id = property_descriptors.audit_id And
property_descriptors.name = 'Reorder Point'
This query finds the amount of each individual substance we would like to have in our system.
My Issue is that I dont know of a way to compare the results from the 2 queries.
Ideally, I want the query to only give the substance which have a duplicate count lower than their default value (found using query 2).
any ideas would be appreciated!
here is the table schema for reference:
Consumables
ID|Template_ID|
Templates
ID|Property_Descriptor_ID|Name|audit_id
Property_Descriptors
ID| Name|Default_Value|audit_id
Thanks!
SELECT q1.name, q2.default_value - q1.Howmuch FROM
(SELECT Template_ID, COUNT(Template_ID) AS Howmuch, t.name
FROM consumables e, templates t
Where t.consumable_type_id = '2410980'
GROUP BY template_id, t.name
HAVING ( COUNT(Template_ID) > 1 )) q1,
(SELECT property_descriptors.default_value default_value,
templates.name name
FROM
templates,
Property_descriptors
WHERE
templates.consumable_type_id = '858190' And
templates.audit_id = property_descriptors.audit_id And
property_descriptors.name = 'Reorder Point') q2
where q1.name = q2.name
should do the trick you'll need to clean up the result a bit to work away negative results
or add q2.default_value - q1.Howmuch > 0 in the outer WHERE clause

multiple count(distinct)

I get an error unless I remove one of the count(distinct ...). Can someone tell me why and how to fix it?
I'm in vfp. iif([condition],[if true],[else]) is equivalent to case when
SELECT * FROM dpgift where !nocalc AND rectype = "G" AND sol = "EM112" INTO CURSOR cGift
SELECT
list_code,
count(distinct iif(language != 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_E_New_Indiv,
count(distinct iif(language = 'F' AND renew = '0' AND type = 'IN',donor,0)) as d_Count_F_New_Indiv /*it works if i remove this*/
FROM cGift gift
LEFT JOIN
(select didnumb, language, type from dp) d
on cast(gift.donor as i) = cast(d.didnumb as i)
GROUP BY list_code
ORDER by list_code
edit:
apparently, you can't use multiple distinct commands on the same level. Any way around this?
VFP does NOT support two "DISTINCT" clauses in the same query... PERIOD... I've even tested on a simple table of my own, DIRECTLY from within VFP such as
select count( distinct Col1 ) as Cnt1, count( distinct col2 ) as Cnt2 from MyTable
causes a crash. I don't know why you are trying to do DISTINCT as you are just testing a condition... I more accurately appears you just want a COUNT of entries per each category of criteria instead of actually DISTINCT
Because you are not "alias.field" referencing your columns in your query, I don't know which column is the basis of what. However, to help handle your DISTINCT, and it appears you are running from WITHIN a VFP app as you are using the "INTO CURSOR" clause (which would not be associated with any OleDB .net development), I would pre-query and group those criteria, something like...
select list_code,
donor,
max( iif( language != 'F' and renew = '0' and type = 'IN', 1, 0 )) as EQualified,
max( iif( language = 'F' and renew = '0' and type = 'IN', 1, 0 )) as FQualified
from
list_code
group by
list_code,
donor
into
cursor cGroupedByDonor
so the above will ONLY get a count of 1 per donor per list code, no matter how many records that qualify. In addition, if one record as an "F" and another does NOT, then you'll have a value of 1 in EACH of the columns... Then you can do something like...
select
list_code,
sum( EQualified ) as DistEQualified,
sum( FQualified ) as DistFQualified
from
cGroupedByDonor
group by
list_code
into
cursor cDistinctByListCode
then run from that...
You can try using either another derived table or two to do the calculations you need, or using projections (queries in the field list). Without seeing the schema, it's hard to know which one will work for you.

How to optimize this low-performance MySQL query?

I’m currently using the following query for jsPerf. In the likely case you don’t know jsPerf — there are two tables: pages containing the test cases / revisions, and tests containing the code snippets for the tests inside the test cases.
There are currently 937 records in pages and 3817 records in tests.
As you can see, it takes quite a while to load the “Browse jsPerf” page where this query is used.
The query takes about 7 seconds to execute:
SELECT
id AS pID,
slug AS url,
revision,
title,
published,
updated,
(
SELECT COUNT(*)
FROM pages
WHERE slug = url
AND visible = "y"
) AS revisionCount,
(
SELECT COUNT(*)
FROM tests
WHERE pageID = pID
) AS testCount
FROM pages
WHERE updated IN (
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
)
AND visible = "y"
ORDER BY updated DESC
I’ve added indexes on all fields that appear in WHERE clauses. Should I add more?
How can this query be optimized?
P.S. I know I could implement a caching system in PHP — I probably will, so please don’t tell me :) I’d just really like to find out how this query could be improved, too.
Use:
SELECT x.id AS pID,
x.slug AS url,
x.revision,
x.title,
x.published,
x.updated,
y.revisionCount,
COALESCE(z.testCount, 0) AS testCount
FROM pages x
JOIN (SELECT p.slug,
MAX(p.updated) AS max_updated,
COUNT(*) AS revisionCount
FROM pages p
WHERE p.visible = 'y'
GROUP BY p.slug) y ON y.slug = x.slug
AND y.max_updated = x.updated
LEFT JOIN (SELECT t.pageid,
COUNT(*) AS testCount
FROM tests t
GROUP BY t.pageid) z ON z.pageid = x.id
ORDER BY updated DESC
You want to learn how to use EXPLAIN. This will execute the sql statement, and show you which indexes are being used, and what row scans are being performed. The goal is to reduce the number of row scans (ie, the database searching row by row for values).
You may want to try the subqueries one at a time to see which one is slowest.
This query:
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
Makes it sort the result by slug. This is probably slow.