Why is my WHERE clause not working as intended? - sql

I am running a query to apply a number depending on the WHERE clause which verifies information from different tables
UPDATE EstadoDoc
SET PuntajePrevio = 10
FROM Investigador i, Autores a, Documentos d, EstadoDoc e, Periodo p
WHERE i.IdInv = a.IdInv
AND a.IdDoc = d.IdDoc
AND d.Tipo = 'AR'
AND e.Estado IN ('A', 'RyR')
AND p.FechaDesde <= e.FechaEst
AND p.FechaHasta >= e.FechaEst
AND p.IdPeriodo = 2019
This is the query I'm running and it's supposed to put the value 10 in PuntajePrevio depending on the WHERE clause, but I'm noticing it's not working correctly because it is applying the value to 2 column were it shouldn't.
To further explain here is a screenshot of the data it affected - it should be only taking into consideration all the ones that have d.tipo = 'AR' but is it not.

Related

TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?

I'm going through TPC-DS for Amazon Athena.
It was fine until query 5.
I got some problem on query 6. (which is below)
select a.ca_state state, count(*) cnt
from customer_address a
,customer c
,store_sales s
,date_dim d
,item i
where a.ca_address_sk = c.c_current_addr_sk
and c.c_customer_sk = s.ss_customer_sk
and s.ss_sold_date_sk = d.d_date_sk
and s.ss_item_sk = i.i_item_sk
and d.d_month_seq =
(select distinct (d_month_seq)
from date_dim
where d_year = 2002
and d_moy = 3 )
and i.i_current_price > 1.2 *
(select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
group by a.ca_state
having count(*) >= 10
order by cnt, a.ca_state
limit 100;
It took more than 30 minutes so it failed with timeout.
I tried to find which part cause problem, so I checked the where conditions and I found where j.i_category = i.i_category for the last part of where condition.
I don't know why this condition is needed so I deleted this part and the query ran Ok.
can you guys tell me why this part is needed?
The j.i_category = i.i_category is subquery correlation condition.
If you remove it from the subquery
select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
the subquery becomes uncorrelated, and becomes a global aggregation on the item table, which is easy to calculate and the query engine needs to do it once.
If you want a fast, performant query engine on AWS, i can recommend Starburst Presto (disclaimer: i am from Starburst). See https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/ for a related comparison (note: this is not a comparison with Athena).
If it doesn't have to be that fast, you can use PrestoSQL on EMR (note that "PrestoSQL" and "Presto" components on EMR are not the same thing).

Difference between combination of AND and OR when placed inside the parentheses vs when placed stand alone in a conditional statement

I am getting different number of results when I have the script like the following:
select count(distinct(t1.ticketid)),t1.TicketStatus from ticket as t1
inner join Timepoint as t2 on t1.TicketID=t2. ticketid
where
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10
)
And when I use it like this:
select count(distinct(t1.ticketid)),t1.TicketStatus from ticket as t1
inner join Timepoint as t2 on t1.TicketID=t2. ticketid
where
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
t1.TicketBuildStatusID<>12 AND
t1.TicketBuildStatusID<>11 AND
t1.TicketBuildStatusID<>10
Can someone tell me why there is a difference, to me the logic is the same!
Thanks,
In your example, it won't matter because you have all AND clauses. That said, you need to be aware of precedence (ie order of operations) where NOT comes before AND, AND comes before OR and so on.
So just like 3 + 3 x 0 means 3 + (3 x 0), A or B and C means A or (B and C), even if that's not what you meant.
So in cases where you have mixed AND and OR clauses, it matters a lot.
Consider this example:
select *
from A, B
where A.id = B.id and A.family_code = 'ABC' or A.family_code = 'DEF'
It's horrible code, I admit, but for illustrative purposes, bear with me.
You may have meant this:
select *
from A, B
where A.id = B.id and (A.family_code = 'ABC' or A.family_code = 'DEF')
but you said this:
select *
from A, B
where (A.id = B.id and A.family_code = 'ABC') or A.family_code = 'DEF'
Which in the construct above completely blows away your join, resulting in a cartesian product for all cases where the family code is DEF.
So bottom line: when you mix clauses (AND, OR, NOT), it's best to use parentheses to be explicit about what you mean, even when it's not necessary.
Food for thought.
-- EDIT --
The question was changed after I wrote this so that the queries were NOT the same (ands were changed to ors).
Hopefully my explanation still helps.
After the edited to your question there will now be a difference.
t2.BuilderAnalystID=10 and t1.SubmissionDT >='04-01-2018' AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10
)
This query will return values where t1.TicketBuildStatusID is 10, 11 and 12. It states that it should not be 10 (so 11 and 12), or not be 11 (so 10 and 11), or not be 12 (so 10 and 11).
Yes, those queries will produce different results. In fact, the first query will return every value of TicketBuildStatusID unless it has a value of NULL.
When TicketBuildStatusID has a value or 12 it doesn't have a value of 11 or 12 so the expression (t1.TicketBuildStatusID<>12 OR t1.TicketBuildStatusID<>11 OR t1.TicketBuildStatusID<>10), is true. If it has a value of 11, then the same applies again, and for every other possible value, apart from NULL (as {expression}<>NULL = NULL which is not true).
when you do this
AND
(t1.TicketBuildStatusID<>12 OR
t1.TicketBuildStatusID<>11 OR
t1.TicketBuildStatusID<>10)
you are basically doing no filter because any of the condition evaluated to true will make all the condition true e.i.
true AND (true or false or false) = true
when you do this all conditions should match like status should not be 12,11,10
AND
t1.TicketBuildStatusID<>12 AND
t1.TicketBuildStatusID<>11 AND
t1.TicketBuildStatusID<>10
OR isn't the logic that you want. Because if x = 12, then it is not 11. So, all values match x <> 12 and x <> 11.
So, just simply the logic and use not in:
select count(distinct t1.ticketid), t1.TicketStatus
from ticket t1 inner join
Timepoint t2
on t1.TicketID = t2.ticketid
where t2.BuilderAnalystID = 10 and
t1.SubmissionDT >= '2018-04-01' and
t1.TicketBuildStatusID not in (12, 11, 10)
Notes:
distinct is not a function, so there is no need to place the following expression in parentheses.
Use standard date formats. Either 'YYYYMMDD' or 'YYYY-M-DD'.

SQL GROUP BY function returning incorrect SUM amount

I've been working on this problem, researching what I could be doing wrong but I can't seem to find an answer or fault in the code that I've written. I'm currently extracting data from a MS SQL Server database, with a WHERE clause successfully filtering the results to what I want. I get roughly 4 rows per employee, and want to add together a value column. The moment I add the GROUP BY clause against the employee ID, and put a SUM against the value, I'm getting a number that is completely wrong. I suspect the SQL code is ignoring my WHERE clause.
Below is a small selection of data:
hr_empl_code hr_doll_paid
1 20.5
1 51.25
1 102.49
1 560
I expect that a GROUP BY and SUM clause would give me the value of 734.24. The value I'm given is 211461.12. Through troubleshooting, I added a COUNT(*) column to my query to work out how many lines it's running against, and it's giving a result of 1152, furthering reinforces my belief that it's ignoring my WHERE clause.
My SQL code is as below. Most of it has been generated by the front-end application that I'm running it from, so there is some additional code in there that I believe does assist the query.
SELECT DISTINCT
T000.hr_empl_code,
SUM(T175.hr_doll_paid)
FROM
hrtempnm T000,
qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE 1 = 1
AND T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
AND (T175.hr_paym_type = 'd' OR T175.hr_paym_type = 't')
GROUP BY T000.hr_empl_code
ORDER BY hr_empl_code
I'm really lost where it could be going wrong. I have stripped out the additional WHERE AND and brought it down to just T166.hr_empl_code = T175.hr_empl_code, but it doesn't make a different.
By no means am I any expert in SQL Server and queries, but I have decent grasp on the technology. Any help would be very appreciated!
Group by is not wrong, how you are using it is wrong.
SELECT
T000.hr_empl_code,
T.totpaid
FROM
hrtempnm T000
inner join (SELECT
hr_empl_code,
SUM(hr_doll_paid) as totPaid
FROM
hrtpaytp T175
where hr_paym_type = 'd' OR hr_paym_type = 't'
GROUP BY hr_empl_code
) T on t.hr_empl_code = T000.hr_empl_code
where exists
(select * from qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
)
ORDER BY hr_empl_code
Note: It would be more clear if you have used joins instead of old style joining with where.

when 2 output values are returned it should display the hardcorded one and if 1 output value is returned it should display the 1output itself

When I execute a query for input parameter ABC it returns two values (Partner, Smith); whenever two values are returned of those two values Smith will be a compulsory value which will be returned.
But whenever the same query is executed with input parameter as 'xyz' it returns only one value.
Now my requirement is whenever I execute a query if it returns two values of those two values only SMITH must be returned in output and if the same query returns one output value then it should display the loutput value itself.
The below query satisfies 1st part of my requirement but it doesn’t satisfy my 2nd part of the requirement. Instead of displaying the 1output value it’s returning ‘Null’ value whenever the output value quantity is 1.
SELECT R.REGION_GID
FROM GTM_TRANSACTION T,
GTM_TRANSACTION_INVOLVED_PARTY P,
CONTACT C,
LOCATION L,
REGION_DETAIL R
WHERE T.GTM_TRANSACTION_GID=P.GTM_TRANSACTION_GID
AND R.COUNTRY_CODE3_GID = L.COUNTRY_CODE3_GID
AND R.REGION_GID LIKE 'SSN/BP.GTM_COMPL%'
AND L.LOCATION_GID = C.LOCATION_GID
AND P.INVOLVED_PARTY_CONTACT_GID=C.CONTACT_GID
AND P.INVOLVED_PARTY_QUAL_GID='SHIP_FROM'
AND T.GTM_TRANSACTION_GID=$SHIP_FORM
INTERSECT
SELECT R.REGION_GID
FROM GTM_TRANSACTION T,
GTM_TRANSACTION_INVOLVED_PARTY P,
CONTACT C,
LOCATION L,
REGION_DETAIL R
WHERE T.GTM_TRANSACTION_GID=P.GTM_TRANSACTION_GID
AND R.COUNTRY_CODE3_GID = L.COUNTRY_CODE3_GID
AND R.REGION_GID ='SSN/BP.GTM_COMPL_NO_CODING'
AND L.LOCATION_GID = C.LOCATION_GID
AND P.INVOLVED_PARTY_CONTACT_GID=C.CONTACT_GID
AND P.INVOLVED_PARTY_QUAL_GID='SHIP_FROM'
AND T.GTM_TRANSACTION_GID=$SHIP_FROM
As far as I can tell, the only difference between the two halves of your INTERSECT are in the filters for P.REGION_GID. The first half has:
R.REGION_GID LIKE 'SSN/BP.GTM_COMPL%'
while the second has
R.REGION_GID = 'SSN/BP.GTM_COMPL_NO_CODING'
Given how INTERSECT works, I think this means the first half is redundant. The only question then is whether the second half is returning one row or two rows. You want it to always return one row, with 'SMITH' taking precedence. The following logic may be what you want (as a bonus, I've tidied up your JOINs too):
SELECT TOP 1
R.REGION_GID
FROM
GTM_TRANSACTION T
JOIN GTM_TRANSACTION_INVOLVED_PARTY P ON
T.GTM_TRANSACTION_GID=P.GTM_TRANSACTION_GID
JOIN CONTACT C ON
P.INVOLVED_PARTY_CONTACT_GID=C.CONTACT_GID
JOIN LOCATION L ON
L.LOCATION_GID = C.LOCATION_GID
JOIN REGION_DETAIL R ON
R.COUNTRY_CODE3_GID = L.COUNTRY_CODE3_GID
WHERE
R.REGION_GID ='SSN/BP.GTM_COMPL_NO_CODING'
AND P.INVOLVED_PARTY_QUAL_GID='SHIP_FROM'
AND T.GTM_TRANSACTION_GID=$SHIP_FROM
ORDER BY
CASE WHEN R.REGION_GID = 'SMITH' then 1 else 2 end
That last line will want to be something like: CASE WHEN R.REGION_GID = 'SMITH' then 1 else 2 end but I you haven't told us much about your data, so I really don't know.

Oracle Statement repeating itself 9 times!! I need suggestions

I got a script for oracle database that must have something wrong, the code runs okay, but the results are that, for each line of the result, are another 8 lines with the same result. Instead of showing just one line, It's showing 9
What could be wrong in the script below?
SELECT P.IDPESSOA AS CodigoCompanhia,
E.NOMEEMPRESA AS NomeCompanhia,
L.LACDEBCRE AS TipoOperacao,
L.TRGDTINCLUSAO AS DataLancamento,
P.PLNDATDIA AS DataContabilizacao,
L.PLACONTA AS ContaContabil,
C.PLANOME AS DescricaoContaContabil,
C.PLANATUREZA AS NaturezaContaContabil,
L.LACVALOR AS ValorContabil,
M.MOESIGLA AS Moeda,
L.LACHIST1||' '||L.LACHIST2||' '||L.LACHIST3||' '||L.LACHIST4||' '||L.LACHIST5 AS HistoricoLancamento,
C.PLAGRUPO AS ClasseConta,
L.IDUSUARIOINCLUSAO AS PreparerID,
NVL(PE.NOME, U.NOMEUSUARIO) AS NomeCompletoFuncionario,
CG.DESCRICAO AS CargoFuncionario,
TO_CHAR(PC.PERNOME)||'/'||TO_CHAR(P.PEREXERCICIO) AS PeriodoContabil,
TO_CHAR(P.PLNPLANIL)||'-'||TO_CHAR(L.LACNUMLAN)||'-'||LACDEBCRE AS NumeroDocumento,
C.PLASUBGR3 AS ContasCompensacaoTransitorias,
P.PLNPLANIL, L.LACNUMLAN, P.PLNCODIGO
FROM PLANILHA P, LANCAMENTO L, EMPRESAPROP E, PLANOCONTA C, PERIODO PC,
PARAMGLOBAL PG, MOEDA M, USUARIOSISTEMA U, PESSOA PE, FUNCIONARIO F, CARGO CG
WHERE P.PLNCODIGO = L.PLNCODIGO
AND P.PLNDATDIA >= TO_DATE('01/01/2013','DD/MM/YYYY')
AND P.PLNDATDIA <= TO_DATE('30/04/2013','DD/MM/YYYY')
AND P.PEREXERCICIO = PC.PEREXERCICIO
AND P.PERNUMERO = PC.PERNUMERO
AND P.IDPESSOA = E.IDPESSOA
AND P.IDPESSOA = PG.IDPESSOA
AND PG.MOEDACORRENTE = M.MOECODIGO
AND L.PLANO = C.PLANO
AND L.PLACONTA = C.PLACONTA
AND L.IDUSUARIOINCLUSAO = U.IDUSUARIO
AND U.IDUSUARIO = PE.IDPESSOA(+)
AND PE.IDPESSOA = F.IDPESSOA(+)
AND F.IDCARGO = CG.IDCARGO(+)
AND P.IDPESSOA = 1
ORDER BY P.IDPESSOA, P.PLNDATDIA, P.PLNPLANIL, L.LACNUMLAN;
thanks
Add the primary key of every table that's referenced in the select to your list of output columns. This seems very much like a 1:n relation on one of the tables; you can identify which it is by checking which PKs are different in each set of "equal" 9 rows.
You can also use 'group by' before 'order by' clause