Does using EXISTS instruction improves this query - sql

Im learning how to improve some queries I have, for example I saw that using EXISTS over IN does better, so I did the following modification but im not 100% sure if im getting the same results I expect by using EXISTS, so far it gives me the same results when I execute with different periods but im still doubting, could some one clear for me if it does better performance EXISTS and also if its doing the same?
VARIABLE periodo STRING;
SELECT MO.STRMOVANOMES,
C.STRCLINOMBRE,
MO.STRCLINIT,
MO.STROBLOBLIGSARC,
MO.NUMPROCODIGO,
MO.NUMMOVTIPOCREDITO,
MO.NUMMOVTIPOGARANTIA,
MO.STROBLMODALIDAD,
MO.NUMMOVCALIFICACION,
MO.NUMMOVVLRCAPCREDITO,
MO.NUMMOVVLRINTCREDI,
MO.NUMMOVVLRCAPOTRO
FROM TBLMOVOBLIGACIONES MO,
TBLCLIENTES C
WHERE MO.STRMOVANOMES = :periodo
AND C.STRCLITIPOID ='N'
AND MO.NUMMOVTIPOCREDITO = 2
AND MO.NUMPROCODIGO IN (1,3,4,5,6,7,10,15,16,18,24,29,32,38,40,43,44,45,49,51,54,55,56,70,71,72,73,74,75,76,77,78,81,82,83,84,85)--ALL APPLICATIONS
AND C.STRCLINIT=MO.STRCLINIT
AND SUBSTR(MO.STRCLINIT,1,9) >= 600000000
AND SUBSTR(MO.STRCLINIT,1,9) <= 999999999;
-- USING EXISTS
SELECT MO.STRMOVANOMES,
C.STRCLINOMBRE,
MO.STRCLINIT,
MO.STROBLOBLIGSARC,
MO.NUMPROCODIGO,
MO.NUMMOVTIPOCREDITO,
MO.NUMMOVTIPOGARANTIA,
MO.STROBLMODALIDAD,
MO.NUMMOVCALIFICACION,
MO.NUMMOVVLRCAPCREDITO,
MO.NUMMOVVLRINTCREDI,
MO.NUMMOVVLRCAPOTRO
FROM TBLMOVOBLIGACIONES MO,
TBLCLIENTES C
WHERE MO.STRMOVANOMES = :periodo
AND C.STRCLITIPOID ='N'
AND MO.NUMMOVTIPOCREDITO = 2
AND EXISTS (SELECT NUMPROCODIGO FROM TBLMOVOBLIGACIONES)--ALL APPLICATIONS
AND C.STRCLINIT=MO.STRCLINIT
AND SUBSTR(MO.STRCLINIT,1,9) >= 600000000
AND SUBSTR(MO.STRCLINIT,1,9) <= 999999999;

First, some general info
In theory, the IN clause is to be used when there is a need to check some value against a list of values, so the DB has to loop through the list looking for a value given. Whereas EXIST is to let db to perform a quick query (which is db designed for) in order to check if there is at least one row.
If the values list you're checking against is in a table, it is better to use EXISTS in general. That was some theory.
In practice though, starting from version 10 (which is very old one) Oracle builds same exec plans for IN and EXISTS queries, here is a good explanation if you need some more experienced guy.
To check whether it affects query execution, check explain plans (google for it for your dev-tool) and compare overall "costs" (cost comparison it's not always a good thing to relay on, it is okay for such case though)
Now, back to your query.
The EXISTS in the way you are using it will always return true whenever table TBLMOVOBLIGACIONES has at least one row. Not sure this is what you were looking for.
I believe you need to use it somehow in this manner
AND EXISTS (SELECT 1 FROM TBLMOVOBLIGACIONES WHERE NUMPROCODIGO = MO.NUMPROCODIGO)
Thus it will check whether there is at least one record connected to the data you selected on previous step.
Next step I see here is the EXISTS clause can be easily converted to a regular table join, so instead of choosing between IN or EXISTS you might write the following
FROM TBLMOVOBLIGACIONES MO,
TBLCLIENTES C,
TBLMOVOBLIGACIONES M
WHERE MO.STRMOVANOMES = :periodo
AND C.STRCLITIPOID ='N'
AND MO.NUMMOVTIPOCREDITO = 2
AND M.NUMPROCODIGO = MO.NUMPROCODIGO--ALL APPLICATIONS
AND C.STRCLINIT=MO.STRCLINIT
AND SUBSTR(MO.STRCLINIT,1,9) >= 600000000
AND SUBSTR(MO.STRCLINIT,1,9) <= 999999999;
It is worth doing because joins are always better option allowing database to vary table joins ordering to achieve better performance.

Related

SQL Query for Search Page

I am working on a small project for an online databases course and i was wondering if you could help me out with a problem I am having.
I have a web page that is searching a movie database and retrieving specific columns using a movie initial input field, a number input field, and a code field. These will all be converted to strings and used as user input for the query.
Below is what i tried before:
select A.CD, A.INIT, A.NBR, A.STN, A.ST, A.CRET_ID, A.CMNT, A.DT
from MOVIE_ONE A
where A.INIT = :init
AND A.CD = :cd
AND A.NBR = :num
The way the page must search is in three different cases:
(initial and number)
(code)
(initial and number and code)
The cases have to be independent so if certain field are empty, but fulfill a certain case, the search goes through. It also must be in one query. I am stuck on how to implement the cases.
The parameters in the query are taken from the Java parameters in the method found in an SQLJ file.
If you could possibly provide some aid on how i can go about this problem, I'd greatly appreciate it!
Consider wrapping the equality expressions in NVL (synonymous to COALESCE) so if parameter inputs are blank, corresponding column is checked against itself. Also, be sure to kick the a-b-c table aliasing habit.
SELECT m.CD, m.INIT, m.NBR, m.STN, m.ST, m.CRET_ID, m.CMNT, m.DT
FROM MOVIE_ONE m
WHERE m.INIT = NVL(:init, m.INIT)
AND m.CD = NVL(:cd, m.CD)
AND m.NBR = COALESCE(:num, m.NBR)
To demonstrate, consider below DB2 fiddles where each case can be checked by adjusting value CTE parameters all running on same exact data.
Case 1
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(NULL)),
n(num) AS (VALUES(53)),
cte AS
...
Case 2
WITH
i(init) AS (VALUES(NULL)),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(NULL)),
cte AS
...
Case 3
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(53)),
cte AS
...
However, do be aware the fiddle runs a different SQL due to nature of data (i.e., double and dates). But query does reflect same concept with NVL matching expressions on both sides.
SELECT *
FROM cte, i, c, n
WHERE cte.mytype = NVL(i.init, cte.mytype)
AND YEAR(CAST(cte.mydate AS date)) = NVL(c.cd, YEAR(CAST(cte.mydate AS date)))
AND ROUND(cte.mynum, 0) = NVL(n.num, ROUND(cte.mynum, 0));

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.

Why my sql query is so slow in one database?

I am using sql server 2008 r2 and I have two database, which is one have 11.000 record and another is just 3000 record, when i do run this query
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and year(tbltransac.tgl_faktur)=2015
And ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc
It takes me 1 minute 30 sec in my server (I query it using sql management tool) that have 3000 record but it only took 3 sec to do a query in my another server which is have 11000 record, whats wring with my database?
I've already tried to backup and restore my 3000 record database and restore it in my 11000 record server, it's faster.. took 30 sec to do a query, but it's still annoying if i compare to my 11000 record server. They are in the same spec
How this happend? what i should check? i check on event viewer, resource monitor or sql management log, i couldn't find any error or blocked connection. There is no wrong routing too..
Please help... It just happen a week ago, before this it was fine, and I haven't touch the server more than a month...
as already mentioned before, you have three issues in your query.
Just as an example, change the query to this one:
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
Another idea, if your viewGrandTotal is quite large, could be an pre-filtering of this table before you join it. Sometimes SQL Server doesn't get a good plan which needs some lovely touch to get him in the right direction.
Maybe this one:
SELECT Right(rtrim(vgt.No_Faktur),6) as NoUrut,
vgt.No_Faktur,
vgt.No_FakturP,
vgt.Kd_Plg,
tc.Nm_Plg,
vgt.Total_Faktur,
vgt.Nm_Pajak,
vgt.Tgl_Faktur,
vgt.Tgl_FakturP,
vgt.Total_Distribusi
FROM (SELECT Kd_Plg, Nm_Plg FROM Tblcust GROUP BY Kd_Plg, Nm_Plg) as tc -- Pre-Filter on just the needed columns and distinctive.
INNER JOIN (
-- Pre filter viewGrandTotal
SELECT DISTINCT vgt.No_Faktur, vgt.No_Faktur, vgt.No_FakturP, vgt.Kd_Plg, vgt.GRANDTOTAL AS Total_Faktur, vgt.Nm_Pajak,
vgt.Tgl_Faktur, vgt.Tgl_FakturP, vgt.Total_Distribusi
FROM ViewGrandtotal AS vgt
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
) as vgt
ON tc.Kd_Plg = vgt.Kd_Plg
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
The pre filtering could increase the generation of a better plan.
Another issue could be just the multi-threading. Maybe your query get a parallel plan as it reaches the cost threshold because of the 11.000 rows. The other query just hits a normal plan due to his lower rows. You can take a look at the generated plans by including the actual execution plan inside your SSMS Query.
Maybe you can compare those plans to get a clue. If this doesn't help, you can post them here to get some feedback from me.
I hope this helps. Not quite easy to give you good hints without knowing table structures, table sizes, performance counters, etc. :-)
Best regards,
Ionic
Note: first of all you should avoid any function in Where clause like this one
year(tbltransac.tgl_faktur)=2015
Here Aaron Bertrand how to work with date in Where clause
"In order to make best possible use of indexes, and to avoid capturing too few or too many rows, the best possible way to achieve the above query is ":
SELECT COUNT(*)
FROM dbo.SomeLogTable
WHERE DateColumn >= '20091011'
AND DateColumn < '20091012';
And i cant understand your logic in this piece of code but this is bad part of your query too
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Actually Null <> "Shop" in this case, so Why are you replace it to ""?
Thanks and good luck
Here is some recommendations:
year(tbltransac.tgl_faktur)=2015 replace this with tbltransac.tgl_faktur >= '20150101' and tbltransac.tgl_faktur < '20160101'
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP' replace this with tbltransac.No_OPJ <> 'SHOP' because NULL <> 'SHOP'.
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc remove this, because ordering should be done in presentation layer rather then in data layer.
Read about SARG arguments and predicates:
What makes a SQL statement sargable?
To write an appropriate SARG, you must ensure that a column that has
an index on it appears in the predicate alone, not as a function
parameter. SARGs must take the form of column inclusive_operator
or inclusive_operator column. The column name is alone
on one side of the expression, and the constant or calculated value
appears on the other side. Inclusive operators include the operators
=, >, <, =>, <=, BETWEEN, and LIKE. However, the LIKE operator is inclusive only if you do not use a wildcard % or _ at the beginning of
the string you are comparing the column to

are SQL subqueries within CASE WHEN run once for a query, or for each row?

Basically, is the code below efficient (if I cannot use # variables in MonetDB), or will this call the subqueries more than once each?
CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007) THEN 1
WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007) THEN 2
WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.75) FROM sys.share26_2007) THEN 3
ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);
I would rather not use a user-defined function either, though this came up in other questions.
As e.g. GoatCO commented on the question, this is probably better avoided. The SET command that MonetDB support can be used with SELECT as in the code below. The remaining question is why all quantiles are zero where my data is surely not (I also got division by zero errors before using NULLIF). I show more of the code now.
CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, CASE WHEN fodelsear < 1981 THEN 0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007
);
SELECT COUNT(*) FROM over26_2007;
CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, SUM(below26_loneink)/NULLIF(SUM(loneink),0)
FROM sys.over26_2007
GROUP BY peorglopnr
);
SELECT COUNT(*) FROM share26_2007;
DECLARE firstq double;
SET firstq = (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007);
SELECT firstq;
DECLARE secondq double;
SET secondq = (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007);
SELECT secondq;
DECLARE thirdq double;
SET thirdq = (SELECT QUANTILE(share26_2007,0.275) FROM sys.share26_2007);
SELECT thirdq;
CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < firstq THEN 1
WHEN share26_2007 < secondq THEN 2
WHEN share26_2007 < thirdq THEN 3
ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);
SELECT COUNT(*) FROM share26cuts_2007;
About inspecting plans, MonetDB supports:
PLAN to see the logical plan
EXPLAIN to see the physical plan in terms of MAL instructions
TRACE same as EXPLAIN, but actually execute the MAL plan and return statistics for all instructions.
About your question on repeating the subqueries, in principle nothing will be repeated and you will not need to take care of it explicitly.
That's because the default optimization pipeline includes the commonTerms optimizer. Your SQL will be translated to a sequence of MAL instructions, with duplicate calls. MAL is designed to be simple: many short instruction calls, a bit like assembly, which operate on columns, not rows (hence don't apply the same reasoning you would use for SQL Server when you think of execution efficiency). This makes it easier to run some optimizations on it. The commonTerms optimizer will detect the duplicate calls and reuse all results that it can. This is done per-column. So you should really be able to run your query and be happy.
However, I said in principle. Not all cases will be detected (though most will), plus some limitations have been introduced on purpose. For example, the search-space for detecting duplicates is a window (too small for my taste - I have removed it altogether in my installations) over the whole MAL plan: if the duplicate instruction is too far down the plan it won't be detected. This was done for efficiency. In your case, that single query isn't that big, but if it is part of a longer chain of views, then all these views will compile into a large MAL plan - which might make commonTerms less effective - it really depends on the actual queries.

Selecting top n Oracle records with ROWNUM still valid in subquery?

I have the following FireBird query:
update hrs h
set h.plan_week_id=
(select first 1 c.plan_week_id from calendar c
where c.calendar_id=h.calendar_id)
where coalesce(h.calendar_id,0) <> 0
(Intention: For records in hrs with a (non-zero) calendar_id
take calendar.plan_week_id and put it in hrs.plan_week_id)
The trick to select the first record in Oracle is to use WHERE ROWNUM=1, and if understand correctly I do not have to use ROWNUM in a separate outer query because I 'only' match ROWNUM=1 - thanks SO for suggesting Questions that may already have your answer ;-)
This would make it
update hrs h
set h.plan_week_id=
(select c.plan_week_id from calendar c
where (c.calendar_id=h.calendar_id) and (rownum=1))
where coalesce(h.calendar_id,0) <> 0
I'm actually using the 'first record' together with the selection of only one field to guarantee that I get one value back which can be put into h.plan_week_id.
Question: Will the above query work under Oracle as intended?
Right now, I do not have a filled Oracle DB at hand to run the query on.
Like Nicholas Krasnov said, you can test it in SQL Fiddle.
But if you ever find yourself about to use where rownum = 1 in a subquery, alarm bells should go off, because in 90% of the cases you are doing something wrong. Very rarely will you need a random value. Only when all selected values are the same, a rownum = 1 is valid.
In this case I expect calendar_id to be a primary key in calendar. Therefor each record in hrs can only have 1 plan_week_id selected per record. So the where rownum = 1 is not required.
And to answer your question: Yes, it will run just fine. Though the brackets around each where clause are also not required and in fact only confusing (me).