How to optimize following SQL Query? - sql

Query is
SELECT DISTINCT A.X1, A.X2, A.X3, TO_DATE(A.EVNT_SCHED_DATE,'DD-Mon-YYYY') AS EVNT_SCHED_DATE,
A.X4, A.MOVEMENT_TYPE, TRIM(A.EFFECTIVE_STATUS) AS STATUS, A.STATUS_TIME, A.TYPE,
A.LEG_NUMBER,
CASE WHEN A.EFFECTIVE_STATUS='BT' THEN 'NLT'
WHEN A.EFFECTIVE_STATUS='NLT' THEN 'NLT'
WHEN A.EFFECTIVE_STATUS='MKUP' THEN 'MKUP'
END AS STATUS
FROM PHASE1.DY_STATUS_ZONE A
WHERE A.LAST_LEG_FLAG='Y'
AND SCHLD_DATE>='01-Apr-2019'--TO_DATE(''||MNTH_DATE||'','DD-Mon-YYYY')
AND SCHLD_DATE<='20-Feb-2020'--TO_DATE(''||TILL_DATE||'','DD-Mon-YYYY')
AND A.MOVEMENT_TYPE IN ('P')
AND (EXCEPTIONAL_FLAG='N' OR EXCEPTION_TYPE='5') ---------SS
PHASE1.DY_STATUS_ZONE has 710246 records in it , Please guide if this query can be optimized ?

You could try adding an index which covers the WHERE clause:
CREATE INDEX idx ON PHASE1.DY_STATUS_ZONE (LAST_LEG_FLAG, SCHLD_DATE, MOVEMENT_TYPE,
EXCEPTIONAL_FLAG, EXCEPTION_TYPE);
Depending on the cardinality of your data, the above index may or may not be used.

The problem might be the select distinct. This can be hard to optimize because it removes duplicates. Even if no rows are duplicated, Oracle still does the work. If it is not needed remove it.
For your particular query, I would write it as:
WHERE A.LAST_LEG_FLAG = 'Y' AND
SCHLD_DATE >= DATAE '2019-04-01 AND
SCHLD_DATE <= DATE '2020-02-20' AND
A.MOVEMENT_TYPE = 'P' AND
EXCEPTIONAL_FLAG IN ('N', '5')
The date formats don't affect performance. Just readability and maintainability.
For this query, the optimal index is probably: (LAST_LEG_FLAG, MOVEMENT_TYPE, SCHLD_DATE, EXCEPTIONAL_FLAG). The last two columns might be switched, if EXCEPTIONAL_FLAG is more selective than SCHLD_DATE.
However, if this returns many rows, then the SELECT DISTINCT will be the gating factor for the query. And that is much more difficult to optimize.

Related

Query optimization on Postgresql

SQL optimization problem, which of the two solutions below is the most efficient?
I have the table from the image, I need to group the data by CPF and date and know if the CPFs had at least one login_ok = true on a specific date. Both solutions below satisfy my need but the goal is to find the best query.
We can have multiple login_ok = true and login_ok = false for CPFs on a specific date. I just need to know if there was at least one login_ok = true
I already have two solutions, I want to discuss how to make another more efficient
Maybe this would work for your problem:
SELECT
t2.CPF,
t2.data
FROM (
SELECT
CPF,
date(data) AS data
from db_risco.site_rn_login
WHERE login_ok
) t2
GROUP BY 1,2
ORDER BY t2.data
DISTINCT would also work, and I doubt it would pose any performance threat in your case. Usually it evals expressions (like date(data)) before checking for uniqueness.
By using a subquery, in this case, you can select upfront which CPFs to include and then extract date. Finally you'd group by on a quite smaller number os lines, since those were previously selected.
PostgreSQL has the function BOOL_OR to check whether the expression is true for at least one row. It is likely to be optimised for this kind of task.
select cpf, date(data) as data, bool_or(login_ok) as status_login
from db_risco.site_rn_login
group by cpf, date(data);
An index on (cpf, date(data)) or even on (cpf, date(data), login_ok) could help speed up the query.
On a side note: You may also want to order your results with ORDER BY. Don't rely on GROUP BY doing this. The order of the rows resulting from a query is arbitrary without a GROUP BY clause.

Optimizing the Oracle group on queries in very large data of table

I have a query I would like to optimize. This is the query:
SELECT "c"."NETSW_ACQEREF" AS "BANK",
count("c"."NETSW_ACQEREF") AS "QTY",
sum("c"."TRAN_AMNT") / 100 AS "AMOUNT",
count(distinct "c"."TERM_ID") as "terminals"
FROM "CSCLWH"."CLWH_COMMON_DATA" "c"
WHERE ("c"."TRAN_DATE" between 20201101 AND 20201111)
AND ("TRAN_TYPE" IN
('00', '01', '10', '12', '19', '20', '26', '29', '50', '51', '52'))
AND ("RESP_CODE" IN ('0', '00', '000', '400'))
AND ("MTI" IN ('1100', '1200', '1240', '1400', '1420'))
GROUP BY "c"."NETSW_ACQEREF"
ORDER BY "BANK"
These are the explain plan results with huge cost:
Cost 5102095 Time 00:03:20
it has date 3 million rows I've created group by index but it less useful. Can you please show me a way to get the cost down?
The aggregation operations COUNT and SUM can't be optimized much, and also there is no HAVING clause, so your best bet here would probably be to add a multi-column index covering the entire WHERE clause:
CREATE INDEX idx ON "CSCLWH"."CLWH_COMMON_DATA" (TRAN_DATE, TRAN_TYPE, RESP_CODE, MTI);
This index, if used, would at least allow Oracle to discard many records not matching the where filter. The exact order of the columns used in the index would depend on the cardinality of the data in each column. Typically, you want to put columns first which are more restrictive, placing less restrictive columns last.
I can see two potential sources of slowness in your query. You can run a couple of tests to see which is worse. There is an easy way to fix one of them; I don't think you can do much about the other.
You don't only have the group by aggregation at the overall query level; you also have a count(distinct {something}). That count distinct is a nested aggregation which is expensive. What happens if you remove the word "distinct" there? Meaning, how does the execution time change? (Of course, that will not give you the result you need; but it will tell you HOW EXPENSIVE the "distinct" is.)
Unfortunately, if THAT is the biggest bottleneck, there is nothing you can do about it.
The other source of slowness is the ORDER BY clause at the end of the query. A bit of background: there are essentially two ways to SORT BY. One is to order the expressions you "group by"; the other is to hash them. In the old days, Oracle used "sort" group by - which is expensive. As a side effect, results were ordered by the GROUP BY expressions even without an explicit ORDER BY clause; that is how developers acquired very poor habits.
At some point Oracle "learned" that "hash" group by is faster. However, they fell into a trap: when you have GROUP BY followed by ORDER BY the same expressions, Oracle thought (incorrectly for most cases) that they can save time by doing both in one shot by simply using the old "sort" group by. This is very wasteful when 3 million rows in the input result perhaps in 300 groups. Better to hash group by for the 3 million rows, and then have the (additional, but trivial) step of ordering 300 output rows. Why Oracle is so dumb as not to see this, I don't know - it's just how it is.
This problem, though, has a very simple solution. You can force hash group by with the use_hash_aggregation hint. (First, you can simply remove the ORDER BY clause from your query to see if that's the problem; if you see no improvement, then adding the hint about hash aggregation will not help.)
I have no idea which of the two problems I described is worse. And if it's the "sort group by" (the only one you can do something about), don't expect miracles. You may see the execution time drop from 3 minutes and 20 seconds to 2 minutes or 2 minutes and 30 seconds or whatnot; not an order of magnitude of improvement.
I wonder if two levels of aggregation with appropriate indexes would help:
SELECT bank, SUM(qty) as qty, SUM(amount) as amount,
count(*) as terminals
FROM (SELECT "c"."NETSW_ACQEREF" AS bank, "c"."TERM_ID",
count(*) AS qty,
sum("c"."TRAN_AMNT") / 100 AS "AMOUNT",
FROM "CSCLWH"."CLWH_COMMON_DATA" "c"
WHERE "c"."TRAN_DATE" between 20201101 AND 20201111 AND
"TRAN_TYPE" IN ('00', '01', '10', '12', '19', '20', '26', '29', '50', '51', '52') AND
"RESP_CODE" IN ('0', '00', '000', '400') AND
"MTI" IN ('1100', '1200', '1240', '1400', '1420')
GROUP BY "c"."NETSW_ACQEREF", "c"."TERM_ID"
) c
GROUP BY bank
ORDER BY BANK;
This assumes that tran_type, resp_code, and MTI are all strings. If they are numbers, then change the comparisons to use numbers.
Then you want an index for the WHERE clause. It is quite unclear what the best possibilities are, but something like (tran_date, mti, tran_type, resp_code) -- these should be most selective first.

Count(1) from a table having million records is slow even with parallel(8) hint

I am trying to take count of records from table which has 194 million records. Used parallel hints and index fast scan but still its slow. Please suggest any alternative or improvement ideas for the query attached.
SELECT
/*+ parallel(cs_salestransaction 8)
index_ffs(cs_salestransaction CS_SALESTRANSACTION_COMPDATE)
index_ffs(cs_salestransaction CS_SALESTRANSACTION_AK1) */
COUNT(1)
FROM cs_salestransaction
WHERE processingunitseq=38280596832649217
AND (compensationdate BETWEEN DATE '28-06-17' AND DATE '26-01-18'
OR eventtypeseq IN (16607023626823731, 16607023626823732, 16607023626823733, 16607023626823734));
Here is Execution plan:
[]
The query gave result but took 2 hours to calculate 194 million.
Edits:
Code edited to add DATE per suggestion by Littlefoot.
Code edited with actual column names.
I am new to stack overflow, hence have attached plan as image.
Also, if compensationdate is DATE datatype, don't compare it to strings (because '28-JUL-17' is a string) and force Oracle to perform implicit conversion & spend time over nothing. Switch to
compensationdate BETWEEN date '2017-07-28' and date '2018-01-26'
Having OR CONDITION in where clause ignores the use of index in the query. You should get rid of OR condition. There can be multiple ways for that. One of the method is -
SELECT /*+ parallel(sales 8)
index_ffs(sales ,sales_COMPDATE)
index_ffs(sales , sales_eventtypeseq )*/
COUNT(1)
FROM sales
WHERE processingunitseq=38
AND compensationdate BETWEEN TO_DATE('28-JUL-17') AND TO_DATE('26-JAN-18')
UNION ALL
SELECT /*+ parallel(sales 8)
index_ffs(sales ,sales_COMPDATE)
index_ffs(sales , sales_eventtypeseq )*/
COUNT(1)
FROM sales
WHERE processingunitseq=38
AND compensationdate NOT BETWEEN TO_DATE('28-JUL-17') AND TO_DATE('26-JAN-18') -- To avoid duplicates
AND eventtypeseq IN (1, 2, 3, 4);
For other suggestions, Please post the execution plan of the query.

Why my sql query is so slow in one database?

I am using sql server 2008 r2 and I have two database, which is one have 11.000 record and another is just 3000 record, when i do run this query
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and year(tbltransac.tgl_faktur)=2015
And ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc
It takes me 1 minute 30 sec in my server (I query it using sql management tool) that have 3000 record but it only took 3 sec to do a query in my another server which is have 11000 record, whats wring with my database?
I've already tried to backup and restore my 3000 record database and restore it in my 11000 record server, it's faster.. took 30 sec to do a query, but it's still annoying if i compare to my 11000 record server. They are in the same spec
How this happend? what i should check? i check on event viewer, resource monitor or sql management log, i couldn't find any error or blocked connection. There is no wrong routing too..
Please help... It just happen a week ago, before this it was fine, and I haven't touch the server more than a month...
as already mentioned before, you have three issues in your query.
Just as an example, change the query to this one:
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
Another idea, if your viewGrandTotal is quite large, could be an pre-filtering of this table before you join it. Sometimes SQL Server doesn't get a good plan which needs some lovely touch to get him in the right direction.
Maybe this one:
SELECT Right(rtrim(vgt.No_Faktur),6) as NoUrut,
vgt.No_Faktur,
vgt.No_FakturP,
vgt.Kd_Plg,
tc.Nm_Plg,
vgt.Total_Faktur,
vgt.Nm_Pajak,
vgt.Tgl_Faktur,
vgt.Tgl_FakturP,
vgt.Total_Distribusi
FROM (SELECT Kd_Plg, Nm_Plg FROM Tblcust GROUP BY Kd_Plg, Nm_Plg) as tc -- Pre-Filter on just the needed columns and distinctive.
INNER JOIN (
-- Pre filter viewGrandTotal
SELECT DISTINCT vgt.No_Faktur, vgt.No_Faktur, vgt.No_FakturP, vgt.Kd_Plg, vgt.GRANDTOTAL AS Total_Faktur, vgt.Nm_Pajak,
vgt.Tgl_Faktur, vgt.Tgl_FakturP, vgt.Total_Distribusi
FROM ViewGrandtotal AS vgt
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
) as vgt
ON tc.Kd_Plg = vgt.Kd_Plg
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
The pre filtering could increase the generation of a better plan.
Another issue could be just the multi-threading. Maybe your query get a parallel plan as it reaches the cost threshold because of the 11.000 rows. The other query just hits a normal plan due to his lower rows. You can take a look at the generated plans by including the actual execution plan inside your SSMS Query.
Maybe you can compare those plans to get a clue. If this doesn't help, you can post them here to get some feedback from me.
I hope this helps. Not quite easy to give you good hints without knowing table structures, table sizes, performance counters, etc. :-)
Best regards,
Ionic
Note: first of all you should avoid any function in Where clause like this one
year(tbltransac.tgl_faktur)=2015
Here Aaron Bertrand how to work with date in Where clause
"In order to make best possible use of indexes, and to avoid capturing too few or too many rows, the best possible way to achieve the above query is ":
SELECT COUNT(*)
FROM dbo.SomeLogTable
WHERE DateColumn >= '20091011'
AND DateColumn < '20091012';
And i cant understand your logic in this piece of code but this is bad part of your query too
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Actually Null <> "Shop" in this case, so Why are you replace it to ""?
Thanks and good luck
Here is some recommendations:
year(tbltransac.tgl_faktur)=2015 replace this with tbltransac.tgl_faktur >= '20150101' and tbltransac.tgl_faktur < '20160101'
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP' replace this with tbltransac.No_OPJ <> 'SHOP' because NULL <> 'SHOP'.
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc remove this, because ordering should be done in presentation layer rather then in data layer.
Read about SARG arguments and predicates:
What makes a SQL statement sargable?
To write an appropriate SARG, you must ensure that a column that has
an index on it appears in the predicate alone, not as a function
parameter. SARGs must take the form of column inclusive_operator
or inclusive_operator column. The column name is alone
on one side of the expression, and the constant or calculated value
appears on the other side. Inclusive operators include the operators
=, >, <, =>, <=, BETWEEN, and LIKE. However, the LIKE operator is inclusive only if you do not use a wildcard % or _ at the beginning of
the string you are comparing the column to

Creating a quicker MySQL Query

I'm trying to create a faster query, right now i have large databases. My table sizes are 5 col, 530k rows, and 300 col, 4k rows (sadly i have 0 control over architecture, otherwise I wouldn't be having this silly problem with a poor db).
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1, table2
WHERE table2.foo_0 = table1.foo_0
AND table1.bar1 >= NOW()
AND foo_20="tada"
ORDER BY
date desc
LIMIT 0,10
I've indexed the table2.foo_0 and table1.foo_0 along with foo_20 in hopes that it would allow for faster querying.. i'm still at nearly 7 second load time.. is there something else I can do?
Cheers
I think an index on bar1 is the key. I always run into performance issues with dates because it has to compare each of the 530K rows.
Create the following indexes:
CREATE INDEX ix_table1_0_1 ON table1 (foo_1, foo_0)
CREATE INDEX ix_table2_20_0 ON table2 (foo_20, foo_0)
and rewrite you query as this:
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1
JOIN table2
ON table2.foo_0 = table1.foo_0
AND table2.foo_20 = "tada"
WHERE table1.bar1 >= NOW()
ORDER BY
table1.foo_1 DESC
LIMIT 0, 10
The first index will be used for ORDER BY, the second one will be used for JOIN.
You, though, may benefit more from creating the first index like this:
CREATE INDEX ix_table1_0_1 ON table1 (bar, foo_0)
which may apply more restrictive filtering on bar.
I have a blog post on this:
Choosing index
, which advices on how to choose which index to create for cases like that.
Indexing table1.bar1 may improve the >=NOW comparison.
A compound index on table2.foo_0 and table2.foo_20 will help.
An index on table2.foo_1 may help the sort.
Overall, pasting the output of your query with EXPLAIN prepended may also give some hints.
table2 needs a compound index on foo_0, foo_20, and bar1.
An index on table1.foo_0, table1.bar1 could help too, assuming that foo_20 belongs to table1.
See How to use MySQL indexes and Optimizing queries with explain.
Use compound indexes that corresponds to your WHERE equalities (in general leftmost col in the index), WHERE commparison to abolute value (middle), and ORDER BY clause (right, in the same order).