I am using MariaDB and made several queries to find some data in the field
Query :
SELECT
COUNT(
CASE WHEN alamat.alamat_prins LIKE '%Aceh%' THEN 1 END
) AS 'id-ac'
FROM
kont_uji_rutin AS UjiRutin
LEFT JOIN alamat ON UjiRutin.id_prins = alamat.id_prins
WHERE
UjiRutin.tgl_tri BETWEEN '2019-01-01' AND '2019-12-31'
Using the LIKE query is the slowest way to do full text matching on what I have. Are there alternative queries to look up data in the field provided that it has the word "Aceh"?
Thank you in advance
You should move the comparison to the WHERE clause and dispense with the LEFT JOIN:
SELECT COUNT(*) as 'id_ac'
FROM kont_uji_rutin ur JOIN
alamat a
ON ur.id_prins = a.id_prins
WHERE ur.tgl_tri BETWEEN '2019-01-01' AND '2019-12-31' AND
a.alamat_prins LIKE '%Aceh%';
This will not have a big impact on performance. Indexes on:
kont_uji_rutin(tgl_tri, id_prins)
alamat(id_prins, alamat_prins)
are worth trying (they cover the query which might have a performance benefit).
Ultimately, though, the problem is the wildcard at the beginning of the pattern. This prevents the use of an index. You might consider a full-text index on alamat(alamat_prins). That should speed the query, if the full text functionality meets your needs.
Related
I have three tables that I'm trying to join with over a billion rows per table. Each table has an index on the billing date column. If I just filter the left table and do the joins, will the query run efficiently or do I need to put the same date filter in each subquery?
I.E. will the first query run much slower than the second query?
select item, billing_dollars, IC_billing_dollars
from billing
left join IC_billing on billing.invoice_number = IC_billing.invoice_number
where billing.date = '2019-09-24'
select item, billing_dollars, IC_billing_dollars
from billing
left join (select * from IC_billing where IC_billing.date = '2019-09-24') on billing.invoice_number = IC_billing.invoice_number
where billing.date = '2019-09-24'
I don't want to run this without knowing whether the query will perform well as there aren't many safeguards for poor performing queries. Also, if I need to write the query in the second way, is there a way to only have the date filter in one location rather than having it show up multiple times in the query?
That depends.
Consider your query:
select b.item, b.billing_dollars, icb.IC_billing_dollars
from billing b left join
IC_billing icb
on b.invoice_number = icb.invoice_number
where b.date = '2019-09-24';
(Assuming I have the columns coming from the correct tables.)
The optimal strategy is an index on billing(date, invoice_number) -- perhaps also with item and billing_dollars to the index; and ic_billing(invoice_number) -- perhaps with IC_billing_dollars.
I can think of two situations where filtering on date in ic_billing would be useful.
First, if there is an index on (invoice_date, invoice_number), particularly a primary key definition. Then using this index is usually preferred, even if another index is available.
Second, if ic_billing is partitioned by invoice_date. In that case, you will want to specify the partition for performance.
Generally, though, the additional restriction on the invoice date does not help. In some databases, it might even hurt performance (particularly if the subquery is materialized and the outer query does not use an appropriate index).
I have two tables that I need to join, the first table contains CustomerNumber and IdentificationNumber, and IdentificationType. The second table contains the IdentificationType, EffectiveDate, and EndDate.
My Query basically looks like this:
Select CustomerNumber, IdentificationNumber
From Identification i
Inner Join IdentificationType it On it.IdentificationType = i.IdentificationType
And it.EffectiveDate < #TodaysDate
And (it.EndDate IS NULL Or it.EndDate > #TodaysDate)
My execution plan is showing a clustered index scan on the identification type table, I'm assuming it's because of the OR in the join clause.
Is there a more efficient way to join, KNOWING that the EndDate field MUST allow Null, or a real datetime value?
I know you said the EndDate column MUST allow NULL, so just for the record: the most efficient way is to stop using NULLs in place of "no end date" in the IdentificationType table, and instead use 9999-12-31. Then your queries can skip the whole OR clause. (I understand this might require some application changes, but it would be worth it in my opinion for this exact reason--and I have seen this "NULL = open ended" pattern make queries difficult or perform badly over and over again in my own work and in SQL questions online.)
Also, you might consider swapping the order of the two OR conditions--this may sound like voodoo but I believe I heard that there are some special cases where it can optimize better when the variable is first in this specific scenario (though I could be wrong).
In the meantime, would you try this and share how well it performs compared to your and other solutions?
SELECT
CustomerNumber, IdentificationNumber
FROM
dbo.Identification i
INNER JOIN dbo.IdentificationType it
ON it.IdentificationType = i.IdentificationType
WHERE
it.EffectiveDate < #TodaysDate
AND it.EndDate IS NULL
UNION ALL
SELECT
CustomerNumber, IdentificationNumber
FROM
dbo.Identification i
INNER JOIN dbo.IdentificationType it
ON it.IdentificationType = i.IdentificationType
WHERE
it.EffectiveDate < #TodaysDate
AND it.EndDate > #TodaysDate
;
I have recovered from poor performance with OR clauses by using this exact strategy. It is painful to explode the query size/complexity, but the possibility of getting just a few seeks is totally worth it compared to the scan you're dealing with now.
There is something fishy about your inequality comparisons: The first one should have an equal sign in it <=. You didn't tell us the data type of the date columns and #TodaysDate, but best practice is to design a system so it does not fail for any input. So even if the variable is datetime and EffectiveDate has no time portion, it should still be <= on that comparison so a query at exactly midnight doesn't fail to include the data for that day.
P.S. Sorry about not preserving your formatting--I just understand queries better when formatted in my preferred style. Also, I moved the date conditions to the WHERE clause because in my opinion they are not part of the JOIN.
Try using isnull instead of the OR statement. I also think you should use Datediff instead of the comparison operator.
select CustomerNumber, IdentificationNumber
From Identification i
Inner Join IdentificationType it On it.IdentificationType = i.IdentificationType
And it.EffectiveDate < #TodaysDate
And (isnull(it.EndDate,#TodaysDate) >= #TodaysDate)
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I am trying to create view, But select statement from this view is taking more than 15 secs.How can i make it faster. My query for the view is below.
create view Summary as
select distinct A.Process_date,A.SN,A.New,A.Processing,
COUNT(case when B.type='Sold' and A.status='Processing' then 1 end) as Sold,
COUNT(case when B.type='Repaired' and A.status='Processing' then 1 end) as Repaired,
COUNT(case when B.type='Returned' and A.status='Processing' then 1 end) as Returned
from
(select distinct M.Process_date,M.SN,max(P.enter_date) as enter_date,M.status,
COUNT(case when M.status='New' then 1 end) as New,
COUNT(case when M.status='Processing' and P.cn is null then 1 end) as Processing
from DB1.dbo.Item_details M
left outer join DB2.dbo.track_data P on M.SN=P.SN
group by M.Process_date,M.SN,M.status) A
left outer join DB2.dbo.track_data B on A.SN=B.SN
where A.enter_date=B.enter_date or A.enter_date is null
group by A.Process_date,A.New,A.Processing,A.SN
After this view..my select query is
select distinct process_date,sum(New),sum(Processing),sum(sold),sum(repaired),sum(returned) from Summary where month(process_date)=03 and year(process_date)=2011
Please suggest me on what changes to be made for the query to perform faster.
Thank you
ARB
It is hard to give advices without seeing the actual data and the structure of the tables. I would rewrite the query keeping in mind these principles:
Use inner join instead of outer join if possible.
Get rid of case operator inside COUNT function. Build a query so you use conditions in WHERE section not in COUNT.
Try to not use aggregated values in GROUP BY. Currently you use aggregated values New and Processing for grouping. Use GROUP BY by existing table values if possible.
If the query gets too complicated, break it into smaller queries and combine results in the final query. Writing a store procedure may help in this case.
I hope this helps.
For tuning a database query, I shall add few items additional to what #Davyd has already listed:
Look at the tables and indexing on those tables. Putting the right index and avoiding the wrong ones always speed up the query.
Is there anything in the where condition that is not part of any index? At times we put index on a column and in the query we use a cast or convert on the column. So the underlying index is not effective. You may consider setting the index on the cast/convert of the column.
Look at the normal form conformity or over normalisation. 3.
Good luck.
If your are using Postgresql, I suggest you use a tool like "http://explain.depesz.com/" in order to see more clearly what part of your query is slow. Depending on what you get, you could either optimize your indexes, or rewrite part of your query. If your are using another database, I'm sure a similar tool exists.
If none of these ideas help, the final solution would be to create a "materialized query". There are plenty of infos on the web regarding this.
Good luck.
I have this SQL query which due to my own lack of knowledge and problem with mysql handling nested queries, is really slow to process. The query is...
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE Printers.PrinterGroup
IN (
SELECT DISTINCT Printers.PrinterGroup
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE PrintJobs.UserName='<username/>'
);
I would like to avoid splitting this into two queries and inserting the values of the subquery into the main query progamatically.
This is probably not exactly what you are looking for however, i will contribute my 2 cents. First off you should show us your schema and exactly what you are trying to accomplish with that query. However from the looks of it you are not using numeric IDs in the table and are instead using varchar fields to join tables, this is not really a good idea performance wise. Also i am not sure why you are doing:
(select PrinterName, UserName
from PrintJobs) AS Table1
instead of just joining on PrintJobs? Similar stuff for this one:
(select
PrinterName,
PrinterGroup
from Printers) as Table1
Maybe i am just not seeing it right. I would recommend that you simplify the query as much as possible and try it. Also tell us what exactly you are hoping to accomplish with the query and give us some schema to work with.
Removed the bad query from the answer.
This query you have is pretty messed up, not sure if this will handle everything you need but simplifying like this kills all the nested queries and it way faster. You can also use the EXPLAIN command to know how mysql will fetch your query.
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers ON PrintJobs.PrinterName = Printers.PrinterName
AND Printers.Username = '<username/>'
;