Is there a better way to optimize this SQL query?

Is there a better way to optimize this SQL query? - sql

Assume my table has millions of records.
The following columns have indexes:
type
date
unique_id
Here is my query:
SELECT TOP (1000) T.TIME, T.TYPE, F.NAME,
B.NAME, T.MESSAGE
FROM MY_TABLE T
LEFT OUTER JOIN FOO F ON F.ID = T.FID
LEFT OUTER JOIN BAR B ON B.ID = T.BID
WHERE T.TYPE IN ('success', 'failure')
AND T.DATE BETWEEN 1592585183437 AND 1594232320525
AND T.UNIQUE_ID = "my unique ID"
ORDER BY T.DATE DESC
My question is am I causing myself any trouble with this query if I have tons of records in my table? Can this be optimized further?

My question is am I causing myself any trouble with this query if I have tons of records
in my table?
Wrong question. As long as you are only asking for data you need, you NEED it. Any trouble means you STIL need it.
The query looks as good as it gets. I somehow doubt the TOP 10000 (that is a LOT of data unless you compress it somehow).
The question is more whether you have the proper indices. Without the proper indices. I would also not use strings like this for TYPE, but then the question is about the query, not possible failures in table design.
Check indices.

Related

Tuning Oracle Query for slow select

I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically

One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!

As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view

Determine datatypes of columns - SQL selection

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks

Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

Check the query efficiency

I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)

It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)

Improving performance on SQL query

I'm currently having performance problems with an expensive SQL query, and I'd like to improve it.
This is what the query looks like:
SELECT TOP 50 MovieID
FROM (SELECT [MovieID], COUNT(*) AS c
FROM [tblMovieTags]
WHERE [TagID] IN (SELECT TOP 7 [TagID]
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY Relevance ASC)
GROUP BY [MovieID]
HAVING COUNT(*) > 1) a
INNER JOIN [tblMovies] m ON m.MovieID=a.MovieID
WHERE (Hidden=0) AND m.Active=1 AND m.Processed=1
ORDER BY c DESC, m.IMDB DESC
What I'm trying to find movies that have at least 2 matching tags for MovieID 12345.
Database basic scheme looks like:
Each movie has 4 to 5 tags. I want a list of movies similar to any movie based on the tags. A minimum of 2 tags must match.
This query is causing my server problems as I have hundreds of concurrent users at any given time.
I have already created indexes based on execution plan suggestions, and that has made it quicker, but it's still not enough.
Is there anything I could do to make this faster?

I Like to use temp tables, because they can speed up your queries (if used correctly) and make it easier to read. Try using the query below and see if it speeds it up any. There were a few fields (hidden,imdb) that weren't in your schema, so I left them out.
This query may, or may not, be exactly what you are looking for. The point of it is to show you how to use temp tables to increase the performance and improve readability. Some minor tweaks may be necessary.
SELECT TOP 7 [TagID],[MovieTagID],[MovieID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
SELECT mt.MovieID, COUNT(mt.MovieTagID)
INTO #Movies
FROM #MovieTags mt
INNER JOIN tblMovies m ON m.MovieID=mt.MovieID AND m.Active=1 AND m.Process=1
GROUP BY [MovieID]
HAVING COUNT(mt.MovieTagID) > 1
SELECT TOP 50 * FROM #Movies
DROP TABLE #MovieTags
DROP TABLE #Movies
Edit
Parameterized Queries
You will also want to use parameterized queries, rather than concatenating your values in your SQL string. Check out this short, to the point, blog that explains why you should use parameterized queries. This, combined with the temp table method, should improve your performance significantly.

I want to see if there is some unnecessary processing happening from that query you wrote. Try the following query and let us know if it's faster slower etc And if it's even getting the same data.
I just threw this together so no guarantees on perfect syntax
SELECT TOP 7 [TagID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY TagID
;cte_movies AS
(
SELECT
mt.MovieID
,mt.TagID
FROM
tblMovieTags mt
INNER JOIN #MovieTags t ON mt.TagId = t.TagId
INNER JOIN tblMovies m ON mt.MovieID = m.MovieID
WHERE
(Hidden=0) AND m.Active=1 AND m.Processed=1
),
cte_movietags AS
(
SELECT
MovieId
,COUNT(MovieId) AS TagCount
FROM
cte_movies
GROUP BY MovieId
)
SELECT
MovieId
FROM
cte_movietags
WHERE
TagCount > 1
ORDER BY
MovieId
GO
DROP TABLE #MovieTags

Formatting Clear and readable SQL queries

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.
We're not using views so that's out of the question.
After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.
What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?

With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.
Example:
WITH
cubed_data AS (
SELECT
dimension1_id,
dimension2_id,
dimension3_id,
measure_id,
SUM(value) value
FROM
source_data
GROUP BY
CUBE(dimension1, dimension2, dimension3),
measure
),
dimension1_label AS(
SELECT
dimension1_id,
dimension1_label
FROM
labels
WHERE
object = 'dimension1'
), ...
SELECT
*
FROM
cubed_data
JOIN dimension1_label USING (dimension1_id)
JOIN dimension2_label USING (dimension2_id)
JOIN dimension3_label USING (dimension3_id)
JOIN measure_label USING (measure_id)
The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.
WITH works at least on current versions of Postgres, Oracle and SQL Server

Boy is this a loaded question. :) There are as many ways to do it right as there are smart people on this site. That said, here is how I keep myself sane when building complex sql statements:
select
c.customer_id
,c.customer_name
,o.order_id
,o.order_date
,o.amount_taxable
,od.order_detail_id
,p.product_name
,pt.product_type_name
from
customer c
inner join
order o
on c.customer_id = o.customer_id
inner join
order_detail od
on o.order_id = od.order_id
inner join
product p
on od.product_id = p.product_id
inner join
product_type pt
on p.product_type_id = pt.product_type_id
where
o.order_date between '1/1/2011' and '1/5/2011'
and
(
pt.product_type_name = 'toys'
or
pt.product_type_name like '%kids%'
)
order by
o.order_date
,pt.product_type_name
,p.product_name
If you're interested, I can post/send layouts for inserts, updates and deletes as well as correlated subqueries and complex join predicates.
Does this answer your question?

Generally, people break lines on reserved words, and indent any sub-queries:
SELECT *
FROM tablename
WHERE value in
(SELECT *
FROM tablename2
WHERE condition)
ORDER BY column

In general, I follow a simple hierarchical set of formatting rules. Basically, keywords such as SELECT, FROM, ORDER BY all go on their own line. Each field goes on its own line (in a recursive fashion)
SELECT
F.FIELD1,
F.FIELD2,
F.FIELD3
FROM
FOO F
WHERE
F.FIELD4 IN
(
SELECT
B.BAR
FROM
BAR B
WHERE
B.TYPE = 4
AND B.OTHER = 7
)

Table aliases and simple consistency will get you a long, long way
What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).
Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.
Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.
Indenting logically the same level of statement (subqueries, opening brackets, etc)
Capitalize all keywords and standard functions.
Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.
EDIT example:
SELECT a.name, SUM(b.tax)
FROM db_prefix_registered_users a
INNER JOIN db_prefix_transactions b
ON a.id = b.user_id
LEFT JOIN db_countries
ON b.paid_from_country_id = c.id
WHERE a.type IN (1, 2, 7) AND
b.date < (SELECT MAX(date)
FROM audit) AND
c.country = 'CH'
So, at the end to sum it up - consistency matters the most.

I like to use something like:
SELECT col1,
col2,
...
FROM
MyTable as T1
INNER JOIN
MyOtherTable as T2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
LEFT JOIN
(
SELECT 1,2,3
FROM Someothertable
WHERE somestuff = someotherstuff
) as T3
ON t1.field = t3.field

The only true and right way to format SQL is:
SELECT t.mycolumn AS column1
,t.othercolumn AS column2
,SUM(t.tweedledum) AS column3
FROM table1 t
,(SELECT u.anothercol
,u.memaw /*this is a comment*/
FROM table2 u
,anothertable x
WHERE u.bla = :b1 /*the bla value*/
AND x.uniquecol = :b2 /*the widget id*/
) v
WHERE t.tweedledee = v.anothercol
AND t.hohum = v.memaw
GROUP BY t.mycolumn
,t.othercolumn
HAVING COUNT(*) > 1
;
;)
Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.

Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.

Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.
Common Table Expressions can really help in some cases to break up a query into meaningful sections.

An age-old question with a thousand opinions and no one right answer, and one of my favorites. Here's my two cents.
With regards to subqueries, lately I've found it easier to follow what's going on with "extreme" indenting and adding comments like so:
SELECT mt.Col1, mt.Col2, subQ.Dollars
from MyTable1 mt
inner join (-- Get the dollar total for each SubCol
select SubCol, sum(Dollars) Dollars
from MyTable2
group by SubCol) subQ
on subQ.SubCol = mt.Col1
order by mt.Col2
As for the other cent, I only use upper case on the first word. With pages of run-on queries, it makes it a bit easier to pick out when a new one starts.
Your mileage will, of course, vary.

Wow, alot of responses here, but one thing I haven't seen in many is COMMENTS! I tend to add a lot of comments throughout, especially with large SQL statements. Formatting is important, but well placed and meaningful comments are extremely important, not just for you but the poor soul who needs to maintain the code ;)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is there a better way to optimize this SQL query? - sql

Related

Tuning Oracle Query for slow select

Determine datatypes of columns - SQL selection

Check the query efficiency

Improving performance on SQL query

Formatting Clear and readable SQL queries

Categories

Resources