SQL Help- in rewriting a query - sql

How can we reduce the Execution time of the below query?
Need help in rewriting below SQL query in a more efficient way?
SELECT A.*, C.*, F.*, D.*
FROM TABLE1 A INNER JOIN
TABLE2 C
ON A.CODE = C.CODE INNER JOIN
TABLE3 D
ON A.CODE = D.CODE INNER JOIN
TABLE4 F
ON A.CODE = F.CODE
WHERE D.IND1 = 'N' AND
D.IND2 = 'N' AND
D.EFF_DATE = (SELECT MAX(X.EFF_DATE)
FROM TABLE3 X
WHERE X.CODE = D.CODE AND X.EFF_DATE <= A.EFFECTIVE_DATE
) AND
F.EFF_DATE = (SELECT MAX(Z.EFF_DATE)
FROM TABLE4 Z
WHERE Z.DETAIL_CODE = F.DETAIL_CODE AND Z.EFF_DATE <= A.EFFECTIVE_DATE
)

For performance, I would start with indexes on:
TABLE3(IND1, IND2, CODE, EFF_DATE)
TABLE3(CODE, EFF_DATE)
TABLE1(CODE, EFF_DATE)
TABLE2(CODE)
TABLE4(CODE)
TABLE4(DETAIL_CODE, EFF_DATE)
If you have performance issues, though, I suspect your code may be generating unexpected Cartesian products. Debugging that requires much more information. I might suggest asking another question.

If you can find out where the bottlenecks in your query are -- i.e. sub-queries, joins -- that will give you a better idea of what to look at. In the absence of that, take a look at:
modify your column projections (i.e. A., C., F., D.) to only return the columns you need
look at table partitioning for the queries accessing rows based on DATE values (TABLE3.EFF_DATE, TABLE4.EFF_DATE) (http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56partition-090450.html)
look at adding materialized view(s) either on the entire query OR the sub-queries (https://oracle-base.com/articles/misc/materialized-views)
look at statistic generation if the query plan is not optimal (https://docs.oracle.com/cd/A97630_01/server.920/a96533/stats.htm#26713)
If you can provide an EXPLAIN plan (or Oracle's equivalent), that would be helpful.

Note that because of the conditions on the two sub-queries all the records in your result will have D.EFF_DATE <= A.EFFECTIVE_DATE and F.EFF_DATE <= A.EFFECTIVE_DATE, so I would suggest to put those conditions in the JOIN clauses.
Secondly, analytic functions may give better performance than subqueries:
SELECT *
FROM (
SELECT A.*,C.*,F.*,D.*,
RANK() OVER (PARTITION BY D.CODE
ORDER BY D.EFF_DATE DESC) AS D_RANK,
RANK() OVER (PARTITION BY F.DETAIL_CODE
ORDER BY F.EFF_DATE DESC) AS F_RANK
FROM TABLE1 A
INNER JOIN TABLE2 C
ON A.CODE = C.CODE
INNER JOIN TABLE3 D
ON A.CODE = D.CODE
AND D.EFF_DATE <= A.EFFECTIVE_DATE
INNER JOIN TABLE4 F
ON A.CODE = F.CODE
AND F.EFF_DATE <= A.EFFECTIVE_DATE
WHERE D.IND1 = 'N'
AND D.IND2 = 'N'
)
WHERE D_RANK = 1 AND F_RANK = 1
Evidently you need to have the right indexes to optimise the execution plan.

Another thing to consider is the total number of columns your query returns, you seem to be selecting all the columns from 4 tables.
We found that our complex queries ran in under a second when selecting only a few columns but took orders of magnitude longer when selecting many columns.
Question why you need so many columns in your result set.

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P โ€‹โ€‹(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values โ€‹โ€‹of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

How to join three tables having relation parent-child-child's child. And I want to access all records related to parent

I have three tables:
articles(id,title,message)
comments(id,article_id,commentedUser_id,comment)
comment_likes(id, likedUser_id, comment_id, action_like, action_dislike)
I want to show comments.id, comments.commentedUser_id, comments.comment, ( Select count(action_like) where action_like="like") as likes and comment_id=comments.id where comments.article_id=article.id
Actually I want to count all action_likes that related to any comment. And show all all comments of articles.
action_likes having only two values null or like
SELECT c.id , c.CommentedUser_id , c.comment , (cl.COUNT(action_like) WHERE action_like='like' AND comment_id='c.id') as likes
FROM comment_likes as cl
LEFT JOIN comments as c ON c.id=cl.comment_id
WHERE c.article_id=article.id
It shows nothing, I know I'm doing wrong way, that was just that I want say
I guess you are looking for something like below. This will return Article/Comment wise LIKE count.
SELECT
a.id article_id,
c.id comment_id,
c.CommentedUser_id ,
c.comment ,
COUNT (CASE WHEN action_like='like' THEN 1 ELSE NULL END) as likes
FROM article a
INNER JOIN comments C ON a.id = c.article_id
LEFT JOIN comment_likes as cl ON c.id=cl.comment_id
GROUP BY a.id,c.id , c.CommentedUser_id , c.comment
IF you need results for specific Article, you can add WHERE clause before the GROUP BY section like - WHERE a.id = N
I would recommend a correlated subquery for this:
SELECT a.id as article_id, c.id as comment_id,
c.CommentedUser_id, c.comment,
(SELECT COUNT(*)
FROM comment_likes cl
WHERE cl.comment_id = c.id AND
cl.action_like = 'like'
) as num_likes
FROM article a INNER JOIN
comments c
ON a.id = c.article_id;
This is a case where a correlated subquery often has noticeably better performance than an outer aggregation, particularly with the right index. The index you want is on comment_likes(comment_id, action_like).
Why is the performance better? Most databases will implement the group by by sorting the data. Sorting is an expensive operation that grows super-linearly -- that is, twice as much data takes more than twice as long to sort.
The correlated subquery breaks the problem down into smaller pieces. In fact, no sorting should be necessary -- just scanning the index and counting the matching rows.

How to ignore lines in sql query which specific id php

I have a simply shop with php and I need to ignore some products in shop on manage page. How to possible to make ignore in SQL query?
Here is my query:
$query = "SELECT a.*,
a.user as puser,
a.id as pid,
b.date as date,
b.price as price,
b.job_id as job_id,
b.masterkey as masterkey
FROM table_shop a
INNER JOIN table_shop_s b ON a.id = b.buyid
WHERE b.payok = 1
ORDER BY buyid";
I need to ignore list with product_id = "3","4" from table table_shop_s in this query
WHERE b.payok = 1 AND tablename.product_id != 3 AND tablename.product_id != 4
Simply use NOT IN (to ignore specific pids), with AND logical condition. Use the following:
$query = "SELECT a.*,
a.user as puser,
a.id as pid,
b.date as date,
b.price as price,
b.job_id as job_id,
b.masterkey as masterkey
FROM table_shop a
INNER JOIN table_shop_s b ON a.id = b.buyid
WHERE b.payok = 1
AND a.id NOT IN (3,4)
ORDER BY buyid";
Other answer has noted you would probably use a "productid NOT IN (3,4)" which would work, but that would be a short-term fix. Extend the thinking a bit. 2 products now, but in the future you have more you want to hide / prevent? What then, change all your queries and miss something?
My suggestion would be to update your product table. Add a column such as ExcludeFlag and have it set to 1 or 0... 1 = Yes, Exclude, 0 = ok, leave it alone. Then join your shop detail table to products and exclude when this flag is set... Also, you only need to "As" columns when you are changing their result column name, Additionally, by doing A.*, you are already getting ALL columns from alias "a" table, do you really need to add the extra instances of "a.user as puser, a.id as pid" ?
something like
SELECT
a.*,
b.date,
b.price,
b.job_id,
b.masterkey
FROM
table_shop a
INNER JOIN table_shop_s b
ON a.id = b.buyid
AND b.payok = 1
INNER JOIN YourProductTable ypt
on b.ProductID = ypt.ProductID
AND ypt.ExcludeFlag = 0
ORDER BY
a.id
Notice the extra join and specifically including all those where the flag is NOT set.
Also, good practice to alias table names closer to context of purpose vs just "a" and "b" much like my example of long table YourProductTable aliased as ypt.
I also changed the order by to "a.id" since that is the primary table in your query and also, since a.id = b.buyid, it is the same key order anyhow and probably is indexed on your "a" table too. the table_shop_s table I would assume already has an index on (buyid), but might improve when you get a lot of records to be indexed on (buyid, payok) to better match your JOINING criteria on both parts.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

SQL: Speed Improvement - Left Join on cond1 or cond2

SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
)
Two tables that are basically the same
I don't have access to the table structure or data input (thus no cleaning up primary keys)
Sometimes the user_id is populated in one and not the other
Sometimes names are equal, sometimes they are not
I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).
This query runs in 33000ms, whereas individualized they are each about 200ms.
I've been up late and can't think straight right now
I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
Here is some free points to anyone that wants to help
Please don't ask for the execution plan.
Looks like you can easily avoid the string concatenation:
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
Change it to:
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
Rather than concatenating first and last name and comparing them, try comparing them individually instead. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes on the first name and last name columns.
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR (a.f_name = b.f_name and a.l_name = b.l_name)
)
If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.
One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:
SELECT a_field1, a_field2, ...
MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
FROM (
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
UNION ALL
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.f_name = b.f_name AND a.l_name = b.l_name
)
GROUP BY a_field1, a_field2, ...
And now the database can do each of the two joins using the most efficient plan.
(Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)
Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.
I don't really understand why you're concatenating those strings. Seems like that's where your slowdown would be. Does this work instead?
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
)
Here is Yet Another Ugly Way To Do It.
SELECT a.*
, CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1
, CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2
...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
LEFT JOIN import_tbl c
ON a.f_name = c.f_name AND a.l_name = c.l_name;
This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.
Try using JOIN hints:
http://msdn.microsoft.com/en-us/library/ms173815.aspx
We were encountering the same type of behavior with one of our queries. As a last resort we added the LOOP hint, and the query ran much much faster.
It's important to note that Microsoft says this about JOIN hints:
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including , be used only as a last resort by experienced developers and database administrators.
my boss at my last job.. I swear.. he thought that using UNIONS was ALWAYS FASTER THAN OR.
For example.. instead of writing
Select * from employees Where Employee_id = 12 or employee_id = 47
he would write (and have me write)
Select * from employees Where employee_id = 12
UNION
Select * from employees Where employee_id = 47
SQL Sever optimizer said that this was the right thing to do in SOME situations.. I have a friend who works on the SQL Server team at Microsoft, I emailed him about this and he told me that my stats were out of date or something along those lines.
I never really got a good answer on WHY the unions are faster, it seems REALLY counter-intuitive.
I'm not recommending you DO this, but in some situations it can help.
Also two more things-- GET RID OF THE DISTINCT CLAUSE unless you absolutely need it.. n
and more importantly, you can easily get rid of the concatenation in your join, like this for example (pardon my lack of mySQL knowledge)
SELECT DISTINCT a., b.
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name and a.l_name = b.l_name)
)
I've had some tests at work in a similiar situation that show 10x performance improvement by getting rid of the simple concatenation in your join