How to optimize the query? t-sql - sql

This query works about 3 minutes and returns 7279 rows:
SELECT identity(int,1,1) as id, c.client_code, a.account_num,
c.client_short_name, u.uso, us.fio, null as new, null as txt
INTO #ttable
FROM accounts a INNER JOIN Clients c ON
c.id = a.client_id INNER JOIN Uso u ON c.uso_id = u.uso_id INNER JOIN
Magazin m ON a.account_id = m.account_id LEFT JOIN Users us ON
m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9') AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
The type of 'client_code' field is VARCHAR(6).
Is it possible to somehow optimize this query?

Insert the records in the Temporary table without using Order by Clause and then Sort them using the c.client_code. Hope it should help you.
Create table #temp
(
your columns...
)
and Insert the records in this table Without Using the Order by Clause. Now run the select with Order by Clause

Do you have indexes set up for your tables? An index on foreign key columns as well as Magazin.status might help.

Make sure there is an index on every field used in the JOINs and in the WHERE clause
If one or the tables you select from are actually views, the problem may be in the performance of these views.

Always try to list tables earlier if they are referenced in the where clause - it cuts off row combinations as early as possible. In this case, the Magazin table has some predicates in the where clause, but is listed way down in the tables list. This means that all the other joins have to be made before the Magazin rows can be filtered - possibly millions of extra rows.
Try this (and let us know how it went):
SELECT ...
INTO #ttable
FROM accounts a
INNER JOIN Magazin m ON a.account_id = m.account_id
INNER JOIN Clients c ON c.id = a.client_id
INNER JOIN Uso u ON c.uso_id = u.uso_id
LEFT JOIN Users us ON m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9')
AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
This kind of optimization can greatly improve query performance.

Related

Why joining two tables onto parent table returns empty query

I am new to postgres and this schema is a work in progress, there is a lot of redundant columns for data checks.
Why do I receive an empty query when I join "penn_survey" and "brazil_legacy_survey" onto the "visit" table? If I join only "penn_survey" onto the "visit" table the query works correctly and vice versa for "brazil_legacy_survey". Yet when I try join both survey tables back onto the "visit" table in a single query it returns an empty table.
Do I need different column headers or separate foreign key IDs? Is my understanding of the structure for join
-- Data Check
select v.date, v.site, ps.site, ps.date, b.site, b.date
from visit v
join penn_survey ps on ps.visit_id = v.visit_id
join brazil_legacy_survey b on b.visit_id = v.visit_id
-- Joining only penn_survey table
select v.date, v.site, ps.site, ps.date
from visit v
join penn_survey ps on ps.visit_id = v.visit_id
Expanding on the comments, using a left join to both surveys should work, however it would potentially return additional rows of data unless you add a where clause that would include and/or statements. Probably easier to just UNION two queries and control your conditions in each one.
select 'penn' as survey, v.date as visit_date, v.site as visit_site,
ps.date as survey_date, ps.site as survey_site
from visit v
join penn_survey ps on ps.visit_id = v.visit_id
union
select 'brazil', v.date, v.site, b.date, b.site
from visit v
join brazil_legacy_survey b on b.visit_id = v.visit_id ;

Multiple joins on the same table, Results Not Returned if Join Field is NULL

SELECT organizations_organization.code as organization,
core_user.email as Created_By,
assinees.email as Assigned_To,
from tickets_ticket
JOIN organizations_organization on tickets_ticket.organization_id = organizations_organization.id
JOIN core_user on tickets_ticket.created_by_id = core_user.id
Left JOIN core_user as assinees on assinees.id = tickets_ticket.currently_assigned_to_id
In the above query, if tickets_ticket.currently_assigned_to_id is null then that that row from tickets_ticket is not returned
> Records In tickets_ticket = 109
> Returned Records = 4 (out of 109 4 row has value for currently_assigned_to_id rest 105 are null )
> Expected Records = 109 (with nulll set for Assigned_To)
Note I am trying to achieve multiple joins on the same table
LEFT JOIN can not kill output records,
your problem is here:
JOIN core_user on tickets_ticket.created_by_id = core_user.id
this join kills non-matching records
try
LEFT JOIN core_user on tickets_ticket.created_by_id = core_user.id
First, this is not the actual code you are running. There is a comma before the from clause that would cause a syntax error. If you have left out a where clause, then that would explain why you are seeing no rows.
When using left joins, conditions on the first table go in the where clause. Conditions on subsequent tables go in the on clause.
That said, a where clause may not be the problem. I would suggest using left joins from the first table onward -- along with table aliases:
select oo.code as organization, cu.email as Created_By, a.email as Assigned_To,
from tickets_ticket tt left join
organizations_organization oo
on tt.organization_id = oo.id left join
core_user cu
on tt.created_by_id = cu.id left join
core_user a
on a.id = tt.currently_assigned_to_id ;
I suspect that you have data in your data model that is unexpected -- perhaps bad organizations, perhaps bad created_by_id. Keep all the tickets to see what is missing.
That said, you should probably be including something like tt.id in the result set to identify the specific ticket.

Duplicate rows in SQL Server

I am having duplicate rows with the same storeactivityid show up in my results...
This is the primary key, so this should not happen. Is there something wrong with my joins that could cause this? I could use distinct, but that will not solve the issue here.
Any tips or advice? There are 3 duplicates showing for each result!
select pd.storeactivityid,e.EMPLOYEENAME,c.ChainName,c.UserCode as ChainNumber,
s.storenumber,s.StoreNameAndNumber,
pd.startdatetime,
pd.enddatetime,
cast((datediff(s, pd.startdatetime, pd.enddatetime) / 3600.0) as decimal(9,2)) as duration,
exceptioncodes,pe.Description,isnull(pd.approved, 0) as approved,
isnull(pd.comment, '') as comment,
pd.modifieddate
from payrolldetail pd with (nolock)
inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = #companyid
left join stores s with (nolock) on pd.storeid = s.storeid
left join chains c with (nolock) on c.chainid = s.chaincode
left join employees e with (nolock) on pd.employeeid = e.employeeid
inner join payrollexceptions pe with (nolock) on pd.ExceptionCodes = pe.Code
where pd.companyid = #companyid
and cast(getdate() as date) between pp.begindate and pp.enddate
and exceptioncodes = #exceptioncodes
and pd.companyid = #companyid
If it is a primary key, you can be certain that in the actual table you do not have duplicate rows with the same storeactivityid.
Your query returns rows with the same storeactivityid because at least one of the joined tables has the matches the condition specified in the join.
My best guess it is due to the followoing join:
inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = #companyid
Is it possible that a company has multiple payrolldetails within the same range of dates specified in the payperiods table?
i do not know what is in each of the tables,
but the easiest way i found to debug something like his is
select [storeactivityid],count([storeactivityid]) as [count]
from [<table>]
<Start adding joins in one at a time>
where [count] > 1
group by [storeactivityid]
If you use NOLOCK hint to be able to do 'dirty reads' (read uncommitted) and some modifications are happen in the same time on this tables, this may cause missing or double count of even unique rows!
If there is no activity on the server, no updates/inserts/deletes, than there is something inside data of your tables that caused duplicating, as other guys have already said.

SQL statement to split a table based on a join

I have a primary table for Articles that is linked by a join table Info to a table Tags that has only a small number of entries. I want to split the Articles table, by either deleting rows or creating a new table with only the entries I want, based on the absence of a link to a certain tag. There are a few million articles. How can I do this?
Not all of the articles have any tag at all, and some have many tags.
Example:
table Articles
primary_key id
table Info
foreign_key article_id
foreign_key tag_id
table Tags
primary_key id
It was easy for me to segregate the articles that do have the match right off the bat, so I thought maybe I could do that and then use a NOT IN statement but that is so slow running it's unclear if it's ever going to finish. I did that with these commands:
INSERT INTO matched_articles SELECT * FROM articles a LEFT JOIN info i ON a.id = i.article_id WHERE i.tag_id = 5;
INSERT INTO unmatched_articles SELECT * FROM articles a WHERE a.id NOT IN (SELECT m.id FROM matched_articles m);
If it makes a difference, I'm on Postgres.
INSERT INTO matched_articles
SELECT * FROM articles a LEFT JOIN info i ON a.id = i.article_id WHERE i.tag_id = 5;
INSERT INTO unmatched_articles
SELECT * FROM articles a WHERE a.id NOT IN (SELECT m.id FROM matched_articles m);
There's so much wrong here, I'm not sure where to start. OK in your first insert you do not need a left join in fact you don't actually have one. It should be
INSERT INTO matched_articles
SELECT * FROM articles a INNER JOIN info i ON a.id = i.article_id WHERE i.tag_id = 5;
Had you needed a left join you would have had
INSERT INTO matched_articles
SELECT * FROM articles a LEFT JOIN info i ON a.id = i.article_id AND i.tag_id = 5;
When you put something from the right side of a left join into the where clause (other than searching for the null values), then you convert it to an inner join becasue it must meet that condition, therefore the records that don't have a match inthe right table are elimiated.
Now the second statement can be done with a special case of the left join, although what you have will work.
INSERT INTO matched_articles
SELECT * FROM articles a
LEFT JOIN info i ON a.id = i.article_id AND i.tag_id = 5
WHERE i.tag_id is null
This will give you all the records that are in the info table except those that matched the articles table.
Now the next thing, you should not write insert staments without specifying the fields you want to insert. Nor should you ever write a select statement using select * especially if you have a join. This is generally sloppy, lazy coding and should be fixed. What if someone changed the structure of one of the tables but not the other? This kind of thing is bad for maintenance and in the case of a select statment with a join, it is returning a collumn twice (the join column) and that is a waste of server and network resources. It is just poor coding to be too lazy specify what you need and only what you need. So get out of the habit and don't do it again for any production code.
If you current stament is too slow, you may also be able to fix it with the right indexes. Are the id fields indexed on both tables? Onthe other hand if there are millionas of articles, it is going to take time to insert them. It is often better to do this in batches maybe 50000 at a time (fewer still if this takes too long). Just do the insert ina loop that selects the top XXX records and then loops until the row count affected is none.
Your queries look ok except the first one should be an inner join, not a left join. If you want to try something else, consider this:
INSERT INTO matched_articles
SELECT *
FROM articles a
INNER JOIN info i ON a.id = i.article_id
WHERE i.tag_id = 5;
INSERT INTO unmatched_articles
SELECT *
FROM articles a
LEFT JOIN info i ON a.id = i.article_id AND a.id <> 5
WHERE a.id IS NULL
That might be faster but really, what you have is probably ok if you only have to do it once.
Not sure, if Postgres has a concept of a temporary table.
Here is how this can be done, as well.
CREATE Table #temp
AS SELECT A.ID, COUNT(i.*) AS Total
FROM Articles A
LEFT JOIN info i
ON A.id = i.Article_ID AND i.Tag_ID = 5
GROUP BY A.ID
INSERT INTO Matched_Articles
SELECT A.*
FROM Articles A INNER JOIN #temp t
ON A.ID = t.Article_ID AND T.Total = 0
DELETE FROM #Temp
WHERE Total = 0
INSERT INTO UnMatched_Articles
SELECT A.*
FROM Articles AINNER JOIN #temp t
ON A.ID = t.Article_ID
Note that I am not using any editor to try this out.
I hope this gives you hint on how I would approach this.

SQL help: COUNT aggregate, list of entries and its comment count

So, what I intended to do is to fetch a list of entries/posts with their category and user details, AND each of its total published comments. (entries, categories, users, and comments are separate tables)
This query below fetches the records fine, but it seems to skip those entries with no comments. As far as I can see, the JOINs are good (LEFT JOIN on the comments table), and the query is correct. What did I miss ?
SELECT entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, entry_categories.title AS category_title,
entry_categories.slug AS category_slug, entry_categories.parent AS category_parent,
entry_categories.can_comment AS can_comment, entry_categories.can_rate AS can_rate,
users.user_id, users.group_id, users.username, users.first_name, users.last_name,
users.avatar_small, users.avatar_big, users.score AS user_score,
COUNT(entry_comments.comment_id) AS comment_count
FROM (entries)
JOIN entry_categories ON entries.category = entry_categories.category_id
JOIN users ON entries.user_id = users.user_id
LEFT JOIN entry_comments ON entries.entry_id = entry_comments.entry_id
WHERE `entries`.`publish` = 'Y'
AND `entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL
AND `category` = 5
GROUP BY entries.entry_id, entries.title, entries.content,
entries.preview_image, entries.preview_thumbnail, entries.slug,
entries.view_count, entries.posted_on, category_title, category_slug,
category_parent, can_comment, can_rate, users.user_id, users.group_id,
users.username, users.first_name, users.last_name, users.avatar_big,
users.avatar_small, user_score
ORDER BY posted_on desc
edit: I am using MySQL 5.0
Well, you're doing a left join on entry_comments, with conditions:
`entry_comments`.`publish` = 'Y'
`entry_comments`.`deleted_at` IS NULL
For the entries with no comments, these conditions are false.
I guess this should solve the problem:
WHERE `entries`.`publish` = 'Y'
AND (
(`entry_comments`.`publish` = 'Y'
AND `entry_comments`.`deleted_at` IS NULL)
OR
`entry_comments`.`id` IS NULL
)
AND `category` = 5
In the OR condition, I put entry_comments.id, assuming this is the primary key of the entry_comments table, so you should replace it with the real primary key of entry_comments.
It's because you are setting a filter on columns in the entry_comments table. Replace the first with:
AND IFNULL(`entry_comments`.`publish`, 'Y') = 'Y'
Because your other filter on this table is an IS NULL one, this is all you need to do to allow the unmatched rows from the LEFT JOIN through.
Try changing the LEFT JOIN to a LEFT OUTER JOIN
OR
I'm no expert with this style of SQL joins (more of an Oracle man myself), but the wording of the left join is leading me to believe that it is joining entry_comments on to entries with entry_comments on the left, you really want it to be the other way around (I think).
So try something like:
LEFT OUTER JOIN entries ON entries.entry_id = entry_comments.entry_id