SQL query, view joined to table - number of results inconsistent - sql-server-2000

Apologies in advance for the vagueness of this question, but it involves a query which is too big to describe in full, and field/table names that I can't reveal. So I'm not really expecting a solution, but if someone give some advice on how I could proceed in solving it myself, I'd be grateful.
SQL Server 2000.
I have a query which joins a view and a table with an INNER JOIN and has a WHERE clause:
SELECT
view.join_field
FROM
view INNER JOIN table ON view.join_field=table.join_field
WHERE
table.other_field='EE'
# (23 rows)
This produces 23 results (it should be 1000s). If I add another WHERE clause to the above query, I get more results, instead of less:
SELECT
view.join_field
FROM
view INNER JOIN table ON view.join_field=table.join_field
WHERE
table.other_field='EE' AND
view.field2=1
This gives me a few thousand results, as was originally expected. Changing the value to 2 or 3 (the only other values present) also gives me thousands of results each, but if I change it to view.field2 IN (1,2,3) I end up with only 38 results.
Going back to the original query, which gave me 23 results, if I add the table field I have in the WHERE clause to the SELECT block, I get the right number of results:
SELECT
view.join_field,
table.other_field
FROM
view INNER JOIN table ON view.join_field=table.join_field
WHERE
table.other_field='EE'
# (8764 rows)
If I instead use a WHERE clause of table.other_field='GG' (the only other value present in the table), none of these strange things happen, and I get the expected number of results.
If I SELECT the contents of view into a temporary table, and use that in my query, I also get the thousands of rows I was expecting.
view itself is an LEFT OUTER JOIN of another view and two other tables. table, in my query, is not involved in any of the views.
Can anyone give me even the vaguest of ideas of what's going on? Are my tables or views corrupt, somehow?

Related

Poor performance with stacked joins

I'm not sure I can provide enough details for an answer, but my company is having a performance issue with an older mssql view. I've narrowed it down to the right outer joins, but I'm not familiar with the structure of joins following joins without a "ON" with each one, as in the code snippet below.
How do I write the joins below to either improve performance or to the simpler format of Join Tablename on Field1 = field2 format ?
FROM dbo.tblObject AS tblObject_2
JOIN dbo.tblProspectB2B PB ON PB.Object_ID = tblObject_2.Object_ID
RIGHT OUTER JOIN dbo.tblProspectB2B_CoordinatorStatus
RIGHT OUTER JOIN dbo.tblObject
INNER JOIN dbo.vwDomain_Hierarchy
INNER JOIN dbo.tblContactUser
INNER JOIN dbo.tblProcessingFile WITH ( NOLOCK )
LEFT OUTER JOIN dbo.enumRetentionRealization AS RR ON RR.RetentionRealizationID = dbo.tblProcessingFile.RetentionLeadTypeID
INNER JOIN dbo.tblLoan
INNER JOIN dbo.tblObject AS tblObject_1 WITH ( NOLOCK ) ON dbo.tblLoan.Object_ID = tblObject_1.Object_ID ON dbo.tblProcessingFile.Loan_ID = dbo.tblLoan.Object_ID ON dbo.tblContactUser.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.vwDomain_Hierarchy.Object_ID = tblObject_1.Domain_ID ON dbo.tblObject.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.tblProspectB2B_CoordinatorStatus.Object_ID = dbo.tblLoan.ReferralSourceContactID ON tblObject_2.Object_ID = dbo.tblLoan.ReferralSourceContactID
Your last INNER JOIN has a number of ON statements. Per this question and answer, such syntax is equivalent to a nested subquery.
That is one of the worst queries I have ever seen. Since I cannot figure out how it is supposed to work without the underlying data, this is what I suggest to you.
First find a good sample loan and write a query against this view to return where loan_id = ... Now you have a data set you chan check you changes against more easily than the, possibly, millions of records this returns. Make sure these results make sense (that right join to tbl_objects is bothering me as it makes no sense to return all the objects records)
Now start writing your query with what you think should be the first table (I would suggest that loan is the first table, if it not then the first table is Object left joined to loan)) and the where clause for the loan id.
Check your results, did you get the same loan information as teh view query with the where clause added?
Then add each join one at a time and see how it affects the query and whether the results appear to be going off track. Once you have figured out a query that gives the same results with all the tables added in, then you can try for several other loan ids to check. Once those have checked out, then run teh whole query with no where clause and check against the view results (if it is a large number you may need to just see if teh record counts match and visually check through (use order by on both things in order to make sure your results are in the same order). In the process try to use only left joins and not that combination of right and left joins (its ok to leave teh inner ones alone).
I make it a habit in complex queries to do all the inner joins first and then the left joins. I never use right joins in production code.
Now you are ready to performance tune.
I woudl guess the right join to objects is causing a problem in that it returns teh whole table and the nature of that table name and teh other joins to the same table leads me to believe that he probably wanted a left join. Without knowing the meaning of the data, it is hard to be sure. So first if you are returning too many records for one loan id, then consider if the real problem is that as tables have grown, returning too many records has become problematic.
Also consider that you can often take teh view and replace it with code to get the same results. Views calling views are a poor technique that often leads to performance issues. Often the views on top of the other views call teh same tables and thus you end up joining to them multiple times when you don;t need to.
Check your Explain plan or Execution plan depending on what database backend you have. Analysis of this should show where you might have missing indexes.
Also make sure that every table in the query is needed. This especially true when you join to a view. The view may join to 12 other tables but you only need the data from one of them and it can join to one of your tables. MAke sure that you are not using select * but only returning teh fields the view actually needs. You have inner joins so, by definition, select * is returning fields you don't need.
If your select part of teh view has a distinct in it, then consider if you can weed down the multiple records you get that made distinct needed by changing to a derived table or adding a where clause. To see what is causing the multiples, you may need to temporarily use select * to see all the columns and find out which one is not uniques and is causing the issue.
This whole process is not going to be easy or fun. Just take it slowly, work carefully and methodically and you will get there and have a query that is understandable and maintainable in the end.

Inconsistent results from BigQuery: same query, different number of rows

I noticed today that one my query was having inconsistent results: every time I run it I have a different number of rows returned (cache deactivated).
Basically the query looks like this:
SELECT *
FROM mydataset.table1 AS t1
LEFT JOIN EACH mydataset.table2 AS t2
ON t1.deviceId=t2.deviceId
LEFT JOIN EACH mydataset.table3 AS t3
ON t2.email=t3.email
WHERE t3.email IS NOT NULL
AND (t3.date IS NULL OR DATE_ADD(t3.date, 5000, 'MINUTE')<TIMESTAMP('2016-07-27 15:20:11') )
The tables are not updated between each query. So I'm wondering if you also have noticed that kind of behaviour.
I usually make queries that return a lot of rows (>1000) so a few missing rows here and there is hardly noticeable. But this query return a few row, and it varies everytime between 10 and 20 rows :-/
If a Google engineer is reading this, here are two Job ID of the same query with different results:
picta-int:bquijob_400dd739_1562d7e2410
picta-int:bquijob_304f4208_1562d7df8a2
Unless I'm missing something, the query that you provide is completely deterministic and so should give the same result every time you execute it. But you say it's "basically" the same as your real query, so this may be due to something you changed.
There's a couple of things you can do to try to find the cause:
replace select * by an explicit selection of fields from your tables (a combination of fields that uniquely determine each row)
order the table by these fields, so that the order becomes the same each time you execute the query
simplify your query. In the above query, you can remove the first condition and turn the two left outer joins into inner joins and get the same result. After that, you could start removing tables and conditions one by one.
After each step, check if you still get different result sets. Then when you have found the critical step, try to understand why it causes your problem. (Or ask here.)

Teradata: use of aliases impacts EXPLAIN estimation of time

I have a relative simple query
SELECT
, db1.something
, COALESCE(db2.something_else, 'NA') AS something2
FROM dwh.db_1 AS db1
LEFT JOIN dwh.db_2 AS db2 ON db1.some_id = db2 = some_id
EXPLAIN gives an estimated time of something more than 15 seconds.
On the other hand, explain on the following, where we basically replaced the alias with the table name:
SELECT
, db1.something
, COALESCE(db_2.something_else, 'NA') AS something2
FROM dwh.db_1 AS db1
LEFT JOIN dwh.db_2 AS db2 ON db1.some_id = db2.some_id
gives an estimated time of over 4 hours, where it seems like the system is trying to execute a product join on some spool (I can't really follow the sequence of planning steps).
I always thought that aliases are just aliases and have no impact on perf.
The estimated time is probably correct :-)
A Table-Alias is not really an alias, it replaces the tablename within that query. In Teradata using the original tablename doesn't result in an error message (as it does within most other DBMSes), but it causes a
CROSS join.
Why? Well, Teradata was implemented before there was Standard SQL, the initial query language was called TEQUEL (TEradata QUEry Language), whose syntax didn't require to list tables within FROM. A simple RETRIEVE TableName.ColumnName carried enough information for the Parser/Optimizer to resolve tablename and columnname. There's no flag to switch it off, some client tools refuse to submit it, but you can still submit RETRIEVE in BTEQ.
Within that above example you're mixing old TEQUEL and SQL, there are 3 tables for the optimizer, but only one join-condition, this results
in a CROSS join to the third table.
At least it's easy to spot in Explain. The optimizer will do this stupid join as last step, so scroll to the end and you will see joined using a product join, with a join condition of ("(1=1)").

Getting "Unexpected. Please try again." when doing large join in Google BigQuery

Ok, I have two tables. Table A has ~60k records and 145 fields. Table B has ~1k records and 9 fields. I'm doing a join between 23 fields from A to a single field on B. Each one of these joins is a LEFT OUTER join. As such, I'm selecting all 145 fields from table A (but replacing the 23 fields from the join).
If I run the joins one-by-one, results are returned in under a second. However, If I try to run the query with all the joins in place, it runs for hours then errors out with the message: "Unexpected. Please try again."
This happens if I select one field or all the fields.
Any ideas?
Of the first 6 left joins, only 2 are used elsewhere in the query. Get rid of the unused joins for starters.
Do you have indexes in [medclient.users_table] for all of the fields you are doing the left join on? If not, then consider at least adding them while this query runs.
Add a few left joins at a time, checking your actual execution plan and see which one(s) kill it.

Number of records discrepancy - only change is sorting

I have an Access 2003 database with a query that is a left outer join of a table to another query. If I didn't sort that final query, I got 42 records. If I sorted the final query by the 2 joined fields, I got 43 records. No other changes were made to the query.
To verify this, I took the query, copied it, applied the sort with no other changes, and the record count went up by one. Perplexed, I copied the results into Excel, sorted, and compared row by row. I discovered one record was duplicated (all fields were exactly the same), where there were actually no duplicate records in the source table and query.
I would think this is a bug, and I know there are a few in Access, but has anyone heard of this behavior before?
It is possible that you have a corrupt index. It may be worth taking a back up and then compacting and repairing the database, which should rebuild the indexes.
I had a similar problem on Access 2003 where I had duplicate records where 1 field was an autonumber which should up twice.
My query was :
SELECT qry_Tasks_with_Names.*, Location.Location, WorkRequests.Element, WorkRequests.Site_ID
FROM ((WorkRequests LEFT JOIN Location ON WorkRequests.Site_ID=Location.LocationID) INNER JOIN qry_Tasks_with_Names_by_Individual ON WorkRequests.Work_Request_ID=qry_Tasks_with_Names_by_Individual.Work_Request_ID) INNER JOIN qry_Tasks_with_Names ON WorkRequests.Work_Request_ID=qry_Tasks_with_Names.Work_Request_ID
WHERE (WorkRequests.Site_ID=Forms!TaskList!comboFilter_by_Site Or Forms!TaskList!comboFilter_by_Site Is Null) And (qry_Tasks_with_Names.Assigned_to_User_ID=forms!taskList!comboFilter_by_Person Or forms!taskList!comboFilter_by_Person Is Null) And (qry_Tasks_with_Names.Assigned_to_Team_ID=forms!taskList!comboFilter_by_Team Or forms!taskList!comboFilter_by_Team Is Null) And qry_Tasks_with_Names.Assigned_to_User_ID>0
ORDER BY qry_Tasks_with_Names.SLA_Due;
The above was initially created by the query designer, but it gets itself confused at this level and it also seems that I do.
Once I removed the inner join on qry_Tasks_with_Names_by_Individual all was OK.
No idea why, but hopefully this may save someone else some tears if they have the same problem.