VB6 SQL 2005 Database Index Question - sql

I have a VB app that accesses a sql database. I think it’s running slow, and I thought maybe I didn’t have the tables propery indexed. I was wondering how you would create the indexes? Here’s the situation.
My main loop is
Select * from Docrec
Order by YearFiled,DocNumb
Inside this loop I have two others databases hits.
Select * from Names
Where YearFiled = DocRec.YearFiled
and Volume = DocRec.Volume and Page = DocRec.Page
Order by SeqNumb
Select * from MapRec
Where FiledYear = DocRec.YearFiled
and Volume = DocRec.Volume and Page = DocRec.Page
Order by SeqNumb
Hopefully I made sense.

Try in one query using INNER JOIN:
SELECT * FROM Doctec d
INNER JOIN Names n ON d.YearField = n.YearField AND d.Volume = n.Volume AND d.Page = n.Page
INNER JOIN MapRec m ON m.FiledYear = n.YearFiled AND m.Volume = n.Volumen and m.Page = n.Page
ORDER BY YearFiled, DocNumb
You will have only one query to database. The problem can be that you hit database many times and get only one (or few) row(s) per time.

Off the top, one thing that would help would be determining if you really need all columns.
If you don't, instead of SELECT *, select just the columns you need - that way you're not pulling as much data.
If you do, then from SQL Server Management Studio (or whatever you use to manage the SQL Server) you'll need to look at what is indexed and what isn't. The columns you tend to search on the most would be your first candidates for an index.
Addendum
Now that I've seen your edit, it may help to look at why you're doing the queries the way you are, and see if there isn't a way to consolidate it down to one query. Without more context I'd just be guessing at more optimal queries.

In general looping through records is a poor idea. can you not do a set-based query that gives you everything you need in one pass?
As far as indexing consider any fields that you use in the ordering or where clauses and any fileds that arein joins. Primary keys are indexed as part of the setup of a primary ley but foreign keys are not. Often people forget that they need to index them as well.
Never use select * in a production environment. It is a poor practice. Do not ever return more data than you need.

I don't know if you need the loop. If all you are doing is grabbing the records in maprec that match for docrec and then the same for the second table then you can do this without a loop using inner join syntax.
select columnlist from maprec m inner join docrec d on (m.filedyear = d.yearfield and m.volume = d.volume and m.page=d.page)
and then again for the second table...
You could also trim up your queries to return only the columns needed instead of returning all if possible. This should help performance.
To create an index by yourself in SQL Server 2005, go to the design of the table and select the Manage Indexes & Keys toolbar item.
You can use the Database Engine Tuning Advisor. You can create a trace (using sql server profiler) of your queries and then the Advisor will tell you and create the indexes needed to optimize for your query executions.
UPDATE SINCE YOUR FIRST COMMENT TO ME:
You can still do this by running the first query then the second and third without a loop as I have shown above. Here's the trick. I am thinking you need to tie the first to the second and third one hence why you did a loop.
It's been a while since I have done VB6 recordsets BUT I do recall the ability to filter the recordset once returned from the DB. So, in this case, you could keep your loop but instead of calling SQL every time in the loop you would simply filter the resulting recordset data based on the first record. You would initialize / load the second & third query before this loop to load the data. Using the syntax above that I gave will load in each of those tables the matching to the parent table (docrec).
With this, you will still only hit the DB three times but still retain the loop you need to have the parent docrec table traversed so you can do work on it AND the child tables when you do have a match.
Here's a few links on ado recordset filtering....
http://www.devguru.com/technologies/ado/QuickRef/recordset_filter.html
http://msdn.microsoft.com/en-us/library/ee275540(BTS.10).aspx
http://www.w3schools.com/ado/prop_rs_filter.asp
With all this said.... I have this strange feeling that perhaps it could be solved with just a left join on your tables?
select * from docrec d
left join maprec m on (d.YearFiled= m.FiledYear and d.Volume = m.Volume and d.Page = m.Page)
left join names n on (d.YearFiled = n.YearFiled and d.Volume = n.Volume and d.Page = n.Page)
this will return all DocRec records AND add all the maprec values and name values where it matches OR NULL if not.
If this fits your need it will only hit the DB once.

Related

SQL - Make Join Query Faster

I've made a query that selects 2 values from 2 tables. I need to run this query about 32 times when a visitor visits my website. This makes the page quite slow (it takes over 5 seconds to fully load).
The query looks like this:
SELECT tmdb.name, patch.sfo_title
FROM tmdb
RIGHT JOIN patch
ON tmdb.titleid = CONCAT(patch.cusa, '_00')
WHERE cusa = :titleid
LIMIT 1
is there any way to make this query faster? The query isn't the biggest operation if I look at it, so I'm not really sure why it's so slow?
I would write the query as a left join (out of preference, not performance):
SELECT t.name, p.sfo_title
FROM patch p LEFT JOIN
tmdb t
ON t.titleid = CONCAT(p.cusa, '_00')
WHERE p.cusa = :titleid
LIMIT 1;
Then for performance, I would recommend indexes on patch(cusa, sfo_title) and t(titleid).
Note that the use of LIMIT without ORDER BY is suspicious, although you might have reasons for it.
You are joining two tables based on a CALCULATED field? No wonder it is slow. I have no idea how your tables get maintained, but you need to get that concat'ed value of CUSA into the data base as a separate field and get it indexed, as Gordon Linoff suggested. You could even maintain it through On Insert and On Update triggers. Personally, I would examine why you have similar but different keys in your two tables and try to rationalize it down to one. That Concat(CUSA, '_00' ) looks suspiciously like an opportunity to simplify the application.

Poor performance with stacked joins

I'm not sure I can provide enough details for an answer, but my company is having a performance issue with an older mssql view. I've narrowed it down to the right outer joins, but I'm not familiar with the structure of joins following joins without a "ON" with each one, as in the code snippet below.
How do I write the joins below to either improve performance or to the simpler format of Join Tablename on Field1 = field2 format ?
FROM dbo.tblObject AS tblObject_2
JOIN dbo.tblProspectB2B PB ON PB.Object_ID = tblObject_2.Object_ID
RIGHT OUTER JOIN dbo.tblProspectB2B_CoordinatorStatus
RIGHT OUTER JOIN dbo.tblObject
INNER JOIN dbo.vwDomain_Hierarchy
INNER JOIN dbo.tblContactUser
INNER JOIN dbo.tblProcessingFile WITH ( NOLOCK )
LEFT OUTER JOIN dbo.enumRetentionRealization AS RR ON RR.RetentionRealizationID = dbo.tblProcessingFile.RetentionLeadTypeID
INNER JOIN dbo.tblLoan
INNER JOIN dbo.tblObject AS tblObject_1 WITH ( NOLOCK ) ON dbo.tblLoan.Object_ID = tblObject_1.Object_ID ON dbo.tblProcessingFile.Loan_ID = dbo.tblLoan.Object_ID ON dbo.tblContactUser.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.vwDomain_Hierarchy.Object_ID = tblObject_1.Domain_ID ON dbo.tblObject.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.tblProspectB2B_CoordinatorStatus.Object_ID = dbo.tblLoan.ReferralSourceContactID ON tblObject_2.Object_ID = dbo.tblLoan.ReferralSourceContactID
Your last INNER JOIN has a number of ON statements. Per this question and answer, such syntax is equivalent to a nested subquery.
That is one of the worst queries I have ever seen. Since I cannot figure out how it is supposed to work without the underlying data, this is what I suggest to you.
First find a good sample loan and write a query against this view to return where loan_id = ... Now you have a data set you chan check you changes against more easily than the, possibly, millions of records this returns. Make sure these results make sense (that right join to tbl_objects is bothering me as it makes no sense to return all the objects records)
Now start writing your query with what you think should be the first table (I would suggest that loan is the first table, if it not then the first table is Object left joined to loan)) and the where clause for the loan id.
Check your results, did you get the same loan information as teh view query with the where clause added?
Then add each join one at a time and see how it affects the query and whether the results appear to be going off track. Once you have figured out a query that gives the same results with all the tables added in, then you can try for several other loan ids to check. Once those have checked out, then run teh whole query with no where clause and check against the view results (if it is a large number you may need to just see if teh record counts match and visually check through (use order by on both things in order to make sure your results are in the same order). In the process try to use only left joins and not that combination of right and left joins (its ok to leave teh inner ones alone).
I make it a habit in complex queries to do all the inner joins first and then the left joins. I never use right joins in production code.
Now you are ready to performance tune.
I woudl guess the right join to objects is causing a problem in that it returns teh whole table and the nature of that table name and teh other joins to the same table leads me to believe that he probably wanted a left join. Without knowing the meaning of the data, it is hard to be sure. So first if you are returning too many records for one loan id, then consider if the real problem is that as tables have grown, returning too many records has become problematic.
Also consider that you can often take teh view and replace it with code to get the same results. Views calling views are a poor technique that often leads to performance issues. Often the views on top of the other views call teh same tables and thus you end up joining to them multiple times when you don;t need to.
Check your Explain plan or Execution plan depending on what database backend you have. Analysis of this should show where you might have missing indexes.
Also make sure that every table in the query is needed. This especially true when you join to a view. The view may join to 12 other tables but you only need the data from one of them and it can join to one of your tables. MAke sure that you are not using select * but only returning teh fields the view actually needs. You have inner joins so, by definition, select * is returning fields you don't need.
If your select part of teh view has a distinct in it, then consider if you can weed down the multiple records you get that made distinct needed by changing to a derived table or adding a where clause. To see what is causing the multiples, you may need to temporarily use select * to see all the columns and find out which one is not uniques and is causing the issue.
This whole process is not going to be easy or fun. Just take it slowly, work carefully and methodically and you will get there and have a query that is understandable and maintainable in the end.

Is a Join Faster than two queries

I know this question has been asked before,
example, and I do agree that one query with a join is faster than performing another query for each record returned by the firs query.
However, since with a join you tend to generate redundant field will this slow the network ?
Let's say I have a table hotel and HOTEL has a number of images in a table HOTEL_IMAGE. HOTEL has 20 fields. performing a join on HOTEL_IMAGE will produce 20 fields for each hotel image. will this query still be faster over the network ?
That depends a lot on your actual data, but from what I have seen, if you have a well-parameterized DB with fresh statistics, it is much better to put the join in SQL and let the DB figure out what to do.
Anyway, DB queries are in my opinion the first things you want to profile. It is not a coincidence that any good DBMS has a lot of performance measuring tools. And you need to profile with data as close to actual data as possible (recent copies of your production environment are best).
Don't use select * but only the columns that you need. IF you do this, a join will be faster (not sure why you would ever want to do this with 2 queries, you have to make 2 connections to your database ect.)
As a solution to avoid duplication in joined data, you can return multiple recordsets from single query, if your database supports it. One recordset will return the master records, and second - the detail records plus the key fields from master query.
select ID, Name, ... from HOTEL where <.... your criteria>;
select h.ID as HotelID, i.ID, i.Description, i.ImageFile, .... from HOTEL_IMAGE i
join HOTEL h on h.ID = i.HotelID and ( <.... same criteria for HOTEL> )
Not sure if the query on the master table would be cached, so the second select will reuse it, but it will save traffic for sure.
We are using this approach for queries, which tend to return multi-level joined results.

Is there a way to rename a similarly named column from two tables when performing a join?

I have two tables that I am joining with the following query...
select *
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
Which works great but some of the columns from Partners I want to be replaced with similarly named columns from OrganizationMembers. The number of columns I want to replace in the joined table are very few, shouldn't be more than 3.
It is possible to get the result I want by selectively choosing the columns I want in the resulting join like so...
select om.MemberID,
p.ParID,
p.Levelz,
p.encryptedSecureToken,
p.PartnerGroupName,
om.EmailAddress,
om.FirstName,
om.LastName
from Partners p
inner join OrganizationMembers om on p.ParID = om.OrganizationId
where om.EmailAddress = 'my_email#address.com'
and om.deleted = 0
But this creates a very long sequence of select p.a, p.b, p.c, p.d, ... etc ... which I am trying to avoid.
In summary I am trying to get several columns from the Partners table and up to 3 columns from the OrganizationMembers table without having a long column specification sequence at the beginning of the query. Is it possible or am I just dreaming?
select om.MemberID as mem
Use th AS keyword. This is called aliasing.
You are dreaming in your implementation.
Also, as a best practice, select * is something that is typically frowned upon by DBA's.
If you want to limit the results or change anything you must explicitly name the results, as a potential "stop gap you could do something like this.
SELECT p.*, om.MemberId, etc..
But this ONLY works if you want ALL columns from the first table, and then selected items.
Try this:
p.*,
om.EmailAddress,
om.FirstName,
om.LastName
You should never use * though. Always specifying the columns you actually need makes it easier to find out what happens.
But this creates a very long sequence
of select p.a, p.b, p.c, p.d, ... etc
... which I am trying to avoid.
Don't avoid it. Embrace it!
There are lots of reasons why it's best practice to explicity list the desired columns.
It's easier to do searches for where a particular column is being used.
The behavior of the query is more obvious to someone who is trying to maintain it.
Adding a column to the table won't automatically change the behavior of your query.
Removing a column from the table will break your query earlier, making bugs appear closer to the source, and easier to find and fix.
And anything that uses the query is going to have to list all the columns anyway, so there's no point being lazy about it!

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.
I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.
It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.
I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.
This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.
For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.