Simple MySQL Query taking 45 seconds (Gets a record and its "latest" child record) - sql

I have a query which gets a customer and the latest transaction for that customer. Currently this query takes over 45 seconds for 1000 records. This is especially problematic because the script itself may need to be executed as frequently as once per minute!
I believe using subqueries may be the answer, but I've had trouble constructing it to actually give me the results I need.
SELECT
customer.CustID,
customer.leadid,
customer.Email,
customer.FirstName,
customer.LastName,
transaction.*,
MAX(transaction.TransDate) AS LastTransDate
FROM customer
INNER JOIN transaction ON transaction.CustID = customer.CustID
WHERE customer.Email = '".$email."'
GROUP BY customer.CustID
ORDER BY LastTransDate
LIMIT 1000
I really need to get this figured out ASAP. Any help would be greatly appreciated!

Make sure you have an index for transaction.CustID, and another one for customer.Email.
Assuming customer.CustID is a primary key, this should already be indexed.
You can create an index as follows:
CREATE INDEX ix_transaction_CustID ON transaction(CustID);
CREATE INDEX ix_customer_Email ON customer(Email);
As suggested in the comments, you can use the EXPLAIN command to understand if the query is using indexes correctly.

Related

SQL Query - select multiple rows from 2 tables

I'm new to SQL and recently saw this question, which is described on the [attached picture]
. Any suggestions on how to solve it? Thanks in advance.
In most cases such requirements are better to resolve by starting from the end. Let's try that: "have more than 3 items ... and total price".
select InvoiceId, sum(price)
from InvoiceItem
group by InvoiceId
having count(1) > 3;
This query gives you the overall price for every invoice that has more than 3 items. You might be wondering why there is that funny "having" clause and not the "where" clause.
The database executes first "from" and "where" parts of the query to get the data and only after that using aggregations (sum, count etc.) is possible, so they are to be specified afterwards.
The query above actually returns all the data from requirement, but I assume that whoever gave you this task (that was a teacher, right?) was asking you to retrieve all the data from Invoices table that correspond to the records within the InvoiceItem table that has more than 3 items and so on.
So, for now all you have left is to join the Invoice table with a query from above
select i.id, i.customername, i.issuedate, sum(it.price)
from InvoiceItem it
join Invoice i on it.invoiceid = i.id
group by i.id, i.customername, i.issuedate
having count(1) > 3;
Teacher might ask you why did you use count(1) and not count(*) or count(id) or something. Just tell him/her all these things are equal so you picked just one.
Now it should be working fine presumably. I did not tested it at all as you did not provide the data to test it on and it is late so I am pretty tired. Because of this you might needed to fix some typos or syntax errors I probably made - but hey let's keep in mind you still have to do something on your own with this task.

How do you JOIN tables to a view using a Vertica DB?

Good morning/afternoon! I was hoping someone could help me out with something that probably should be very simple.
Admittedly, I’m not the strongest SQL query designer. That said, I’ve spent a couple hours beating my head against my keyboard trying to get a seemingly simple three way join working.
NOTE: I'm querying a Vertica DB.
Here is my query:
SELECT A.CaseOriginalProductNumber, A.CaseCreatedDate, A.CaseNumber, B.BU2_Key as BusinessUnit, C.product_number_desc as ModelNumber
FROM pps_sfdc.v_Case A
INNER JOIN reference_data.DIM_PRODUCT_LINE_HIERARCHY B
ON B.PL_Key = A.CaseOriginalProductLine
INNER JOIN reference_data.DIM_PRODUCT C
ON C.product_line_code = A.CaseOriginalProductLine
WHERE B.BU2_Key = 'XWT'
LIMIT 20
I have a view (v_Case) that I’m trying to join to two other tables so I can lookup a value from each of them. The above query returns identical data on everything EXCEPT the last column (see below). It's like it's iterating through the last column to pull out the unique entries, sort of like a "GROUP BY" clause. What SHOULD be happening is that I get unique rows with specific "BusinessUnit" and "ModelNumber" for that record.
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 1
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 2
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 3
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 4
I modeled my solution after this post:
How to deal with multiple lookup tables for beginners of SQL?
What am I doing wrong?
Thank you for any help you can provide.
Data issue. General rule in trouble shooting these is the column that is distinct (in this case C.product_number_desc as ModelNumber) for each record is generally where the issue is going to be...and why I pointed you towards dim_product.
If you receive duplicates, this query below will help identify if this table is giving you the issues. Remember key in this statement can be multiple fields...whatever you are joining the table on:
Select key,count(1) from table group by key having count(1)>1
Other options for the future...don't assume it's your code, duplicates like this almost always point towards dirty data (other option is you are causing cross joins because keys are not correct). If you comment out the 'c' table and the column referred to in the select clause, you would have received one row...hence your dupes were coming from the 'c' table here.
Good luck with it

What is the best way to reduce sql queries in my situation

Here is the situation,each page will show 30 topics,so I had execute 1 sql statements at least,besides,I also want to show how many relpies with each topic and who the author is,thus
I have to use 30 statements to count the number of replpies and use other 30 statements to find the author.Finally,I got 61 statements,I really worry about the efficiency.
My tables looks like this:
Topic Reply User
------- ---------- ------------
id id id
title topic_id username
... ...
author_id
You should look into joining tables during a query.
Joins in SQLServer http://msdn.microsoft.com/en-us/library/ms191517.aspx
Joins in MySQL http://dev.mysql.com/doc/refman/5.0/en/join.html
As an example, I could do the following:
SELECT reply.id, reply.authorid, reply.text, reply.topicid,
topic.title,
user.username
FROM reply
LEFT JOIN topic ON (topic.id = reply.topicid)
LEFT JOIN user ON (user.id = reply.authorid)
WHERE (reply.isactive = 1)
ORDER BY reply.postdate DESC
LIMIT 10
If I read your requirements correctly, you want the result of the following query:
SELECT Topic.title, User.username, COUNT(Reply.topic_id) Replies
FROM Topic, User, Reply
WHERE Topic.id = Reply.topic_id
AND Topic.author_id = User.id
GROUP BY Topic.title, User.username
When I was first starting out with database driven web applications I had similar problems. I then spent several years working in a database rich environment where I actually learned SQL. If you intend to continue developing web applications (which I find are very fun to create) it would be worth your time to pick up a book or checking out some how-to's on basic and advance SQL.
One thing to add, on top of JOINS
It may be that your groups of data do not match or relate, so JOINs won't work. Another way: you may have 2 main chunks of data that is awkward to join.
Stored procedures can return multiple result sets.
For example, for a summary page you could return one aggregate result set and another "last 20" result set in one SQL call. To JOIN the 2 is awkward because it doesn't "fit" together.
You certainly can use some "left joins" on this one, however since the output only changes if someone updates/adds to your tables you could try to cache it in a xml/text file. Another way could be to build in some redundancy by adding another row to the topic table that keeps the reply count, username etc... and update them only if changes occur...

Slow but simple Query, how to make it quicker?

I have a database which is 6GB in size, with a multitude of tables however smaller queries seem to have the most problems, and want to know what can be done to optimise them for example there is a Stock, Items and Order Table.
The Stock table is the items in stock this has around 100,000 records within with 25 fields storing ProductCode, Price and other stock specific data.
The Items table stores the information about the items there are over 2,000,000 of these with over 50 fields storing Item Names and other details about the item or product in question.
The Orders table stores the Orders of Stock Items, which is the when the order was placed plus the price sold for and has around 50,000 records.
Here is a query from this Database:
SELECT Stock.SKU, Items.Name, Stock.ProductCode FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
WHERE (Stock.Status = 1 OR Stock.Status = 2) AND Order.Customer = 12345
ORDER BY Order.OrderDate DESC;
Given the information here what could be done to improve this query, there are others like this, what alternatives are there. The nature of the data and the database cannot be detailed further however, so if general optmisation tricks and methods are given these will be fine, or anything which applies generally to databases.
The Database is MS SQL 2000 on Windows Server 2003 with the latest service packs for each.
DB Upgrade / OS Upgrade are not options for now.
Edit
Indices are Stock.SKU, Items.ProductCode and Orders.OrderID on the tables mentioned.
Execution plan is 13-16 seconds for a Query like this 75% time spent in Stock
Thanks for all the responses so far - Indexing seems to be the problem, all the different examples given have been helpful - dispite a few mistakes with the query, but this has helped me a lot some of these queries have run quicker but combined with the index suggestions I think I might be on the right path now - thanks for the quick responses - has really helped me and made me consider things I did not think or know about before!
Indexes ARE my problem added one to the Foriegn Key with Orders (Customer) and this
has improved performance by halfing execution time!
Looks like I got tunnel vision and focused on the query - I have been working with DBs for a couple of years now, but this has been very helpful. However thanks for all the query examples they are combinations and features I had not considered may be useful too!
is your code correct??? I'm sure you're missing something
INNER JOIN Batch ON Order.OrderID = Orders.OrderID
and you have a ) in the code ...
you can always test some variants against the execution plan tool, like
SELECT
s.SKU, i.Name, s.ProductCode
FROM
Stock s, Orders o, Batch b, Items i
WHERE
b.OrderID = o.OrderID AND
s.ProductCode = i.ProductCode AND
s.Status IN (1, 2) AND
o.Customer = 12345
ORDER BY
o.OrderDate DESC;
and you should return just a fraction, like TOP 10... it will take some milliseconds to just choose the TOP 10 but you will save plenty of time when binding it to your application.
The most important (if not already done): define your primary keys for the tables (if not already defined) and add indexes for the foreign keys and for the columns you are using in the joins.
Did you specify indexes? On
Items.ProductCode
Stock.ProductCode
Orders.OrderID
Orders.Customer
Sometimes, IN could be faster than OR, but this is not as important as having indexes.
See balexandre answer, you query looks wrong.
Some general pointers
Are all of the fields that you are joining on indexed?
Is the ORDER BY necessary?
What does the execution plan look like?
BTW, you don't seem to be referencing the Order table in the question query example.
Table index will certainly help as Cătălin Pitiș suggested.
Another trick is to reduce the size of the join rows by either use sub select or to be more extreme use temp tables. For example rather than join on the whole Orders table, join on
(SELECT * FROM Orders WHERE Customer = 12345)
also, don't join directly on Stock table join on
(SELECT * FROM Stock WHERE Status = 1 OR Status = 2)
Setting the correct indexes on the tables is usually what makes the biggest difference for performance.
In Management Studio (or Query Analyzer for earlier versions) you can choose to view the execution plan of the query when you run it. In the execution plan you can see what the database is really doing to get the result, and what parts takes the most work. There are some things to look for there, like table scans, that usually is the most costly part of a query.
The primary key of a table normally has an index, but you should verify that it's actually so. Then you probably need indexes on the fields that you use to look up records, and fields that you use for sorting.
Once you have added an index, you can rerun the query and see in the execution plan if it's actually using the index. (You may need to wait a while after creating the index for the database to build the index before it can use it.)
Could you give it a go?
SELECT Stock.SKU, Items.Name, Stock.ProductCode FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID AND (Order.Customer = 12345) AND (Stock.Status = 1 OR Stock.Status = 2))
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
ORDER BY Order.OrderDate DESC;
Elaborating on what Cătălin Pitiș said already: in your query
SELECT Stock.SKU, Items.Name, Stock.ProductCode
FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
WHERE (Stock.Status = 1 OR Stock.Status = 2) AND Order.Customer = 12345
ORDER BY Order.OrderDate DESC;
the criterion Order.Customer = 12345 looks very specific, whereas (Stock.Status = 1 OR Stock.Status = 2) sounds unspecific. If this is correct, an efficient query consists of
1) first finding the orders belonging to a specific customer,
2) then finding the corresponding rows of Stock (with same OrderID) filtering out those with Status in (1, 2),
3) and finally finding the items with the same ProductCode as the rows of Stock in 2)
For 1) you need an index on Customer for the table Order, for 2) an index on OrderID for the table Stock and for 3) an index on ProductCode for the table Items.
As long your query does not become much more complicated (like being a subquery in a bigger query, or that Stock, Order and Items are only views, not tables), the query optimizer should be able to find this plan already from your query. Otherwise, you'll have to do what kuoson is suggesting (but the 2nd suggestion does not help, if Status in (1, 2) is not very specific and/or Status is not indexed on the table Status). But also remember that keeping indexes up-to-date costs performance if you do many inserts/updates on the table.
To shorten my answer I gave 2 hours ago (when my cookies where switched off):
You need three indexes: Customer for table Order, OrderID for Stock and ProductCode for Items.
If you miss any of these, you'll have to wait for a complete table scan on the according table.

How can I write this summing query?

I didn't design this table, and I would redesign it if I could, but that's not an option for me.
I have this table:
Transactions
Index --PK, auto increment
Tenant --this is a fk to another table
AmountCharged
AmountPaid
Balance
Other Data
The software that is used calculates the balance each time from the previous balance like this:
previousBalance - (AmountPaid - AmountCharged)
Balance is how much the tenant really owes.
However, the program uses Access and concurrent users, and messes up. Big time.
For example: I have a tenant that looks like this:
Amount Charged | Amount Paid | Balance
350 0 350
440 0 790
0 350 -350 !
0 440 -790
I want to go though and reset all the balances to what they should be, so I'd have some sort of running total. I don't know if Access can use variables like SP's or not.
I don't even know how to start on this, I'd assume it'd be a query with a subquery to sum all the charges/payments before it's index, but I don't know how to write it.
How can I do this?
Edit:
I am using Access 97
Assuming Index is incremental, and higher values --> later transaction dates, you can use a self-join with a >= condition in the join clause, something like this:
select
a.[Index],
max(a.[Tenant]) as [Tenant],
max(a.[AmountCharged]) as [AmountCharged],
max(a.[AmountPaid]) as [AmountPaid],
sum(
iif(isnull(b.[AmountCharged]),0,b.[AmountCharged])+
iif(isnull(b.[AmountPaid]),0,b.[AmountPaid])
) as [Balance]
from
[Transactions] as a
left outer join
[Transactions] as b on
a.[Tenant] = b.[Tenant] and
a.[Index] >= b.[Index]
group by
a.[Index];
Access SQL is fiddly; there may be some syntax errors above, but that's the general idea. To create this query in the query designer, add the Transactions table twice, join them on Tenant and Index, and then edit the join (if possible).
You could do the same with a subquery, something like:
select
[Index],
[Tenant],
[AmountCharged],
[AmountPaid],
(
select
sum(
iif(isnull(b.[AmountCharged]),0,b.[AmountCharged])+
iif(isnull(b.[AmountPaid]),0,b.[AmountPaid])
)
from
[Transactions] as b
where
[Transactions].[Tenant] = b.[Tenant] and
[Transactions].[Index] >= b.[Index]
) as [Balance]
from
[Transactions];
Once you have calculated the proper balances, use an update query to update the table, by joining the Transactions table to the select query defined above on Index. You could probably combine it into one update query, but that would make it more difficult to test.
If all the records have a sequnecing number (with no gaps in between) you can try the following: create a query where you link the table to itself. In the join, you spicify that you want to link the tables with Id = Id - 1. That way, you link each record to its previous record.
If ou do not have a column that can be used for this, try adding an autonumber column.
Other option is to write some simple lines in VBA to loop over the records and update the values. If it is a one-off operation, I think that will be the easiest if you are not very experienced with sql.