SQL - Query Performance Slow When Joining Two Views with Where Clause - sql

I am running the following query in SQL Server which gives me a result in just 5 seconds with 80,000+ rows and 75+ columns:
SELECT * FROM VIEW_ITEM_STOCK_LEDGER AS ItemLedger
LEFT JOIN VIEW_PRODUCT_WITH_CHARACTERISTIC_COLUMN_DATA AS Characteristics
ON Characteristics.Code = ItemLedger.ItemCode
But when I add a WHERE clause to the query it takes too long time to execute the query. It takes more than 5 minutes for 13450 records.
SELECT * FROM VIEW_ITEM_STOCK_LEDGER AS ItemLedger
LEFT JOIN VIEW_PRODUCT_WITH_CHARACTERISTIC_COLUMN_DATA AS Characteristics
ON Characteristics.Code = ItemLedger.ItemCode
WHERE (ItemLedger.VoucherTypeCode=204 OR ItemLedger.VoucherTypeCode=205)
What could be the reason? How do I solve this?

It sounds to me like there is no index on the column VoucherTypeCode.
If VoucherTypeCode is a column of a table in your database, you can try indexing that column (see this article about creating indexes on MS Docs)
If VoucherTypeCode is a product of multiple columns, you can try indexing the view itself (see this Article about indexed views on sqlshack.com)
Alternatively, if you can't/don't want to create an index, check out the accepted answer in this StackOverflow-Thread

Related

Query process steps of select query

I have confused with query process steps of select query. I read some docs, select query will run like this
1. Getting Data (From, Join)
2. Row Filter (Where)
3. Grouping (Group by)
4. Group Filter (Having)
5. Return Expressions (Select)
6. Order & Paging (Order by & Limit / Offset)
I retry test run a query join A table ( 70m records ) and B table( 75m records)
select *
from A join B on A.code = B.box_code
where B.box_code = '123'
compare with
select *
from A join (select * from B where box_code = '123' ) on A.code = B.box_code
I assume the first query will run slower than second query. Because the first query will take time when mapping large data while second query filters box_code before mapping data. But two queries run the same. Why did that happen?
I searched google, it may be related to clustered index, but I am not sure.
1 more question , why clustered index can get where condition to filter data before join ? i think the query will run join before where
Where did I get it wrong?
illustrating images
first query
second query
Thanks
This part is wrong...
select query will run like this
Getting Data (From, Join)
Row Filter (Where)
Grouping (Group by)
Group Filter (Having)
Return Expressions (Select)
Order & Paging (Order by & Limit / Offset)
Oracle has a number of operations that it can perform to satisfy a query. Some operations may require child operations to be completed first. Operations include things like TABLE ACCESS BY INDEX ROWID, INDEX RANGE SCAN, and NESTED LOOPS.
Oracle's optimizer decides which operations are necessary and in what order. It very often will, for example, apply WHERE conditions to a row source before joining that row source to another one. It does that for exactly the reason you imply in your post: because it is probably faster to filter a million rows down to 10 before doing a join.
Oracle maintains an elaborate set of statistics on each table and column so that it can estimate when you submit your query what is likely to work well.
Theoretically, your job when writing SQL is to describe what you want and leave the how part to Oracle. In practice, the how part is still important, so your question is a very good one. Read Oracle's documentation on the subject, titled "Oracle Database SQL Tuning Guide". There is a version for each release of the database and they're available for free online (see: https://docs.oracle.com).

Fastest execution time for querying on Big size table

i need advice how to get fastest result for querying on big size table.
I am using SQL Server 2012, my condition is like this:
I have 5 tables contains transaction record, each table has 35 millions of records.
All tables has 14 columns, the columns i need to search is GroupName, CustomerName, and NoRegistration. And I have a view that contains 5 of all these tables.
The GroupName, CustomerName, and NoRegistration records is not unique each tables.
My application have a function to search to these column.
The query is like this:
Search by Group Name:
SELECT DISTINCT(GroupName) FROM TransactionRecords_view WHERE GroupName LIKE ''+#GroupName+'%'
Search by Name:
SELECT DISTINCT(CustomerName) AS 'CustomerName' FROM TransactionRecords_view WHERE CustomerName LIKE ''+#Name+'%'
Search by NoRegistration:
SELECT DISTINCT(NoRegistration) FROM TransactionRecords_view WHERE LOWER(NoRegistration) LIKE LOWER(#NoRegistration)+'%'
My question is how can i achieve fastest execution time for searching?
With my condition right now, every time i search, it took 3 to 5 minutes.
My idea is to make a new tables contains the distinct of GroupName, CustomerName, and NoRegistration from all 5 tables.
Is my idea is make execution time is faster? or any other idea?
Thank you
EDIT:
This is query for view "TransactionRecords_view"
CREATE VIEW TransactionRecords_view
AS
SELECT * FROM TransactionRecords_1507
UNION ALL
SELECT * FROM TransactionRecords_1506
UNION ALL
SELECT * FROM TransactionRecords_1505
UNION ALL
SELECT * FROM TransactionRecords_1504
UNION ALL
SELECT * FROM TransactionRecords_1503
You must show sql of TransactionRecords_view. Do you have indexes? What is the collation of NoRegistration column? Paste the Actual Execution Plan for each query.
Ok, so you don't need to make those new tables. If you create Non-Clustered indexes based upon these fields it will (in effect) do what you're after. The index will only store data on the columns that you indicate, not the whole table. Be aware, however, that indexes are excellent to aid in SELECT statements but will negatively affect any write statements (INSERT, UPDATE etc).
Next you want to run the queries with the actual execution plan switched on. This will show you how the optimizer has decided to run each query (in the back end). Are there any particular issues here, are any of the steps taking up a lot of the overall operator cost? There are plenty of great instructional videos about execution plans on youtube, check them out if you haven't looked at exe plans before.
Did you try to check if there were missing indexes with the actual execution plan ?
Moreover, as you use clause on varchar, I've heard about Full-Text Search.. maybe it can be useful for you :
https://msdn.microsoft.com/en-us/library/ms142571(v=sql.120).aspx

Why SQL query can take so long time to return results?

I have an SQL query as simple as:
select * from recent_cases where user_id=1000000 and case_id=10095;
It takes up to 0.4 seconds to execute it in Oracle. And when I do 20 requests in a row, it takes > 10s.
The table 'recent_cases' has 4 columns: ID, USER_ID, CASE_ID and VISITED_DATE. Currently there are only 38 records in this table.
Also, there are 3 indexes on this table: on ID column, on USER_ID column, and on (USER_ID, CASE_ID) columns pair.
Any ideas?
One theory -- the table has a very large data segment and high water mark near the end, but the statistics are not prompting the optimiser to use an index. Therefore you're getting a slow full table scan. You could ALTER TABLE ... MOVE and rebuild the indexes to fix such a problem, or COALESCE it.
Oracle Databases have a function called "analyze table". This function can speed up select statements a lot, even if there are just a few rows in the table.
Here are some links which might help you:
http://www.dba-oracle.com/t_oracle_analyze_table.htm
http://docs.oracle.com/cd/B28359_01/server.111/b28310/general002.htm

Avoid full table scan

I have an SQL select query to be tuned. In the query there is a View in from clause which has been formed through 4 tables. When this query is executed Full table scan takes place on all these 4 tables which causes CPU spikes. The four tables have valid indexes built on them.
The query looks similar to this:
SELECT DISTINCT ID, TITLE,......
FROM FINDSCHEDULEDTESTCASE
WHERE STEP_PASS_INDEX = 1 AND LOWER(COMPAREANAME) ='abc' ORDER BY ID;
The dots indicate that there are many more columns. Here FINDSCHEDULEDTESTCASE is a view on four tables.
Can someone guide me how to avoid full table scan on those four tables.
In any case using your condition
AND LOWER(COMPAREANAME) ='abc'
you'll have the full scan of COMPAREANAME values because for each value function LOWER must be calculated.
It depends on so many things!
SELECT DISTINCTG ID, TITLE, ......
Depending on how many columns you SELECT, it is possible that SQL Server decides to do a table scan instead of using your indexes.
Also, depending on your "WHERE" conditions, SQL Server can also decides to do a table scan instead of using your indexes.
Which version of SQL Server are you using?
There can be ways to improve the indexes on the tables, if, for an example, the conditions in the "WHERE" represents less than 50% of the rows, and if you are using SQL 2008. (With filtered indexes http://msdn.microsoft.com/en-us/library/ms188783.aspx )
Or you can create indexes on views (http://msdn.microsoft.com/en-us/library/ms191432.aspx )
There really is not enough details in your question to be able to really help you.

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).
Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.
You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.
When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.