SQL query execute takes more time - sql

here i combine two table and get the result.
SELECT *
FROM dbo.LabSampleCollection
WHERE CONVERT(nvarchar(20), BillNo) + CONVERT(Nvarchar(20), ServiceCode)
NOT IN (SELECT CONVERT(nvarchar(20), BillNo) + CONVERT(Nvarchar(20), ServiceCode)
FROM dbo.LabResult)
the problem is Its take more time to execute. is there is any alternative way to handle this.

SELECT *
FROM dbo.LabSampleCollection sc
WHERE NOT EXISTS ( SELECT BillNo
FROM dbo.LabResult r
WHERE r.BillNo = sc.BillNo
AND r.ServiceCode = sc.Servicecode)
No need to combine the two fields, just check if both are available in the same record. It would also be better to replace the * with the actual columns that you wish to retrieve. (The selected column BillNo in the second select state is just there to limit the results of the second query.

Are you familiar with query execution plans, if not then i strongly recommend you read up on them? If you are going to be writing queries and troubleshooting/trying to improve performance they are one of the most useful tools (along with some basic understanding of what they are how SQL server optimization engine works).
You can access them from SSMS via the activity monitor or by running the query itself (us the Income actual execution plan button or ctrl-M) and they will tell you exactly which part of the query is the most inefficient and why. There are many very good articles on the web on how to improve performance using this valuable tool e.g. https://www.simple-talk.com/sql/performance/execution-plan-basics/

Related

Need help regarding running multiple queries in Big Query

I have some queries that I want to run in a sequential Manner. Is it possible to schedule multiple queries under one scheduled query in Big Query? Thanks
tack.imgur.com/flUN4.jpg
If you don't need all of the intermediate tables and are just interested in the final output... consider using CTEs.
with first as (
select *, current_date() as todays_date from <table1>
),
second as (
select current_date(), concat(field1,field2) as new_field, count(*) as ct
from first
group by 1,2
)
select * from second
You can chain together as many of these as needed.
If you do need all of these intermediate tables materialized, you are venturing into ETL and orchestration tools (dbt, airflow, etc) or will need to write a custom script to execute several commands sequentially.
Not currently, but an alpha program for scripting support in BigQuery was announced at Google Cloud Next in April. You can follow the relevant feature request for updates. In the meantime, you could consider using Cloud Composer to execute multiple sequential queries or an App Engine cron with some code to achieve sequential execution on a regular basis.
Edit (October 2019): support for scripting and stored procedures is now in beta. You can submit multiple queries separated with semi-colons and BigQuery is able to run them now.
I'm not 100% sure if this is what you're looking for, but I'm confident that you won't need to orchestrate many queries to do this. It may be as simple to use the INSERT...SELECT syntax, like this:
INSERT INTO
YourDataset.AdServer_Refine
SELECT
Placement_ExtID,
COALESCE(m.New_Ids,a.Placement_ExtID) AS New_Ids,
m.Labels,
CONCAT(Date," - ",New_Ids) AS Concatenated,
a.Placement_strategy,
a.Campaign_Id,
a.Campaign,
a.Cost,
a.Impressions,
a.Clicks,
a.C_Date AS Current_date,
a.Date
FROM
YourDataset.AdServer AS a
LEFT JOIN
YourDataset.Matching AS m
USING(Placement_ExtID)
WHERE
a.Date = CURRENT_DATE()
This will insert all the rows that are output from SELECT portion of the query (and you can easily test the output by just running the SELECT).
Another option is to create a scheduled query that outputs to your desired table from the SELECT portion of the query above.
If that isn't doing what you're expecting, please clarify the question and leave a comment and I'm happy to try to refine the answer.

Performance of SQL comparison using substring vs like with wildcard

I am working on a join condition between 2 tables where one of the columns to match on is a concatentation of values. I need to join columnA from tableA to the first 2 characters of columnB from tableB.
I have developed 2 different statements to handle this and I have tried to analyze the performance of each method.
Method 1:
ON tB.columnB like tA.columnA || '%'
Method 2:
ON substr(tB.columnB,1,2) = tA.columnA
The query execution plan has a lot less steps using Method 1 compared to Method 2, however, it looks like Method 2 executes much faster. Also, the execution plan shows a recommended index for Method 2 that could improve its performance.
I am running this on an IBM iSeries, though would be interested in answers in a general sense to learn more about sql query optimization.
Does it make sense that Method 2 would execute faster?
This SO question is similar, but it looks like no one provided any concrete answers to the performance difference of these approaches: T-SQL speed comparison between LEFT() vs. LIKE operator.
PS: The table design that requires this type of join is not something that I can get changed at this time. I realize having the fields separated which hold different types of data would be preferrable.
I ran the following in the SQL Advisor in IBM Data Studio on one of the tables in my DB2 LUW 10.1 database:
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND SUBSTR(DB30_REL_TABLE_NM, 1, 4) = 'ZZZZ'
and
SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
AND DB30_VERSION_ID = 'YYY'
AND DB30_REL_TABLE_NM LIKE 'ZZZZ%'
They both had the exact same access path utilizing the same index, the same estimated IO cost and the same estimated cardinality, the only difference being the estimated total CPU cost for the LIKE was 178,343.75 while the SUBSTR was 197,518.48 (~10% difference).
The cumulative total cost for both were the same though, so this difference is negligible as per the advisor.
Yes, Method 2 would be faster. LIKE is not as efficient a function.
To compare performance of various techniques, try using Visual Explain. You will find it buried in System i Navigator. Under your system connection, expand databases, then click onyour RDB name. In the lower right pane you can then click on the option to Run an SQL Script. Enter in your SELECT statement, and choose the menu option for Visual Explain or Run and Explain. Visual explain will break down the execution plan for your statement and show you the cost for each part as estimated on your tables with the indexes available.
You can actually run with real examples in your database.
LIKE is always better at my run.
select count(*) from u_log where log_text like 'AUT%';
1 row(s) returned : 90ms taken
select count(*) from u_log where substr(log_text,1,3)='AUT';
1 row(s) returned : 493ms taken
I found this reference in an IBM redbook related to SQL performance. It sounds like the SUBSTR scalar function can be handled in an optimized manner by an iSeries.
If you search for the first character and want to use the SQE instead
of the CQE, you can use the scalar function substring on the left sign
of the equal sign. If you have to search for additional characters in
the string, you can additionally use the scalar function POSSTR. By
splitting the LIKE predicate into several scalar function, you can
affect the query optimizer to use the SQE.
http://publib-b.boulder.ibm.com/abstracts/sg246654.html?Open

Can Common Table expressions be used here for performance?

Can COMMON Table expressions be used to avoid having SQL Server perform the following string parsing twice per record? My guess is "no."
SELECT DISTINCT
Client_ID
,RIGHT('0000000' + RIGHT(Client_ID
,PATINDEX('%[^0-9]%'
,REVERSE('?' + Client_ID)) - 1)
,7) AS CorrectedClient
FROM
membob_vw
WHERE
Client_ID <> RIGHT('0000000' + RIGHT(Client_ID
,PATINDEX('%[^0-9]%'
,REVERSE('?' + Client_ID)) - 1)
,7)
ORDER BY
1
,2
Every time I try to format the SQL as a "Code Block" it looks good (displaying on multiple lines) until the page is refreshed, after which point the SQL is displayed , for me at least, all on ONE line- and I can't seem to corerct that.
Does it display that way for people that are using a browser new that IE6? My company imposes this POS browser on me and prevents me for using any other.
NO, a CTE will not do anything performance wise for this query. It may seem strange/inefficient to type in the same thing large string expression twice. However, SQL Server will only do the string expression one time per row, it has been optimized for things like that.
EDIT
the CTE will reduce the duplicate code:
;WITH AllRows AS (
SELECT DISTINCT
Client_ID
,RIGHT('0000000' + RIGHT(Client_ID
,PATINDEX('%[^0-9]%'
,REVERSE('?' + Client_ID)) - 1)
,7) AS CorrectedClient
FROM
membob_vw
)
SELECT * FROM AllRows WHERE Client_ID<>CorrectedClient
ORDER BY
1
,2
but won't perform any better. USE SET SHOWPLAN_ALL ON and I'll bet you see the same query plan for each version.
BE CAREFUL trying to make queries look pretty and reduce redundant code fragments! simple looking SQL changes can have major adverse performance implications! always performance (run and/or query plan) check any changes you make. I have seen trivial changes made to queries that run instantly, that results in them then taking minutes to run. The key with SQL is performance not pretty code. If the application is slow, who cares if the code looks good.
If you're going to be running this query a lot, and especially if Client_ID is seldom updated, you should consider a computed column or pre-calculating CorrectedClient and storing it separately.

Out of the two sql queries below , suggest which one is better one. Single query with join or two simple queries?

Assuming result of first query in A) (envelopecontrolnumber,partnerid,docfileid) = (000000400, 31,35)
A)
select envelopecontrolnumber, partnerid, docfileid
from envelopeheader
where envelopeid ='LT01ENV1107010000050';
select count(*)
from envelopeheader
where envelopecontrolnumber = '000000400'
and partnerid= 31 and docfileid<>35 ;
or
B)
select count(*)
from envelopeheader a
join envelopeheader b on a.envelopecontrolnumber = b.envelopecontrolnumber
and a.partnerid= b.partnerid
and a.envelopeid = 'LT01ENV1107010000050'
and b.docfileid <> a.docfileid;
I am using the above query in a sql function. I tried the queries in pgAdmin(postgres), it shows 16ms for A) and B). When I tried queries from B) separately on pgadmin. It still shows 16 ms separately for each one - making 32ms for B) - Which is wrong because when you run both the queries in one go from B), it shows 16 ms. Please suggest which one is better. I am using postgres database.
The time displayed includes time to :
send query to server
parse query
plan query
execute query
send results back to client
process all results
Try a simple query like "SELECT 1". You'll probably get 16 ms too.
It's quite likely you are simply measuring the ping time to your server.
If you want to know how much time on the server a query uses, you need EXPLAIN ANALYZE.
Option 1:
Run query A.
Get results.
Use these results to create query B.
Send query B.
Get results.
Option 2:
Run combined query AB.
Get results.
So, if you are using this from a client, connecting to Postgres, use the second option. There is an overhead for sending a query to the db and getting results back.
If you are using it inside an SQL function or procedure, the difference is probably negligible. I would still use the second option though. And in either case, I would check that queries B or AB are optimized (checked query plan, if indexes are used, etc).
Go option 1: the two queries are unrelated, so more efficient to do them separately.
Option A will be faster since you are interested in the count.
The join will create a temporary structure for join the data based on conditions and then performs the counting operation.
Hence option A is better and faster.

Refactoring SQL

Are there any formal techniques for refactoring SQL similar to this list here that is for code?
I am currently working on a massive query for a particular report and I'm sure there's plenty of scope for refactoring here which I'm just stumbling through myself bit by bit.
I have never seen an exhaustive list like the sample you provided.
The most effective way to refactor sql that I have seen is to use the with statement.
It allows you to break the sql up into manageable parts, which frequently can be tested independently. In addition it can enable the reuse of query results, sometimes by the use of a system temporary table. It is well worth the effort to examine.
Here is a silly example
WITH
mnssnInfo AS
(
SELECT SSN,
UPPER(LAST_NAME),
UPPER(FIRST_NAME),
TAXABLE_INCOME,
CHARITABLE_DONATIONS
FROM IRS_MASTER_FILE
WHERE STATE = 'MN' AND -- limit to Minne-so-tah
TAXABLE_INCOME > 250000 AND -- is rich
CHARITABLE_DONATIONS > 5000 -- might donate too
),
doltishApplicants AS
(
SELECT SSN, SAT_SCORE, SUBMISSION_DATE
FROM COLLEGE_ADMISSIONS
WHERE SAT_SCORE < 100 -- Not as smart as the average moose.
),
todaysAdmissions AS
(
SELECT doltishApplicants.SSN,
TRUNC(SUBMISSION_DATE) SUBMIT_DATE,
LAST_NAME, FIRST_NAME,
TAXABLE_INCOME
FROM mnssnInfo,
doltishApplicants
WHERE mnssnInfo.SSN = doltishApplicants.SSN
)
SELECT 'Dear ' || FIRST_NAME ||
' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE) -- For stuff received today only
One of the other things I like about it, is that this form allows you to separate the filtering from the joining. As a result, you can frequently copy out the subqueries, and execute them stand alone to view the result set associated with them.
There is a book on the subject: "Refactoring Databases". I haven't read it, but it got 4.5/5 stars on Amazon and is co-authored by Scott Ambler, which are both good signs.
Not that I've ever found. I've mostly done SQL Server work and the standard techniques are:
Parameterise hard-coded values that might change (so the query can be cached)
Review the execution plan, check where the big monsters are and try changing them
Index tuning wizard (but beware you don't cause chaos elsewhere from any changes you make for this)
If you're still stuck, many reports don't depend on 100% live data - try precalculating portions of the data (or the whole lot) on a schedule such as overnight.
Not about techniques as much, but this question might help you find SQL refactoring tools:
Is there a tool for refactoring SQL, a bit like a ReSharper for SQL