Providex Query Performance - sql

I am running a query against a providex database that we use in MAS 90. The query has three tables joined together, and has been slow but not unbearably, taking about 8 minutes per run. The query has a fair number of conditions in the where clause:
I'm going to omit the select part of the query as its long and simple, just a list of fields from the three tables that are to be used in the results.
But the tables and the where clauses in the 8 minute run time version are:
(The first parameter is the lower bound of the user-selected date range, the second is the upper bound.)
FROM "AR_InvoiceHistoryDetail" "AR_InvoiceHistoryDetail",
"AR_InvoiceHistoryHeader" "AR_InvoiceHistoryHeader", "IM1_InventoryMasterfile"
"IM1_InventoryMasterfile"
WHERE "AR_InvoiceHistoryDetail"."InvoiceNo" = "AR_InvoiceHistoryHeader"."InvoiceNo"
AND "AR_InvoiceHistoryDetail"."ItemCode" = "IM1_InventoryMasterfile"."ItemNumber"
AND "AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."OrderDate" >= #p_dr
AND "AR_InvoiceHistoryHeader"."OrderDate" <= #p_d2
However, it turns out that another date field in the same table needs to be the one that the Date Range is compared with. So I changed the Order Dates at the end of the WHERE clause to InvoiceDate. I haven't had the query run successfully at all yet. And I've waited over 40 minutes. I have no control over indexing because this is a MAS 90 database which I don't believe I can directly change the database characteristics of.
What could cause such a large (at least 5 fold) difference in performance. Is it that OrderDate might have been indexed while InvoiceDate was not? I have tried BETWEEN clauses but it doesn't seem to work in the providex dialect. I am using the ODBC interface through .NET in my custom report engine. I have been debugging the report and it is running at the database execution point when I asked VS to Break All, at the same spot where the 8 minute report was waiting, so it is almost certainly either something in my query or something in the database that is screwed up.
If its just the case that InvoiceDates aren't indexed, what else can I do in the providex dialect of SQL to optimize the performance of these queries? Should I change the order of my criteria? This report gets results for a specific salesperson which is why the SMC clause exists. The prior clauses are for the inner joins, and the last clause is for the date range.
I used an identical date range in both the OrderDate and InvoiceDate versions and have ran them all mulitiple times and got the same results.

I still don't know exactly why it was so slow, but we had another problem with the results coming from the query (we switched back to using OrderDate). We weren't getting some of the results because of the nature of the IM1 table.
So I added a Left Outer Join once I figured out Providex's syntax for that. And for some reason, even though we still have 3 tables, it runs a lot faster now.
The new query criteria are:
FROM "AR_InvoiceHistoryHeader" "AR_InvoiceHistoryHeader",
{OJ "AR_InvoiceHistoryDetail" "AR_InvoiceHistoryDetail"
LEFT OUTER JOIN "IM1_InventoryMasterfile" "IM1_InventoryMasterfile"
ON "AR_InvoiceHistoryDetail"."ItemCode" =
"IM1_InventoryMasterfile"."ItemNumber" }
WHERE "AR_InvoiceHistoryDetail"."InvoiceNo" =
"AR_InvoiceHistoryHeader"."InvoiceNo" AND
"AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."InvoiceDate" >= ?
AND "AR_InvoiceHistoryHeader"."InvoiceDate" <= ?
Strange, but at least I learned more of the world of Providex Sql in the process.

I've never used providex before.
A search turned up this reference article on the syntax for creating an index.
Looking over your query, there's three tables and five criteria. Two of the criteria are "join criteria", and three criteria are filtering criteria:
AND "AR_InvoiceHistoryHeader"."SalespersonNo" = 'SMC'
AND "AR_InvoiceHistoryHeader"."OrderDate" >= #p_dr
AND "AR_InvoiceHistoryHeader"."OrderDate" <= #p_d2
I don't know how good SalespersonNo is for limiting return results, but it might be good to add an index on that.

I haven't used .NET so my question may show ignorance, but in Access you must use a SQL Pass-Through query to wring any results from ProvideX, if more than one table is involved.

Related

ms access query (ms access freezes)

I have this report and need to add totals for each person (the red circle)
existing report
new report
I cannot change the existing report so I export data from MS SQL to MS Access and create a new report there. I got it working for one employee but have trouble with a query which would for multiple employees.
This query extract data use as input:
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[EMP_ID])=376) And (([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
this query populates the report:
SELECT *
FROM TIME1
WHERE RCD_NUM = (SELECT Max(RCD_NUM) FROM [TIME1] UQ WHERE UQ.PPERIOD = [TIME1].PPERIOD AND UQ.PC = [TIME1].PC);
the problem is if I remove EMP_ID from the first query like this
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
then the second query doesn't work and ms access freezes when running this query.
any help/idea please?
Caveat: I won't pretend to know the precise cause of the problem, but I have had to repeatedly refactor queries in Access to get them working even though the original SQL statements are completely valid in regards to syntax and logic. Sometimes I've had to convolute a sequence of queries just to avoid bugs in Access. Access is often rather dumb and will simply (re)execute queries and subqueries exactly as given without optimization. At other times Access will attempt to combine queries by performing some internal optimizations, but sometimes those introduce frustrating bugs. Something as simple as a name change or column reordering can be the difference between a functioning query and one that crashes or freezes Access.
First consider:
Can you leave the data on SQL Server and link to the results in Access (rather than export/importing it into Access)? Even if you need or prefer to use Access for creating the actual report, you could use all the power of SQL Server for querying the data--it is likely less buggy and more efficient.
Common best practice is to create SQL Server stored procedures that return just what data you need in Access. A pass-through query is created in Access to retrieve the data, but all data operations are performed on the server.
Perhaps this is just a performance issue where limiting the set by [EMP_ID] selects a small subset, but the full table is large enough to "freeze" Access.
How long have you let Access remain frozen before killing the process? Be patient... like many, many minutes (or hours). Start it in the morning and check after lunch. :) It might eventually return a result set. This does not imply it is tolerable or that there is no other solution, but it can be useful to know if it eventually returns data or not.
How many possible records are there?
Are the imported data properly indexed? Add indexes to all key fields and those which are used in WHERE clauses.
Is the database located on a network share or is it local? Try copying the database to a local drive.
Other hints:
Try the BETWEEN operator for dates in the WHERE clause.
Try refactoring the "second" query by performing a join in the FROM clause rather than the WHERE clause. In doing this, you may also want to save the subquery as a named query (just as [TIME1] is saved). Whether or not a query is saved or embedded in another statement CAN change the behavior of Access (see caveat) even though the results should be identical.
Here's a version with the embedded aggregate query. Notice how all column references are qualified with the source. Some of the original query's columns do not have a source alias prefixing the column name. Remember the caveat... such picky details can affect Access behavior.:
SELECT TIME1.*
FROM TIME1 INNER JOIN
(SELECT UQ.PPERIOD, UQ.PC, Max(UQ.RCD_NUM) As Max_RCD_NUM
FROM [TIME1] UQ
GROUP BY UQ.PPERIOD, UQ.PC) As TIMEAGG
ON (TIME1.PPERIOD = TIMEAGG.PPERIOD) And (TIME1.PC = TIMEAGG.PC)
AND (TIME1.RCD_NUM = TIMEAGG.Max_RCD_NUM)

Querying time higher with 'Where' than without it

I have something what I think is a srange issue. Normally, I think that a Query should last less time if I put a restriction (so that less rows are processed). But I don't know why, this is not the case. Maybe I'm putting something wrong, but I don't get error; the query just seems to run 'till infinity'.
This is the query
SELECT
A.ENTITYID AS ORG_ID,
A.ID_VALUE AS LEI,
A.MODIFIED_BY,
A.AUDITDATETIME AS LAST_DATE_MOD
FROM (
SELECT
CASE WHEN IFE.NEWVALUE IS NOT NULL
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE')
ELSE NULL
end as ID_TYPE,
case when IFE.NEWVALUE is not null
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_VALUE')
ELSE NULL
END AS ID_VALUE,
(select u.username from admin.users u where u.userid = ife.analystuserid) as Modified_by,
ife.*
FROM ife.audittrail ife
WHERE
--IFE.AUDITDATETIME >= '01-JUN-2016' AND
attributeid = 499
AND ROWNUM <= 10000
AND (CASE WHEN IFE.NEWVALUE IS NOT NULL then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE') ELSE NULL end) = '38') A
--WHERE A.AUDITDATETIME >= '01-JUN-2016';
So I tried with the two clauses commented (one per each time of course).
And with both of them happens the same; the query runs for so long time that I have to abort it.
Do you know why this could be happening? How could I do, maybe in a different way, to put the restriction?
The values of the field AUDITDATETIME are '06-MAY-2017', for example. In that format.
Thank you very much in advance
I think you may misunderstand how databases work.
Firstly, read up on EXPLAIN - you can find out exactly what is taking time, and why, by learning to read the EXPLAIN statement.
Secondly - the performance characteristics of any given query are determined by a whole range of things, but usually the biggest effort goes not in processing rows, but finding them.
Without an index, the database has to look at every row in the database and compare it to your where clause. It's the equivalent of searching in the phone book for a phone number, rather than a name (the phone book is indexed on "last name").
You can improve this by creating indexes - for instance, on columns "AUDITDATETIME" and "attributeid".
Unlike the phone book, a database server can support multiple indexes - and if those indexes match your where clause, your query will be (much) faster.
Finally, using an XML string extraction for a comparison in the where clause is likely to be extremely slow unless you've got an index on that XML data.
This is the equivalent of searching the phone book and translating the street address from one language to another - not only do you have to inspect every address, you have to execute an expensive translation step for each item.
You probably need index(es)... We can all make guesses on what indexes you already have, and need to add, but most dbms have built in query optimizers.
If you are using MS SQL Server you can execute query with query plan, that will tell you what index you need to add to optimize this particular query. It will even let you copy /paste the command to create it.

Shorter Date Interval results in longer execution time

I have this query:
SELECT
COUNT(*) AS 'RedactedCount'
,s.Redacted1
,s.[Redacted2]
,s.[Redacted3] AS 'Redacted3'
FROM RedactedTable1 s
LEFT OUTER JOIN RedactedTable2 g ON s.Redacted5= g.Redacted5
WHERE g.Redacted6= 31013 AND s.DateTime >= '2013-03-02 00:00:00'
GROUP BY s.Redacted1,s.Redacted2, s.Redacted3
Which have a very odd behavior. This query takes a whole 1min30secs to complete. Should I change de date to 2013-04-02 00:00:00 (today as I'm writing this post), it is near instant, which is the expected behavior.
But if I change the date to 2013-02-02 (a 2 months time span instead of 1), the query takes only 20 secs.
Does Anyone has encountered this problem ? I am completely stunned on the result. It will also be an important SQL request of a web application that I am working on.
Microsoft SQL Server Management Studio 11.0.3128.0
Microsoft Data Access Components (MDAC) 6.1.7601.17514
Microsoft MSXML 3.0 6.0
Microsoft Internet Explorer 9.0.8112.16421
Microsoft .NET Framework 4.0.30319.296
Système d'exploitation 6.1.7601
Note: The database is poorly designed, and contains absolutely no indexes. Yes, this is bad. Unfortunately, this is a commercial software and I have no rights to make changes on the database model. However, I do not think that the problem I have is caused by this.
P.S.: Sorry if my query is heavily redacted as I am on a strict NDA. I tried to made it as readable as possible.
Thanks !
First of all, it is pointless to put a predicate condition on a table on the outer side of an outer join. As soon as you do this, all rows in the final result set that do not have a match in that table are eliminated, effectively making the overall query behave as though it was an inner join.
The condition on RedactedTable2.Redacted6 should be part of the join conditions if you want the join to include rows where there is no matching row in table RedactedTable2.
SELECT COUNT(*) AS 'RedactedCount',
s.Redacted1, s.[Redacted2],
s.[Redacted3] AS 'Redacted3'
FROM RedactedTable1 s
LEFT JOIN RedactedTable2 g
ON g.Redacted5= s.Redacted5
And g.Redacted6= 31013
WHERE s.DateTime >= '2013-03-02 00:00:00'
GROUP BY s.Redacted1,s.Redacted2, s.Redacted3
As to why the difference in performance, my suspicion is that your issue is caused by something in the data in the tables that is causing the query processor to use a different execution plan in one case than it was in the other. This can easily happen. If the optimizer "guesses" that it would need to examine more than a certain percentage of data rows using one query execution plan, based on database statistics about the distribution of data values in the tables), then it will switch to a different plan.
Run both queries with the ShowPlan option turned on, and look and see what the differences are.
In addition to what Charles suggested you might ask the database administrator (assuming you have one) to run UPDATE STATISTICS on at least RedactedTable1 and RedactedTable2. UPDATE STATISTICS requires you to have ALTER permissions on the table/view so I doubt you have permissions to run it. But you can probably ask for it to be done. Problems like what you are describing are frequently caused by out of date statistics.
Aaron Bertrand got the answer in a comment.
This problem was caused by parameter sniffing done by MSSQL.
Declaring and using dummy variables prevents MSSQL to falsely optimize the query using past optimizations.
The following link helped me learn about parameter sniffing http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx

Querying for software using SQL query in SCCM

I am looking for specific pieces of software across our network by querying the SCCM Database. My problem is that, for various reasons, sometimes I can search by the program's name and other times I need to search for a specific EXE.
When I run the query below, does it take 13 seconds to run if the where clause contains an AND, but it will run for days with no results if the AND is replaced with OR. I'm assuming it is doing this because I am not properly joining the tables. How can I fix this?
select vrs.Name0
FROM v_r_system as vrs
join v_GS_INSTALLED_SOFTWARE as VIS on VIS.resourceid = vrs.resourceid
join v_GS_SoftwareFile as sf on SF.resourceid = vrs.resourceid
where
VIS.productname0 LIKE '%office%' AND SF.Filename LIKE 'Office2007%'
GROUP BY vrs.Name0
Thanks!
Your LIKE clause contains a wildcard match at the start of a string:
LIKE '%office%'
This prevents SQL Server from using an index on this column, hence the slow running query. Ideally you should change your query so your LIKE clause doesn't use a wildcard at the start.
In the case where the WHERE clause contains an AND its querying based on the Filename clause first (it is able to use an index here and so this is relatively quick) and then filtering this reduced rowset based on your productname0 clause. When you use an OR however it isn't restricted to just returning rows that match your Filename clause, and so it must search through the entire table checking to see if each productname0 field matches.
Here's a good Microsoft article http://msdn.microsoft.com/en-us/library/ms172984.aspx on improving indexes. See the section on Indexes with filter clauses (reiterates the previous answer.)
Have you tried something along these lines instead of a like query?
... where product name in ('Microsoft Office 2000','Microsoft Office xyz','Whateverelse')

Out of the two sql queries below , suggest which one is better one. Single query with join or two simple queries?

Assuming result of first query in A) (envelopecontrolnumber,partnerid,docfileid) = (000000400, 31,35)
A)
select envelopecontrolnumber, partnerid, docfileid
from envelopeheader
where envelopeid ='LT01ENV1107010000050';
select count(*)
from envelopeheader
where envelopecontrolnumber = '000000400'
and partnerid= 31 and docfileid<>35 ;
or
B)
select count(*)
from envelopeheader a
join envelopeheader b on a.envelopecontrolnumber = b.envelopecontrolnumber
and a.partnerid= b.partnerid
and a.envelopeid = 'LT01ENV1107010000050'
and b.docfileid <> a.docfileid;
I am using the above query in a sql function. I tried the queries in pgAdmin(postgres), it shows 16ms for A) and B). When I tried queries from B) separately on pgadmin. It still shows 16 ms separately for each one - making 32ms for B) - Which is wrong because when you run both the queries in one go from B), it shows 16 ms. Please suggest which one is better. I am using postgres database.
The time displayed includes time to :
send query to server
parse query
plan query
execute query
send results back to client
process all results
Try a simple query like "SELECT 1". You'll probably get 16 ms too.
It's quite likely you are simply measuring the ping time to your server.
If you want to know how much time on the server a query uses, you need EXPLAIN ANALYZE.
Option 1:
Run query A.
Get results.
Use these results to create query B.
Send query B.
Get results.
Option 2:
Run combined query AB.
Get results.
So, if you are using this from a client, connecting to Postgres, use the second option. There is an overhead for sending a query to the db and getting results back.
If you are using it inside an SQL function or procedure, the difference is probably negligible. I would still use the second option though. And in either case, I would check that queries B or AB are optimized (checked query plan, if indexes are used, etc).
Go option 1: the two queries are unrelated, so more efficient to do them separately.
Option A will be faster since you are interested in the count.
The join will create a temporary structure for join the data based on conditions and then performs the counting operation.
Hence option A is better and faster.