How do I optimize DB2 query - joining million rows with one row

How do I optimize DB2 query - joining million rows with one row - sql

I have a db2 query which joins a fact table (300M rows) with a date table (1 row). The dates from the date table is used on the where condition to fetch only that date's data. But the query for 3 hours.
Select * from fact, date
where fact.procdate between date.lastdate and date.currdate
Is there a way to optimize this query without using plsql?

If you feed that query to db2expln you will see that all 300M lines get evaluated, probably several times. You are asking DB2 to build cartesian product, and after that evaluate the where clause.
In any case, that query might not even give you the results you are expecting. You should study the output more carefully to determine that. You more likely want to do something like
Select * from fact
where fact.procdate between DATE("firstdate") and DATE("seconddate")
The firstdate and seconddate you should supply from your application logic (probably separate queries to the table date). Alternatively you could set up subqueries to retrieve the beginning and end dates.

Related

Oracle SQL Performance issue - How to make a query stop running and return when finding the first instance

I have a table in Oracle with lots of data of employees and dates (and many many more..) and i need to query it many times, since it's part of a big program.
the only thing that I'm looking for in this table is whether an employee at a specific date appears in the table or not, and I don't care how many times or any other data.
At the moment my query is:
select distinct(EMP_ID) from EMPLOYEES
where TRUNC(DATE = TO_DATE('2020-11-21', 'yyyy-mm-dd') )AND
EMP_ID = '123456789'
The thing is that the query Performs poorly - about 1.5 minutes for each query, and this is something that isn't tolerable, because it consumes server resources.
Is there a way to make the query stop the moment it finds that the employee does appear at a specific date and return something (without continue running)??
Thank you very much!!

You can filter on the pseudo column rownum so that it doesn’t search for every row that matches your filters:
where rownum=1.
But for this query, it looks like you probably want an index on empid, and you want to make sure you’re using the correct data types in your query (is it really a string?). Is your date filter correct?

MS Access SQL - Removing Duplicates From Query

MS Access SQL - This is a generic performance-related duplicates question. So, I don't have a specific example query, but I believe I have explained the situation below clearly and simply in 3 statements.
I have a standard/complex SQL query that Selects many columns; some computed, some with asterisk, and some by name - e.g. (tab1.*, (tab2.co1 & tab2.col2) as computedFld1, tab3.col4, etc).
This query Joins about 10 tables. And the Where clause is based on user specified filters that could be based on any of the fields present in all 10 tables.
Based on these filters, I can sometimes get records with the same tab4.ID value.
Question: What is the best way to eliminate duplicate result rows with the same tab4.ID value. I don't care which rows get eliminated. They will differ in non-important ways.
Or, if important, they will differ in that they will have different tab5.ID values; and I want to keep the result rows with the LARGEST tab5.ID values.
But if the first query performs better than the second, then I really don't care which rows get eliminated. The performance is more important.
I have worked on this most of the morning and I am afraid that the answer to this is above my pay scale. I have tried Group By tab4.ID, but can't use "*" in Select clause; and many other things that I just keep bumping my head against a wall.

Access does not support CTEs but you can do something similar with saved queries.
So first alias the columns that have same names in your query, something like:
SELECT tab4.ID AS tab4_id, tab5.ID AS tab5_id, ........
and then save your query for example as myquery.
Then you can use this saved query like this:
SELECT q1.*
FROM myquery AS q1
WHERE q1.tab5_id = (SELECT MAX(q2.tab5_id) FROM myquery AS q2 WHERE q2.tab4_id = q1.tab4_id)
This will return 1 row for each tab4_id if there are no duplicate tab5_ids for each tab4_id.
If there are duplicates then you must provide additional conditions.

How to limit SQL Server query to single table partition

I want to query a massive table, and need to get my query runtime down.
I'm trying to breakup my target query, into many steps, by running my summarizing query against each individual table partition (I will then aggregate the outputs). All the columns in my where clauses are indexed (nonclustered) -- all the columns I'm pulling in my query are indexed. The "Month" column is our partition index.
How do I write my query, so that I'm explicitly telling SQL Server to only use one "Month" partition?
edit to include Execution plan:
Per the comment, used this site: https://www.brentozar.com/pastetheplan/?id=SJRAIUD3V

Assuming that you have read the suggestions and still want to use partitions the query would be as follows once you have portioned the table with Month as the key_column.
SELECT <Your_select_list>
FROM <dbo.partitioned_table>
WHERE key_column = <target_month>

Inconsistent results from BigQuery: same query, different number of rows

I noticed today that one my query was having inconsistent results: every time I run it I have a different number of rows returned (cache deactivated).
Basically the query looks like this:
SELECT *
FROM mydataset.table1 AS t1
LEFT JOIN EACH mydataset.table2 AS t2
ON t1.deviceId=t2.deviceId
LEFT JOIN EACH mydataset.table3 AS t3
ON t2.email=t3.email
WHERE t3.email IS NOT NULL
AND (t3.date IS NULL OR DATE_ADD(t3.date, 5000, 'MINUTE')<TIMESTAMP('2016-07-27 15:20:11') )
The tables are not updated between each query. So I'm wondering if you also have noticed that kind of behaviour.
I usually make queries that return a lot of rows (>1000) so a few missing rows here and there is hardly noticeable. But this query return a few row, and it varies everytime between 10 and 20 rows :-/
If a Google engineer is reading this, here are two Job ID of the same query with different results:
picta-int:bquijob_400dd739_1562d7e2410
picta-int:bquijob_304f4208_1562d7df8a2

Unless I'm missing something, the query that you provide is completely deterministic and so should give the same result every time you execute it. But you say it's "basically" the same as your real query, so this may be due to something you changed.
There's a couple of things you can do to try to find the cause:
replace select * by an explicit selection of fields from your tables (a combination of fields that uniquely determine each row)
order the table by these fields, so that the order becomes the same each time you execute the query
simplify your query. In the above query, you can remove the first condition and turn the two left outer joins into inner joins and get the same result. After that, you could start removing tables and conditions one by one.
After each step, check if you still get different result sets. Then when you have found the critical step, try to understand why it causes your problem. (Or ask here.)

Question on Query execution

In the below query if the Patients table has 1000 records how many times TableValueFunction executes? Only once or 1000 time?
This is a query in a Stored Procedure, do you have a better idea to improve this?
SELECT * FROM Patients
WHERE Patient.Id In (SELECT PatientId FROM TableValueFunction(parameters..))

It depends on what you are using as parameters. If the parameters are constants the function will execute one time but if the parameters are fields from Patients the function will execute as many times as there are rows in table Patients.

To some extent it depends on whether you are talking about an inline TVF or a multi statement one.
A multi statement TVF is totally opaque to the query optimiser. It always assumes that it will return 1 row and it will not get expanded out into the main query.
Because of the 1 row assumption then if your Patients table is indexed on PatientId you will probably get a nested loops join with the TVF as the driving table meaning that it is only executed once.
If it is not indexed and you get a hash or merge join both of these methods only process both inputs once.
An inline TVF gets merged into the query itself. So the function itself is never executed as such. However SQL Server can then refer to cardinality information and might order the plan such that the query contained in the TVF appears on the inner side of a nested loops join and has a number of executions greater than one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I optimize DB2 query - joining million rows with one row - sql

Related

Oracle SQL Performance issue - How to make a query stop running and return when finding the first instance

MS Access SQL - Removing Duplicates From Query

How to limit SQL Server query to single table partition

Inconsistent results from BigQuery: same query, different number of rows

Question on Query execution

Categories

Resources