DB2 COALESCE - notable impact on query time execution

DB2 COALESCE - notable impact on query time execution - sql

I've noted that using COALESCE (in my case) to avoid possible NULL value in prepared statement causes a decrease in performance of DB query time execution. Can someone explain me what is the root cause and how can I overcome that issue? Queries samples below:
QUERY 1 (execution time 3 s):
SELECT TABLE_A.Y, TABLE_B.X
FROM ...
WHERE Z = ? AND TABLE_A.ABC = ? AND
TABLE_A.QWERTY = ? AND TABLE_A.Q = TABLE_B.Q;
QUERY 2 (execution time 210 s):
SELECT TABLE_A.Y, TABLE_B.X
FROM ...
WHERE Z = ? AND (
(COALESCE(?,'')='') OR
(TABLE_A.ABC = ? AND TABLE_A.QWERTY = ? AND TABLE_A.Q = TABLE_B.Q)
);
The only difference is using (COALESCE(?,'')='').

The bigger problem I see is that QUERY 1 has 3 placeholders whereas QUERY 2 has 4 placeholders.
I think what you're trying to do is that you want to make your placeholders optional.
A simple way to do this is to fix QUERY 1 as follows
SELECT TABLE_A.Y, TABLE_B.X
FROM TABLE_A
INNER JOIN TABLE_B ON TABLE_A.Q = TABLE_B.Q;
WHERE Z = ?
AND TABLE_A.ABC = COALESCE(?,TABLE_A.ABC)
AND TABLE_A.QWERTY = COALESCE(?,TABLE_A.QWERTY)

Related

Sub-query works but would a join or other alternative be better?

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?

If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)

It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?

I'm going through TPC-DS for Amazon Athena.
It was fine until query 5.
I got some problem on query 6. (which is below)
select a.ca_state state, count(*) cnt
from customer_address a
,customer c
,store_sales s
,date_dim d
,item i
where a.ca_address_sk = c.c_current_addr_sk
and c.c_customer_sk = s.ss_customer_sk
and s.ss_sold_date_sk = d.d_date_sk
and s.ss_item_sk = i.i_item_sk
and d.d_month_seq =
(select distinct (d_month_seq)
from date_dim
where d_year = 2002
and d_moy = 3 )
and i.i_current_price > 1.2 *
(select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
group by a.ca_state
having count(*) >= 10
order by cnt, a.ca_state
limit 100;
It took more than 30 minutes so it failed with timeout.
I tried to find which part cause problem, so I checked the where conditions and I found where j.i_category = i.i_category for the last part of where condition.
I don't know why this condition is needed so I deleted this part and the query ran Ok.
can you guys tell me why this part is needed?

The j.i_category = i.i_category is subquery correlation condition.
If you remove it from the subquery
select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
the subquery becomes uncorrelated, and becomes a global aggregation on the item table, which is easy to calculate and the query engine needs to do it once.
If you want a fast, performant query engine on AWS, i can recommend Starburst Presto (disclaimer: i am from Starburst). See https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/ for a related comparison (note: this is not a comparison with Athena).
If it doesn't have to be that fast, you can use PrestoSQL on EMR (note that "PrestoSQL" and "Presto" components on EMR are not the same thing).

Determining what index to create given a query?

Given a SQL query:
SELECT *
FROM Database..Pizza pizza
JOIN Database..Toppings toppings ON pizza.ToppingId = toppings.Id
WHERE toppings.Name LIKE '%Mushroom%' AND
toppings.GlutenFree = 0 AND
toppings.ExtraFee = 1.25 AND
pizza.Location = 'Minneapolis, MN'
How do you determine what index to write to improve the performance of the query? (Assuming every value to the right of the equal is calculated at runtime)
Is there a built in command SQL command to suggest the proper index?
To me, it gets confusing when there's multiple JOINS that use fields from both tables.

For this query:
SELECT *
FROM Database..Pizza p JOIN
Database..Toppings t
ON p.ToppingId = t.Id
WHERE t.Name LIKE '%Mushroom%' AND
t.GlutenFree = 0 AND
t.ExtraFee = 1.25 AND
p.Location = 'Minneapolis, MN';
You basically have two options for indexes:
Pizza(location, ToppingId) and Toppings(id)
or:
Toppings(GlutenFree, ExtraFee, Name, id) and Pizza(ToppingId, location)
Which works better depends on how selective the different conditions are in the WHERE clause.

increasing speed of sql update postgresql

I'm performing the 2 below queries on my database, and I'm trying to figure out how to make it faster.
The first query takes 208796.8ms. The second one takes 611654.9ms. I'm not sure there is a way to make them faster. I need these updates to be in the same transaction, so I'm also not sure if the update by batches of n records would be faster. I will take any idea !
UPDATE ticket_memberships AS my_table
SET ticket_id = foreign_table.id
FROM tickets AS foreign_table
WHERE my_table.agency_id = 2
AND foreign_table.agency_id = 2
AND my_table.ticket_id IS NOT NULL
AND my_table.ticket_id = foreign_table.old_id
UPDATE ticket_memberships AS my_table
SET person_contact_id = foreign_table.id
FROM person_contacts AS foreign_table
WHERE my_table.agency_id = 2
AND foreign_table.agency_id = 2
AND my_table.person_contact_id IS NOT NULL
AND my_table.person_contact_id = foreign_table.old_id

Gettin a value from a table and using in the same query

SELECT (h.horario), h.codigo
FROM horarios as h
JOIN horario_turma as h_t
ON(h.codigo != h_t.cd_horario)
WHERE h_t.cd_turma = 'HTJ009'
AND h_t.cd_dia = 2
AND h.cd_turno = 1
I'm trying to figure out if there's a possibility to get the h.cd_turnovalue from another table and use in the same query, beacuse this value is gonna be variable. So, I'd have to get this value from a query, then pass the value to PHP and do another query with this value. Is there a way to do that in the same query?
There's a table called turmas(codigo, cd_turno). I'll have the codigovalue, in this case HTJ009, and I'd like to select the cd_turno value.
Query used to get the value:
SELECT cd_turno FROM turmas WHERE codigo='HTJ009'

You can use a subquery, like so:
SELECT (h.horario), h.codigo
FROM horarios as h
JOIN horario_turma as h_t
ON(h.codigo != h_t.cd_horario)
WHERE h_t.cd_turma = 'HTJ009'
AND h_t.cd_dia = 2
AND h.cd_turno = (SELECT cd_turno FROM turmas WHERE codigo='HTJ009')
In this case, remember that it is important for the subquery to return only one result, otherwise you'll encounter an error. If you do see such an error, you may have to tweak the subquery to ensure only one result is returned.
Check this out for Postgres subquery documentation

SELECT (h.horario), h.codigo
FROM horarios as h
JOIN horario_turma as h_t
ON(h.codigo = h_t.cd_horario)
WHERE h_t.cd_turma = 'HTJ009'
AND h_t.cd_dia = 2
AND h.cd_turno = 1 and h_t.cd_horario is null

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DB2 COALESCE - notable impact on query time execution - sql

Related

Sub-query works but would a join or other alternative be better?

TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?

Determining what index to create given a query?

increasing speed of sql update postgresql

Gettin a value from a table and using in the same query

Categories

Resources