Postgres SQL - performance issue with boolean field

Postgres SQL - performance issue with boolean field - sql

In my table plin_korisnik, I have field active which is defined as boolean type.
I'm trying to execute this query to fetch data from that table and two other tables:
SELECT
pk.omm AS omm,
pk.br_plin AS br_plin,
pk.naziv AS naziv,
pk.ulica||' '||pk.kbr AS adresa,
pk.pu||' - '||pk.naziv_pu AS mjesto,
po.datum AS datum,
CASE WHEN po.stanje >= 999999 THEN NULL ELSE po.stanje END AS stanje,
po.napomena AS napomena,
po.plin_postar AS laus,
pp.ime||' '||pp.prezime AS postar
FROM plin_korisnik pk
INNER JOIN
plin_ocitanje po ON pk.omm = po.omm
INNER JOIN plin_postar pp ON pp.laus = po.plin_postar
WHERE po.datum>='2017-01-26'
AND po.datum<='2017-01-26'
AND pk.tip='p'
AND pk.active = TRUE
ORDER BY po.datum, pk.naziv
but query takes to much time (like forever; I interrupted it after half an hour), but when I remove pk.active = TRUE test from WHERE clause, then query executes with expected speed. I had try to cast boolean type to integer, but problem remains.
I'd appreciate it if someone could explain how to use a boolean field in this and similar queries. active field is not indexed, maybe it should be, please help.
EDIT:
After few hours of thinking I came out with this solution which use WITH clause:
WITH pk AS (
SELECT * FROM plin_korisnik WHERE active AND tip='p'
)
SELECT
pk.omm AS omm,
pk.br_plin AS br_plin,
pk.naziv AS naziv,
pk.ulica||' '||pk.kbr AS adresa,
pk.pu||' - '||pk.naziv_pu AS mjesto,
po.datum AS datum,
CASE WHEN po.stanje >= 999999 THEN NULL ELSE po.stanje END AS stanje,
po.napomena AS napomena,
po.plin_postar AS laus,
pp.ime||' '||pp.prezime AS postar
FROM pk
INNER JOIN
plin_ocitanje po ON pk.omm = po.omm
INNER JOIN plin_postar pp ON pp.laus = po.plin_postar
WHERE po.datum>='2017-01-26'
AND po.datum<='2017-01-26'
ORDER BY po.datum, pk.naziv;

I think you may make it faster if you change the order in the where clause, start with the initial table.
An index on pk.active and on po.datum will absolutely help.
You may want to consider putting the po.datum in the inner join, instead of the where.
po.datum>='2017-01-26'
AND po.datum<='2017-01-26'
AND pk.tip='p'
AND pk.active = TRUE
Will be:
pk.active = TRUE
AND pk.tip='p'
AND po.datum>='2017-01-26'
AND po.datum<='2017-01-26'

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

Not Exists clause -Query

I am using a NOT EXSITS clause in my query and wanted to make sure it was working correctly since I was getting lesser rows than expected.
SELECT DISTINCT offer.courier_uuid,
offer.region_uuid,
offer.offer_time_local,
Cast(scores.acceptance_rate AS DECIMAL(5, 3)) AS acceptance_rate
FROM integrated_delivery.trip_offer_fact offer
JOIN integrated_product.driver_score_v2 scores ON offer.courier_uuid = scores.courier_id
AND offer.region_uuid = scores.region_id
AND offer.business_day BETWEEN date '2019-04-04' AND date '2019-04-07'
AND scores.extract_dt = 20190331
AND NOT EXISTS
(SELECT NULL
FROM source_cassandra_courier_scheduling.assigned_block_by_id_v2 sched
JOIN source_cassandra_delivery.region r ON sched.region_id = r.id
WHERE offer.courier_uuid = sched.courier_id
AND offer.offer_time_local >= date_parse(date_format(AT_TIMEZONE("start",r.time_zone),'%Y-%m-%d %H:%i:%s'),'%Y-%m-%d %H:%i:%s')
AND offer.offer_time_local <= date_parse(date_format(AT_TIMEZONE("end",r.time_zone),'%Y-%m-%d %H:%i:%s'),'%Y-%m-%d %H:%i:%s')
AND element_at(sched.state,-1) = 'ASSIGNED')
ORDER BY 3
Is there anything wrong with my not exists clause? I am only asking since I am getting back lesser rows than expected. The not exists caluse contains a time conversion but i dont think that would affect anything.
I am trying to get all possible ids and their offer times that do NOT EXIST in the scheduled shifts table. I wanted confirm if the way I have the NOT EXISTS clause is correct or if there is something else I would need that would correctly pull all records that exist or not exist in that shed table?

The "where" condition worked not as expected ("or" issue)

I have a problem to join thoses 4 tables
Model of my database
I want to count the number of reservations with different sorts (user [mrbs_users.id], room [mrbs_room.room_id], area [mrbs_area.area_id]).
Howewer when I execute this query (for the user (id=1) )
SELECT count(*)
FROM mrbs_users JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id
WHERE mrbs_entry.start_time BETWEEN "145811700" and "1463985000"
or
mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and mrbs_users.id = 1
The result is the total number of reservations of every user, not just the user who has the id = 1.
So if anyone could help me.. Thanks in advance.

Use parentheses in the where clause whenever you have more than one condition. Your where is parsed as:
WHERE (mrbs_entry.start_time BETWEEN "145811700" and "1463985000" ) or
(mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and
mrbs_users.id = 1
)
Presumably, you intend:
WHERE (mrbs_entry.start_time BETWEEN 145811700 and 1463985000 or
mrbs_entry.end_time BETWEEN 1458120600 and 1463992200
) and
mrbs_users.id = 1
Also, I removed the quotes around the string constants. It is bad practice to mix data types, and in some databases, the conversion between types can make the query less efficient.

The problem you've faced caused by the incorrect condition WHERE.
So, should be:
WHERE (mrbs_entry.start_time BETWEEN 145811700 AND 1463985000 )
OR
(mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
Moreover, when you use only INNER JOIN (JOIN) then it be better to avoid WHERE clause, because the ON clause is executed before the WHERE clause, so criteria there would perform faster.
Your query in this case should be like this:
SELECT COUNT(*)
FROM mrbs_users
JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
AND
(mrbs_entry.start_time BETWEEN 145811700 AND 1463985000
OR ( mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
)
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id

Subselect a Summed col in Oracle

this is an attempted fix to a crystal reports use of 2 sub reports!
I have a query that joins 3 tables, and I wanted to use a pair of sub selects that bring in the same new table.
Here is the first of the two columns in script:
SELECT ea."LOC_ID", lo."DESCR", ea."PEGSTRIP", ea."ENTITY_OWNER"
, ea."PCT_OWNERSHIP", ea."BEG_BAL", ea."ADDITIONS", ea."DISPOSITIONS"
, ea."EXPLANATION", ea."END_BAL", ea."NUM_SHARES", ea."PAR_VALUE"
, ag."DESCR", ea."EOY", ea."FAKEPEGSTRIP",
(select sum(htb.END_FNC_CUR_US_GAAP)
from EQUITY_ACCOUNTS ea , HYPERION_TRIAL_BALANCE htb
where
htb.PEGSTRIP = ea.PEGSTRIP and
htb.PRD_NBR = 0 and
htb.LOC_ID = ea.LOC_ID and
htb.PRD_YY = ea.EOY
) firstHyp
FROM ("TAXPALL"."ACCOUNT_GROUPING" ag
INNER JOIN "TAXP"."EQUITY_ACCOUNTS" ea
ON (ag."ACCT_ID"=ea."PEGSTRIP") AND (ag."EOY"=ea."EOY"))
INNER JOIN "TAXP"."LOCATION" lo ON ea."LOC_ID"=lo."LOC_ID"
WHERE ea."EOY"=2009
ORDER BY ea."LOC_ID", ea."PEGSTRIP"
When this delivers the dataset the value of "firstHyp" fails to change by pegstrip value. It returns a single total for the join and fails to put the proper by value by pegstrip.
I thought that the where clause would have picked up the joins line by line.
I don't do Oracle syntax often so what am I missing here?
TIA

Your SQL is equivilent to the following:
SELECT ea."LOC_ID", lo."DESCR", ea."PEGSTRIP",
ea."ENTITY_OWNER" , ea."PCT_OWNERSHIP",
ea."BEG_BAL", ea."ADDITIONS", ea."DISPOSITIONS" ,
ea."EXPLANATION", ea."END_BAL", ea."NUM_SHARES",
ea."PAR_VALUE" , ag."DESCR", ea."EOY", ea."FAKEPEGSTRIP",
(select sum(htb.END_FNC_CUR_US_GAAP)
from EQUITY_ACCOUNTS iea
Join HYPERION_TRIAL_BALANCE htb
On htb.PEGSTRIP = iea.PEGSTRIP
and htb.LOC_ID = iea.LOC_ID
and htb.PRD_YY = iea.EOY
where htb.PRD_NBR = 0 ) firstHyp
FROM "TAXPALL"."ACCOUNT_GROUPING" ag
JOIN "TAXP"."EQUITY_ACCOUNTS" ea
ON ag."ACCT_ID"=ea."PEGSTRIP"
AND ag."EOY"=ea."EOY"
JOIN "TAXP"."LOCATION" lo
ON ea."LOC_ID"=lo."LOC_ID"
WHERE ea."EOY"=2009
ORDER BY ea."LOC_ID", ea."PEGSTRIP"
Notice that the subquery that generates firstHyp is not in any way dependant on the tables in the outer query... It is therefore not a Correllated SubQuery... meaning that the value it generates will NOT be different for each row in the outer query's resultset, it will be the same for every row. You need to somehow put something in the subquery that makes it dependant on the value of some row in the outer query so that it will become a correllated subquery and run over and over once for each outer row....
Also, you mention a pair of subselects, but I only see one. Where is the other one ?

Use:
SELECT ea.LOC_ID,
lo.DESCR,
ea.PEGSTRIP,
ea.ENTITY_OWNER,
ea.PCT_OWNERSHIP,
ea.BEG_BAL,
ea.ADDITIONS,
ea.DISPOSITIONS,
ea.EXPLANATION,
ea.END_BAL,
ea.NUM_SHARES,
ea.PAR_VALUE,
ag.DESCR,
ea.EOY,
ea.FAKEPEGSTRIP,
NVL(SUM(htb.END_FNC_CUR_US_GAAP), 0) AS firstHyp
FROM TAXPALL.ACCOUNT_GROUPING ag
JOIN TAXP.EQUITY_ACCOUNTS ea ON ea.PEGSTRIP = ag.ACCT_ID
AND ea.EOY = ag.EOY
AND ea.EOY = 2009
JOIN TAXP.LOCATION lo ON lo.LOC_ID = ea.LOC_ID
LEFT JOIN HYPERION_TRIAL_BALANCE htb ON htb.PEGSTRIP = ea.PEGSTRIP
AND htb.LOC_ID = ea.LOC_ID
AND htb.PRD_YY = ea.EOY
AND htb.PRD_NBR = 0
GROUP BY ea.LOC_ID,
lo.DESCR,
ea.PEGSTRIP,
ea.ENTITY_OWNER,
ea.PCT_OWNERSHIP,
ea.BEG_BAL,
ea.ADDITIONS,
ea.DISPOSITIONS,
ea.EXPLANATION,
ea.END_BAL,
ea.NUM_SHARES,
ea.PAR_VALUE,
ag.DESCR,
ea.EOY,
ea.FAKEPEGSTRIP,
ORDER BY ea.LOC_ID, ea.PEGSTRIP
I agree with Charles Bretana's assessment that the original SELECT in the SELECT clause was not correlated, which is why the value never changed per row. But the sub SELECT used the EQUITY_ACCOUNTS table, which is the basis for the main query. So I removed the join, and incorporated the HYPERION_TRIAL_BALANCE table into the main query, using a LEFT JOIN. I wrapped the SUM in an NVL rather than COALESCE because I didn't catch what version of Oracle this is for.

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!

Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.

A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY

Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgres SQL - performance issue with boolean field - sql

Related

If transaction within date range, then return customer name (and not all the transactions!)

Not Exists clause -Query

The "where" condition worked not as expected ("or" issue)

Subselect a Summed col in Oracle

MySQL to PostgreSQL: GROUP BY issues

Categories

Resources