How to JOIN on different CONCAT expressions? - sql

I am attempting to Join tables based on a CONCAT expression containing different text criteria.
I have tried a inefficient method that just takes way too long to query.
SELECT
sf.displayId,
-- tw.displayText,
CASE
WHEN tw.DisplayText IN ('N/A', 'NotApplicable', 'Not Applicable')
THEN 'Not Applicable'
ELSE REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(tw.DisplayText,'client','Client'),'approved','Approved'),'rejected','Rejected'),'review','Review'),'Requirement','Requirement'),'open','Open'),'submitted','Submitted'),'complete','Complete'),'incomplete','Incomplete'),'pending','Pending'),'resubmit','Resubmit'),'Awaiting review','Awaiting Review')
END AS AuditStatus
FROM Connect.Data.supplier_form sf
JOIN Connect.Data.translation tw
ON ((CONCAT('workflowStepName' , sf.workflowStatusId) = tw.translationField) OR (CONCAT('workflowStepMessage' , sf.workflowStatusId) = tw.translationField))
AND tw.language = 'en'
WHERE sf.deleted = 0
AND tw.displayText IN ('Awaiting Review','Awaiting review')
ORDER BY sf.displayId
OFFSET 0 ROWS
FETCH NEXT 100 ROWS ONLY;
I want to JOIN Connect.Data.supplier_form sf based on the CONCAT expression in a far more efficient way.

I'm not sure how good the SQL optimizer is in SQL Server, so you may try to flip the order of the join condition to make it easier to use an index. I would do two things:
I would change the JOIN condition to:
JOIN Connect.Data.translation tw
ON tw.language = 'en'
AND tw.translationField in (
CONCAT('workflowStepName' , sf.workflowStatusId),
CONCAT('workflowStepMessage' , sf.workflowStatusId)
)
And I would add the index:
create index ix1 on Connect.Data.translation(language, translationField);
Give it a try and let us know if it improves the performance.

Related

IF / Case statment in SQL

I have a column where I have 0 or 1. I like to do the following set up:
If 0 than put / use the Region_table (here I have regions like EMEA, AP,LA with finished goods only) and when it 1 then put / use the Plant_table (here I have plants with non-finished goods) data's.
I tried to write it in 2 different statements but it is not good:
,Case
when [FG_NFG_Selektion] = '0' Then 'AC_region'
End as 'AC_region'
,Case
when [FG_NFG_Selektion] = '1' Then 'AC_plant'
End as 'AC_plant'
I'm not 100% clear on what you're looking for, but if you want to get data from different tables based on the value in the [FG_NFG_Selektion] field, you can do something like this:
SELECT
CASE
WHEN [FG_NFG_Selektion] = '0' THEN r.some_col -- If 0, use value from "region" table
WHEN [FG_NFG_Selektion] = '1' THEN p.some_col -- If 1, use value from "plant" table
END AS new_field
FROM MyTable t
LEFT JOIN AC_region r ON t.pk_col = r.pk_col -- get data from "AC_region" table
LEFT JOIN AC_plant p ON t.pk_col = p.pk_col -- get data from "AC_plant" table
;
If [FG_NFG_Selektion] is a numeric field, then you should remove the single quotes: [FG_NFG_Selektion] = 0.
I would strongly recommend putting the conditions in the ON clauses:
SELECT COALESCE(r.some_col, p.some_col) as som_col
FROM t LEFT JOIN
AC_region r
ON t.pk_col = r.pk_col AND
t.FG_NFG_Selektion = '0' LEFT JOIN
AC_plant p
ON t.pk_col = p.pk_col AND
t.FG_NFG_Selektion = '1';
Why do I recommend this? First, this works correctly if there are multiple matches in either table. That is probably not an issue in this case, but it could be in others. You don't want to figure out where extra rows come from.
Second, putting the conditions in the ON clause allows the optimizer/execution engine to take advantage of them. For instance, it is more likely to use FG_NFG_Selektion in an index.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.
looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

Blob in query with joins to create a view

I am struggling with a blob-type-column to join it to a view with multiple joins in it.
The blob-type is the MAINAPL.GRAPHICS.GRAF and highlighted below in my query.
I get following Error from my Oracle Database:
ORA-00932 and it leads to exactly that column.
Is there a possibility to join the blob-type anyways? I need it as a normal column in this complex view.
My main query is this:
CREATE OR REPLACE FORCE VIEW SECAPL.VIEW_DATAFEED2 AS
SELECT
MIN(CASE WHEN MAINAPL.ARTCOPY.SPRID = 'EN' AND MAINAPL.ARTCOPY.ARTCOPYNUM = 1 THEN MAINAPL.ARTCOPY.ART-COPY-1 ELSE NULL END ) COPY1-EN,
MAX(CASE WHEN MAINAPL.CONT.TONGID = 'EN' AND MAINAPL.CLASS1.CLASSTFRLE1 = '1' THEN MAINAPL.CONT.COPY-ONLY END) AS TYPE-EN,
MAINAPL.ARTICLEGRAPHICS.GRAPHID,
**MAINAPL.GRAPHICS.GRAF**,
MAINAPL.GRAPHICS.GRAFFORMAT
FROM
MAINAPL.ARTICLE
LEFT JOIN MAINAPL.ARTCOPY ON MAINAPL.ARTICLE.ARTID = MAINAPL.ARTCOPY.ARTID
INNER JOIN MAINAPL.ARTICLEGRAPHICS ON MAINAPL.ARTICLE.ARTID = MAINAPL.ARTICLEGRAPHICS.ARTID
INNER JOIN MAINAPL.GRAPHICS ON MAINAPL.GRAPHICS.GRAPHICSID = MAINAPL.ARTICLEGRAPHICS.GRAPHICSID
GROUP BY MAINAPL.ARTICLE.ARTID,
MAINAPL.ARTICLEGRAPHICS.GRAPHID,
**MAINAPL.GRAPHICS.GRAF**,
MAINAPL.GRAPHICS.GRAFFORMAT
I think the problem is that you are trying to group by the blob column. You can work around that easily. Use two selects. The first select is basically what you have but without the blob (and possibly without other columns that can be fetched later). You still get the graphid. The second outer select joins the inner select with the mainapl.grapics table on graphid and returns everything including the blob. Hope that made sense?
EDIT:
CREATE OR REPLACE FORCE VIEW SECAPL.VIEW_DATAFEED2 AS
select sub.copy1-en, sub.type-en, sub.graphicsid,
graph.graf, graph.grafformat
from (SELECT
MIN(CASE WHEN MAINAPL.ARTCOPY.SPRID = 'EN' AND MAINAPL.ARTCOPY.ARTCOPYNUM = 1 THEN MAINAPL.ARTCOPY.ART-COPY-1 ELSE NULL END ) COPY1-EN,
MAX(CASE WHEN MAINAPL.CONT.TONGID = 'EN' AND MAINAPL.CLASS1.CLASSTFRLE1 = '1' THEN MAINAPL.CONT.COPY-ONLY END) AS TYPE-EN,
MAINAPL.ARTICLEGRAPHICS.GRAPHICSID
FROM MAINAPL.ARTICLE
LEFT JOIN MAINAPL.ARTCOPY ON MAINAPL.ARTICLE.ARTID = MAINAPL.ARTCOPY.ARTID
INNER JOIN MAINAPL.ARTICLEGRAPHICS ON MAINAPL.ARTICLE.ARTID = MAINAPL.ARTICLEGRAPHICS.ARTID
INNER JOIN MAINAPL.GRAPHICS ON MAINAPL.GRAPHICS.GRAPHICSID = MAINAPL.ARTICLEGRAPHICS.GRAPHICSID
GROUP BY MAINAPL.ARTICLEGRAPHICS.GRAPHICSID
) sub
join MAINAPL.GRAPHICS ON MAINAPL.GRAPHICS.GRAPHICSID = sub.GRAPHICSID
This may not work, but should illustrate the point. I rewrote the inner SQL a bit as you seemed to group by too much, I'm not sure if that was good or bad. I also corrected a possible mistake with the graphicsid column name.
Anyway, the point should be clear - find the records you need with an inner SQL that includes the id you need in order to fetch the blob in the outer SQL.

Joining tables based on values from other tables

I have the following tables. I want to run a query but I think my beginner tsql level won't help here.. It probably also is a situation where I have a bad database design.
Basically I need to select all fields from tblPhotoGalleries. Also I need to create a seperate field named GalleryCategoryName.
GalleryCategoryName field will be the pCatName in tblPhotoGalleryCats.
If pCatName in tblPhotoGalleryCats = '0', then that would mean, ConnectedNewsCatID is something other than 0. In that case;
GalleryCategoryName will be the CategoryName field from tblNewsCategories where CategoryID = ConnectedNewsCatID
Use a left join on the news category table, and use a case expression to choose between the names:
select
g.pgID, g.gName,
GalleryCategoryName = case c.pCatName when '0' then n.CategoryName else c.pCatName end
from tblPhotoGalleries g
inner join tblPhotoGFalleryCats c on c.pCatID = g.FK_pCatID
left join tblNewsCategories n on n.CategoryOd = c.ConnectedNewsCatID
Try starting here:
select *,
case when PGC.pCatName = '0' then NC.CategoryName else PGC.pCatName end as [CatName]
from tblPhotoGalleries as PG inner join
tblPhotoGalleryCats as PGC on PGC.pCatID = FK_pCatID left outer join
tblNewsCategories as NC on NC.CategoryId = ConnectedNewsCatID

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!
Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.
A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY
Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.