Find all SELECT statements without a WHERE clause using a regex - sql

I'm trying to scan my codebase to find all select queries without a where clause using regex. The results will be fed into an IDE or a grep file output, but must contain the full matching queries only.
My biggest challenge is getting the entire statement without the WHERE. The caveats are:
some selects don't have a where but also don't have a FROM
some selects target a database view (always starts with a vw) which don't need a where clause
Here's a sample list of all queries fetched from one file:
'
DECLARE #RowsAffected INT = 0;
INSERT INTO tblInvoice (InvID, OcID, InvTimeStamp)
SELECT DISTINCT OcInvID, OcID, GETDATE() AS InvTimeStamp
FROM tblOrderCost OC WITH(NOLOCK)
INNER JOIN tblVendor WITH(NOLOCK) ON InvVendorID = VendorID AND VendorType = 1 -- 1 for supplier.
INNER JOIN #tmpOpID tmp WITH(NOLOCK) ON tmp.OpID = OcOpID
WHERE id = ' . quote($order_id, NUMERIC);
$sql = ' SELECT rphrpid,rphwho,rphdate,rphnotes,opid
FROM tblReplacementPartHistory (nolock)
INNER JOIN tblReplacementPart (nolock)
ON rphrpid = rpid
INNER JOIN tblOrderProduct (nolock)
ON rpopid = opid
WHERE oporid =' . quote($order_id, NUMERIC)
. 'ORDER BY rphrpid';
select * from table where id = 1;
'
select count()
';
DECLARE #RowsAffected INT = 0;
INSERT INTO tblInvoice (InvID, OcID, InvTimeStamp)
SELECT DISTINCT OcInvID, OcID, GETDATE() AS InvTimeStamp
FROM tblOrderCost OC WITH(NOLOCK)
INNER JOIN tblVendor WITH(NOLOCK) ON InvVendorID = VendorID AND VendorType = 1
INNER JOIN #tmpOpID tmp WITH(NOLOCK) ON tmp.OpID = OcOpID';
$sql = ' SELECT rphrpid,rphwho,rphdate,rphnotes,opid
FROM tblReplacementPartHistory (nolock)
INNER JOIN tblReplacementPart (nolock)
ON rphrpid = rpid
INNER JOIN tblOrderProduct (nolock)
ON rpopid = opid
ORDER BY rphrpid';
SELECT rphrpid,rphwho,rphdate,rphnotes,opid
FROM vwOrder';
select * from tbl;
I tried several variations of regex patterns and the closest I got was finding matches with the WHERE line stripped out. I would like to have the entire match made only if the query does not have a WHERE clause. I tried the following
SELECT(.*)(\s)*FROM(\s|.)+?((?!.*where))(?=(';|";|;))
SELECT\s*(?!.*\s*where|vw(\w)*).*\s*(';|";|;)
SELECT[^;\n]*(?:\n(?![^\n;]*where)[^;\n]*)*\n[\n]*
The work can also be tested in the regex101 sandbox: https://regex101.com/r/jvbLOE/1
What I expect to see, given the sample data, is only three matches
1. SELECT DISTINCT OcInvID, OcID, GETDATE() AS InvTimeStamp
FROM tblOrderCost OC WITH(NOLOCK)
INNER JOIN tblVendor WITH(NOLOCK) ON InvVendorID = VendorID AND VendorType = 1
INNER JOIN #tmpOpID tmp WITH(NOLOCK) ON tmp.OpID = OcOpID';
2. SELECT rphrpid,rphwho,rphdate,rphnotes,opid
FROM tblReplacementPartHistory (nolock)
INNER JOIN tblReplacementPart (nolock)
ON rphrpid = rpid
INNER JOIN tblOrderProduct (nolock)
ON rpopid = opid
ORDER BY rphrpid';
3. select * from tbl;

You can use following regex to match select queries not having a where clause in it based on your example.
/(?!.*where)select.*?;/gis
Regex 101 example:
https://regex101.com/r/XaGXp6/1

Related

Passing different column values to where clause

SELECT pims.icicimedicalexaminerreport.id,
pims.icicimerfemaleapplicant.adversemenstrualid,
pims.icicimerfemaleapplicant.pregnantid,
pims.icicimerfemaleapplicant.miscarriageabortionid,
pims.icicimerfemaleapplicant.breastdiseaseid,
pims.pimscase.tiannumber
FROM pims.pimscase
INNER JOIN pims.digitization
ON pims.pimscase.digitizationid = pims.digitization.id
INNER JOIN pims.medicalexaminerreport
ON pims.digitization.medicalexaminerreportid =
pims.medicalexaminerreport.id
INNER JOIN pims.icicimedicalexaminerreport
ON pims.medicalexaminerreport.id =
pims.icicimedicalexaminerreport.id
INNER JOIN pims.icicimerfemaleapplicant
ON pims.icicimedicalexaminerreport.id =
pims.icicimerfemaleapplicant.id
WHERE pims.pimscase.tiannumber = 'ICICI1234567890'
which gives me the following output
Now I want to use the above output values to select the rows from the table "YesNoAnswerWithObservation"
I imagine it should look something like this Select * from YesNoAnswerWithObservation Where Id in (22,27,26,...23)
Only instead of typing the values inside IN clause I want to use the values in each column resulting from above-mentioned query.
I tried the below code but it returns all the rows in the table rather than rows mentioned inside the In
SELECT pims.yesnoanswerwithobservation.observation,
graphitegtccore.yesnoquestion.description,
pims.yesnoanswerwithobservation.id ObservationId
FROM pims.yesnoanswerwithobservation
INNER JOIN graphitegtccore.yesnoquestion
ON pims.yesnoanswerwithobservation.yesnoanswerid =
graphitegtccore.yesnoquestion.id
WHERE EXISTS (SELECT pims.icicimedicalexaminerreport.id,
pims.icicimerfemaleapplicant.adversemenstrualid,
pims.icicimerfemaleapplicant.pregnantid,
pims.icicimerfemaleapplicant.pelvicorgandiseaseid,
pims.icicimerfemaleapplicant.miscarriageabortionid,
pims.icicimerfemaleapplicant.gynocologicalscanid,
pims.icicimerfemaleapplicant.breastdiseaseid,
pims.pimscase.tiannumber
FROM pims.pimscase
INNER JOIN pims.digitization
ON pims.pimscase.digitizationid =
pims.digitization.id
INNER JOIN pims.medicalexaminerreport
ON pims.digitization.medicalexaminerreportid =
pims.medicalexaminerreport.id
INNER JOIN pims.icicimedicalexaminerreport
ON pims.medicalexaminerreport.id =
pims.icicimedicalexaminerreport.id
INNER JOIN pims.icicimerfemaleapplicant
ON pims.icicimedicalexaminerreport.id =
pims.icicimerfemaleapplicant.id
WHERE pims.pimscase.tiannumber = 'ICICI1234567890')
Any help or a nudge in the right direction would be greatly appreciated
Presumably you want the ids from the first query:
SELECT awo.observation, ynq.description, ynq.id as ObservationId
FROM pims.yesnoanswerwithobservation awo JOIN
graphitegtccore.yesnoquestion ynq
ON awo.yesnoanswerid = ynq.id
WHERE ynq.id = (SELECT mer.id
FROM pims.pimscase c JOIN
pims.digitization d
ON c.digitizationid = d.id JOIN
pims.medicalexaminerreport mer
ON d.medicalexaminerreportid = mer.id JOIN
pims.icicimedicalexaminerreport imer
ON mer.id = imer.id JOIN
pims.icicimerfemaleapplicant ifa
ON imer.id = ifa.id
WHERE c.tiannumber = 'ICICI1234567890'
) ;
Notice that table aliases make the query much easier to write and to read.

Multiple Join not working on two string attributes

I have the following issue:
I have several tables in my Database, in order to check for a specific criteria I have to join several tables, where I use the following statement:
SELECT *
FROM (SELECT * FROM (((((((((SELECT * FROM table1 WHERE tab1_id = 1) AS A
INNER JOIN (SELECT * FROM table2 WHERE tab2_variable IS NOT NULL) AS B
ON A.variable = B.variable)
INNER JOIN (SELECT table3.*, IIF(year='XXXX', 0, 1) AS flag FROM table3) AS C
ON A.h_name = C.h_name)
INNER JOIN (SELECT * FROM table4 WHERE s_flag = 0) AS D
ON C.v_type = D.v_type)
INNER JOIN table5
ON C.ng_type = table5.ng_type)
INNER JOIN table6
ON C.part = table6.part)
INNER JOIN table7
ON C.ifg = table7.ifg)
INNER JOIN table8
ON C.v_type = table8.v_type AND C.ng_typ = table8.ng_typ AND C.ntr_flag = table8.ntr_flag)
INNER JOIN table9
ON table8.ifg_sii_id = table9.ifg_sii_id)) AS F
LEFT JOIN table10
ON F.risk = table10.risk AND C.v_type = table10.v_type
AND F.series = table10.series
This statement fails. But when I remove one the following two join-conditions in the last Left JOIN, it works as intended:
F.risk = table10.risk and/or C.v_type = table10.v_type
They are both of type CHAR, whereas series is type TINYINT, I guess it has something to do with joining on multiple conditions with strings, but I'm not able to find a workaround, any ideas?
according to your current SQL, Table C is not visible as it's a sub query within F. instead of C.v_type you need to use F.c_v_type and the c_v_type field must come from C table like. select v_type as c_v_type from table3... as C
In the last part of your Query You could try to actually select the table and include the Where Clause Like so:
LEFT JOIN (Select * from table10 Where C.v_type = v_type AND F.series = series) as xx
ON F.risk = xx.risk
Hope this helps

SQL query structure - variables

I have a query where I want to include a new variable. I need the query to check if this variable returns NULL, and if it does, exclude the record from the resultset.
The variable I need to incorporate is #PaymentStatusID_DV but I don't know where to put it where it would make sense. To assign it is this:
SELECT #PaymentStatusID_DV = dbo.fnGetSimpleDvByThirdPartyField(c.ClientID,c.CustomerID,l.LeadID,ca.CaseID,m.MatterID,1490,4370)
So where can I put that in within this query and then check for the NULL? (IS NOT NULL)
DECLARE #SettlementDate DATE,
#PaymentStatusID_DV VARCHAR(2000),
#ClientID = 384
SELECT #SettlementDate=dbo.fnAddWorkingDays ( dbo.fnGetNextWorkingDate (CONVERT(DATE,GETDATE()),0) ,cdv.ValueInt + 1 )
FROM ClientDetailValues cdv WITH (NOLOCK) WHERE cdv.DetailFieldID=170226 AND cdv.ClientID=#ClientID
SELECT
c.CustomerID,ca.CaseID,ca.LatestInProcessLeadEventID [LeadEventID],m.MatterID, COUNT(m.MatterID)
FROM Customers c WITH (NOLOCK)
INNER JOIN Lead l WITH (NOLOCK) ON c.CustomerID=l.CustomerID
INNER JOIN Cases ca WITH (NOLOCK) ON l.LeadID=ca.LeadID
INNER JOIN Matter m WITH (NOLOCK) ON ca.CaseID=m.CaseID
LEFT JOIN MatterDetailValues suspended WITH (NOLOCK) ON m.MatterID=suspended.MatterID AND suspended.DetailFieldID=175275
INNER JOIN LeadTypeRelationship ltr WITH (NOLOCK) ON ltr.ToMatterID=m.MatterID AND ltr.FromLeadTypeID=1492 AND ltr.ToLeadTypeID=1493
INNER JOIN Matter pam WITH (NOLOCK) ON ltr.FromMatterID=pam.MatterID
INNER JOIN CustomerPaymentSchedule cps WITH (NOLOCK) ON cps.CustomerID = c.CustomerID AND cps.WhenCreated > '2017-09-01'
INNER JOIN Account a WITH (NOLOCK) ON a.AccountID = cps.AccountID
WHERE c.Test=0 AND c.ClientID=#ClientID
AND NOT EXISTS ( SELECT * FROM LeadEvent le WITH (NOLOCK) WHERE le.CaseID=ca.CaseID AND le.EventDeleted=0 AND le.EventTypeID=155198 ) -- and collections are not on hold
AND (suspended.ValueInt <> 5144 OR suspended.ValueInt IS NULL) -- and policy status is live
AND cps.CustomerLedgerID IS NULL -- and payment schedule entry has not already been queued in the GL
AND cps.ActualCollectionDate <= #SettlementDate -- and payment date is before or the same as settlement day
AND cps.PaymentGross < 0 -- exclude zero value payments
AND a.AccountTypeID=1 -- and this is a DD payment
GROUP BY c.CustomerID,ca.CaseID,ca.LatestInProcessLeadEventID,m.MatterID
Do you really need to use the variable? The best way would be to include the function in your WHERE statement:
WHERE ... AND dbo.fnGetSimpleDvByThirdPartyField(c.ClientID,c.CustomerID,l.LeadID,ca.CaseID,m.MatterID,1490,4370) IS NOT NULL

sql inner join over multiple tables

I've given this relational scheme and following task:
Inner Join: Return a list of professors, which gives
'lehrveranstaltung' of the 'fachbereich' with the name 'informatik'.
* print 'vorname', 'ho_name', 'lv_name'
* output should sort surnames in ascending order and if they're the same in descending order
* identical lines should online shown once
now I came up with following query:
select distinct
v.vorname,
h.ho_name,
l.lv_name
--print wanted, only once
from
vorname v,
hochschulangehoeriger h,
lehrveranstaltung l
-- from these tables
inner join fachbereich f on f.fb_name = 'Informatik'
-- only the 'informatik' events
inner join prof_haelt_lv on l.lv_nr = pl.lv_nr
-- make sure 'lehrveranstaltung' is from a professor
inner join mitarbeiter mit on pl.pers_Nr = mit.pers_Nr
-- make sure dude is a prof
where
mit.ho_nr = h.ho_nr
and
mit.ho_nr = v.ho_nr -- give only names from prof
order by
2 asc,
3 desc; -- order rules
I think this works for me (can't test it properly). But when I look at it I'll wish that I came up for a bether solution since this looks kinda ugly and wrong for me.
Is there a bether way of doing this? (Have to use inner join)
Based on the table you have, you may use the following SQL statement
SELECT DISTINCT v.vorname,
h.ho_name,
l.lv_name
FROM vorname v
INNER JOIN hochschulangehoeriger h
ON v.ho_nr = h.ho_nr
INNER JOIN mitarbeiter m
ON m.ho_nr = h.ho_nr
INNER JOIN fachbereich f
ON f.fb_nr = m.fb_nr
AND f.fb_name = 'Informatik'
INNER JOIN lehrveranstaltung l
ON l.fb_nr = f.nb_nr
INNER JOIN professor p
ON p.pers_nr = m.pers_nr
INNER JOIN prof_haelt_lv pl
ON pl.pers_nr = p.pers_nr
AND pl.lv_nr = l.lv_nr
ORDER BY 2,
3 DESC;
Also, these section on your SQL, this has no connection to any table in your SQL
inner join fachbereich f on f.fb_name = 'Informatik'
-- only the 'informatik' events
you forgot the alias for prof_haelt_lv
inner join prof_haelt_lv on l.lv_nr = pl.lv_nr
-- make sure 'lehrveranstaltung' is from a professor

Query Performance too Slow

Im having performance issues with this query. If I remove the status column it runs very fast but adding the subquery in the column section delays way too much the query 1.02 min. How can I modify this query so it runs fast getting the desired data.
The reason I put that subquery there its because I needed the status for the latest activity, some activities have null status so I have to ignore them.
Establishments: 6.5k rows -
EstablishmentActivities: 70k rows -
Status: 2 (Active, Inactive)
SELECT DISTINCT
est.id,
est.trackingNumber,
est.NAME AS 'establishment',
actTypes.NAME AS 'activity',
(
SELECT stat3.NAME
FROM SACPAN_EstablishmentActivities eact3
INNER JOIN SACPAN_ActivityTypes at3
ON eact3.activityType_FK = at3.code
INNER JOIN SACPAN_Status stat3
ON stat3.id = at3.status_FK
WHERE eact3.establishment_FK = est.id
AND eact3.rowCreatedDT = (
SELECT MAX(est4.rowCreatedDT)
FROM SACPAN_EstablishmentActivities est4
INNER JOIN SACPAN_ActivityTypes at4
ON est4.establishment_fk = est.id
AND est4.activityType_FK = at4.code
WHERE est4.establishment_fk = est.id
AND at4.status_FK IS NOT NULL
)
AND at3.status_FK IS NOT NULL
) AS 'status',
est.authorizationNumber,
reg.NAME AS 'region',
mun.NAME AS 'municipality',
ISNULL(usr.NAME, '') + ISNULL(+ ' ' + usr.lastName, '')
AS 'created',
ISNULL(usr2.NAME, '') + ISNULL(+ ' ' + usr2.lastName, '')
AS 'updated',
est.rowCreatedDT,
est.rowUpdatedDT,
CASE WHEN est.rowUpdatedDT >= est.rowCreatedDT
THEN est.rowUpdatedDT
ELSE est.rowCreatedDT
END AS 'LatestCreatedOrModified'
FROM SACPAN_Establishments est
INNER JOIN SACPAN_EstablishmentActivities eact
ON est.id = eact.establishment_FK
INNER JOIN SACPAN_ActivityTypes actTypes
ON eact.activityType_FK = actTypes.code
INNER JOIN SACPAN_Regions reg
ON est.region_FK = reg.code --
INNER JOIN SACPAN_Municipalities mun
ON est.municipality_FK = mun.code
INNER JOIN SACPAN_ContactEstablishments ce
ON ce.establishment_FK = est.id
INNER JOIN SACPAN_Contacts con
ON ce.contact_FK = con.id
--JOIN SACPAN_Status stat ON stat.id = actTypes.status_FK
INNER JOIN SACPAN_Users usr
ON usr.id = est.rowCreatedBy_FK
LEFT JOIN SACPAN_Users usr2
ON usr2.id = est.rowUpdatedBy_FK
WHERE (con.ssn = #ssn OR #ssn = '*')
AND eact.rowCreatedDT = (
SELECT MAX(eact2.rowCreatedDT)
FROM SACPAN_EstablishmentActivities eact2
WHERE eact2.establishment_FK = est.id
)
--AND est.id = 6266
ORDER BY 'LatestCreatedOrModified' DESC
Try moving that 'activiy' query to a Left Join and see if it speeds it up.
I solved the problem by creating a temporary table and creating an index to it, this removed the need of the slow subquery in the select statement. Then I join the temp table as I do with normal tables.
Thanks to all.