Work Around for SQL Query 'NOT IN' that takes forever?

Work Around for SQL Query 'NOT IN' that takes forever? - sql

I am trying to run a query on an Oracle 10g DB to try and view 2 groups of transactions. I want to view basically anyone who has a transaction this year (2014) that also had a transaction in the previous 5 years. I then want to run a query for anyone who has a transaction this year (2014) that hasn't ordered from us in the last 5 years. I assumed I could do this with the 'IN' and 'NOT IN' features. The 'IN' query runs fine but the 'NOT IN' never completes. DB is fairly large which is probably why. Would love any suggestions from the experts!
*Notes, [TEXT] is a description of our Customer's Company name, sometimes the accounting department didn't tie this to our customer ID which left NULL values, so using TEXT as my primary grouping seemed to work although the name is obscure. CODE_D is a product line just to bring context to the name.
Below is my code:
SELECT CODE_D, sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
FROM
gen_led_voucher_row_tab
WHERE ACCOUNTING_YEAR like '2014'
and TEXT NOT IN
(select TEXT
from gen_led_voucher_row_tab
and voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
)
GROUP BY CODE_D
ORDER BY TOTAL DESC

Try using a LEFT JOIN instead of NOT IN:
SELECT t1.CODE_D, sum(coalesce(t1.credit_amount, 0) - coalesce(t1.debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab AS t1
LEFT JOIN gen_led_voucher_row_tab AS t2
ON t1.TEXT = t2.TEXT
AND t2.voucher_date >= '01-JUN-09'
AND t2.voucher_date < '01-JUN-14'
AND (credit_amount > '1' or debet_amount > '1')
WHERE t2.TEXT IS NULL
AND t1.ACCOUNTING_YEAR = '2014'
GROUP BY CODE_D
ORDER BY TOTAL DESC
ALso, make sure you have an index on the TEXT column.

You can increase your performance by changing the Not In clause to a Where Not Exists like as follows:
Where Not Exists
(
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text
)
You'll need to alias the first table as well to a for this to work. Essentially, you're pulling back a ton of data to just discard it. Exists invokes a Semi Join which does not pull back any data at all, so you should see significant improvement.
Edit
Your query, as of the current update to the question should be this:
SELECT CODE_D,
sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab a
Where ACCOUNTING_YEAR like '2014'
And Not Exists
(
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text
)
GROUP BY CODE_D
ORDER BY TOTAL DESC

Related

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!

The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

SQL Query to show order of work orders

First off sorry for the poor subject line.
EDIT: The Query here duplicates OrderNumbers I am needing the query to NOT duplicate OrderNumbers
EDIT: Shortened the question and provided a much cleaner question
I have a table that has a record of all of the work orders that have been performed. there are two types of orders. Installs and Trouble Calls. My query is to find all of the trouble calls that have taken place within 30 days of an install and match that trouble call (TC) to the proper Install (IN). So the Trouble Call date has to happen after the install but no more than 30 days after. Additionally if there are two installs and two trouble calls for the same account all within 30 days and they happen in order the results have to reflect that. The problem I am having is I am getting an Install order matching to two different Trouble Calls (TC) and a Trouble Call(TC) that is matching to two different Installs(IN)
In the example on SQL Fiddle pay close attention to the install order number 1234567810 and the Trouble Call order number 1234567890 and you will see the issue I am having.
http://sqlfiddle.com/#!3/811df/8
select b.accountnumber,
MAX(b.scheduleddate) as OriginalDate,
b.workordernumber as OriginalOrder,
b.jobtype as OriginalType,
MIN(a.scheduleddate) as NewDate,
a.workordernumber as NewOrder,
a.jobtype as NewType
from (
select workordernumber,accountnumber,jobtype,scheduleddate
from workorders
where jobtype = 'TC'
) a join
(
select workordernumber,accountnumber,jobtype,scheduleddate
from workorders
where jobtype = 'IN'
) b
on a.accountnumber = b.accountnumber
group by b.accountnumber,
b.scheduleddate,
b.workordernumber,
b.jobtype,
a.accountnumber,
a.scheduleddate,
a.workordernumber,
a.jobtype
having MIN(a.scheduleddate) > MAX(b.scheduleddate) and
DATEDIFF(day,MAX(b.scheduleddate),MIN(a.scheduleddate)) < 31
Example of what I am looking for the results to look like.
Thank you for any assistance you can provide in setting me on the correct path.

You were actually very close. I realized that what you really want is the MIN() TC date that is greater than each install date for that account number so long as they are 30 days or less apart.
So really you need to group by the install dates from your result set excluding WorkOrderNumbers still. Something like:
SELECT a.AccountNumber, MIN(a.scheduleddate) TCDate, b.scheduleddate INDate
FROM
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'TC'
) a
INNER JOIN
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'IN'
) b
ON a.AccountNumber = b.AccountNumber
WHERE b.ScheduledDate < a.ScheduledDate
AND DATEDIFF(DAY, b.ScheduledDate, a.ScheduledDate) <= 30
GROUP BY a.AccountNumber, b.AccountNumber, b.ScheduledDate
This takes care of the dates and AccountNumbers, but you still need the WorkOrderNumbers, so I joined the workorders table back twice, once for each type.
NOTE: I assume that each workorder has a unique date for each account number. So, if you have workorder 1 ('TC') for account 1 done on '1/1/2015' and you also have workorder 2 ('TC') for account 1 done on '1/1/2015' then I can't guarantee that you will have the correct WorkOrderNumber in your result set.
My final query looked like this:
SELECT
aggdata.AccountNumber, inst.workordernumber OriginalWorkOrderNumber, inst.JobType OriginalJobType, inst.ScheduledDate OriginalScheduledDate,
tc.WorkOrderNumber NewWorkOrderNumber, tc.JobType NewJobType, tc.ScheduledDate NewScheduledDate
FROM (
SELECT a.AccountNumber, MIN(a.scheduleddate) TCDate, b.scheduleddate INDate
FROM
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'TC'
) a
INNER JOIN
(
SELECT WorkOrderNumber, ScheduledDate, JobType, AccountNumber
FROM workorders
WHERE JobType = 'IN'
) b
ON a.AccountNumber = b.AccountNumber
WHERE b.ScheduledDate < a.ScheduledDate
AND DATEDIFF(DAY, b.ScheduledDate, a.ScheduledDate) <= 30
GROUP BY a.AccountNumber, b.AccountNumber, b.ScheduledDate
) aggdata
LEFT OUTER JOIN workorders tc
ON aggdata.TCDate = tc.ScheduledDate
AND aggdata.AccountNumber = tc.AccountNumber
AND tc.JobType = 'TC'
LEFT OUTER JOIN workorders inst
ON aggdata.INDate = inst.ScheduledDate
AND aggdata.AccountNumber = inst.AccountNumber
AND inst.JobType = 'IN'

select in1.accountnumber,
in1.scheduleddate as OriginalDate,
in1.workordernumber as OriginalOrder,
'IN' as OriginalType,
tc.scheduleddate as NewDate,
tc.workordernumber as NewOrder,
'TC' as NewType
from
workorders in1
out apply (Select min(in2.scheduleddate) as scheduleddate from workorders in2 Where in2.jobtype = 'IN' and in1.accountnumber=in2.accountnumber and in2.scheduleddate>in1.scheduleddate) ins
join workorders tc on tc.jobtype = 'TC' and tc.accountnumber=in1.accountnumber and tc.scheduleddate>in1.scheduleddate and (ins.scheduleddate is null or tc.scheduleddate<ins.scheduleddate) and DATEDIFF(day,in1.scheduleddate,tc.scheduleddate) < 31
Where in1.jobtype = 'IN'

Calculated fields from queries in CTE are quite slow, how to optimize

I have a query with calculated fields which involves looking up a dataset within a CTE for each of them, but it's quite slow when I get to a couple of these fields.
Here's an idea:
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N)
If I had many of these calculated fields, the query can take up to 10 minutes to execute. Gathering the data in the CTE takes about 15 seconds for around 1,000,000 records.
I don't really need JOINS since I'm not really using the data that a JOIN would do, I only want to check for the existence of records in TRNS_RPT with certains criterias and set alias fields to certain values whether I find such records or not.
Can you help me optimize this ? Thanks

From looking at your code I would probably do a join instead to avoid having to include TRNCTE twice in the query. It is not exactly the same if TRNCTE.PORT_N is not unique and you would get duplicate rows.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN MONTH(TRNCTE.SETTLE_DATE) = 12)
THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
JOIN TRNCTE ON C.PORT_N = TRNCTE.PORT_N

There is a trick what you can do in cases like this. You need to insert the result of the cte into a temp table. This way you will save the cost of the recalculation.
Plese give it a try.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT * INTO #temp
FROM TRNCTE
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N)
DROP TABLE #temp

SQL Select case to create new value

Hello guys I am sorry but I didn’t know what I should call this question.
I have a table that contains information one these are how long it took from when the row was created and until it was last updated these are shown within the following columns:
CREATED
LAST_UPD
The time difference between these are shown in a separate column called:
SOLVED_SEC
(The time is shown in seconds)
Now I want to collect some of the data from this table but should the CREATED (which is a date) be outside of our company’s opening hours, the SOLVED_SEC should recalculated in my
Our opening hours exists in a table called KS_DRIFT.SYS_DATE_KS.
This table has a column named: THIS_DATE_OPENING.
I was thinking that I could calculate the new solved time as such:
THIS_DATE_OPENING-LAST_UPD
However I’m not quite sure how to do this
The following is the SQL that i have right now:
SELECT
TIDSPUNKT, LAST_UPD, AA.CREATED,
TRUNC(AA.SOLVED_SEC/60/60,2) as LØST_TIME,
//this is my attempt
CASE
WHEN AA.CREATED >= CC.THIS_DATE_CLOSING
THEN LØST_TIME = (LAST_UPD-CC.THIS_DATE_OPENING) AS LØST_TIME
END,
COUNT(CASE WHEN AA.LAST_UPD >= CC.THIS_DATE_CLOSING THEN 1 END) as AFTER_CLOSING,
COUNT(CASE WHEN STATUS ='Færdig' THEN 1 END)as Completed_Callbacks
FROM
KS_DRIFT.NYK_SIEBEL_CALLBACK_AGENT_H_V AA
INNER JOIN
KS_DRIFT.V_TEAM_DATO BB ON AA.TIDSPUNKT = BB.DATO
RIGHT JOIN
KS_DRIFT.SYS_DATE_KS CC ON AA.TIDSPUNKT = CC.THIS_DATE
WHERE
AA.TIDSPUNKT BETWEEN '2012-04-01' AND '2013-04-04'
AND AA.AFSLUTTET_AF = BB.INITIALER
GROUP BY
AA.TIDSPUNKT, LØST_SEKUNDER, LAST_UPD, AA.CREATED
Sadly this doesn’t work.
My question is how can I change the value of SOLVED_SEC if the CREATED > THIS_DATE_CLOSED ?
Should you require additional information please do not hesitate to comment.
UPDATE
I have tried the following:
CASE WHEN (AA.CREATED >= CC.THIS_DATE_CLOSING) THEN (AA.LAST_UPD-CC.THIS_DATE_OPENING) END AS SOVLED_AFTER_OPENING
However i get
"not a GROUP BY expression"
UPDATE 2
My SQL statement now looks like this:
SELECT TIDSPUNKT,
AA.CREATED,
LAST_UPD,
AA.AGENTGRUPPE,
TRUNC(AA.LØST_SEKUNDER/60/60,2) as LØST_TIME,
'LØST_TIME' = CASE WHEN AA.CREATED >= CC.THIS_DATE_CLOSING THEN DATEDIFF(ss, CC.THIS_DATE_OPENING, LAST_UPD) END,
COUNT(CASE WHEN AA.AGENTGRUPPE not in('Hovednumre','Privatcentre','Forsikring','Hotline','Stabe','Kunder','Erhverv','NykreditKunder','Servicecentret') THEN 1 END) as CALLBACKS_OUTSIDE_OF_KS,
COUNT(CASE WHEN AA.CREATED >= CC.THIS_DATE_CLOSING THEN 1 END) as AFTER_CLOSING,
COUNT(CASE WHEN STATUS ='Færdig' THEN 1 END)as Completed_Callbacks
FROM KS_DRIFT.NYK_SIEBEL_CALLBACK_AGENT_H_V AA
INNER JOIN KS_DRIFT.V_TEAM_DATO BB ON AA.TIDSPUNKT = BB.DATO
RIGHT JOIN KS_DRIFT.SYS_DATE_KS CC ON AA.TIDSPUNKT = CC.THIS_DATE
WHERE AA.TIDSPUNKT BETWEEN '2013-04-01' AND '2013-04-04'
AND AA.AFSLUTTET_AF = BB.INITIALER
GROUP BY AA.TIDSPUNKT, LAST_UPD, AA.CREATED, AA.LØST_SEKUNDER,
AA.AFSLUTTET_AF, AA.AGENTGRUPPE
However i get From keyword not found where expected

You need to use the DATEDIFF function when trying to calculate differences in time.
For example:
SELECT DATEDIFF(ss, THIS_DATE_OPENING, LAST_UPD);
Where "ss" denotes seconds. This first parameter is the datepart that you want to calculate. Like seconds or minutes or days or whatever.
You can find the documentation here http://msdn.microsoft.com/en-us/library/aa258269(v=sql.80).aspx
Let me know if I didn't understand your question correctly.
I also just noticed that you're trying to do this:
CASE WHEN AA.CREATED >= CC.THIS_DATE_CLOSING THEN LØST_TIME =(LAST_UPD-CC.THIS_DATE_OPENING) AS LØST_TIME END
Try this instead:
'L0ST_TIME' = CASE WHEN AA.CREATED >= CC.THIS_DATE_CLOSING THEN DATEDIFF(ss, CC.THIS_DATE_OPENING, LAST_UPD) END
HTH

count() on where clause from a different table

I have a search function to search for mysql results depending on their inputs. Now I wanted to include on the where clause the count() of likes for the result from a different table.
Like this scenario. I want to search "Dog" where count(like_id) >= 1.
The Likes Table that I want to include is: TABLE_GLOBAL_LIKES.
The Structure of TABLE_GLOBAL_LIKES is:
Fields: like_id, user_id, products_id, blog_id.
BOTH TABLE_GLOBAL_PRODUCTS and TABLE_GLOBAL_LIKES has a common field of products_id and blog_id to associate with.
This is my working query that I want the Likes table to included.
SELECT SQL_CALC_FOUND_ROWS p.*,
CASE
WHEN p.specials_new_products_price > 0.0000
AND (p.expires_date > Now()
OR p.expires_date IS NULL
OR p.expires_date ='0000-00-00 00:00:00')
AND p.status != 0 THEN p.specials_new_products_price
ELSE p.products_price
END price
FROM ".TABLE_GLOBAL_PRODUCTS." p
INNER JOIN ".TABLE_STORES." s ON s.blog_id = p.blog_id
WHERE MATCH (p.products_name,
p.products_description) AGAINST ('*".$search_key."*')
AND p.display_product = '1'
AND p.products_status = '1' HAVING price <= ".$priceto_key."
AND price >= ".$pricefrom_key."
ORDER BY p.products_date_added DESC, p.products_name
I'm a newbie in mysql queries.. Please help.

Try this:
SELECT SQL_CALC_FOUND_ROWS p.*, COUNT(l.like_id)
CASE
WHEN p.specials_new_products_price > 0.0000
AND (p.expires_date > Now()
OR p.expires_date IS NULL
OR p.expires_date ='0000-00-00 00:00:00')
AND p.status != 0 THEN p.specials_new_products_price
ELSE p.products_price
END price
FROM ".TABLE_GLOBAL_PRODUCTS." p
INNER JOIN ".TABLE_STORES." s ON s.blog_id = p.blog_id
INNER JOIN ".TABLE_GLOBAL_LIKES." l ON l.blog_id = p.blog_id AND l.products_id = p.products_id
WHERE MATCH (p.products_name,
p.products_description) AGAINST ('*".$search_key."*')
AND p.display_product = '1'
AND p.products_status = '1' HAVING price <= ".$priceto_key."
AND price >= ".$pricefrom_key."
GROUP BY p.products_id
HAVING COUNT(l.like_id)>0
ORDER BY p.products_date_added DESC, p.products_name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Work Around for SQL Query 'NOT IN' that takes forever? - sql

Related

How to solve a nested aggregate function in SQL?

SQL Query to show order of work orders

Calculated fields from queries in CTE are quite slow, how to optimize

SQL Select case to create new value

count() on where clause from a different table

Categories

Resources