Calculated fields from queries in CTE are quite slow, how to optimize - sql

I have a query with calculated fields which involves looking up a dataset within a CTE for each of them, but it's quite slow when I get to a couple of these fields.
Here's an idea:
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM TRNCTE WHERE PORT_N = C.PORT_N)
If I had many of these calculated fields, the query can take up to 10 minutes to execute. Gathering the data in the CTE takes about 15 seconds for around 1,000,000 records.
I don't really need JOINS since I'm not really using the data that a JOIN would do, I only want to check for the existence of records in TRNS_RPT with certains criterias and set alias fields to certain values whether I find such records or not.
Can you help me optimize this ? Thanks

From looking at your code I would probably do a join instead to avoid having to include TRNCTE twice in the query. It is not exactly the same if TRNCTE.PORT_N is not unique and you would get duplicate rows.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN MONTH(TRNCTE.SETTLE_DATE) = 12)
THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
JOIN TRNCTE ON C.PORT_N = TRNCTE.PORT_N

There is a trick what you can do in cases like this. You need to insert the result of the cte into a temp table. This way you will save the cost of the recalculation.
Plese give it a try.
;WITH TRNCTE AS
(
SELECT TRN.PORT_N, TRN.TRADE_DATE, TRN.TRANS_TYPE, TRN.TRANS_SUB_CODE, TRN.SEC_TYPE, TRN.SETTLE_DATE
FROM TRNS_RPT TRN
WHERE TRN.TRADEDT >= '2014-01-01' AND TRN.TRADEDT <= '2014-12-31'
)
SELECT * INTO #temp
FROM TRNCTE
SELECT
C.CLIENT_NAME,
C.PORT_N,
C.PHONE_NUMBER,
CASE
WHEN EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N AND MONTH(SETTLE_DATE) = 12) THEN 'DECEMBER TRANSACTION'
ELSE 'NOT DECEMBER TRANSACTION'
END AS ALIAS1
FROM CLIENTS C
WHERE EXISTS(SELECT TOP 1 1 FROM #temp WHERE PORT_N = C.PORT_N)
DROP TABLE #temp

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

CTE doesn't work in SSAS Cube. I want to find solution or convert it ti subquery

I write this CTE query and the explanation is:
WITH TP AS
(select
c.ID, c.PeriodCId, c.PeriodName, c.Status, c.StatusChangeDate, CAST(c.StartDate AS DATE) AS StartDate, c.EndDate,c.PeriodCode,
c.PeriodType, c.ParentCId, c.MarketId, c.ParentId, c.WD, LEFT(CONVERT(varchar, c.StartDate, 112), 6) AS YEARMONTH,
(select count(*) from dTimePeriod c2 where c2.ParentId = c.ID and c2.Status='actv') as #children
from dTimePeriod c
where (MarketId = 7) ),
TP2 AS
( SELECT *
FROM TP
WHERE #children='12' ),
TP3 AS
(SELECT TP.*, CASE WHEN (TP.WD IS NOT NULL) AND (TP.StartDate <= getdate()) AND TP2.ID=TP.ParentId THEN 18 ELSE NULL END AS WorkingDays
FROM TP LEFT JOIN TP2 ON TP2.ID=TP.ParentId)
select * from TP3
order by ID
and this is the result
CTE Image
I have recursive table called [dTimePeriod] this table contains different cycles and each cycle contains different number of periods,EX: one cycle has 8 periods another cycle has 12 periods and so on, I want if cycles contains 12 periods put to each period value = 18 and for others cycle periods null
and there are some another conditions but it's not the issue.
And when I put it in the SSAS cube doesn't work because the cube doesn't understand the CTE so I tried to find a solution but it doesn't work,
one of them to put this CTE in a view and call this view in the cube but the view doesn't work as well.
so I start to write it as subquery to make the cube able to understand it.
but I am stuck, I can't write this CTE in subquery statement
and this is the subquery where I stuck
SELECT c.ID, c.PeriodCId, c.PeriodName, c.Status, c.StatusChangeDate, CAST(c.StartDate AS DATE) AS StartDate, c.EndDate,
c.PeriodCode, c.PeriodType, c.ParentCId, c.MarketId, c.ParentId, c.WD,
LEFT(CONVERT(varchar, c.StartDate, 112), 6) AS YEARMONTH,
CASE WHEN (c.WD IS NOT NULL) AND (c.StartDate <= getdate()) THEN 18
WHEN (c.WD IS NOT NULL) AND (c.StartDate > getdate()) THEN NULL ELSE c.WD END AS WorkingDays,
case when (select sub.* from
(select count(*) as children from dTimePeriod c2 where c2.ParentId = c.ID and c2.Status='actv' ) sub ) = 12
then 18 else null end as WWW
FROM dTimePeriod c
WHERE (c.MarketId = 7)
Instead of using the SQL command directly in the data source for your cube, you can turn this query into a stored procedure and use that result set in the cube. For the SQL statement in the cube, just do an EXEC command as below. The database name isn't necessary if this database is already the initial catalog in the connection string of the data source.
EXEC YourDatabase.YourSchema.YourSP

Create weighted average in SQL using dates

I have a SQL query that lists details about a certain item. Everything works as should except for the last column. I want the weight of transaction column to report back a difference in days.
So for example the 4th row in the txdate column is 05/21/2014 and the 3rd row is 05/12/20014. The weight of transaction column in the 4th row should say 9.
I read about the Lag and Lead functions, but I'm not sure how to implement those with dates (if it's even possible). If it isn't possible is there a way to accomplish this?
Select t.txNumber,
t.item,
t.txCode,
t.txdate,
(t.onhandlocold + t.stockQty) as 'Ending Quantity',
tmax.maxtnumber 'Latest Transaction Code',
tmax.maxdate 'Latest Transaction Date',
tmin.mindate 'First Transaction Date',
(t.txdate - tmin.mindate) 'weight of transaction'
From tbliminvtxhistory t
Left outer join
(Select t.item, max(t.txnumber) as maxtnumber, max(t.txdate) as maxdate
From tbliminvtxHistory t
Where t.txCode != 'PAWAY'
Group By Item) tmax
on t.item = tmax.item
Left Outer Join
(Select t.item, min(t.txdate) as mindate
From tbliminvtxHistory t
WHere t.txCode != 'PAWAY'
and t.txdate > DateAdd(Year, -1, GetDate())
Group By Item) tmin
on t.item = tmin.item
where t.item = 'LR50M'
and t.txCode != 'PAWAY'
and t.txdate > DateAdd(Year, -1, GetDate())
Check out the DATEDIFF function, which will return the difference between two dates.
I think this is what you are looking for:
DATEDIFF(dd,tmin.mindate,t.txdate)
UPDATE:
Now that I understand your question a little better, here is an update. As mentioned in a comment on the above post, the LAG function is only supported in SQL 2012 and up. An alternative is to use ROW_NUMBER and store the results into a temp table. Then you can left join back to the same table on the next ROW_NUMBER in the results. Then you would use your DATEDIFF to compare the dates. This will do the exact same thing as the the LAG function.
Example:
SELECT ROW_NUMBER() OVER (ORDER BY txdate) AS RowNumber,*
INTO #Rows
FROM tbliminvtxhistory
SELECT DATEDIFF(dd,r2.txdate,r.txdate),*
FROM #Rows r
LEFT JOIN #Rows r2 ON r.RowNumber=r2.RowNumber+1
DROP TABLE #Rows
I think you are looking for this expression:
Select . . . ,
datediff(day, lag(txdate) over (order by xnumber), txdate)
This assumes that the rows are ordered by the first column, which seems reasonable given your explanation and the sample data.
EDIT:
Without lag() you can use outer apply. For simplicity, let me assume that your query is defined as a CTE:
with cte as (<your query here>)
select . . . ,
datediff(day, prev.txdate , cte.txdate)
from cte cross apply
(select top 1 cte2.*
from cte cte2
where cte2.xnumber < cte.xnumber
order by cte2.xnumber desc
) prev

Work Around for SQL Query 'NOT IN' that takes forever?

I am trying to run a query on an Oracle 10g DB to try and view 2 groups of transactions. I want to view basically anyone who has a transaction this year (2014) that also had a transaction in the previous 5 years. I then want to run a query for anyone who has a transaction this year (2014) that hasn't ordered from us in the last 5 years. I assumed I could do this with the 'IN' and 'NOT IN' features. The 'IN' query runs fine but the 'NOT IN' never completes. DB is fairly large which is probably why. Would love any suggestions from the experts!
*Notes, [TEXT] is a description of our Customer's Company name, sometimes the accounting department didn't tie this to our customer ID which left NULL values, so using TEXT as my primary grouping seemed to work although the name is obscure. CODE_D is a product line just to bring context to the name.
Below is my code:
SELECT CODE_D, sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
FROM
gen_led_voucher_row_tab
WHERE ACCOUNTING_YEAR like '2014'
and TEXT NOT IN
(select TEXT
from gen_led_voucher_row_tab
and voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
)
GROUP BY CODE_D
ORDER BY TOTAL DESC
Try using a LEFT JOIN instead of NOT IN:
SELECT t1.CODE_D, sum(coalesce(t1.credit_amount, 0) - coalesce(t1.debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab AS t1
LEFT JOIN gen_led_voucher_row_tab AS t2
ON t1.TEXT = t2.TEXT
AND t2.voucher_date >= '01-JUN-09'
AND t2.voucher_date < '01-JUN-14'
AND (credit_amount > '1' or debet_amount > '1')
WHERE t2.TEXT IS NULL
AND t1.ACCOUNTING_YEAR = '2014'
GROUP BY CODE_D
ORDER BY TOTAL DESC
ALso, make sure you have an index on the TEXT column.
You can increase your performance by changing the Not In clause to a Where Not Exists like as follows:
Where Not Exists
(
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text
)
You'll need to alias the first table as well to a for this to work. Essentially, you're pulling back a ton of data to just discard it. Exists invokes a Semi Join which does not pull back any data at all, so you should see significant improvement.
Edit
Your query, as of the current update to the question should be this:
SELECT CODE_D,
sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab a
Where ACCOUNTING_YEAR like '2014'
And Not Exists
(
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text
)
GROUP BY CODE_D
ORDER BY TOTAL DESC

Group By column throwing off query

I have a query that checks a database to see if a customer has visited multiple times a day. If they have it counts the number of visits, and then tells me what times they visited. The problem is it throws "Tickets.lcustomerid" into the group by clause, causing me to miss 5 records (Customers without barcodes). How can I change the below query to remove "tickets.lcustomerid" from the group by clause... If I remove it I get an error telling me "Tickets.lCustomerID" is not a valid select because it's not part of an aggregate or groupby clause.
The Query that works:
SELECT Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME) AS dtCreatedDate, COUNT(Customers.sBarcode) AS [Number of Scans],
MAX(Customers.sLastName) AS LastName
FROM Tickets INNER JOIN
Customers ON Tickets.lCustomerID = Customers.lCustomerID
WHERE (Tickets.dtCreated BETWEEN #startdate AND #enddate) AND (Tickets.dblTotal <= 0)
GROUP BY Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME)
HAVING (COUNT(*) > 1)
ORDER BY dtCreatedDate
The Output is:
sBarcode dtcreated Date Number of Scans slastname
1234 1/4/2013 12:00:00 AM 2 Jimbo
1/5/2013 12:00:00 AM 3 Jimbo2
1578 1/6/2013 12:00:00 AM 3 Jimbo3
My current Query with the subquery
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS sub
WHERE ( lcustomerid = tickets.lcustomerid )
AND ( dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME),
tickets.lcustomerid
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
The Current output (notice the record without a barcode is missing) is:
sBarcode dtcreated Date Number of Scans slastname Times Scanned
1234 1/4/2013 12:00:00 AM 2 Jimbo 12:00PM, 1:00PM
1578 1/6/2013 12:00:00 AM 3 Jimbo3 03:05PM, 1:34PM
UPDATE: Based on our "chat" it seems that customerid is not the unique field but barcode is, even though customer id is the primary key.
Therefore, in order to not GROUP BY customer id in the subquery you need to join to a second customers table in there in order to actually join on barcode.
Try this:
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS subticket
inner join
customers as subcustomers
on
subcustomers.lcustomerid = subticket.lcustomerid
WHERE ( subcustomers.sbarcode = customers.sbarcode )
AND ( subticket.dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME)
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
I can't directly solve your problem because I don't understand your data model or what you are trying to accomplish with this query. However, I can give you some advice on how to solve the problem yourself.
First do you understand exactly what you are trying to accomplish and how the tables fit together? If so move on to the next step, if not, get this knowledge first, you cannot do complex queries without this understanding.
Next break up what you are trying to accomplish in little steps and make sure you have each covered before moving to the rest. So in your case you seem to be missing some customers. Start with a new query (I'm pretty sure this one has more than one problem). So start with the join and the where clauses.
I suspect you may need to start with customers and left join to tickets (which would move the where conditions to the left joins as they are on tickets). This will get you all the customers whether they have tickets or not. If that isn't what you want, then work with the jon and the where clasues (and use select * while you are trying to figure things out) until you are returning the exact set of customer records you need. The reason why you use select * at this stage is to see what in the data may be causeing the problem you are having. That may tell you how to fix.
Usually I start with a the join and then add in the where clasues one at a time until I know I am getting the right inital set of records. If you have multiple joins, do them one at time to know when you suddenly start have more or less records than you would expect.
Then go into the more complex parts. Add each in one at a time and check the results. If you suddenly go from 10 records to 5 or 15, then you have probably hit a problem. When you work one step at a time and run into a problem, you know exactly what caused the problem making it much easier to find and fix.
Group BY is important to understand thoroughly. You must have every non-aggregated field in the group by or it will not work. Think of this as law like the law of gravity. It is not something you can change. However it can be worked around through the use of derived tables or CTEs. Please read up on those a bit if you don't know what they are, they are very useful techniques when you get into complex stuff and you shoud understand them thoroughly. I suspect you will need to use the derived table approach here to group on only the things you need and then join that derived table to the rest of teh query to get the ontehr fields. I'll show a simple example:
select
t1.table1id
, t1.field1
, t1.field2
, a.field3
, a.MostRecentDate
From table1 t1
JOIN
(select t1.table1id, t2.field3, max (datefield) as MostRecentDate
from table1 t1
JOin Table2 t2 on t1.table1id = t2.table1id
Where t2.field4 = 'test'
group by t1.table1id,t2.field3) a
ON a.table1id = t1.table1id
Hope this approach helps you solve this problem.