Query to select appropriate feedback record based on dates - sql

I understand there are many questions and answers outlining how to query data with relation to dates stored in SQL. My question, while in the same vein, is not something I've been able to find a solution for.
There is a SolutionRevisions table with three relevant columns: SR_ID (primary), SR_SolutionID, and SR_CreatedDate.
There is a Feedback table with F_ID, F_ItemID (maps to SR_SolutionID) and F_CreatedDate, F_SubItemID (currently empty).
My goal is to identify which SolutionRevisions record was the active when a feedback item was created and insert the SR_ID value into F_SubItemID column. This could be done by choosing the revision record created most recently before the feedback item.
To begin, I've began by trying to select back appropriate values for testing and verification. What I have looks like this:
SELECT sr.SR_ID, f.F_ItemID, sr.SR_CreatedDateUtc, f.F_CreatedDateUtc
FROM Feedback AS f
INNER JOIN SolutionRevisions AS sr on f.F_ItemID = sr.SR_SolutionID
WHERE f.F_CreatedDateUtc > sr.SR_CreatedDateUtc
This is obviously missing a necessary portion of the where clause (thus returning inaccurate results). The where clause needs to include (in pseudo-code)
AND f.F_CreatedDateUtc < nextSRCreatedDate
^ where "nextSRCreatedDate" represents the sr.SR_CreatedDate of the next record on the SolutionRevisions table for the same SR_SolutionID. My issue lies in getting back the correct value for nextSRCreatedDate
Example data:
Any help is appreciated, if more info is requested I'll promptly reply.

Try this approach instead. It uses a subselect to determine the most recent record in the SolutionRevisions table at the time any individual Feedback record was created.
It's probably not the fastest solution (pun intended), but it should get you the desired results. If you're able to alter the subselect to a hit a better index, that would of course help with the execution time.
Update
f
Set
f.F_SubItemID = (
Select Top 1
sr.SR_ID
From
SolutionRevisions sr
Where
sr.SR_SolutionID = f.F_ItemID
And sr.SR_CreatedDateUtc < f.F_CreatedDateUtc
Order By
sr.SR_CreatedDateUtc Desc
)
From
Feedback f

My goal is to identify which SolutionRevisions record was the active when a feedback item was created and insert the SR_SolutionID value into F_SubItemID column.
I think you want outer apply:
SELECT sr.SR_ID, f.F_ItemID, sr.SR_CreatedDateUtc, f.F_CreatedDateUtc
FROM Feedback f OUTER APPLY
(SELECT sr.*
FROM SolutionRevisions sr
WHERE f.F_ItemID = sr.SR_SolutionID AND
f.F_CreatedDateUtc >= sr.SR_CreatedDateUtc
ORDER BY SR_CreatedDateUtc DESC
) sr;

Related

SQL - LEFT JOIN and WHERE statement to show just first row

I read many threads but didn't get the right solution to my problem. It's comparable to this Thread
I have a query, which gathers data and writes it per shell script into a csv file:
SELECT
'"Dose History ID"' = d.dhs_id,
'"TxFieldPoint ID"' = tp.tfp_id,
'"TxFieldPointHistory ID"' = tph.tph_id,
...
FROM txfield t
LEFT JOIN txfielpoint tp ON t.fld_id = tp.fld_id
LEFT JOIN txfieldpoint_hst tph ON fh.fhs_id = tph.fhs_id
...
WHERE d.dhs_id NOT IN ('1000', '10000')
AND ...
ORDER BY d.datetime,...;
This is based on an very big database with lots of tables and machine values. I picked my columns of interest and linked them by their built-in table IDs. Now I have to reduce my result where I get many rows with same values and just the IDs are changed. I just need one(first) row of "tph.tph_id" with the mechanics like
WHERE "Rownumber" is 1
or something like this. So far i couldn't implement a proper subquery or use the ROW_NUMBER() SQL function. Your help would be very appreciated. The Result looks like this and, based on the last ID, I just need one row for every og this numbers (all IDs are not strictly consecutive).
A01";261511;2843119;714255;3634457;
A01";261511;2843113;714256;3634457;
A01";261511;2843113;714257;3634457;
A02";261512;2843120;714258;3634464;
A02";261512;2843114;714259;3634464;
....
I think "GROUP BY" may suit your needs.
You can group rows with the same values for a set of columns into a single row

SQL query seems to work for 'AND T1.email_address_ IN (subquery)', but returns 0 rows for 'AND T1.email_address_ NOT IN (subquery)'

Good morning. I'm working in Responsys Interact, which is an Oracle-based email campaign management type SAAS product. I'm creating a query to basically filter a target list for an email campaign designed to target a specific sub-set of our master email contact list. Here's the query I created a few weeks ago that appears to work:
/*
Table Symbolic Name
CONTACTS_LIST $A$
Engaged $B$
TRANSACTIONS_RAW $C$
TRANSACTION_LINES_RAW $D$
-- A Responsys Filter (Engaged) will return only an RIID_, nothing else, according to John # Responsys....so,....let's join on that to contact list...
*/
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$A$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
;
This seems to work, in that if I run the query without the sub-query, we get 720K rows; and if I add back the 'AND NOT IN...' subquery, we get about 700K rows, which appears correct based on what my user knows about her data. What I'm (supposedly) doing with the NOT IN subquery is filtering out any email addresses where the customer has purchased certain items from us in the last 60 days.
So, now I need to add in another constraint. We still don't want customers who made certain purchases in the last 60 days as above, but now also we want to exclude customers who have purchased another particular item, but now within the last 12 months. So, I thought I would add another subquery, as shown below. Now, this has introduced several problems:
Performance - the query, which took a couple minutes to run before, now takes quite a few more minutes to run - in fact it seems to time out....
So, I wondered if there's an issue having two subqueries, but before I went to think about alternatives to this, I decided to test my new subquery by temporarily deleting the first subquery, so that I had just one subquery similar to above, but with the new item = 11 and within the last 12 months logic. And so with this, the query finally returned after a few minutes now, but with zero rows.
Trying to figure out why, I tried simply changing the AND NOT IN (subquery) to AND IN (subquery), and that worked, in that it returned a few thousand rows, as expected.
So why would the same SQL when using AND IN (subquery) "work", but the exact same SQL simply changed to AND NOT IN (subquery) return zero rows, instead of what I would expect which would be my 700 something thousdand plus rows, less the couple thousand encapsulated by the subquery result?
Also, what is the best i.e. most performant way to accomplish what I'm trying to do, which is filter by some purchases made within one date range, AND by some other purchases made within a different date range?
Here's the modified version:
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
AND
$A$.EMAIL_ADDRESS_ NOT IN
(
/* get purchase transactions for another type of item within last year */
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$D$.ITEM_FAMILY_ID = 11 AND $C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -12)
)
;
Thanks for any ideas/insights. I may be missing or mis-remembering some basic SQL concept here - if so please help me out! Also, Responsys Interact runs on top of Oracle - it's an Oracle product - but I don't know off hand what version/flavor. Thanks!
Looks like my problem with the new subquery was due to poor performance due to lack of indexes. Thanks to Alex Poole's comments, I looked in Responsys and there is a facility to get an 'explain' type analysis, and it was throwing warnings, and suggesting I build some indexes. Found the way to do that on the data sources, went back to the explain, and it said, "The query should run without placing an unnecessary burden on the system". And while it still ran for quite a few minutes, it did finally come back with close to the expected number of rows.
Now, I'm on to tackle the other half of the issue, which is to now incorporate this second sub-query in addition to the first, original subquery....
Ok, upon further testing/analysis and refining my stackoverflow search critieria, the answer to the main part of my question dealing with the IN vs. NOT IN can be found here: SQL "select where not in subquery" returns no results
My performance was helped by using Responsys's explain-like feature and adding some indexes, but when I did that, I also happened to add in a little extra SQL in my sub-query's WHERE clause.... when I removed that, even after indexes built, I was back to zero rows returned. That's because as it turned out at least one of the transactions rows for the item family id I was interested in for this additional sub-query had a null value for email address. And as further explained in the link above, when using NOT IN, as soon as you have a null value involved, SQL can't definitively say it's NOT IN, since you can't really compare to null, so as soon as you have a null, the sub-query's going to evaluate 'false', thus zero rows. When using IN, even though there are nulls present, if you get one positive match, well, that's a match, so the sub-query returns 'true', so that's why you'll get rows with IN, but not with NOT IN. I hadn't realized that some of our transaction data may have null email addresses - now I know, so I just added a where not null to the where clause for the email address, and now all's good.

Selecting top n Oracle records with ROWNUM still valid in subquery?

I have the following FireBird query:
update hrs h
set h.plan_week_id=
(select first 1 c.plan_week_id from calendar c
where c.calendar_id=h.calendar_id)
where coalesce(h.calendar_id,0) <> 0
(Intention: For records in hrs with a (non-zero) calendar_id
take calendar.plan_week_id and put it in hrs.plan_week_id)
The trick to select the first record in Oracle is to use WHERE ROWNUM=1, and if understand correctly I do not have to use ROWNUM in a separate outer query because I 'only' match ROWNUM=1 - thanks SO for suggesting Questions that may already have your answer ;-)
This would make it
update hrs h
set h.plan_week_id=
(select c.plan_week_id from calendar c
where (c.calendar_id=h.calendar_id) and (rownum=1))
where coalesce(h.calendar_id,0) <> 0
I'm actually using the 'first record' together with the selection of only one field to guarantee that I get one value back which can be put into h.plan_week_id.
Question: Will the above query work under Oracle as intended?
Right now, I do not have a filled Oracle DB at hand to run the query on.
Like Nicholas Krasnov said, you can test it in SQL Fiddle.
But if you ever find yourself about to use where rownum = 1 in a subquery, alarm bells should go off, because in 90% of the cases you are doing something wrong. Very rarely will you need a random value. Only when all selected values are the same, a rownum = 1 is valid.
In this case I expect calendar_id to be a primary key in calendar. Therefor each record in hrs can only have 1 plan_week_id selected per record. So the where rownum = 1 is not required.
And to answer your question: Yes, it will run just fine. Though the brackets around each where clause are also not required and in fact only confusing (me).

SQL Output Question

Edited
I am running into an error and I know what is happening but I can't see what is causing it. Below is the sql code I am using. Basically I am getting the general results I want, however I am not accurately giving the query the correct 'where' clause.
If this is of any assistance. The count is coming out as this:
Total Tier
1 High
2 Low
There are 4 records in the Enrollment table. 3 are active, and 1 is not. Only 2 of the records should be displayed. 1 for High, and 1 for low. The second Low record that is in the total was flagged as 'inactive' on 12/30/2010 and reflagged again on 1/12/2011 so it should not be in the results. I changed the initial '<=' to '=' and the results stayed the same.
I need to exclude any record from Enrollments_Status_Change that where the "active_status" was changed to 0 before the date.
SELECT COUNT(dbo.Enrollments.Customer_ID) AS Total,
dbo.Phone_Tier.Tier
FROM dbo.Phone_Tier as p
JOIN dbo.Enrollments as eON p.Phone_Model = e.Phone_Model
WHERE (e.Customer_ID NOT IN
(Select Customer_ID
From dbo.Enrollment_Status_Change as Status
Where (Change_Date >'12/31/2010')))
GROUP BY dbo.Phone_Tier.Tier
Thanks for any assistance and I apologize for any confusion. This is my first time here and i'm trying to correct my etiquette on the fly.
If you don't want any of the fields from that table dbo.Enrollment_Status_Change, and you don't seem to use it in any way — why even include it in the JOINs? Just leave it out.
Plus: start using table aliases. This is very hard to read if you use the full table name in each JOIN condition and WHERE clause.
Your code should be:
SELECT
COUNT(e.Customer_ID) AS Total, p.Tier
FROM
dbo.Phone_Tier p
INNER JOIN
dbo.Enrollments e ON p.Phone_Model = e.Phone_Model
WHERE
e.Active_Status = 1
AND EXISTS (SELECT DISTINCT Customer_ID
FROM dbo.Enrollment_Status_Change AS Status
WHERE (Change_Date <= '12/31/2010'))
GROUP BY
p.Tier
Also: most likely, your EXISTS check is wrong — since you didn't post your table structures, I can only guess — but my guess would be:
AND EXISTS (SELECT * FROM dbo.Enrollment_Status_Change
WHERE Change_Date <= '12/31/2010' AND CustomerID = e.CustomerID)
Check for existence of any entries in dbo.Enrollment_Status_Change for the customer defined by e.CustomerID, with a Change_Date before that cut-off date. Right?
Assuming you want to:
exclude all customers whose latest enrollment_status_change record was since the start of 2011
but
include all customers whose latest enrollment_status_change record was earlier than the end of 2010 (why else would you have put that EXISTS clause in?)
Then this should do it:
SELECT COUNT(e.Customer_ID) AS Total,
p.Tier
FROM dbo.Phone_Tier p
JOIN dbo.Enrollments e ON p.Phone_Model = e.Phone_Model
WHERE dbo.Enrollments.Active_Status = 1
AND e.Customer_ID NOT IN (
SELECT Customer_ID
FROM dbo.Enrollment_Status_Change status
WHERE (Change_Date >= '2011-01-01')
)
GROUP BY p.Tier
Basically, the problem with your code is that joining a one-to-many table will always increase the row count. If you wanted to exclude all the records that had a matching row in the other table this would be fine -- you could just use a LEFT JOIN and then set a WHERE clause like Customer_ID IS NULL.
But because you want to exclude a subset of the enrollment_status_change table, you must use a subquery.
Your intention is not clear from the example given, but if you wanted to exclude anyone who's enrollment_status_change as before 2011, but include those who's status change was since 2011, you'd just swap the date comparator for <.
Is this any help?

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate