Access Query Help - Get records within a timeframe of another record - sql

I'm looking for a way of querying a table to get events of a certain type, and all events that happen within the time-frame of the criteria event for the same person. That probably sounded like nonsense. Consider the following;
Imagine I want to get all "SHIFT"s for each person (A person could have multiple shifts per day) and it's associated breaks (But there could be other things as well) a way to query within a date range would be good as well. Eventually I'm going to be working with years worth of data, not all of which is necessary to everybody.
This example would return the first three rows, plus the last two. Row 5 is a BREAK, but it doesn't occur within a SHIFT for person 1.
I would love to provide some code but I honestly can't even think where to start with this one. I guess I'd need a sub query? Any help would be greatly appreciated!
I'm mostly using access 2003 so responses geared towards that would be ideal.

The way you've described the problem, it appears you want the shifts and related breaks as separate rows. To do this you can use union all to combine the two different types. A correlated sub query lets you find breaks that occur during shifts.
Select
*
From
Events
Where
Event_Name = 'SHIFT'
Union All
Select
*
From
Events e1
Where
Event_Name = 'BREAK' And
Exists (
Select
'x'
From
Events e2 -- find corresponding shift for break
Where
e1.Event_Owner = e2.Event_Owner And
e2.Event_Name = 'SHIFT' And
e1.Event_Start >= e2.Event_Start And
e1.Event_End <= e2.Event_End
)

Related

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.

SQL counting number of rows

I am looking for a way to search for a certain number of rows as a quality check. For example, we have tables that have a certain set of results that are needed.
Here is a quick table for an example:
ID: Name: Result: Reportable:
ONE A 10 X
TWO B 12 X
THREE C 1
FOUR D 18 X
FOUR(redo) D 11 X
So we are looking to double check results as there are people who accidentally report results multiple times (as in the case with ID FOUR). We have used having counts but we need the numbers to be specific and need a query to verify that number is satisfied.
In the table above we only want IDs ONE, TWO, and FOUR, however we have 4 results (one extra). Currently we have our check showing the count needed (ie 3) and the current result count (4) to show the mismatch but want a query to easily only show the result needed. We would need the redo result most of the time so we have set it so we take the latest date, but it doesn't help filter how many rows or results. I apologize if anything is confusing and I am not able to share the SQL query that we have currently. It's my first time posting so if I need to clarify anything please let me know as this seems to be very complicated. Thank you for your time.
EDIT: The details
We have one table (Table A) letting us know which results are reportable. The ones that are reportable go into another table (Table B). We have had issues in which people have made too many results reportable which overpopulates the Table B. Our old query had a count in Table B, but due to mistakes in people placing multiple reportables, samples which had many redos seem to be finished as they were all placed and met the count in Table B.
So now by using the Table A that helps tell us how many are Reportable, we want this to double check that the samples are indeed ready.
As I understand the question, you want ids that have multiple reportables. Assuming you really mean name, then:
select name
from t
where reportable = 'X'
group by name
having count(*) >= 2;

SQL query seems to work for 'AND T1.email_address_ IN (subquery)', but returns 0 rows for 'AND T1.email_address_ NOT IN (subquery)'

Good morning. I'm working in Responsys Interact, which is an Oracle-based email campaign management type SAAS product. I'm creating a query to basically filter a target list for an email campaign designed to target a specific sub-set of our master email contact list. Here's the query I created a few weeks ago that appears to work:
/*
Table Symbolic Name
CONTACTS_LIST $A$
Engaged $B$
TRANSACTIONS_RAW $C$
TRANSACTION_LINES_RAW $D$
-- A Responsys Filter (Engaged) will return only an RIID_, nothing else, according to John # Responsys....so,....let's join on that to contact list...
*/
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$A$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
;
This seems to work, in that if I run the query without the sub-query, we get 720K rows; and if I add back the 'AND NOT IN...' subquery, we get about 700K rows, which appears correct based on what my user knows about her data. What I'm (supposedly) doing with the NOT IN subquery is filtering out any email addresses where the customer has purchased certain items from us in the last 60 days.
So, now I need to add in another constraint. We still don't want customers who made certain purchases in the last 60 days as above, but now also we want to exclude customers who have purchased another particular item, but now within the last 12 months. So, I thought I would add another subquery, as shown below. Now, this has introduced several problems:
Performance - the query, which took a couple minutes to run before, now takes quite a few more minutes to run - in fact it seems to time out....
So, I wondered if there's an issue having two subqueries, but before I went to think about alternatives to this, I decided to test my new subquery by temporarily deleting the first subquery, so that I had just one subquery similar to above, but with the new item = 11 and within the last 12 months logic. And so with this, the query finally returned after a few minutes now, but with zero rows.
Trying to figure out why, I tried simply changing the AND NOT IN (subquery) to AND IN (subquery), and that worked, in that it returned a few thousand rows, as expected.
So why would the same SQL when using AND IN (subquery) "work", but the exact same SQL simply changed to AND NOT IN (subquery) return zero rows, instead of what I would expect which would be my 700 something thousdand plus rows, less the couple thousand encapsulated by the subquery result?
Also, what is the best i.e. most performant way to accomplish what I'm trying to do, which is filter by some purchases made within one date range, AND by some other purchases made within a different date range?
Here's the modified version:
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
AND
$A$.EMAIL_ADDRESS_ NOT IN
(
/* get purchase transactions for another type of item within last year */
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$D$.ITEM_FAMILY_ID = 11 AND $C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -12)
)
;
Thanks for any ideas/insights. I may be missing or mis-remembering some basic SQL concept here - if so please help me out! Also, Responsys Interact runs on top of Oracle - it's an Oracle product - but I don't know off hand what version/flavor. Thanks!
Looks like my problem with the new subquery was due to poor performance due to lack of indexes. Thanks to Alex Poole's comments, I looked in Responsys and there is a facility to get an 'explain' type analysis, and it was throwing warnings, and suggesting I build some indexes. Found the way to do that on the data sources, went back to the explain, and it said, "The query should run without placing an unnecessary burden on the system". And while it still ran for quite a few minutes, it did finally come back with close to the expected number of rows.
Now, I'm on to tackle the other half of the issue, which is to now incorporate this second sub-query in addition to the first, original subquery....
Ok, upon further testing/analysis and refining my stackoverflow search critieria, the answer to the main part of my question dealing with the IN vs. NOT IN can be found here: SQL "select where not in subquery" returns no results
My performance was helped by using Responsys's explain-like feature and adding some indexes, but when I did that, I also happened to add in a little extra SQL in my sub-query's WHERE clause.... when I removed that, even after indexes built, I was back to zero rows returned. That's because as it turned out at least one of the transactions rows for the item family id I was interested in for this additional sub-query had a null value for email address. And as further explained in the link above, when using NOT IN, as soon as you have a null value involved, SQL can't definitively say it's NOT IN, since you can't really compare to null, so as soon as you have a null, the sub-query's going to evaluate 'false', thus zero rows. When using IN, even though there are nulls present, if you get one positive match, well, that's a match, so the sub-query returns 'true', so that's why you'll get rows with IN, but not with NOT IN. I hadn't realized that some of our transaction data may have null email addresses - now I know, so I just added a where not null to the where clause for the email address, and now all's good.

SQL Selecting records where one date range doesn't intersect another

I'm trying to write a simple reservation program for a campground.
I have a table for campsites (one record for every site available at the campground).
I have a table for visitors which uses the campsite table's id as a foreign key, along with a check in date and check out date.
What I need to do is gather a potential check in and check out date from the user and then gather all the campsites that are NOT being used at any point in that range of dates.
I think I'm close to the solution but there's one piece I seem to be missing.
I'm using 2 queries.
1) Gather all the campsites that are occupied during that date range.
2) Gather all campsites that are not in query 1.
This is my first query:
SELECT Visitors.CampsiteID, Visitors.CheckInDate, Visitors.CheckOutDate
FROM Visitors
WHERE (((Visitors.CheckInDate)>=#CHECKINDATE#
And (Visitors.CheckInDate)<=#CHECKOUTDATE#)
Or ((Visitors.CheckOutDate)>=#CHECKINDATE#
And (Visitors.CheckOutDate)<=CHECKOUTDATE));
I think I'm missing something. If the #CHECKINDATE# and #CHECKOUTDATE# both occur between someone else's Check-in and Check-out dates, then this doesn't catch it.
I know I could split this between two queries, where one is dealing with just the #CHECKINDATE# and the second is dealing with the #CHECKOUTDATE#, but I figure there's a cleaner way to do this and I'm just not coming up with it.
This is my second one, which I think is fine the way it is:
SELECT DISTINCT Campsites.ID, qryCampS_NotAvailable.CampsiteID
FROM Campsites LEFT JOIN qryCampS_NotAvailable
ON Campsites.ID = qryCampS_NotAvailable.CampsiteID
WHERE (((qryCampS_NotAvailable.CampsiteID) Is Null));
Thanks,
Charles
To get records that overlap with the requested time period, use this simple logic. Two time periods overlap when one starts before the other ends and the other ends after the first starts:
SELECT v.CampsiteID, v.CheckInDate, v.CheckOutDate
FROM Visitors v
WHERE v.CheckInDate <= #CHECKOUTDATE# and
v.CheckOutDate >= #CHECKINDATE# ;

What is an unbounded query?

Is an unbounded query a query without a WHERE param = value statement?
Apologies for the simplicity of this one.
An unbounded query is one where the search criteria is not particularly specific, and is thus likely to return a very large result set. A query without a WHERE clause would certainly fall into this category, but let's consider for a moment some other possibilities. Let's say we have tables as follows:
CREATE TABLE SALES_DATA
(ID_SALES_DATA NUMBER PRIMARY KEY,
TRANSACTION_DATE DATE NOT NULL
LOCATION NUMBER NOT NULL,
TOTAL_SALE_AMOUNT NUMBER NOT NULL,
...etc...);
CREATE TABLE LOCATION
(LOCATION NUMBER PRIMARY KEY,
DISTRICT NUMBER NOT NULL,
...etc...);
Suppose that we want to pull in a specific transaction, and we know the ID of the sale:
SELECT * FROM SALES_DATA WHERE ID_SALES_DATA = <whatever>
In this case the query is bounded, and we can guarantee it's going to pull in either one or zero rows.
Another example of a bounded query, but with a large result set would be the one produced when the director of district 23 says "I want to see the total sales for each store in my district for every day last year", which would be something like
SELECT LOCATION, TRUNC(TRANSACTION_DATE), SUM(TOTAL_SALE_AMOUNT)
FROM SALES_DATA S,
LOCATION L
WHERE S.TRANSACTION_DATE BETWEEN '01-JAN-2009' AND '31-DEC-2009' AND
L.LOCATION = S.LOCATION AND
L.DISTRICT = 23
GROUP BY LOCATION,
TRUNC(TRANSACTION_DATE)
ORDER BY LOCATION,
TRUNC(TRANSACTION_DATE)
In this case the query should return 365 (or fewer, if stores are not open every day) rows for each store in district 23. If there's 25 stores in the district it'll return 9125 rows or fewer.
On the other hand, let's say our VP of Sales wants some data. He/she/it isn't quite certain what's wanted, but he/she/it is pretty sure that whatever it is happened in the first six months of the year...not quite sure about which year...and not sure about the location, either - probably in district 23 (he/she/it has had a running feud with the individual who runs district 23 for the past 6 years, ever since that golf tournament where...well, never mind...but if a problem can be hung on the door of district 23's director so be it!)...and of course he/she/it wants all the details, and have it on his/her/its desk toot sweet! And thus we get a query that looks something like
SELECT L.DISTRICT, S.LOCATION, S.TRANSACTION_DATE,
S.something, S.something_else, S.some_more_stuff
FROM SALES_DATA S,
LOCATIONS L
WHERE EXTRACT(MONTH FROM S.TRANSACTION_DATE) <= 6 AND
L.LOCATION = S.LOCATION
ORDER BY L.DISTRICT,
S.LOCATION
This is an example of an unbounded query. How many rows will it return? Good question - that depends on how business conditions were, how many location were open, how many days there were in February, etc.
Put more simply, if you can look at a query and have a pretty good idea of how many rows it's going to return (even though that number might be relatively large) the query is bounded. If you can't, it's unbounded.
Share and enjoy.
http://hibernatingrhinos.com/Products/EFProf/learn#UnboundedResultSet
An unbounded result set is where a query is performed and does not explicitly limit the number of returned results from a query. Usually, this means that the application assumes that a query will always return only a few records. That works well in development and in testing, but it is a time bomb waiting to explode in production.
The query may suddenly start returning thousands upon thousands of rows, and in some cases, it may return millions of rows. This leads to more load on the database server, the application server, and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.
Here is one example of a query that will trigger the unbounded result set warning:
var query = from post in blogDataContext.Posts
where post.Category == "Performance"
select post;
If the performance category has many posts, we are going to load all of them, which is probably not what was intended. This can be fixed fairly easily by using pagination by utilizing the Take() method:
var query = (from post in blogDataContext.Posts
where post.Category == "Performance"
select post)
.Take(15);
Now we are assured that we only need to handle a predictable, small result set, and if we need to work with all of them, we can page through the records as needed. Paging is implemented using the Skip() method, which instructs Entity Framework to skip (at the database level) N number of records before taking the next page.
But there is another common occurrence of the unbounded result set problem from directly traversing the object graph, as in the following example:
var post = postRepository.Get(id);
foreach (var comment in post.Comments)
{
// do something interesting with the comment
}
Here, again, we are loading the entire set without regard for how big the result set may be. Entity Framework does not provide a good way of paging through a collection when traversing the object graph. It is recommended that you would issue a separate and explicit query for the contents of the collection, which will allow you to page through that collection without loading too much data into memory.