SQL not yielding expected results

SQL not yielding expected results - sql

I have three tables related to this particular query:
Lawson_Employees: LawsonID (pk), LastName, FirstName, AccCode (numeric)
Lawson_DeptInfo: AccCode (pk), AccCode2 (don't ask, HR set up), DisplayName
tblExpirationDates: EmpID (pk), ACLS (date), EP (date), CPR (date), CPR_Imported (date), PALS (date), Note
The goal is to get the data I need to report on all those who have already expired in one or more certification, or are going to expire in the next 90 days.
Some important notes:
This is being run as part of a vbScript, so the 90-day date is being calculated when the script is run. I'm using 2010-08-31 as a placeholder since its the result at the time this question is being posted.
All cards expire at the end of the month. (which is why the above date is for the end of August and not 90 days on the dot)
A valid EP card supersedes ACLS certification, but only the latter is required of some employees. (wasn't going to worry about it until I got this question answered, but if I can get the help I'll take it)
The CPR column contains the expiration date for the last class they took with us. (NULL if they didn't take any classes with us)
The CPR_Imported column contains the expiration date for the last class they took somewhere else. (NULL if they didn't take it elsewhere, and bravo for following policy)
The distinction between CPR classes is important for other reports. For purposes of this report, all we really care about is which one is the most current - or at least is currently current.
If I have to, I'll ignore ACLS and PALS for the time being as it is non-compliance with CPR training that is the big issue at the moment. (not that the others won't be, but they weren't mentioned in the last meeting...)
Here's the query I have so far, which is giving me good data:
SELECT
iEmp.LawsonID, iEmp.LastName, iEmp.FirstName,
dept.AccCode2, dept.DisplayName,
Exp.ACLS, Exp.EP, Exp.CPR, Exp.CPR_Imported, Exp.PALS, Exp.Note
FROM (Lawson_Employees AS iEmp
LEFT JOIN Lawson_DeptInfo AS dept ON dept.AccCode = iEmp.AccCode)
LEFT JOIN tblExpirationDates AS Exp ON iEmp.LawsonID = Exp.EmpID
WHERE iEmp.CurrentEmp = 1
AND ((Exp.ACLS <= #2010-08-31#
AND Exp.ACLS IS NOT NULL)
OR (Exp.CPR <= #2010-08-31#
AND Exp.CPR_Imported <= #2010-08-31#)
OR (Exp.PALS <= #2010-08-31#
AND Exp.PALS IS NOT NULL))
ORDER BY dept.AccCode2, iEmp.LastName, iEmp.FirstName;
After perusing the result set, I think I'm missing some expiration dates that should be in the result set. Am I missing something? This is the sucky part of being the only developer in the department... no one to ask for a little help.

I think the problem is here:
OR (Exp.CPR <= #2010-08-31#
AND Exp.CPR_Imported <= #2010-08-31#)
Null is not less than or greater than anything.
If you need the people who do not have a valid CPR, you will need to include nulls.
It may be easiest to use Nz:
OR (Nz(Exp.CPR,#2010-08-31#) <= #2010-08-31#
AND Nz(Exp.CPR_Imported,#2010-08-31#) <= #2010-08-31#)

Related

SQL question with attempt on customer information

Schema
Question: List all paying customers with users who had 4 or 5 activities during the week of February 15, 2021; also include how many of the activities sent were paid, organic and/or app store. (i.e. include a column for each of the three source types).
My attempt so far:
SELECT source_type, COUNT(*)
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
I would like to get a second opinion on it. I didn't include the accounts table because I don't believe that I need it for this query, but I could be wrong.

Have you tried to run this? It doesn't satisfy the brief on FOUR counts:
List all the ... customers (that match criteria)
There is no customer information included in the results at all, so this is an outright fail.
paying customers
This is the top level criteria, only customers that are not free should be included in the results.
Criteria: users who had 4 or 5 activities
There has been no attempt to evaluate this user criteria in the query, and the results do not provide enough information to deduce it.
there is further ambiguity in this requirement, does it mean that it should only include results if the account has individual users that have 4 or 5 acitvities, or is it simply that the account should have 4 or 5 activities overall.
If this is a test question (clearly this is contrived, if it is not please ask for help on how to design a better schema) then the use of the term User is usually very specific and would suggest that you need to group by or otherwise make specific use of this facet in your query.
Bonus: (i.e. include a column for each of the three source types).
This is the only element that was attempted, as the data is grouped by source_type but the information cannot be correlated back to any specific user or customer.
Next time please include example data and the expected outcome with your post. In preparing the data for this post you would have come across these issues yourself and may have been inspired to ask a different question, or through the process of writing the post up you may have resolved the issue yourself.
without further clarification, we can still start to evolve this query, a good place to start is to exclude the criteria and focus on the format of the output. the requirement mentions the following output requirements:
List Customers
Include a column for each of the source types.
Firstly, even though you don't think you need to, the request clearly states that Customer is an important facet in the output, and in your schema account holds the customer information, so although we do not need to, it makes the data readable by humans if we do include information from the account table.
This is a standard PIVOT style response then, we want a row for each customer, presenting a count that aggregates each of the values for source_type. Most RDBMS will support some variant of a PIVOT operator or function, however we can achieve the same thing with simple CASE expressions to conditionally put a value into projected columns in the result set that match the values we want to aggregate, then we can use GROUP BY to evaluate the aggregation, in this case a COUNT
The following syntax is for MS SQL, however you can achieve something similar easily enough in other RBDMS
OP please tag this question with your preferred database engine...
NOTE: there is NO filtering in this query... yet
SELECT accounts.company_id
, accounts.company_name
, paid = COUNT(st_paid)
, organic = COUNT(st_organic)
, app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
GROUP BY accounts.company_id, accounts.company_name
This results in the following shape of result:
company_id
company_name
paid
organic
app_store
apl01
apples
4
8
0
ora01
oranges
6
12
0
Criteria
When you are happy with the shpe of the results and that all the relevant information is available, it is time to apply the criteria to filter this data.
From the requirement, the following criteria can be identified:
paying customers
The spec doesn't mention paying specifically, but it does include a note that (free customers have current_mrr = 0)
Now aren't we glad we did join on the account table :)
users who had 4 or 5 activities
This is very specific about explicitly 4 or 5 activities, no more, no less.
For the sake of simplicity, lets assume that the user facet of this requirement is not important and that is is simply a reference to all users on an account, not just users who have individually logged 4 or 5 activities on their own - this would require more demo data than I care to manufacture right now to prove.
during the week of February 15, 2021.
This one was correctly identified in the original post, but we need to call it out just the same.
OP has used Monday to Friday of that week, there is no mention that weeks start on a Monday or that they end on Friday but we'll go along, it's only the syntax we need to explore today.
In the real world the actual values specified in the criteria should be parameterised, mainly because you don't want to manually re-construct the entire query every time, but also to sanitise input and prevent SQL injection attacks.
Even though it seems overkill for this post, using parameters even in simple queries helps to identify the variable elements, so I will use parameters for the 2nd criteria to demonstrate the concept.
DECLARE #from DateTime = '2021-02-15' -- Date in ISO format
DECLARE #to DateTime = (SELECT DateAdd(d, 5, #from)) -- will match Friday: 2021-02-19
/* NOTE: requirement only mentioned the start date, not the end
so your code should also only rely on the single fixed start date */
SELECT accounts.company_id, accounts.company_name
, paid = COUNT(st_paid), organic = COUNT(st_organic), app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
WHERE -- paid accounts = exclude 'free' accounts
accounts.current_mrr > 0
-- Date range filter
AND activity_time BETWEEN #from AND #to
GROUP BY accounts.company_id, accounts.company_name
-- The fun bit, we use HAVING to apply a filter AFTER the grouping is evaluated
-- Wording was explicitly 4 OR 5, not BETWEEN so we use IN for that
HAVING COUNT(source_type) IN (4,5)

I believe you are missing some information there.
without more information on the tables, I can only guess that you also have a customer table. i am going to assume there is a customer_id key that serves as key between both tables
i would take your query and do something like:
SELECT customer_id,
COUNT() AS Total,
MAX(CASE WHEN source_type = "app" THEN "numoperations" END) "app_totals"),
MAX(CASE WHEN source_type = "paid" THEN "numoperations" END) "paid_totals"),
MAX(CASE WHEN source_type = "organic" THEN "numoperations" END) "organic_totals"),
FROM (
SELECT source_type, COUNT() AS num_operations
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
) tb1 GROUP BY customer_id
This is the most generic case i can think of, but does not scale very well. If you get new source types, you need to modify the query, and the structure of the output table also changes. Depending on the sql engine you are using (i.e. mysql vs microsoft sql) you could also use a pivot function.
The previous query is a little bit rough, but it will give you a general idea. You can add "ELSE" statements to the clause, to zero the fields when they have no values, and join with the customer table if you want only active customers, etc.

Query complex in Oracle SQL

I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.

Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.

How Do I Get Total 1 Time for Multiple Rows

I've been asked to modify a report (which unfortunately was written horribly!! not by me!) to include a count of days. Please note the "Days" is not calculated using "StartDate" & "EndDate" below. The problem is, there are multiple rows per record (users want to see the detail for start & enddate), so my total for "Days" are counting for each row. How can I get the total 1 time without the total in column repeating?
This is what the data looks like right now:
ID Description startdate enddate Days
REA145681 Emergency 11/17/2011 11/19/2011 49
REA145681 Emergency 12/6/2011 12/9/2011 49
REA145681 Emergency 12/10/2011 12/14/2011 49
REA146425 Emergency 11/23/2011 12/8/2011 54
REA146425 Emergency 12/9/2011 12/12/2011 54
I need this:
ID Description startdate enddate Days
REA145681 Emergency 11/17/2011 11/19/2011 49
REA145681 Emergency 12/6/2011 12/9/2011
REA145681 Emergency 12/10/2011 12/14/2011
REA146425 Emergency 11/23/2011 12/8/2011 54
REA146425 Emergency 12/9/2011 12/12/2011
Help please. This is how the users want to see the data.
Thanks in advance!
Liz
--- Here is the query simplified:
select id
,description
,startdate -- users want to see all start dates and enddates
,enddate
,days = datediff(d,Isnull(actualstardate,anticipatedstartdate) ,actualenddate)
from table

As you didn't provide the data of your tables I'll operate over your result as if it was a table. This will result in what you're looking for:
select *,
case row_number() over (partition by id order by id)
when 1 then days
end
from t
Edit:
Looks like you DID added some SQL code. This should be what you're looking for:
select *,
case row_number() over (partition by id order by id)
when 1 then
datediff(d,Isnull(actualstardate,anticipatedstartdate) ,actualenddate)
end
from t

That is a task for the reporting tool. You will have to write something like he next code in teh Display Properties of the Days field:
if RowNumber > 1 AND id = previous_row(id)
then -- hide the value of Days
Colour = BackgroundColour
Days = NULL
Days = ' '
Display = false
... (anything that works)

So they want the output to be exactly the same except that they don't want to see the days listed multiple times for each ID value? And they're quite happy to see the ID and Description repeatedly but the Days value annoys them?
That's not really an SQL question. SQL is about which rows, columns and derived values are supposed to be presented in what order and that part seems to be working fine.
Suppressing the redundant occurrences of the Days value is more a matter of using the right tool. I'm not up on the current tools but the last time I was, QMF was very good for this kind of thing. If a column was the basis for a control break, you could, in effect, select an option for that column that told it not to repeat the value of the control break repeatedly. That way, you could keep it from repeating ID, Description AND Days if that's what you wanted. But I don't know if people are still using QMF and I have no idea if you are. And unless the price has come way down, you don't want to go out and buy QMF just to suppress those redundant values.
Other tools might do the same kind of thing but I can't tell you which ones. Perhaps the tool you are using to do your reporting - Crystal Reports or whatever - has that feature. Or not. I think it was called Outlining in QMF but it may have a different name in your tool.
Now, if this report is being generated by an application program, that is a different kettle of Fish. An application could handle that quite nicely. But most people use end-user reporting tools to do this kind of thing to avoid the greater cost involved in writing programs.
We might be able to help further if you specify what tool you are using to generate this report.

SQL Scheduling - Overbooked Report

I need a way to view a given resource (in this case rooms/beds) that are overbooked. Here's my table structure. Sorry about the Hungarian notation:
tblRoom
--RoomID
tblBooking
--BookingID
--BeginDate
--EndDate
--AssignedRoomID (foreign key)
I don't have any non-working SQL to post here because I really don't know where to start. I'm using MS Access but I'm looking for a database agnostic solution if possible. It's OK to have to have to change some of the keywords to match the dialect of a given SQL engine but I'd like avoid using other features that are proprietary or only available in one RDBMS.
I realize that it's best to avoid overbooking from the beginning but that's not the point of this question.
In case it's helpful, I posted a related question a couple days ago about how to find resources that are not yet booked for a given data range. You can see that question here.
Edit1:
In reply to the answer below, I've modified your SQL slightly to make it work in Access as well as to be more accurate when it comes to detecting conflicts. If I err not your solution posted below allows some conflicts to go unnoticed but also shows conflicts when a given Booking's EndDate and a different Booking's BeginDate fall on the same day, which is actually allowable and should not show as a conflict. Am I understanding this correctly or am I missing something here?
SELECT
*
FROM
tblBooking AS booking
INNER JOIN
tblBooking AS conflict
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND (([conflict].BeginDate >= DateAdd("d", -1, [booking].BeginDate) AND [conflict].BeginDate < [booking].EndDate)
OR ([conflict].EndDate > [booking].BeginDate AND [conflict].EndDate < [booking].EndDate))
AND [conflict].BookingID <> [booking].BookingID

So, what you're looking for is any record in tblBooking for which there is another record with the same AssignRoomID for an overlapping period?
A naive solution would be...
SELECT
*
FROM
tblBooking [booking]
INNER JOIN
tblBooking [conflict]
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND [conflict].BeginDate <= [booking].EndDate
AND [conflict].EndDate >= [booking].BeginDate
AND [conflict].BookingID != [booking].BookingID
The last condition stops a booking from being it's own conflict. It can also be changed to AND [conflict].BookingID > [booking].BookingID so that you don't get the conflict repeating. (If A conflicts with B, you only get A,B and not B,A.)
EDIT
The issue with the above solution is that it does not scale very well. When searching for a Conflict, all bookings for that room Before the booking's EndDate are found, then filtered based on the EndDate. After a few years, that first search (hopefully using an Index) will return many, many records.
One optimisation is to have a maximum booking length, and only look that many days back in time for a conflict...
INNER JOIN
tblBooking [conflict]
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND [conflict].BeginDate <= [booking].EndDate
AND [conflict].BeginDate >= [booking].BeginDate - 7 -- Or however long the max booking length is
AND [conflict].EndDate >= [booking].BeginDate
AND [conflict].BookingID != [booking].BookingID
By having wrapped a >= AND a <= around the [conflict].BeginDate, an index search can now quickly return a reasonably limitted number of records.
For bookings longer than the maximum booking length, they can be entered into the database as multiple bookings. That's where the art of optimisation comes in, it's often all about trade-offs and compromises :)
EDIT
Another option, giving different details, would be to join the bookings against a calendar table. (Having, for example, one record per day.)
SELECT
[room].RoomID,
[calendar].Date,
COUNT(*) AS [total_bookings],
MIN([booking].BookingID) AS [min_booking_id],
MAX([booking].BookingID) AS [max_booking_id]
FROM
[calendar]
CROSS JOIN
tblRoom [room]
INNER JOIN
tblBooking [booking]
ON [booking].AssignedRoomID = [room].RoomID
AND [booking].BeginDate <= [calendar].Date
AND [booking].EndDate >= [calendar].Date
GROUP BY
[room].RoomID,
[calendar].Date
HAVING
COUNT(*) > 1

Complicated SQL Query Question

I am developing freelancer site and in this site users(project owners and experts) can leave feedback for each one. I am try to find count of feedbacks waitting to leave.
This query returns project's watting feedback count which are have no feedback in last 30 days, user id = 3 and have suitable status code:
SELECT COUNT(*)
FROM projects
WHERE projects.status IN (5, 10) AND projects.status_created >= DATE_SUB('2010-12-17 21:24:51', INTERVAL 30 DAY)
AND NOT EXISTS(
SELECT * FROM
feedbacks WHERE feedbacks WHERE projects.id = feedbacks.project_id AND feedbacks.from_id = '3'
)
This query is works when we have only 2 users in database otherwise for example if we change user id 3 to 99(user which have no relationship with project), query still return 1 for count but it should be return 0.
My database scheme:
PROJECTS(id, project_owner_id, project_title, ...)
FEEDBACKS(id, project_id, to_id, from_id, ....)
PROJECT_BIDS(id, project_id, bid_owner_id, accepted, ...) We can use this table for find out which user's bid is accepted then accepted bid owner have right for leave feedback.
We can use project_bids.accepted field for find out which users have relationship with project. If accepted true then project's freelancer expert is this user. Also projects.project_owner_id is another column to determine relationship.
How can i fix my query ? Thank you.

Your query (as written) is looking for the number of projects that have been created in the last 30 days have have attached comments/feedback, and the person in question has commented on this project.
The first thing that stands out is that you're checking the date the projected was created, not the date of the comments/feedback. If you do this, when the project becomes more than 30 days old, no more feedbacks will count when running the query. You most likely will want to add a timestamp to the feedbacks table and check that field instead.
Also, you're doing a count of the number of projects, rather than the number of feedbacks that meet the criteria.
For you're query, I would try something like:
SELECT COUNT(feedbacks.id)
FROM feedbacks, projects
WHERE
projects.id = feedbacks.project_id AND
projects.status IN (5, 10) AND
feedbacks.timestamp >= DATE_SUB('2010-12-17 21:24:51', INTERVAL 30 DAY)
ORDER BY projects.id
This will find the number of feedbacks per each project (of the given status). If you want to count only the feedbacks that were given by the person who won the bid, you can add to the WHERE clause:
AND feedbacks.from in (
SELECT project_bids.bid_owner_id
FROM project_bids
WHERE
project_bids.accepted = 1 AND
project_bids.project_id = projects.id
)
Your English is a bit difficult to understand, so please clarify I misunderstood something.
Note to everyone else: I'm still trying to get used to the Mark Down system. Feel free to correct my formatting above.

NOT EXISTS(SELECT FROM ...feedbacks.from_id = '99')
Is always true: 99(user which have no relationship with project),
Thats why you «still return 1 »

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas