SQL duration between dates for different persons - sql

hopefully someone can help me with the following task:
I hVE got 2 tables Treatment and 'Person'. Treatment contains the dates when treatments for the different persons were started, Person contains personal information, e.g. lastname.
Now I have to find all persons where the duration between the first and last treatment is over 20 years.
The Tables look something like this:
Person
| PK_Person | First name | Name |
_________________________________
| 1 | A_Test | Karl |
| 2 | B_Test | Marie |
| 3 | C_Test | Steve |
| 4 | D_Test | Jack |
Treatment
| PK_Treatment | Description | Starting time | PK_Person |
_________________________________________________________
| 1 | A | 01.01.1989 | 1
| 2 | B | 02.11.2001 | 1
| 3 | A | 05.01.2004 | 1
| 4 | C | 01.09.2013 | 1
| 5 | B | 01.01.1999 | 2
So in this example, the output should be person Karl, A_Test.
Hopefully its understandable what the problem is and someone can help me.
Edit: There seems to be a problem with the formatting, the tables are not displayed correctly, I hope its readable.

SELECT *
FROM person p
INNER JOIN Treatment t on t.PK_Person = p.PK_Person
WHERE DATEDIFF(year,[TREATMENT_DATE_1], [TREATMENT_DATE_2]) > 20
This should do it, it is however untested so will need tweaking to your schema

Your data looks a bit suspicious, because the first name doesn't look like a first name.
But, what you want to do is aggregate the Treatment table for each person and get the minimum and maximum starting times. When the difference is greater than 20 years, then keep the person, and join back to the person table to get the names.
select p.FirstName, p.LastName
from Person p join
(select pk_person, MIN(StartingTime) as minst, MAX(StartingTime) as maxst
from Treatment t
group by pk_person
having MAX(StartingTime) - MIN(StartingTime) > 20*365.25
) t
on p.pk_person = t.pk_person;
Note that date arithmetic does vary between databases. In most databases, taking the difference of two dates counts the number of days between them, so this is a pretty general approach (although not guaranteed to work on all databases).

I've taken a slightly different approach and worked with SQL Fiddle to verify that the below statements work.
As mentioned previously, the data does seem a bit suspicious; nonetheless per your requirements, you would be able to do the following:
select P.PK_Person, p.FirstName, p.Name
from person P
inner join treatment T on T.pk_person = P.pk_person
where DATEDIFF((select x.startingtime from treatment x where x.pk_person = p.pk_person order by startingtime desc limit 1), T.StartingTime) > 7305
First, we need to inner join treatements which will ignore any persons who are not in the treatment table. The where portion now just needs to select based on your criteria (in this case a difference of dates). Doing a subquery will generate the last date a person has been treated, compare that to each of your records, and filter by number of days (7305 = 20 years * 365.25).
Here is the working SQL Fiddle sample.

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.
Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.
If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0
Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.
Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m
You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

JOIN two tables, but only include data from first table in first instance of each unique record

Title might be confusing.
I have a table of Cases, and each Case can contain many Tasks. To achieve a different workflow for each Task, I have different tables such as Case_Emails, Case_Calls, Case_Chats, etc...
I want to build a Query that will eventually be exported to Excel. In this query, I want to list out each Task, and the Tasks are already joined together via a UNION in another table using a common format. For each task in the Query, I want only the first Task associated with a case to include the details from Cases table. Example below:
+----+---------+------------+-------------+-------------+-------------+
| id | Case ID | Agent Name | Task Info 1 | Task Info 2 | Task Info 3 |
+----+---------+------------+-------------+-------------+-------------+
| 1 | 4000000 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
| 2 | | | Detailstuff | Stuffdetail | Thingsyo |
| 3 | | | Detailstuff | Stuffdetail | Thingsyo |
| 4 | 4000003 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
| 5 | | | Detailstuff | Stuffdetail | Thingsyo |
| 6 | 4000006 | Some Name | Detailstuff | Stuffdetail | Thingsyo |
+----+---------+------------+-------------+-------------+-------------+
My original approach was attempting a LEFT JOIN on Case ID, but I couldn't figure out how to filter the data out from the extra rows.
This would be much simpler if Access supported the ROW_NUMBER function. It doesn't, but you can sort of simulate it with a correlated subquery using the Tasks table (this assumes that each task has a unique numeric ID). This basically assigns a row number to each task, partitioned by the CaseID. Then you can just conditionally display the CaseID and AgentName where RowNum = 1.
SELECT Switch(RowNum = 1, CaseID) as Case,
Switch(RowNum = 1, AgentName) as Agent,
TaskName
FROM (
SELECT c.CaseID,
c.AgentName,
t.TaskName,
(select count(*)
from Tasks t2
where t2.CaseID = c.CaseID and t2.ID <= t.ID) as RowNum
FROM Cases c
INNER JOIN Tasks t ON c.CaseID = t.CaseID
order by c.CaseID, t.TaskName
)
You didn't post your table structure, so I'm not sure this will work for you as-is, but maybe you can adapt it.
No matter what when you join you will have duplicate values. to remove the duplicates either put in a Distinct in your select or a Group by after your filters. This should resolve the duplicates in you query for task info 1,2,3.
Found out that I can name my tables in the query like so:
FROM Case_Calls Calls
With this other name, I was able to filter based on a sub query:
IIF( Calls.[ID] <> (select top 1 [ID] from Case_Calls where [Case ID] = Calls.[Case ID]), '', Cases.[Creator]) As [Case Creator]
This solution gives me the results that I want :) It's rather ugly SQL, and difficult to parse when I'm dealing with dozens of columns, but it gets the job done!
I'm still curious if there is a better solution...

Need advice about JOIN by LIKE operator

I have two data tables, one contain the customer data such as customer ID and used bonus codes. The second table is for internal notes, every note that i write about the customer is there, for example: gave to customer 12 bonus code GFT100.
know i need join this two table based on the bonus code, i want for every bonus code the player used to find the relevant note in the notes table.
Table 1: Used bonus codes
CustomerID | Coupon_Code | DateOfUsege
--------------------------------------
12 | AAA25 | 2016-09-10
-------------------------------------
12 | BBB13 | 2016-09-10
--------------------------------------
17 | CCC14 | 2016-09-10
Table2:Customer Notes
CustomerID| Date | Text
---------------------------------------------
12 |2016-09-07| Gave bonus AAA25
----------------------------------------------
12 |2016-09-07| Very good customer
----------------------------------------------
17 |2016-09-06| Gave bonus code CCC14
Desired output: for each used code in table 1 add only the relevant note from table 2
CustomerID | Coupon_Code | DateOfUsege | Text |
----------------------------------------------------------------
12 | AAA25 | 2016-09-10 | Gave bonus AAA25 |
----------------------------------------------------------------
17 | CCC14 | 2016-09-10 | Gave bonus code CCC14 |
-----------------------------------------------------------------
How can i do that?
I'd advise adding a nullable coupon_code column to your notes table (assuming a note may only pertain to zero or one coupon code) and recording the optional code with each note when applicable. Searching for the code in the free text values could return false positives
... but, this join should get you started
SELECT a.CustomerID, a.Coupon_Code, a.DateOfUsege, b.Text
FROM `used_bonus_codes` a
INNER JOIN `customer_notes` b
ON a.CustomerID = b.CustomerID
AND b.Text LIKE CONCAT('%', a.Coupon_Code, '%')

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

SQL Query X Days back excluding date ranges (Confusing!)

Ok, I have a tough SQL query, and I'm not sure how to go about writing it.
I am summing the number of "bananas collected" by an employee within the last X days, but what I could really use help on is determining X.
The "last X days" value is defined to be the last 100 days that the employee was NOT out due to Purple Fever, starting from some ChosenDate (we'll say today, 6/24/14). That is to say, if the person was sick with Purple Fever for 3 days, then I want to look back over the last 103 days from ChosenDate rather than the last 100 days. Any other reason the employee may have been out does not affect our calculation.
Table PersonOutIncident
+----------------------+----------+-------------+
| PersonOutIncidentID | PersonID | ReasonOut |
+----------------------+----------+-------------+
| 1 | Sarah | PurpleFever |
| 2 | Sarah | PaperCut |
| 3 | Jon | PurpleFever |
| 4 | Sarah | PurpleFever |
+----------------------+----------+-------------+
Table PersonOutDetail
+-------------------+----------------------+-----------+-----------+
| PersonOutDetailID | PersonOutIncidentID | BeginDate | EndDate |
+-------------------+----------------------+-----------+-----------+
| 1 | 1 | 1/1/2014 | 1/3/2014 |
| 2 | 1 | 1/7/2014 | 1/13/2014 |
| 3 | 2 | 2/1/2014 | 2/3/2014 |
| 4 | 3 | 1/15/2014 | 1/20/2014 |
| 5 | 4 | 5/1/2014 | 5/15/2014 |
+-------------------+----------------------+-----------+-----------+
The tables are established. Many PersonOutDetail records can be associated with one PersonOutIncident record and there may be multiple PersonOutIncident records for a single employee (That is to say, there could be two or three PersonOutIncident records with an identical ReasonOut column, because they represent a particular incident or event and the not-necessarily-continuous days lost due to that particular incident)
The nature of this requirement complicates things, even conceptually to me.
The best I can think of is to check for a BeginDate/EndDate pair within the 100 day base period, then determine the number of days from BeginDate to EndDate and add that to the base 100 days. But then I would have to check again that this new range doesn't overlap or contain additional BeginDate/EndDate pairs and add, if so, add those days as well. I can tell already that this isn't the method I want to use, but I can't wrap my mind quite around how exactly what I need to start/structure this query. Does anyone have an idea that might steer me in the correct direction? I realize this might not be clear and I apologize if I'm just confusing things.
One way to do this is to work with a table or WITH CLAUSE that contains a list of days. Let's say days is a table with one column that contains the last 200 days. (This means the query will break if the employee had more than 100 sick days in the last 200 days).
Now you can get a list of all working days of an employee like this (replace ? with the employee id):
WITH t1 AS
(
SELECT day,
ROW_NUMBER() OVER (ORDER BY day DESC) AS 'RowNumber'
FROM days d
WHERE NOT EXISTS (SELECT * FROM PersonOutDetail pd
INNER JOIN PersonOutIncidentID po ON po.PersonOutIncidentID = pd.PersonOutIncidentID
WHERE d.day BETWEEN pd.BeginDate AND pd.EndDate
AND po.ReasonOut = 'PurpleFever'
AND po.PersonID = ?)
)
SELECT * FROM t1
WHERE RowNumber <= 100;
Alternatively, you can obtain the '100th day' by replacing RowNumber <= 100 with RowNumber = 100.