I have a massive table full of hospital visit information. Each row corresponds to one visit. The visit/row itself has a unique ID but also contains a person ID (patient) to match back to that persons specific information.
I'm building a "new patient" sequence model. In doing so, I need to remove any patient from the table who has one (or more) visits before a set date. I can't just remove records before that date, as those "loyal patients" will still have visit information.
I tried to build a look-up table with all the patient ID's that have one or more visits before a certain time. I then tried to use this table to remove all visit information for patients who have had one or more visit before that set time.
I've tried multiple variations of the below (with statements, delete statements, having statements ect.) Each time, the final table has no values. I have verified that there are "new patients" with visit dates only after the set date.
My logic feels solid but clearly something is off. Here is the last command I tried. Any help would be greatly appreciated!
create table client_myvisit_notnew_id as
select patient, admissiondate from client_myvisit_primary_temp1
where admissiondate < '2015-05-30 00:00:00';
create table client_myvisit_primary_temp2 as
select * from client_myvisit_primary_temp1
where patient not in
(select patient from client_myvisit_notnew_id);
not in is a very dangerous construct with subqueries. If any of the values returned by the subquery is NULL, then nothing ever passes the filter. Although you can fix this by adding a where clause, I suggest that you get used to not exists instead:
select mpt.*
from client_myvisit_primary_temp1 mpt
where not exists (select 1
from client_myvisit_notnew_id mni
where mpt.patient = mni.patient
);
This has the semantics that most people expect.
EDIT:
If you just want patients who's original visit is after a certain date, then use window functions:
select mpt.*
from (select mpt.*,
min(mpt.admission_date) over (partition by mpt.patient) as min_ad
from client_myvisit_primary_temp1
) mpt
where min_ad >= '2015-05-30';
Defining multiple views is not necessary.
Related
I have a database of a service that helps people sell things. If they fail a delivery of a sale, they get penalised. I am trying to extract the number of active listings each user had when a particular penalty was applied.
I have the equivalent to the following tables(and relevant fields):
user (id)
listing (id, user_id, status)
transaction (listing_id, seller_id)
listing_history (id, listing_status, date_created)
penalty (id, transaction_id, user_id, date_created)
The listing_history table saves an entry every time a listing is modified, saving a record of what the new state of the listing is.
My goal is to end with a result table with the field: penalty_id, and number of active listings the penalised user had when the penalty was applied.
So far I have the following:
SELECT s1.penalty_id,
COUNT(s1.record_id) 'active_listings'
FROM (
SELECT penalty.id AS 'penalty_id',
listing_history.id AS 'record_id',
FROM user
JOIN penalty ON penalty.user_id = user.id
JOIN transaction ON transaction.id = penalty.transaction_id
JOIN listing_history ON listing_history.listing_id = listing.id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
) s1
GROUP BY s1.penalty_id
Status = 0 means that the listing is active (or that the listing was active at the time the record was created). I got results similar to what I expected, but I fear I may be missing something or may be doing the JOINs wrong. Would this have your approval? (apart from the obvious non-use of aliases, for clarity problems).
UPDATE - As the comments on this answer indicate that changing the table structure isn't an option, here are more details on some queries you could use with the existing structure.
Note that I made a couple changes to the query before even modifying the logic.
As viki888 pointed out, there was a problem reference to listing.id; I've replaced it.
There was no real need for a subquery in the original query; I've simplified it out.
So the original query is rewritten as
SELECT penalty.id AS 'penalty_id'
, COUNT(listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
GROUP BY penalty.id
Now the most natural way, in my opinion, to write the corrected timeline constraint is with a NOT EXISTS condition that filters out all but the most recent listing_history record for a given id. This does require thinking about some edge cases:
Could two listing history records have the same create date? If so, how do you decide which happened first?
If a listing history record is created on the same day as the penalty, which is treated as happening first?
If the created_date is really a timestamp, then this may not matter much (if at all); if it's really a date, it might be a bigger issue. Since your original query required that the listing history be created before the penalty, I'll continue in that style; but it's still ambiguous how to handle the case where two history records with matching status have the same date. You may need to adjust the date comparisons to get the desired behavior.
SELECT penalty.id AS 'penalty_id'
, COUNT(DISTINCT listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
AND NOT EXISTS (SELECT 1
FROM listing_history h2
WHERE listing_history.date_created < h2.date_created
AND h2.date_created < penalty.date_created
AND h2.id = listing_history.id)
GROUP BY penalty.id
Note that I switched from COUNT(...) to COUNT(DISTINCT ...); this helps with some edge cases where two active records for the same listing might be counted.
If you change the date comparisons to use <= instead of < - or, equivalently, if you use BETWEEN to combine the date comparisons - then you'd want to add AND h2.status != 0 (or AND h2.status <> 0, depending on your database) to the subquery so that two concurrent ACTIVE records don't cancel each other out.
There are several equivalent ways to write this, and unfortunately its the kind of query that doesn't always cooperate with a database query optimizer so some trial and error may be necessary to make it run well with large data volumes. Hopefully that gives enough insight into the intended logic that you could work out some equivalents if need be. You could consider using NOT IN instead of NOT EXISTS; or you could use an outer join to a second instance of LISTING_HISTORY... There are probably others I'm not thinking of off hand.
I don't know that we're in a position to sign off on a general statement that the query is, or is not, "correct". If there's a specific question about whether a query will include/exclude a record in a specific situation (or why it does/doesn't, or how to modify it so it won't/will), those might get more complete answers.
I can say that there are a couple likely issues:
The only glaring logic issue has to do with timeline management, which is something that causes a lot of trouble with SQL. The issue is, while your query demonstrates that the listing was active at some point before the penalty creation date, it doesn't demonstrate that the listing was still active on the penalty creation date. Consider
PENALTY
id transaction date
1 10 2016-02-01
TRANSACTION
id listing_id
10 100
LISTING_HISTORY
listing_id status date
100 0 2016-01-01
100 1 2016-01-15
The joins would create a single record, and the count for penalty 1 would include listing 100 even though its status had changed to something other than 0 before the penalty was created.
This is hard - but not impossible - to fix with your existing table structure. You could add a NOT EXISTS condition looking for another LISTING_HISTORY record matching the ID with a date between the first LISTING_HISTORY date and the PENALTY date, for one.
It would be more efficient to add an end date to the LISTING_HISTORY date, but that may not be so easy depending on how the data is maintained.
The second potential issue is the COUNT(RECORD_ID). This may not do what you mean - what COUNT(x) may intuitively seem like it should do, is what COUNT(DISTINCT RECORD_ID) actually does. As written, if the join produces two matches with the same LISTING_HISTORY.ID value - i.e. the listing became active at two different times before the penalty - the listing would be counted twice.
I am in the process of creating an organizational charts for my company, and to create the chart, the data must have a unique role identifier, and a unique 'reports to role' identifier for each line. Unfortunately my data is not playing ball and it out of my scope to change the source.
I have two source tables, simplified in the image below. It is important to note a couple of things in the data.
An employees manager in the query needs to come from the [EmpData] table. The 'ReportsTo' field is only in the [Role] table to be used when a role is vacant
Any number of employees can hold the same role, but for simplicity lets assume that there will only ever be one person in the 'Reports to' role
Using this sample data, my query is as follows:
/**Join Role table with employee data table.
/**Right join so roles with more than one employee will generate a row each
SELECT [Role].RoleId As PositionId
,[EmpData].ReportsToRole As ReportsToPosition
,[Role].RoleTitle
,[Empdata].EmployeeName
FROM [Role]
RIGHT JOIN [EmpData] ON [Role].RoleId=[EmpData].[Role]
UNION
/** Output all roles that do not have a holder, 'VACANT' in employee name.
SELECT [Role].RoleId
,[Role].ReportsToRole
,[Role].RoleTitle
,'VACANT'
FROM [Role]
WHERE [Role].RoleID NOT IN (SELECT RoleID from [empdata])
This almost creates the intended output, but each operator roles has 'OPER', in the PositionId column.
For the charting software to work, each position must have a unique identifier.
Any thoughts on how to achieve this outcome? I'm specifically chasing the appended -01, -02, -03 etc. highlighted yellow in the Desired Query Output.
If you are using T-SQL, you should look into using the ROW_NUMBER operator with the PARTITON BY command and combining the column with your existing column.
Specifically, you would add a column to your select of ROW_NUMBER () OVER (PARTITION BY PositionID ORDER BY ReportsToPosition,EmployeeName) AS SeqNum
I would add that to your first query, and then, in your second, I would do something like SELECT PositionID + CASE SeqNum WHEN 1 THEN "" ELSE "-"+CAST(SeqNum AS VarChar(100)),...
There are multiple ways to do this, but this will leave out the individual ones that don't need a "-1" and only add it to the rest. The major difference between this and your scheme is it doesn't contain the "0" pad on the left, which is easy to do, nor would the first "OPER" be "OPER-1", they would simply be "OPER", but this can also be worked around.
Hopefully this gets you what you need!
I'm a little bit stumped as to how to do this. I want to select records from a table "agency" joined to a table "notes" on an id column that the two tables share.
Table structure:
create table notes (
notes_id varchar2(5),
agency_gp_id varchar2(5),
call_date date,
call_note varchar2(4000)
);
create table agency(
agency_id varchar2(5),
agency_name varchar2(5),
street varchar2(75),
city varchar2(50)
);
alter table notes add constraint "fk_group_notes_agency_id" foreign key(agency_gp_id)
references agency(agency_id) enable;
-Each table has auto-numbering, "before-insert" triggers so the id numbers stay in synch (along with other stuff in the case of adding a note to a newly created agency) - everything I need it to do (the databse), it does.
-Each record from the agency table has a distinct name/address combo (with different branches in different cities) and each record from the notes table has a date entry corresponding to each agency.
-Each agency can have multiple notes (multiple note details from subsequent visits)
What I am attempting to do is select each (distinct agency,street,city) that has not had a note added to it within the past four months.
This is the query I came up with:
SELECT count(a.agency_name) as number_of_visits,
a.agency_name,
(a.street||', '||a.city) as "Location",
n.call_date,
ROUND(TRUNC(sysdate - call_date)) AS days_since_visit
FROM notes n, agency a
WHERE (sysdate - n.call_date) > 120
AND n.agency_gp_id = a.agency_id
--AND a.city = 'München' --not necessary, used for limiting number of results
GROUP BY n.call_date,a.agency_name,a.street, a.city
ORDER BY a.agency_name ASC, n.call_date desc;
It kind of works...I can see what I want but I also see what I DO NOT want (e.g. the multiple notes on each agency). The only thing I want to see is the last entry (most recent, according to the WHERE clause) of each agency. The picture I want to create is: For whichever agency that has not been annotated within 120 days of the last note, display the address and name and the last note date.
(Instead of showing the number of days since EACH visit, I want to show the number of days that have past since the LAST visit - per distinct agency,street,city).
This is for an app that will help a sales executive schedule her sales calls and is run twice a week. I have been unable to figure this out. Also, bear in mind that the actual tables used are much more descriptive - what I have used here are only the parts I need to describe the question.
I would appreciate any suggestions on how to solve this problem.
Thanks!
If I understand your problem correctly, changing call_date to MAX(call_date) (and removing it from the GROUP BY statement) should get you what you want int terms of data, but would also pull in false positives, namely any agency that had notes older than 120 days, regardless of the most recent note. If we filter those agencies out in a NOT EXISTS subquery, that should get you where you need to go.
SELECT count(a.agency_name) as number_of_visits,
a.agency_name,
(a.street||', '||a.city) as "Location",
MAX(n.call_date),
ROUND(TRUNC(sysdate - MAX(call_date))) AS days_since_visit
FROM notes n, agency a
WHERE (sysdate - n.call_date) > 120
AND n.agency_gp_id = a.agency_id
AND NOT EXISTS (SELECT 1 FROM notes n2
WHERE n2.agency_gp_id = a.agency_id
AND (sysdate - n2.call_date) <= 120)
--AND a.city = 'München' --not necessary, used for limiting number of results
GROUP BY a.agency_name,a.street, a.city
ORDER BY a.agency_name ASC, MAX(n.call_date) desc;
Where are Cartesian Joins used in real life?
Can some one please give examples of such a Join in any SQL database.
just random example. you have a table of cities: Id, Lat, Lon, Name. You want to show user table of distances from one city to another. You will write something like
SELECT c1.Name, c2.Name, SQRT( (c1.Lat - c2.Lat) * (c1.Lat - c2.Lat) + (c1.Lon - c2.Lon)*(c1.Lon - c2.Lon))
FROM City c1, c2
Here are two examples:
To create multiple copies of an invoice or other document you can populate a temporary table with names of the copies, then cartesian join that table to the actual invoice records. The result set will contain one record for each copy of the invoice, including the "name" of the copy to print in a bar at the top or bottom of the page or as a watermark. Using this technique the program can provide the user with checkboxes letting them choose what copies to print, or even allow them to print "special copies" in which the user inputs the copy name.
CREATE TEMP TABLE tDocCopies (CopyName TEXT(20))
INSERT INTO tDocCopies (CopyName) VALUES ('Customer Copy')
INSERT INTO tDocCopies (CopyName) VALUES ('Office Copy')
...
INSERT INTO tDocCopies (CopyName) VALUES ('File Copy')
SELECT * FROM InvoiceInfo, tDocCopies WHERE InvoiceDate = TODAY()
To create a calendar matrix, with one record per person per day, cartesian join the people table to another table containing all days in a week, month, or year.
SELECT People.PeopleID, People.Name, CalDates.CalDate
FROM People, CalDates
I've noticed this being done to try to deliberately slow down the system either to perform a stress test or an excuse for missing development deliverables.
Usually, to generate a superset for the reports.
In PosgreSQL:
SELECT COALESCE(SUM(sales), 0)
FROM generate_series(1, 12) month
CROSS JOIN
department d
LEFT JOIN
sales s
ON s.department = d.id
AND s.month = month
GROUP BY
d.id, month
This is the only time in my life that I've found a legitimate use for a Cartesian product.
At the last company I worked at, there was a report that was requested on a quarterly basis to determine what FAQs were used at each geographic region for a national website we worked on.
Our database described geographic regions (markets) by a tuple (4, x), where 4 represented a level number in a hierarchy, and x represented a unique marketId.
Each FAQ is identified by an FaqId, and each association to an FAQ is defined by the composite key marketId tuple and FaqId. The associations are set through an admin application, but given that there are 1000 FAQs in the system and 120 markets, it was a hassle to set initial associations whenever a new FAQ was created. So, we created a default market selection, and overrode a marketId tuple of (-1,-1) to represent this.
Back to the report - the report needed to show every FAQ question/answer and the markets that displayed this FAQ in a 2D matrix (we used an Excel spreadsheet). I found that the easiest way to associate each FAQ to each market in the default market selection case was with this query, unioning the exploded result with all other direct FAQ-market associations.
The Faq2LevelDefault table holds all of the markets that are defined as being in the default selection (I believe it was just a list of marketIds).
SELECT FaqId, fld.LevelId, 1 [Exists]
FROM Faq2Levels fl
CROSS JOIN Faq2LevelDefault fld
WHERE fl.LevelId=-1 and fl.LevelNumber=-1 and fld.LevelNumber=4
UNION
SELECT Faqid, LevelId, 1 [Exists] from Faq2Levels WHERE LevelNumber=4
You might want to create a report using all of the possible combinations from two lookup tables, in order to create a report with a value for every possible result.
Consider bug tracking: you've got one table for severity and another for priority and you want to show the counts for each combination. You might end up with something like this:
select severity_name, priority_name, count(*)
from (select severity_id, severity_name,
priority_id, priority_name
from severity, priority) sp
left outer join
errors e
on e.severity_id = sp.severity_id
and e.priority_id = sp.priority_id
group by severity_name, priority_name
In this case, the cartesian join between severity and priority provides a master list that you can create the later outer join against.
When running a query for each date in a given range. For example, for a website, you might want to know for each day, how many users were active in the last N days. You could run a query for each day in a loop, but it's simplest to keep all the logic in the same query, and in some cases the DB can optimize the Cartesian join away.
To create a list of related words in text mining, using similarity functions, e.g. Edit Distance
Scenario: A sampling survey needs to be performed on membership of 20,000 individuals. Survey sample size is 3500 of the total 20000 members. All membership individuals are in table tblMember. Same survey was performed the previous year and members whom were surveyed are in tblSurvey08. Membership data can change over the year (e.g. new email address, etc.) but the MemberID data stays the same.
How do I remove the MemberID/records contained tblSurvey08 from tblMember to create a new table of potential members to be surveyed (lets call it tblPotentialSurvey09). Again the record for a individual member may not match from the different tables but the MemberID field will remain constant.
I am fairly new at this stuff but I seem to be having a problem Googling a solution - I could use the EXCEPT function but the records for the individuals members are not necessarily the same from one table to next - just the MemberID may be the same.
Thanks
SELECT
* (replace with column list)
FROM
member m
LEFT JOIN
tblSurvey08 s08
ON m.member_id = s08.member_id
WHERE
s08.member_id IS NULL
will give you only members not in the 08 survey. This join is more efficient than a NOT IN construct.
A new table is not such a great idea, since you are duplicating data. A view with the above query would be a better choice.
I apologize in advance if I didn't understand your question but I think this is what you're asking for. You can use the insert into statement.
insert into tblPotentialSurvey09
select your_criteria from tblMember where tblMember.MemberId not in (
select MemberId from tblSurvey08
)
First of all, I wouldn't create a new table just for selecting potential members. Instead, I would create a new true/false (1/0) field telling if they are eligible.
However, if you'd still want to copy data to the new table, here's how you can do it:
INSERT INTO tblSurvey00 (MemberID)
SELECT MemberID
FROM tblMember m
WHERE NOT EXISTS (SELECT 1 FROM tblSurvey09 s WHERE s.MemberID = m.MemberID)
If you just want to create a new field as I suggested, a similar query would do the job.
An outer join should do:
select m_09.MemberID
from tblMembers m_09 left outer join
tblSurvey08 m_08 on m_09.MemberID = m_08.MemberID
where
m_08.MemberID is null