Searching for parent records whose children meet predicate - sql

Let's say I have a parent and child database, and the child keeps a sort of running transcript of things that happen to the parent:
create table patient (
fullname text not null,
admission_number integer primary key
);
create table history (
note text not null,
doctor text not null,
admission_number integer references patient (admission_number)
);
(Just an example, I'm not doing a medical application).
history is going to have many records for the same admission_number:
admission_number doctor note
------------------------------------
3456 Johnson Took blood pressure
7828 Johnson EKG 120, temp 99.2
3456 Nichols Drew blood
9001 Damien Discharged patient
7828 Damien Discharged patient with Rx
So, my question is, how would I build a query that let me do and/or/not searches of the note field for patient records, like, for example, if I wanted to find every patient whose history contained "blood pressure" and "discharged".
Right now I'm been doing a select on history that groups by admission_number, combining all the notes with a group_concat(note) and doing my search in the having, thus:
select * from history
group by admission_number
having group_concat(note) like '%blood pressure%'
and group_concat(note) like '%discharged';
This works, but it makes certain elaborations very complicated -- for example, I'd like to be able to ask things like "every patient whose history contains "blood pressure" and whose history with Dr. Damien says "discharged," and building qualifications like this on top of my basic query is very messy.
Is there any better way of phrasing my basic query?

This is similar to your EXISTS method, but computes the subqueries differently.
This might or might not be faster, depending on how your tables and indexes are organized, and on the queries' selectivity.
SELECT *
FROM patient
WHERE admission_number IN (SELECT admission_number
FROM history
WHERE note LIKE '%blood pressure%')
AND admission_number IN (SELECT admission_number
FROM history
WHERE note LIKE '%discharged%'
AND doctor = 'Damien')
Alternatively, you could use a compound subquery (computing the intersection once is likely to be faster than executing IN twice for every record):
SELECT *
FROM patient
WHERE admission_number IN (SELECT admission_number
FROM history
WHERE note LIKE '%blood pressure%'
INTERSECT
SELECT admission_number
FROM history
WHERE note LIKE '%discharged%'
AND doctor = 'Damien')

Why don't you use a JOIN operation?
e.g.
considering, the patient table contains the following data:
INSERT INTO patient VALUES('Bob', 3456);
INSERT INTO patient VALUES('Mary', 7828);
INSERT INTO patient VALUES('Lucy', 9001);
Running the query:
SELECT DISTINCT p.fullname, p.admission_number FROM patient p
INNER JOIN history h ON p.admission_number = h.admission_number
WHERE note LIKE '%blood pressure%' OR note LIKE '%Discharged%';
gets you:
fullname = Bob
admission_number = 3456
fullname = Lucy
admission_number = 9001
fullname = Mary
admission_number = 7828
And running the following query:
SELECT DISTINCT p.fullname, p.admission_number FROM patient p
INNER JOIN history h ON p.admission_number = h.admission_number
WHERE note LIKE '%blood pressure%';
gets you:
fullname = Bob
admission_number = 3456

I have something -- using EXISTS to construct these is a bit cleaner:
select * from patients where
exists (
select 1 from history where
history.admission_number == patients.admission_number
AND
history.note LIKE '%blood pressure%'
)
AND
exists (
select 1 from history where
history.admission_number == patients.admission_number
AND
history.note LIKE '%discharged%'
AND
history.doctor == 'Damien'
);
That's much better, now I can construct really fine-grained predicates.

Related

Can I get duplicate results (from one table) in an INTERSECT operation between two tables?

I know the wording of the question is awkward, but I couldn't phrase it any better. Let me explain the situation.
There's table A which has a bunch of columns (a, b, c ... ) and I run a SELECT query on it like so:
SELECT a FROM A WHERE b IN ('....') (the ellipsis indicates a number of values to be matched to)
There's another table B which has a bunch of columns (d, e, f ... ) and I run a SELECT query on it like so:
SELECT d FROM B WHERE f = '...' (the ellipsis indicates a single value to be matched to)
Now I should say here that the two tables store different types of information about the same entity, but the columns a and d contain the exact same data (in this case, an ID). I want to find out the intersection of the two tables so I run this:
SELECT a FROM A WHERE b IN ('....') INTERSECT SELECT d FROM B WHERE f = '...'
Now here's the problem:
The first SELECT contains a set of values in the WHERE clause, right? So let's say the set is (1234, 2345,3456). Now, the result of this query when b is matched ONLY to 1234 is, let's say, abc. When it's matched to 2345, it's def, suppose. And matching to 3456, it gives abc.
Let's suppose these two results (abc and def) are also in the set of results from the second SELECT.
So, now, putting back the entire set of values to matched into the WHERE clause, the INTERSECT operation will give me abc and def. But I want abc twice since two values in the WHERE clause set match to the second SELECT.
Is there any way I can get that?
I hope it's not too complicated to understand my problem. This is a real-life problem I'm facing in my job.
Data structure and my code
Table A contains general information about a company:
company_id | branch_id | no_of_employees | city
Table B contains the financials of the company:
company_id | branch_id | revenue | profits
First SELECT:
SELECT branch_id FROM A WHERE CITY IN ('Dallas', 'Miami', 'New Orleans')
Now, running each city separately in the first SELECT, I get the branch_ids:
branch_id | city
23 | Dallas
45 | Miami
45 | New Orleans
Once again, this seems impractical as to how two cities can have the same branch ids, but please bear with me on this.
Second SELECT:
SELECT branch_id FROM B
WHERE REVENUE = 5000000
I know this is a little impractical, but for the purpose of this example, it suffices.
Running this query I get the following set:
11
23
45
22
10
So the INTERSECT will give me just 23 and 45. But I want 45 twice, since both Miami and New Orleans have that branch_id and that branch_id has generated a revenue of 5 million.
Directly from Microsoft's documentation (https://msdn.microsoft.com/en-us/library/ms188055.aspx)
:
"INTERSECT returns distinct rows that are output by both the left and right input queries operator."
So NO, it is not possible to get the same value twice when using INTERSECT because the results will be DISTINCT. However if you build an INNER JOIN correctly you can do essentially the same thing as INTERSECT except keep the repetitive results by NOT using distinct or group by.
SELECT
A.a
FROM
A
INNER JOIN B
ON A.a = B.d
AND B.F = '....'
WHERE b IN ('....')
And for your specific Example that you edited:
SELECT
branch_id
FROM
A
INNER JOIN B
ON A.branch_id = B.branch_id
AND B.REVENUE = 5000000
WHERE A.CITY IN ('Dallas', 'Miami', 'New Orleans')
You overcomplicated your task a lot:
SELECT *
FROM A
WHERE CITY IN (...)
AND EXISTS
(
SELECT 1 FROM B
WHERE B.REVENUE = 5000000
AND B.branch_id = A.branch_id
)
INTERSECT and EXCEPT are both returning row sets with DISTINCT applied.
Regular joining/filtering operations are not performed by INTERSECT or EXCEPT.

Need a little SQL help - Getting number of items in common

Imagine I have a table like such
UserID Name Hobbies
00001 Jim Baseball, Hockey, Astonomy
00002 Jack Baseball, Football, Video Games
00003 Jill Astronomy, Shopping, Soccer
00004 Jane Hockey, Astronomy, Video Games
00005 Jacob Football, Basketball, Video Games
Now, what I want to do is get a count of hobbies in common. So, let's say I plug in 00001 into a textbox or query string or whatever. I want to see something like:
Name Hobbies
Jack You have (1) hobby in common
Jill You have (1) hobby in common
Jane You have (2) hobbies in common
Jacob You have (0) hobbies in common
How would I write the code for that? I'm stumped. I'm thinking it's got to do with string matching, but I have no idea how to do that.
The first choice is to fix your data structure. Comma-delimited lists are bad, bad, bad. A separate table storing one row per person and per hobby is good, good, good.
If you are stuck with someone else's bad decisions, there is a little recourse. First Google "sql server split" and get your favorite string splitting function.
Then, you can do:
with t as (
select t.*, s.val as hobby
from table t cross apply
dbo.split(t.Hobbies, ', ') as s(val) -- Note, some `split()` implementations also have a `pos` value
)
select t.userName, count(tuser.userId) as NumInCommon
from t left join
t tuser
on t.hobby = tuser.hobby and tuser.userId = '00001'
group by t.userId, t.userName;
It is not worth constructing the full sentence in SQL, unless you really want to. Use SQL primarily to get the data you want. (Formatting in SQL can be useful sometimes, but it is really more for the application code.)
create table #temp_hobbies
(hobby_id int
,hobby varchar(50))
insert into #temp_hobbies values
(1, 'football')
,(2,'baseball')
create table #temp_people
(user_ids int,
name varchar(50),
hobby_ids int)
insert into #temp_people values
(01,'Adam',1)
,(01,'Adam',2)
,(02,'Dave',1)
,(03,'Matt',2)
select count(distinct hobby) , count(distinct name)
from #temp_hobbies a
inner join #temp_people b on a.hobby_id = b.hobby_ids
part of your solution you now need to add query that will give computed column of each user's hobby compared to other.
But per other user's try seperating hobby's into a seperate table and use int to do joins. Sql server is faster to process ints than varchar's esp if you will need to do this for thousand's of records.
First of all please NORMALIZE your data. you can see lot of repeatating hobbies in each row, also it will be tedious to serach and for maintainability.
you can have all your USERS data in one table as below :
CREATE TABLE USERS ( UserID , NAME ); --> USERID being PRIMARY KEY
you can have all your HOBBIES in another table as below :
CREATE TABLE HOBBIES ( HOBBYID, HOBBYNAME); --> HOBBYID being PRIMARY KEY
you can have another table which maps USERS with HOBBIES as below :
CREATE USERS_HOBBIES( USERID , HOBBYID );
once the table is normalized as above, you can get the desired result by querying as below :
SELECT u.NAME , count(*) AS Hobbies FROM USERS u INNER JOIN
USERS_HOBBIES uh ON u.UserID = uh.USERID INNER JOIN HOBBIES h ON
uh.HOBBYID = h.HOBBYID WHERE h.HOBBYID IN (
(SELECT a.HOBBYID as HOBBYID FROM
(SELECT DISTINCT(HOBBYID) as HOBBYID FROM USERS_HOBBIES WHERE
USERID = '00001' ) a INNER JOIN
(SELECT DISTINCT(HOBBYID) as HOBBYID FROM USERS_HOBBIES WHERE
USERID <> '00001' ) b ON a.HOBBYID = b.HOBBYID) )
AND u.USERID = '00001' GROUP BY u.NAME
P.S : The above query syntax is in ORACLE

SQL Server 2005 -Join based on criteria in table column

Using SQL Server 2005, what is the most efficient way to join the two tables in the following scenario ?
The number of records in each table could be fairly large about 200000 say.
The only way I can currently think of doing this is with the use of cursors and some dynamic SQL for each item which will clearly be very inefficient.
I have two tables - a PERSON table and a SEARCHITEMS table. The SEARCHITEMS table contains a column with some simple criteria which is to be used when matching records with the PERSON table. The criteria can reference any column in the PERSON table.
For example given the following tables :
PERSON table
PERSONID FIRSTNAME LASTNAME GENDER AGE ... VARIOUS OTHER COLUMNS
1 Fred Bloggs M 16
....
200000 Steve Smith M 18
SEARCHITEMS table
ITEMID DESCRIPTION SEARCHCRITERIA
1 Males GENDER = 'M'
2 Aged 16 AGE=16
3 Some Statistic {OTHERCOLUMN >= SOMEVALUE AND OTHERCOLUMN < SOMEVALUE}
....
200000 Males Aged 16 GENDER = 'M' AND AGE = 16
RESULTS table should contain something like this :
ITEMID DESCRIPTION PERSONID LASTNAME
1 Males 1 Bloggs
1 Males 200000 Smith
2 Aged 16 1 Bloggs
....
200000 Males Aged 16 1 Bloggs
It would be nice to be able to just do something like
INSERT INTO RESULTSTABLE
SELECT *
FROM PERSON P
LEFT JOIN SEARCHITEMS SI ON (APPLY SI.SEARCHCRITERIA TO P)
But I can't see a way of making this work. Any help or ideas appreciated.
Seeing that the SEARCHITEMS table is non-relational by nature, it seems that the cursor and dynamic SQL solution is the only workable one. Of course this will be quite slow and I would "pre-calculate" the results to make it somewhat bearable.
To do this create the following table:
CREATE TABLE MATCHEDITEMS(
ITEMID int NOT NULL
CONSTRAINT fkMatchedSearchItem
FOREIGN KEY
REFERENCES SEARCHITEMS(ITEMID),
PERSONID int
CONSTRAINT fkMatchedPerson
FOREIGN KEY
REFERENCES PERSON(PERSONID)
CONSTRAINT pkMatchedItems
PRIMARY KEY (ITEMID, PERSONID)
)
The table will contain a lot of data, but considering it only stores 2 int columns the footprint on disk will be small.
To update this table you create the following triggers:
a trigger on the SEARCHITEMS table which will populate the MATCHEDITEMS table whenever a rule is changed or added.
a trigger on the PERSON table which will run the rules on the updated or added PERSON records.
Results can then simply be presented by joining the 3 tables.
SELECT m.ITEMID, m.DESCRIPTION, m.PERSONID, p.LASTNAME
FROM MATCHEDITEMS m
JOIN PERSON p
ON m.PERSONID = p.PERSONID
JOIN SEARCHITEMS s
ON m.ITEMID = s.ITEMID
You could build your TSQL dynamically, and then execute it with sp_executesql.

Combine query results from one table with the defaults from another

This is a dumbed down version of the real table data, so may look bit silly.
Table 1 (users):
id INT
username TEXT
favourite_food TEXT
food_pref_id INT
Table 2 (food_preferences):
id INT
food_type TEXT
The logic is as follows:
Let's say I have this in my food preference table:
1, 'VEGETARIAN'
and this in the users table:
1, 'John', NULL, 1
2, 'Pete', 'Curry', 1
In which case John defaults to be a vegetarian, but Pete should show up as a person who enjoys curry.
Question, is there any way to combine the query into one select statement, so that it would get the default from the preferences table if the favourite_food column is NULL?
I can obviously do this in application logic, but would be nice just to offload this to SQL, if possible.
DB is SQLite3...
You could use COALESCE(X,Y,...) to select the first item that isn't NULL.
If you combine this with an inner join, you should be able to do what you want.
It should go something like this:
SELECT u.id AS id,
u.username AS username,
COALESCE(u.favorite_food, p.food_type) AS favorite_food,
u.food_pref_id AS food_pref_id
FROM users AS u INNER JOIN food_preferences AS p
ON u.food_pref_id = p.id
I don't have a SQLite database handy to test on, however, so the syntax might not be 100% correct, but it's the gist of it.

SQL Select based on Link Field

I feel like an idiot asking this...
Table 1: users
id serial
person integer
username char(32)
Table 2:persons
id serial
name char(16)
Can I run a query that returns the name field in persons by providing the username in users?
users
1 | 1 | larry123
persons
1 | larry
2 | curly
SQL?
select name from persons where users.person=persons.id and users.username='larry123';
with the desired return of
larry
I have been doing it with two passes until now and think maybe a nested select using a join is what I need
1 | larry
It sounds like you're asking how to do a join in SQL:
SELECT
name
FROM
users JOIN persons ON (users.person = persons.id)
WHERE
users.username = 'larry123';
that is almost the query you wrote. All you were missing was the join clause. You could also do that join like this:
SELECT name
FROM users, persons
WHERE
users.person = persons.id
AND users.username = 'larry123';
I suggest finding a well-written introduction to SQL.