Oracle SQL: SQL join with group by (count) and having clauses - sql

I have 3 tables
Table 1.) Sale
Table 2.) ItemsSale
Table 3.) Items
Table 1 and 2 have ID in common and table 2 and 3 have ITEMS in common.
I'm having trouble with a query that I have made so far but can't seem to get it right.
I'm trying to select all the rows that only have one row and match a certain criteria here is my query:
select *
from sales i
inner join itemssales j on i.id = j.id
inner join item u on j.item = u.item
where u.code = ANY ('TEST','HI') and
i.created_date between TO_DATE('1/4/2016 12:00:00 AM','MM/DD/YYYY HH:MI:SS AM') and
TO_DATE('1/4/2016 11:59:59 PM','MM/DD/YYYY HH:MI:SS PM')
group by i.id
having count(i.id) = 1
In the ItemSale table there are two entries but in the sale table there is only one. This is fine...but I need to construct a query that will only return to me the one record.
I believe the issue is with the "ANY" portion, the query only returns one row and that row is the record that doesn't meet the "ANY ('TEST', 'HI')" criteria.
But in reality that record with that particular ID has two records in ItemSales.
I need to only return the records that legitimately only have one record.
Any help is appreciated.
--EDIT:
COL1 | ID
-----|-----
2 | 26
3 | 85
1 | 23
1 | 88
1 | 6
1 | 85
What I also do is group them and make sure the count is equal to 1 but as you can see, the ID 85 is appearing here as one record which is a false positive because there is actually two records in the itemsales table.
I even tried changing my query to j.id after the select since j is the table with the two records but no go.
--- EDIT
Sale table contains:
ID
---
85
Itemsales table contains:
ID | Position | item_id
---|----------|---------
85 | 1 | 6
85 | 2 | 7
Items table contains:
item_id | code
--------|------
7 | HI
6 | BOOP
The record it is returning is the one with the Code of 'BOOP'
Thanks,

"I need to only return the records that legitimately only have one record."
I interpret this to mean, you only want to return SALES with only one ITEM. Furthermore you need that ITEM to meet your additional criteria.
Here's one approach, which will work fine with small(-ish) amounts of data but may not scale well. Without proper table descriptions and data profiles it's not possible to offer a performative solution.
with itmsal as
( select sales.id
from itemsales
join sales on sales.id = itemsales.id
where sales.created_date >= date '2016-01-04'
and sales.created_date < date '2016-01-05'
group by sales.id having count(*) = 1)
select sales.*
, item.*
from itmsal
join sales on sales.id = itmsal.id
join itemsales on itemsales.id = itmsal.id
join items on itemsales.item = itemsales.item
where items.code in ('TEST','HI')

I think you are trying to restrict the results so that items MUST ONLY have the code of 'TEST' or 'HI'.
select
sales.*
from (
select
s.id
from Sales s
inner join Itemsales itss on s.id = itss.id
inner join Items i on itss.item_id = i.item_id
group by
s.id
where s.created_date >= date '2016-01-04'
and s.created_date < date '2016-01-05'
having
sum(case when i.code IN('TEST','HI') then 0 else 1 end) = 0
) x
inner join sales on x.id = sales.id
... /* more here as required */
This construct only returns sales.id that have items with ONLY those 2 codes.
Note it could be done with a common table expression (CTE) but I prefer to only use those when there is an advantage in doing so - which I do not see here.

If I get it correctly this may work (not tested):
select *
from sales s
inner join (
select i.id, count( i.id ) as cnt
from sales i
inner join itemssales j on i.id = j.id
inner join item u on j.item = u.item and u.code IN ('TEST','HI')
where i.created_date between TO_DATE('1/4/2016 12:00:00 AM','MM/DD/YYYY HH:MI:SS AM') and
TO_DATE('1/4/2016 11:59:59 PM','MM/DD/YYYY HH:MI:SS PM')
group by i.id
) sj on s.id = sj.id and sj.cnt = 1

Related

Optimize a complex PostgreSQL Query

I am attempting to make a complex SQL join on several tables: as shown below. I have included an image of the dB schema also.
Consider table_1 -
e_id name
1 a
2 b
3 c
4 d
and table_2 -
e_id date
1 1/1/2019
1 1/1/2020
2 2/1/2019
4 2/1/2019
The issue here is performance. From the tables 2 - 4 we only want the most recent entry for a given e_id but because these tables contain historical data (~ >3.5M rows) it's quite slow. I've attached an example of how we're currently trying to achieve this but it only includes one join of 'table_1' with 'table_x'. We group by e_id and get the max date for it. The other way we've thought about doing this is creating a Materialized View and pulling data from that and refreshing it after some period of time. Any improvements welcome.
from fds.region as rg
inner join (
select e_id, name, p_id
from fds.table_1
where sec_type = 'S' AND active_flag = 1
) as table_1 on table_1.e_id = rg.e_id
inner join fds.table_2 table_2 on table_2.e_id = rg.e_id
inner join fds.sec sec on sec.p_id = table_1.p_id
inner join fds.entity ent on ent.int_entity_id = sec.int_entity_id
inner join (
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4 on table_4.e_id = rg.e_id
where rg.region_str like '%US' and ent.sec_type = 'P'
order by table_2.int_price
limit 500;
You can simplify this logic:
(
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4
To:
(SELECT DISTINCT ON (int_1.e_id) int_1.*
FROM fds.table_4 int_1
ORDER BY int_1.e_id, int_1.date DESC
) table_4
This can take advantage of an index on fds.table_4(e_id, date desc) -- and might be wicked fast with such an index.
You also want appropriate indexes for the joins and filtering. However, it is hard to be more specific without an execution plan.

Grouping the data and showing 1 row per group in postgres

I have two tables which look like this :-
Component Table
Revision Table
I want to get the name,model_id,rev_id from this table such that the result set has the data like shown below :-
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
XYZ 4567
Here the data is grouped by name,model_id and only 1 data for each group is shown which has the highest value of created_at.
I am using the below query but it is giving me incorrect result.
SELECT cm.name,cm.model_id,r.created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id,r.created_at
ORDER BY cm.name asc,
r.created_at DESC;
Result :-
Anyone's help will be highly appreciated.
use max and sub-query
select T1.name,T1.model_id,r.rev_id,T1.created_at from
(
select cm.name,
cm.model_id,
MAX(r.created_at) As created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id
) T1
left join revision r
on T1.created_at =r.created_at
http://www.sqlfiddle.com/#!17/68cb5/4
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
xyz 4567
In your SELECT you're missing rev_id
Try this:
SELECT
cm.name,
cm.model_id,
MAX(r.rev_id) AS rev_id,
MAX(r.created_at) As created_at
from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by 1,2
ORDER BY cm.name asc,
r.created_at DESC;
What you were missing is the statement to say you only want the max record from the join table. So you need to join records, but the join will bring in all records from table r. If you group by the 2 columns in component, then select the max from r, on the id and created date, it'll only pick the top out the available to join
I would use distinct on:
select distinct on (m.id) m.id, m.name, r.rev_id, r.created_at
from model m left join
revision r
on m.model_id = r.model_id
order by m.id, r.rev_id;

how to query the value that has changed +/- 10% of the value from first encounter of each patient in sql server?

I have this query that finds users with multiple hospital visits.
Table has about 593 columns, so I don't think I can show you the structure. But let's assume these are basic patients table with following columns.
id, sex, studyDate, referringPhysician, bmi, bsa, height, weight, bloodPressure, heartRate. These are also in the real table.
The patient visits the hospital and has some worked done. What we would like to find is how much of patient's bmi has changed since the first encounter. For example,
ID |SEX| StudyDate | Physician|BMI| BSA | ht| Wt | BP | HR |
1 PatientA | M | 2017-09-11 | Dr. Hale | 60| 2.03 | 6 | 282 | 116/82 | 77 |
2 PatientA | M | 2017-12-11 | Dr. Hale | 58| 2.03 | 6 | 296 | 126/82 | 72 |
3 PatientA | M | 2018-03-17 | Dr. Hale | 50| 2.03 | 6 | 282 | 126/82 | 72 |
In the example above, row 1 was the first encounter and the BMI was 60. In row 2, the bmi decreased to 58, but it's not more than 10%. So, that shouldn't be displayed. However, row 3 has bmi 50 which is decreased by more than 10% of bmi in row 1. That should be displayed.
I'm sorry, I don't have the data that I can share.
with G as(
select * from Patients P
inner join (
select count(*) as counts, ID as oeID
from Patients
group by ID
Having count(*) > 2
) oe on P.ID = oe.oeID where P.BMI > 30
)
select * from G
order by StudyDate asc;
From this, what I'd like to do is find out patients whose BMI has changed by 10% from the first encounter.
How can I do this?
Can you also help me understand the concept of for-each users in SQL, and how it handles such queries?
Guessing at your data model here...I suspect you've got a heavily denormalized structure here with everything being crammed into one table. You would be far better off to have a patient table separate from the table that stores their visits. The with G syntax here is very unneeded as well, especially if you are just doing a select * from the table after. Heh, I'm trying to get into medical analytics, so will give this a try.
I'll build this as I see your data model...you may have to change a step here and there to fit your column names. Lets start by getting first and most recent (last) visit dates by id
select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate)
Simply query at this point and by using the having clause we ensure that these are two separate visits. But we are lacking the BMI numbers for these visits...so we will have to join back to the patient table to grab them. We will iunclude a where clause to ensure only the +/- of 10% is found
select a.id, a.first_visit, a.last_visi, b.bmi as first_bmi, c.bmi as last_bmi, b.bmi - c.bmi as bmi_change
from
(select id, min(StudyDate) as first_visit, max(studydate) as last_visit
from patients
group by id
having min(StudyDate) <> max(StudyDate) a
inner join patients b on b.id = a.id and b.study_date = a.first_visit
inner join patients c on c.id = a.id and c.study_date = a.last_visit
where b.bmi - c.bmi >= 10 or b.bmi - c.bmi <= -10
Hopefully that makes sense, you'll want to change the top select line to grab all the fields you actually want to return, I'm just returning the ones of interest to your question
Part 2:
Lets approach this from a similar angle:
select id, min(StudyDate) as first_visit
from patients
group by id
Now we've got the first visit date. Lets join back to patients and get the bmi here.
select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id
This will simply be a list of each patient by ID giving us their first_visit date and their BMI on that first visit. Now we want to compare this bmi to all subsequent visits...so lets join all rows to back to this query. Subquery a below is simply the query above in brackets:
select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10
Similar idea, instead of joining on the max_date to get most recent, we are joining to all records for that patient and running the math from there. In the commented example, this will give rows 3,5,6.
Part 3
A little more complex...getting rows 3,4,5,6 when row 4 shows less than a 10 change in BMI means you are now trying to pick out the first date that the 10 change is seen and displaying all records from that. Lets call the query in part 2 subquery a and go pseudo code for a moment:
Select id, min(studydate)
from (subquerya) a
(subquerya) simply stands for the entire query used at the end of part 2. This will grab the study date of the first time a bmi change of over 10 is detected for each patient id (in our comment example, it would be visit 3). Now we can join back to patients, this time getting all records that are equal to or more recent than the min(studydate) of the first time bmi changed more than 10 since the first visit
select a.id, b.studydate, b.bmi
from
(Select id, min(studydate) as min_studydate
from (subquerya) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
This will bring back the list of all study dates happening after the first time a bmi change more than 10 was detected (3,4,5,6 from our comment example). Of course we've now lost the first study date's bmi value, so lets add that back in and bring the query all together.
select a.id, b.studydate, b.bmi, c.bmi as start_bmi, c.bmi - b.bmi as bmi_change
from
(Select id, min(studydate) as min_studydate
from ( select a.id, a.first_visit, b.study_date, a.bmi, b.bmi, a.bmi-b.bmi as bmi_change
from
(select a.id, first_visit, b.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) a
inner join patients b on a.id = b.id
where a.bmi - b.bmi >= 10 or a.bmi - b.bmi <= -10) a) a
inner join patients b on a.id = b.id and a.min_studydate <= b.studydate
inner join (select a.id, first_visit, p.bmi
from
(select id, min(StudyDate) as first_visit
from patients
group by id) a
inner join patients b on a.first_visit = b.studydate and a.id = b.id) c on c.id = a.id
If I have everything right, this should bring back rows 3,4,5,6 and the change in BMI across each visit. I've left a few more columns in there than need be and it could be cleaned a little, but all logic should be there. I don't have

pad database out with NULL criteria

If I have the following sample table (order by ID)
ID Date Type
-- ---- ----
1 01/01/2000 A
2 22/04/1995 A
2 14/02/2001 B
Where you can immediate see that ID=1 does not have a Type=B, but ID=2 does. What I want to do, if fill in a line to show this:
ID Date Type
-- ---- ----
1 01/01/2000 A
1 NULL B
2 22/04/1995 A
2 14/02/2001 B
where there could potentially be 100's of different types, (so may need to end up inserting 100's rows per person if they lack 100's Types!)
Is there a general solution to do this?
Could I possibly outer join the table on itself and do it that way?
You can do this with a cross join to generate all the rows and a left join to get the actual data values:
select i.id, s.date, t.type
from (select distinct id from sample) i cross join
(select distinct type from sample) t left join
sample s
on s.id = i.id and
s.type = t.type;

SQL Syntax Issue with getting sum

Ok I have two tables.
Table IDAssoc has the columnsbill_id, year, area_id.
Table Bill has the columns bill_id, year, main_id, and amount_due.
I'm trying to get the sum of the amount_due column from the bill table for each of the associated area_ids in the IDAssoc table.
I'm doing a select statement to select the sum and joining on the bill_ids. How can I set this up so it will have a single row for each of the associated bills in each area_id from the assoc table. There may be three or four bill_ids associated with each area_id and I need those summed for each and returned so I can use this select in another statement. I have a group by set up for the area_id but it still is returning each row and not summing them up for each area_id. I have the year and main_id specified already in the where clause to return the data that I want, but I can't get the sum to work properly. Sorry I'm still learning and I'm not sure how to do this. Thanks!
Edit- Basically the query I'm trying so far is basically just like the one posted below:
select a.area_id, sum(b.amount_due)
from IDAssoc a
inner join Bill b
on a.bill_id = b.bill_id
where Bill.year = 2006 and bill.bill_id = 11111
These are just arbitrary numbers.
The data this is returning is like this:
amount_due - area_id
.05 1003
.15 1003
.11 1003
65 1004
55 1004
I need one row returned for each area_id with the amount_due summed. The area_id is only in the assoc table and not in the bill table.
select a.area_id, sum(b.amount_due)
from IDAssoc a
inner join Bill b
on a.bill_id = b.bill_id
where b.year = 2006 and b.bill_id = 11111
group by a.area_id
You might want to change inner join to left join if one IDAssoc can have many or no Bill:
select a.area_id, coalesce(sum(b.amount_due),0)
from IDAssoc a
left join Bill b
on a.bill_id = b.bill_id
where b.year = 2006 and b.bill_id = 11111
group by a.area_id
You are missing the GROUP BY clause:
SELECT a.area_id, SUM(b.amount_due) TotalAmount
FROM IDAssoc a
LEFT JOIN Bill b
ON a.bill_id = b.bill_id
GROUP BY a.area_id