How to check if two cells have the same value? - sql

I am using PSQL to query a database. I'm using two tables (d_items and chartevents) which are linked using itemid.
The following code
select
subject_id, hadm_id, icustay_id
, di.itemid, di.label
, charttime, storetime
, value, valuenum, valueuom
, error, resultstatus
from chartevents ce
inner join d_items di
on ce.itemid = di.itemid
where subject_id BETWEEN 1 AND 10
and di.itemid in
(
8368, 51
)
order by subject_id, charttime, itemid)
Outputs:
(Link: https://i.imgur.com/trGnwe5.png)
I only want to keep the measurements that include both systolic and diastolic BP. So actually, each (unique) charttime has to have both. How do I achieve this?

You can use window functions or exists. So, here is one way:
with t as (
select subject_id, hadm_id, icustay_id,
di.itemid, di.label,
charttime, storetime,
value, valuenum, valueuom,
error, resultstatus
from chartevents ce inner join
d_items di
on ce.itemid = di.itemid
where subject_id between 1 and 10 and
di.itemid in (8368, 51)
)
select t.*
from (select t.*,
sum( (itemid = 51):: int) over (partition by subject_id, charttime) as cnt_51,
sum( (itemid = 8368):: int) over (partition by subject_id, charttime) as cnt_8368
from t
) t
where cnt_51 > 0 and cnt_8368 > 0
order by subject_id, charttime, itemid;
I am using the itemid to identify the two measurements. You might need to use like on the label.

I am a developer who uses Oracle, but, I think I can provide some concepts to assist you.
Looking at your table, I think table d_items is merely a label to identify the
data with systolic measurements vs. diastolic measurements. So, we can ignore table
d_items.
I think your goal is to display the systolic BP and the diastolic BP on the same record.
What you want to do is to join table chartevents against itself. I assume that subject_id
and chartime will define a unique set of records. Looking at the output columns, it looks
like value and valuenum represent the same data.
Your table join will look something like this:
Select ... systol.value, diastol.value......
from chartevents systol
join chartevents diastol
on (systol.subject_id = diastol.subject_id
and systol.charttime = diastol.charttime)
where ...
I will leave the rest of the work to you to complete the query.

Related

Query keeps giving me duplicate records. How can I fix this?

I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Group by not working to get count of a column with other max record in sql

I have a table named PublishedData, see image below
I'm trying to get the output like, below image
I think you can use a query like this:
SELECT dt.DistrictName, ISNULL(dt.Content, 'N/A') Content, dt.UpdatedDate, mt.LastPublished, mt.Unpublished
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DistrictName ORDER BY UpdatedDate DESC, ISNULL(Content, 'zzzzz')) seq
FROM PublishedData) dt
INNER JOIN (
SELECT DistrictName, MAX(LastPublished) LastPublished, COUNT(CASE WHEN IsPublished = 0 THEN 1 END) Unpublished
FROM PublishedData
GROUP BY DistrictName) mt
ON dt.DistrictName = mt.DistrictName
WHERE
dt.seq = 1;
Because I think you use an order over UpdatedDate, Content to gain you two first columns.
Check out something like this (I don't have your tables, but you will get the idea where to follow with your query):
SELECT DirectName,
MAX(UpdatedDate),
MAX(LastPublished),
(
SELECT COUNT(*)
FROM PublishedData inr
WHERE inr.DirectName = outr.DirectName
AND inr.IsPublished = 0
) AS Unpublished
FROM PublishedData outr
GROUP BY DirectName
We should required a unique identity for that required output in PublishedData Table,Because We can't get the Latest content from given Schema.
If you want data apart from content like DistictName,updatedDate,LastPublishedDate and count of Unpublished records ,Please use Query given below :
select T1.DistrictName,T1.UpdatedDate,T1.LastPublished,T2.Unpublished from
(select DistrictName,Max(UpdateDate) as UpdatedDate,Max(LastPublished) as LastPublished from PublishedData group by DistrictName) T1
inner join
(select DistrictName,count(IsPublished) as Unpublished from PublishedData where isPublished=0 group by DistrictName) T2 ON T1.DistrictName=T2.DistrictName ORDER BY T2.Unpublished DESC

Select latest from joined table excluding duplicates

I have two joined tables, parent one shows unit's name, child shows recording temperatures, that can be inserted either by automatic process (AUTO) or by user. So for given unit reading records from simple join would look like
UNIT TEMP TIMESTAMP DATA_SOURCE
ABC -20 10:26 AUTO
ABC -19 11:27 USER
ABC -19 11:27 AUTO
The goal is to select the latest temp reading. I can use subquery to do so:
SELECT A.UNIT, B.TEMP, B.TIMESTAMP,B.DATA_SOURCE
FROM units_table A left outer join readings_table B on A.Gkey=B.unit_gkey
WHERE B.TIMESTAMP=
(SELECT MAX(TIMESTAMP) FROM readings_table B1
WHERE A.Gkey=B1.unit_gkey)
It would be simple but in the example above there are two exact timestamps, so I will get TWO readings. In such case I'd like to ignore the AUTO source. Is there an elegant way to do it?
Edit: to be clear I want only ONE ROW result:
ABC -19 11:27 USER
You can do this with row_number() instead:
SELECT ut.UNIT, rt.TEMP, rt.TIMESTAMP, rt.DATA_SOURCE
FROM units_table ut left outer join
(SELECT rt.*,
row_number() over (partition by rt.unit_Gkey
order by timestamp desc,
(case when rt.data_source = 'AUTO' then 1 else 0 end)
) as seqnm
FROM readings_table rt
) rt
on rt.unit_Gkey = ut.gkey
WHERE rt.seqnum = 1;
Note: if you wanted the duplicates, you would use rank() or dense_rank() instead of row_number() (and remove the second clause in the order by).
http://www.w3schools.com/sql/sql_distinct.asp
just use the distinct key word look at the example! :)

SQL Grouping even and odd

I have to Group Certain data as so that it comes in 2 sets.
Attached image has details of actal data, expected result and data from query I used.
I am sure i am missing something in group by of max option .Please help
select agrmnt_id ,location_name, slab_no,target_start,target_end, tier_perc ,mod(RANK, 2) col from
(select agrmnt_id ,location_name, slab_no, target as target_start ,LAG(target) OVER (PARTITION BY location_name ORDER BY slab_no DESC)-1 as target_end ,PAY_PREC|| '%' as tier_perc,
DENSE_RANK() over(partition by agrmnt_id order by location_name) RANK
from plb_addnl_slab_details
where agrmnt_id='PLBCAI140262' order by location_name,slab_no
)) group by agrmnt_id,location_name ,slab_no
order by location_name1 ,slab_no1, location_name2 ,slab_no2
If I understand what you want, which is more than a little doubtful, it seems like you are able to generate a list of all the values you want, but you can't get them aligned in two sets? If so I think you need to treat your initial list as a base view and left outer join it to itself, using your col value to decide which is in first set and which in the second.
The criteria for joining seem a bit vague. If I add another ranking to stop the same values appearing twice in the second columns, I can get your expected result with this:
with t as (
select agrmnt_id, location_name, slab_no, target_start, target_end,
tier_perc , mod(col_rnk, 2) col, rnk
from (
select agrmnt_id, location_name, slab_no, target as target_start,
LAG(target) OVER (PARTITION BY location_name
ORDER BY slab_no DESC)-1 as target_end,
SLAB_PERC|| '%' as tier_perc,
DENSE_RANK() over(partition by agrmnt_id order by location_name) col_rnk,
RANK() over(partition by agrmnt_id, slab_no order by location_name) rnk
from plb_addnl_slab_details
where agrmnt_id='PLBCAI140262'
)
)
select t1.agrmnt_id as agrmnt_id_1, t1.location_name as location_name_1,
t1.slab_no as slab_no_1, t1.target_start as target_start_1,
t1.target_end as target_end_1,
t2.agrmnt_id as agrmnt_id_2, t2.location_name as location_name_2,
t2.slab_no as slab_no_2, t2.target_start as target_start_2,
t2.target_end as target_end_2
from t t1
left join t t2 on t2.agrmnt_id = t1.agrmnt_id
and t2.slab_no = t1.slab_no
and t2.rnk = t1.rnk + 1
and t2.col = 0
where t1.col = 1
order by t1.agrmnt_id, t1.location_name, t1.slab_no;
SQL Fiddle. I'm not convinced those join conditions (or the new rank) are quite right but can't really tell without more data, or more information about the logic you want to use. Hopefully this gives you something you can adapt though.