How to Select * Where Everything is Distinct Except One Field

How to Select * Where Everything is Distinct Except One Field - sql

I'm trying to pull 6 records using the code below but there are some cases where the information is updated and therefore it is pulling duplicate records.
My code:
SELECT column2, count(*) as 'Count'
FROM ServiceTable p
join HIERARCHY h
on p.LOCATION_CODE = h.LOCATION
where Report_date between '2017-04-01' and '2017-04-30'
and Column1 = 'Issue '
and LOCATION = '8789'
and
( record_code = 'INCIDENT' or
(
SUBMIT_METHOD = 'Web' and
not exists
(
select *
from ServiceTable p2
where p2.record_code = 'INCIDENT'
and p2.incident_id = p.incident_id
)
)
)
The problem is that instead of the six records it is pulling eight. I would just use distinct * but the file_date is different on the duplicate entries:
FILE_DATE Incident_ID Column1 Column2
4/4/17 123 Issue Service - Red
4/4/17 123 Issue Service - Blue
4/5/17 123 Issue Service - Red
4/5/17 123 Issue Service - Blue
The desired output is:
COLUMN2 COUNT
Service - Red 1
Service - Blue 1
Any help would be greatly appreciated! If you need any other info just let me know.

If you turn your original select statement without the aggregation function into a subquery, you can distinct that on your values that are not the changing date, then select a COUNT from there. Don't forget your GROUP BY clause at the end.
SELECT Column2, COUNT(Incident_ID) AS Service_Count
FROM (SELECT DISTINCT Incident_ID, Column1, Column2
FROM ServiceTable p
JOIN HIERARCHY h ON p.LOCATION_CODE = h.LOCATION
WHERE Report_date BETWEEN '2017-04-01' AND '2017-04-30'
AND Column1 = 'Issue '
AND LOCATION = '8789'
AND
( record_code = 'INCIDENT' or
(
SUBMIT_METHOD = 'Web' and
NOT EXISTS
(
SELECT *
FROM ServiceTable p2
WHERE p2.record_code = 'INCIDENT'
AND p2.incident_id = p.incident_id)
)
)
)
GROUP BY Column2
Also, if you are joining tables it is a good practice to fully qualify the field you are selecting. Example: p.Column2, p.Incident_ID, h.LOCATION. That way, even your distinct fields are easier to follow where they came from and how they relate.
Finally, don't forget that COUNT is a reserved word. I modified your alias accordingly.

If you are using an aggregation function (count), you should use group by for the column not in the aggregation function:
SELECT column2, count(*) as 'Count'
FROM ServiceTable p
join HIERARCHY h
on p.LOCATION_CODE = h.LOCATION
where Report_date between '2017-04-01' and '2017-04-30'
and Column1 = 'Issue '
and LOCATION = '8789'
and
( record_code = 'INCIDENT' or
(
SUBMIT_METHOD = 'Web' and
not exists
(
select *
from ServiceTable p2
where p2.record_code = 'INCIDENT'
and p2.incident_id = p.incident_id
)
)
)
group by column2

Related

Query error: Column name ICUSTAY_ID is ambiguous. Using multiple subqueries in BigQuery

Hi, I receive the following query error "Query error: Column name ICUSTAY_ID is ambiguous" referred to the third last line of code (see the following code). Please can you help me? Thank you so much!
I am an SQL beginner..
WITH t AS
(
SELECT
*
FROM
(
SELECT *,
DATETIME_DIFF(CHARTTIME, INTIME, MINUTE) AS pi_recorded
FROM
(
SELECT
*
FROM
(
SELECT * FROM
(SELECT i.SUBJECT_ID, p.dob, i.hadm_id, p.GENDER, a.ETHNICITY, a.ADMITTIME, a.INSURANCE, i.ICUSTAY_ID,
i.DBSOURCE, i.INTIME, DATETIME_DIFF(a.ADMITTIME, p.DOB, DAY) AS age,
CASE
WHEN DATETIME_DIFF(a.ADMITTIME, p.DOB, DAY) <= 32485
THEN 'adult'
WHEN DATETIME_DIFF(a.ADMITTIME, p.DOB, DAY) > 32485
then '>89'
END AS age_group
FROM `project.mimic3.ICUSTAYS` AS i
INNER JOIN `project.mimic3.PATIENTS` AS p ON i.SUBJECT_ID = p.SUBJECT_ID
INNER JOIN `project.mimic3.ADMISSIONS` AS a ON i.HADM_ID = a.HADM_ID)
WHERE age >= 6570
) AS t1
LEFT JOIN
(
SELECT ITEMID, ICUSTAY_ID, CHARTTIME, VALUE, FROM `project.mimic3.CHARTEVENTS`
WHERE ITEMID = 551 OR ITEMID = 552 OR ITEMID = 553 OR ITEMID = 224631
OR ITEMID = 224965 OR ITEMID = 224966
) AS t2
ON t1.ICUSTAY_ID = t2.ICUSTAY_ID
)
)
WHERE ITEMID IN (552, 553, 224965, 224966) AND pi_recorded <= 1440
)
SELECT ICUSTAY_ID #### Query error: Column name ICUSTAY_ID is ambiguous
FROM t
GROUP BY ICUSTAY_ID;

Both t1 and t2 have a column called ICUSTAY_ID. When you join them together into a single dataset you end up with 2 columns with the same name - which obviously can't work as there would be no way of uniquely identify each column.
You need to alias these columns in you code or not include one or the other if you don't need both

Select only the records with same values

I am working on a SQL statement that will become a part of a view. What I need is to extract only the records that have the same unique key twice. The query looks like below right now.
select distinct
rscmaster_no_in, rsc_no_in, calendar_year, calendar_month,
Wstat_Abrv_Ch,
h.Wstat_no_in, Staffing_Calendar_Date, payhours,
l.OTStatus
from
vw_all_ts_hire h
left join
MCFRS_OTStatus_Lookup l on l.wstat_no_in = h.Wstat_no_in
where
rscmaster_no_in in (select rscmaster_no_in from vw_rsc_ECC_splty)
and Wstat_Abrv_Ch <> ''
and h.Wstat_no_in in (103, 107)
and l.OTStatus in ('ECCOTRemove', 'ECCOTSignup')
and Staffing_Calendar_Date = '2020-11-01' -- only for the testing purposes. Will be removed later.
order by
RscMaster_no_in
The result I get from the query above is:
I need to modify the SQL statement so that the end result is like below:
How can I modify the above statement to spit out the end result like that?

Use the analytic count(*) over () function.
with cte as (
select
count(*) over (partition by YourUniqueKey) as MyRowCount
{rest of your query}
)
select *
from cte
where MyRowCount = 2;

This should give you the results you want (performance is dependent on indexes/table design).
This takes your core logic and puts it into a sub select that only returns records that have a count > 1.
Then use those ID's to select all the data you need but only for those ID's that are in the sub select with count > 1
select distinct rscmaster_no_in,rsc_no_in, calendar_year, calendar_month,
Wstat_Abrv_Ch, h.Wstat_no_in, Staffing_Calendar_Date, payhours ,l.OTStatus
from vw_all_ts_hire h
left join MCFRS_OTStatus_Lookup l on l.wstat_no_in = h.Wstat_no_in
WHERE rscmaster_no_in IN (
SELECT rscmaster_no_in
from vw_all_ts_hire h
left join MCFRS_OTStatus_Lookup l on l.wstat_no_in = h.Wstat_no_in
where rscmaster_no_in in (select rscmaster_no_in from vw_rsc_ECC_splty)
and Wstat_Abrv_Ch <> ''
and h.Wstat_no_in in (103, 107)
and l.OTStatus in ('ECCOTRemove', 'ECCOTSignup')
and Staffing_Calendar_Date = '2020-11-01' -- only for the testing purposes. Will be removed later.
GROUP BY rscmaster_no_in
HAVING COUNT(*) > 1
)
order by RscMaster_no_in

You can use COUNT(*) OVER () window function such as
SELECT *
FROM
(
SELECT COUNT(*) OVER (PARTITION BY rscmaster_no_in) AS cnt,
t.*
FROM tab t
) t
WHERE cnt>1
AND OTStatus = 'ECCOTRemove'

This may help you :
select * from (
select distinct
rscmaster_no_in, rsc_no_in, calendar_year, calendar_month,
Wstat_Abrv_Ch,
h.Wstat_no_in, Staffing_Calendar_Date, payhours,
l.OTStatus,
SELECT COUNT(*) OVER (PARTITION BY rscmaster_no_in) AS uinqueCount
from
vw_all_ts_hire h
left join
MCFRS_OTStatus_Lookup l on l.wstat_no_in = h.Wstat_no_in
where
rscmaster_no_in in (select rscmaster_no_in from vw_rsc_ECC_splty)
and Wstat_Abrv_Ch <> ''
and h.Wstat_no_in in (103, 107)
and l.OTStatus in ('ECCOTRemove', 'ECCOTSignup')
and Staffing_Calendar_Date = '2020-11-01' -- only for the testing purposes. Will be removed later.
) innerReult
where uinqueCount=2 --Or uinqueCount>1 base on your business
order by
RscMaster_no_in

How to Apply Conditional Logic to Where Statement

I can't seem to figure out how to set the logic up for my particular problem. I'm trying to count the number of times the word "Service" appears but only when the RECORD_CODE is INCIDENT. When the RECORD is INCIDENT-UPDATE, it is normally already somewhere else as an INCIDENT so I exclude them to keep from duplicating my data.
However, there are a small number of cases where the SUBMIT_METHOD is "WEB" and the only record is an INCIDENT_UPDATE and I cannot figure out how to look only where the RECORD = 'INCIDENT' unless the particular record has a SUBMIT_METHOD of "WEB" and there is no record for that report # with a RECORD of INCIDENT. It could be a simple problem and I'm just overthinking it but I cannot think of how to do it. Any help would be GREATLY appreciated!
My query:
SELECT column2, count(*) as 'COUNT'
from Service.Table
where date between '1/1/17' and '1/31/17'
and column1 = 'Issue'
and RECORD = 'INCIDENT'
group by column2
Sample of the data:
REPORT # RECORD SUBMIT_METHOD SUBMIT_DATE COLUMN2
1234 Incident Web 1/1/2017 Service
1234 Incident-Update Web 1/1/2017 Service
1235 Incident Phone 1/15/2017 Other
1235 Incident-Update Phone 1/15/2017 Other
1236 Incident-Update Web 1/18/2017 Service
The expected output in this case would be:
COLUMN2 COUNT
Service 3
If I can provide any other info just let me know!

You are looking for a group by like
select column2, count(*)
from tbl1
where SUBMIT_METHOD = 'Web'
group by column2;

;With cte(REPORT#,RECORD ,SUBMIT_METHOD,SUBMIT_DATE,COLUMN2)
AS
(
SELECT 1234,'Incident' ,'Web' , '1/1/2017' ,'Service' Union all
SELECT 1234,'Incident-Update' ,'Web' , '1/1/2017' ,'Service' Union all
SELECT 1235,'Incident' ,'Phone', '1/15/2017', 'Other' Union all
SELECT 1235,'Incident-Update' ,'Phone', '1/15/2017', 'Other' Union all
SELECT 1236,'Incident-Update' ,'Web' , '1/18/2017', 'Service'
)
SELECT COLUMN2
,CountCOLUMN2
FROM (
SELECT *
,COUNT(COLUMN2) OVER (
PARTITION BY COLUMN2 ORDER BY COLUMN2
) CountCOLUMN2
,ROW_NUMBER() OVER (
PARTITION BY COLUMN2 ORDER BY COLUMN2
) Seq
FROM cte
) Dt
WHERE SUBMIT_DATE BETWEEN '1/1/17'
AND '1/31/17'
AND RECORD = 'INCIDENT'
ORDER BY 1 DESC
OutPut
COLUMN2 CountCOLUMN2
--------------------
Service 3
Other 2

You could use not exists subquery to ensure there is no other row with the same Report# and a record type of INCIDENT:
select *
from Service.Table t1
where date between '1/1/17' and '1/31/17' and
column1 = 'Issue' and
(
record = 'Incident' or
(
record = 'Incident-Update' and
submit_method = 'Web' and
not exist
(
select *
from Service.Table t2
where t2.record = 'INCIDENT'
and t2.[Report #] = t1.[Report #]
)
)
)

SQL need to add outerjoin to the query below

In the below SQL:
I need to add two columns in the result
1) Local_code
2)Local_CPTY_SYS_ID,
which are in HSBC_LOCAL_INVOL_PARTY table.
So far I have tried to add
select local_code from HSBC_LOCAL_INVOL_PARTY
h join t_cdr T2
on T2.counterparty_new = h.entity_code
but that doesn't work. It needs an explicit outer join in the end. Please help
SELECT
T2.counterparty_new,
T2.bis_entity_type_original,
T2.counterparty_new_desc,
T2.counterparty_new_attribute_6,
T2.method_original,
T2.netting_agreement_reference,
T2.internal_rating_new,
T2.counterparty_type_original,
T2.obligor_grade_new,
T2.pd_pre_floor_new,
T2.pd_new,
T2.lgd,
T2.rwa
from t_cdr T2,
(
SELECT * FROM (
SELECT
FINAL.FILTER_MARKER,
FINAL.entity_code
FROM (
SELECT
FILTER_POP.entity_code,
FILTER_POP.FILTER_MARKER
FROM (
SELECT
CASE
WHEN CONCAT(Dlgd,unfloored_lgd) IS NOT NULL
THEN 'EXCLUDE'
WHEN CONCAT(Dlgd,unfloored_lgd) IS NULL
THEN 'INCLUDE'
END AS FILTER_MARKER,
entity_code,
Dlgd,
unfloored_lgd
FROM
HSBC_LOCAL_INVOL_PARTY
WHERE
((HSBC_LOCAL_INVOL_PARTY.entity_code) NOT LIKE '%DUM%')
AND
((HSBC_LOCAL_INVOL_PARTY.entity_code) NOT LIKE '%HSBC%')
) FILTER_POP
GROUP BY
FILTER_POP.entity_code,
FILTER_POP.FILTER_MARKER) FINAL
GROUP BY
FINAL.FILTER_MARKER,
FINAL.entity_code
ORDER BY
FINAL.entity_code)
PIVOT
(
COUNT(FILTER_MARKER)
FOR FILTER_MARKER IN ('INCLUDE' AS INCLUDE,'EXCLUDE' AS EXCLUDE)
)
WHERE INCLUDE = 1 AND EXCLUDE = 0
) ENTITY_FILTER
WHERE ENTITY_FILTER.entity_code = T2.counterparty_new
AND T2.method_original = 'ADV'
ORDER BY T2.rwa DESC

Solved it: Look at the last few lines. Took a while but optimized it as well for performance.
SELECT
T2.counterparty_new,
T2.bis_entity_type_original,
T2.counterparty_new_desc,
T2.counterparty_new_attribute_6,
T2.method_original,
T2.netting_agreement_reference,
T2.internal_rating_new,
T2.counterparty_type_original,
T2.obligor_grade_new,
T2.pd_pre_floor_new,
T2.pd_new,
T2.lgd,
HSBC_LOCAL_INVOL_PARTY.local_code,
T2.rwa
from t_cdr T2,
(
SELECT * FROM (
SELECT
FINAL.FILTER_MARKER,
FINAL.entity_code
FROM (
SELECT
FILTER_POP.entity_code,
FILTER_POP.FILTER_MARKER
FROM (
SELECT
CASE
WHEN CONCAT(Dlgd,unfloored_lgd) IS NOT NULL
THEN 'EXCLUDE'
WHEN CONCAT(Dlgd,unfloored_lgd) IS NULL
THEN 'INCLUDE'
END AS FILTER_MARKER,
entity_code,
Dlgd,
unfloored_lgd
FROM
HSBC_LOCAL_INVOL_PARTY
WHERE
((HSBC_LOCAL_INVOL_PARTY.entity_code) NOT LIKE '%DUM%')
AND
((HSBC_LOCAL_INVOL_PARTY.entity_code) NOT LIKE '%HSBC%')
) FILTER_POP
GROUP BY
FILTER_POP.entity_code,
FILTER_POP.FILTER_MARKER) FINAL
GROUP BY
FINAL.FILTER_MARKER,
FINAL.entity_code
ORDER BY
FINAL.entity_code)
PIVOT
(
COUNT(FILTER_MARKER)
FOR FILTER_MARKER IN ('INCLUDE' AS INCLUDE,'EXCLUDE' AS EXCLUDE)
)
WHERE INCLUDE = 1 AND EXCLUDE = 0
) ENTITY_FILTER,HSBC_LOCAL_INVOL_PARTY
WHERE ENTITY_FILTER.entity_code = T2.counterparty_new
AND ENTITY_FILTER.entity_code = HSBC_LOCAL_INVOL_PARTY.entity_code(+)
AND T2.method_original = 'ADV'
ORDER BY T2.rwa DESC

Distinct keyword not fetching results in Oracle

I have the following query where I unique records for patient_id, meaning patient_id should not be duplicate. Each time I try executing the query, seems like the DB hangs or it takes hours to execute, I'm not sure. I need my records to load quickly. Any quick resolution will be highly appreciated.
SELECT DISTINCT a.patient_id,
a.study_id,
a.procstep_id,
a.formdata_seq,
0,
(SELECT MAX(audit_id)
FROM audit_info
WHERE patient_id =a.patient_id
AND study_id = a.study_id
AND procstep_id = a.procstep_id
AND formdata_seq = a.formdata_seq
) AS data_session_id
FROM frm_rg_ps_rg a,
PATIENT_STUDY_STEP pss
WHERE ((SELECT COUNT(*)
FROM frm_rg_ps_rg b
WHERE a.patient_id = b.patient_id
AND a.formdata_seq = b.formdata_seq
AND a.psdate IS NOT NULL
AND b.psdate IS NOT NULL
AND a.psresult IS NOT NULL
AND b.psresult IS NOT NULL) = 1)
OR NOT EXISTS
(SELECT *
FROM frm_rg_ps_rg c
WHERE a.psdate IS NOT NULL
AND c.psdate IS NOT NULL
AND a.psresult IS NOT NULL
AND c.psresult IS NOT NULL
AND a.patient_id = c.patient_id
AND a.formdata_seq = c.formdata_seq
AND a.elemdata_seq! =c.elemdata_seq
AND a.psresult != c.psresult
AND ((SELECT (a.psdate - c.psdate) FROM dual)>=7
OR (SELECT (a.psdate - c.psdate) FROM dual) <=-7)
)
AND a.psresult IS NOT NULL
AND a.psdate IS NOT NULL;

For start, you have a cartesian product with PATIENT_STUDY_STEP (pss).
It is not connected to anything.
select *
from (select t.*
,count (*) over (partition by patient_id) as cnt
from frm_rg_ps_rg t
) t
where cnt = 1
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to Select * Where Everything is Distinct Except One Field - sql

Related

Query error: Column name ICUSTAY_ID is ambiguous. Using multiple subqueries in BigQuery

Select only the records with same values

How to Apply Conditional Logic to Where Statement

SQL need to add outerjoin to the query below

Distinct keyword not fetching results in Oracle

Categories

Resources