Using Order By with Distinct on a Join (PLSQL)

Using Order By with Distinct on a Join (PLSQL) - sql

I have written a join on some tables and I have ordered the data using two levels of ordering - one of which is the primary key of one table.
Now, with this data sorted I want to then exclude any duplicates from my data using an in-line view and the DISTINCT clause - and this is where I am coming unstuck.
I seem to be able to either sort the data OR distinct it, but never both at the same time. Is there a way around this or have I stumbled upon the SQL equivalent of the uncertainty principle?
This code returns the data sorted, but with duplicates
SELECT
ada.source_tab source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 123456
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Tab type 1' THEN 1
WHEN source_tab = 'Tab type 2' THEN 2
ELSE 3
END
,ada.ada_id ASC;
This code removes the duplicates, but I lose the order...
SELECT DISTINCT source_tab, source_col, source_value FROM (
SELECT
ada.source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 123456
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Tab type 1' THEN 1
WHEN source_tab = 'Tab type 2' THEN 2
ELSE 3
END
,ada.ada_id ASC
)
;
If I try and include 'ORDER BY ada_id' at the end of the outer select, I get the error message 'ORA-01791: not a SELECTed expression' which is infuriating me!!

Why don't you include ada_id at the selected fields of the outer query?

;WITH CTE AS
(
SELECT
ada.source_tab source_tab
, ada.source_col source_col
, ada.source_value source_value
, ada.ada_id ada_id
, ROW_NUMBER() OVER (PARTITION BY [COLUMNS_YOU_WANT TO BE DISTINCT]
ORDER BY [your_columns]) rn
FROM
are_aud_data ada
, are_aud_exec_checks aec
, are_audit_elements ael
WHERE
aec.aec_id = ada.aec_id
AND ael.ano_id = aec.ano_id
AND aec.acn_id = 356441
AND ael.ael_type = 1
ORDER BY
CASE
WHEN source_tab = 'Licensed Inventory' THEN 1
WHEN source_tab = 'CMDB' THEN 2
ELSE 3
END
,ada.ada_id ASC
)
select * from CTE WHERE rn<2

it seems that the ada_id is meaningless in the outer query.
you have removed all those values to boil it down to the distinct source_tab and source_col...
what would you expect the order to be?
you want maybe the minimum ada_id for each table and column set to be the driver for the order - (although the table name seems appropriate to me)
include the minimum ada_id in the inner query (you'll need a group by clause)
then reference that in the outer query and sort on it.

Related

Update failing while using dense_rank and row_number

Here is the sample data of the two tables , just put together for easy reference
I want the upper part of the table [Outbound].[dbo].[Encounter_Out_P] with column "277CA_FILENAME","277CA_FILENAME2","277CA_FILENAME3","277CA_FILENAME4" as NULLS which are sorted by File_Submitted_DT ascending order to be updated with "277FileId" values of the lower table [Outbound].[dbo].[Encounter_Out_277_P] which are sorted by EDIFECSProcessDate in ascending order. Thanks in advance
Here is my code
WITH
cte_2771 AS (
SELECT
"277CA_FILENAME",
File_Submitted_DT,
TRN02_PatientControlNumber
FROM (
SELECT
I."277CA_FILENAME",
I.File_Submitted_DT,
#cte_277.TRN02_PatientControlNumber
,dense_rank() OVER(PARTITION BY #cte_277.TRN02_PatientControlNumber ORDER BY ABS(DATEDIFF(MINUTE, i.File_Submitted_DT, #cte_277.EDIFECSProcessDate)) ASC) rw1
FROM
[Outbound].[dbo].[Encounter_Out_P] I
INNER JOIN #cte_277 ON I.EncounterID = #cte_277.TRN02_PatientControlNumber
--WHERE EncounterID = 'AP230120920712808806'
)t
WHERE
t.rw1 = 1
)
,
cte_2772 AS (
SELECT
"277FileId"
,1 + ((ROW_NUMBER() OVER(PARTITION BY TRN02_PatientControlNumber ORDER BY EDIFECSProcessDate,File_Submitted_DT ASC ) - 1) % 4)rw2
,TRN02_PatientControlNumber
FROM (
SELECT DISTINCT
p."277FileId",
p.EDIFECSProcessDate,
p.TRN02_PatientControlNumber
,p.ID
,cte_2771.File_Submitted_DT
FROM [Outbound].[dbo].[Encounter_Out_277_P] p
INNER JOIN cte_2771 ON cte_2771.File_Submitted_DT < p.EDIFECSProcessDate
WHERE
p.TRN02_PatientControlNumber = cte_2771.TRN02_PatientControlNumber
) t
)
UPDATE cte_2771
SET "277CA_FILENAME" =
COALESCE(cte_2771."277CA_FILENAME", cte_2772."277FileId" )
FROM cte_2771 INNER JOIN cte_2772
ON cte_2772.TRN02_PatientControlNumber = cte_2771.TRN02_PatientControlNumber
WHERE cte_2772.rw2 = 1
I want the output to be like below, (the upperpart) just put together for easy reference
Notes
I have posted the code for "277CA_FILENAME" only, since it is the same for the rest by changing the WHERE condition changes as "WHERE cte_2772.rw2 = 2,3,4"
if I Uncomment the --WHERE EncounterID = 'AP230120920712808806' in the cte_2771 , it is working perfectly, but if I comment it and run for the entire load, one row gets correct and the other one gets NULL

Combine multiple boolean columns into a single column

I am generation reports from an ERP system where users are provided with a check box which return a boolean value for each item selected. The database is hosted on SQL Server.
However, users can select Contracts with other values as well, as shown below.
I would like to capture the Categories as a single column and I don't mind having duplicate rows in the view. I would like the first row to return Contract and the second the other value selected, for the same Reference ID.

You can use apply :
select distinct t.*, tt.category
from t cross apply
( values ('Contracts', t.Contracts),
('Tender', t.Tender),
('Waiver', t.Waiver),
('Quotation', t.Quotation)
) tt(category, flag)
where flag = 1;

I guess a straightforward way is:
select *, 'Contract' as [Category] from [TableOne] where [Contract] = 1
union all select *, 'Tender' as [Category] from [TableOne] where [Tender] = 1
union all select *, 'Waiver' as [Category] from [TableOne] where [Waiver] = 1
union all select *, 'Quotation' as [Category] from [TableOne] where [Quotation] = 1
union all select *, '(none)' as [Category] from [TableOne] where [Contract]+[Tender]+[Waiver]+[Quotation] = 0
order by [Reference ID]
Note that the last line is put there just in case you need to handle the all-zero case.

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!

I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2

Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

SQL Server UDF array inputs and outputs

I have a set of columns CODE_1-10, which contain diagnostic codes. I want to create a set of variables CODE_GROUP_1-17, which indicate whether or not one of some particular set of diagnostic codes matches any of the CODE_1-10 variables. For example, CODE_GROUP_1 = 1 if any of CODE_1-10 match either '123' or '456', and CODE_GROUP_2 = 1 if any of CODE_1-10 match '789','111','333','444' or 'foo'.
Here's an example of how you could do this using values constructors.
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('123', '456')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_1,
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('789','111','333','444','foo')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_2
I am wondering if there is another way to do this that is more efficient. Is there a way to make a CLR UDF that takes an array of CODE_1-10, and outputs a set of columns CODE_GROUP_1-17?

You could at least avoid the repetition of FROM (VALUES ...) like this:
SELECT
CODE_GROUP_1 = COUNT(DISTINCT CASE WHEN val IN ('123', '456') THEN 1 END),
CODE_GROUP_2 = COUNT(DISTINCT CASE WHEN val IN ('789','111','333','444','foo') THEN 1 END),
...
FROM
(
VALUES
(CODE_1),
(CODE_2),
(CODE_3),
(CODE_4),
(CODE_5),
(CODE_6),
(CODE_7),
(CODE_8),
(CODE_9),
(CODE_10)
) AS value(val)
If CODE_1, CODE_2 etc. are column names, you can use the above query as a derived table in CROSS APPLY:
SELECT
...
FROM
dbo.atable -- table containing CODE_1, CODE_2 etc.
CROSS APPLY
(
SELECT ... -- the above query
) AS x
;

Can you create 2 new tables with the columns appended as rows? So one table would be dxCode with a source column if you need to retain the 1-10 value and the dx code and whatever key field(s) you need, the other table would be dxGroup with your 17 groups, the source groupID if you need it, and your target dx values.
Then to determine which codes are in which groups, you can join on your dx fields.

speed up SQL Query

I have a query which is taking some serious time to execute on anything older than the past, say, hours worth of data. This is going to create a view which will be used for datamining, so the expectations are that it would be able to search back weeks or months of data and return in a reasonable amount of time (even a couple minutes is fine... I ran for a date range of 10/3/2011 12:00pm to 10/3/2011 1:00pm and it took 44 minutes!)
The problem is with the two LEFT OUTER JOINs in the bottom. When I take those out, it can run in about 10 seconds. However, those are the bread and butter of this query.
This is all coming from one table. The ONLY thing this query returns differently than the original table is the column xweb_range. xweb_range is a calculated field column (range) which will only use the values from [LO,LC,RO,RC]_Avg where their corresponding [LO,LC,RO,RC]_Sensor_Alarm = 0 (do not include in range calculation if sensor alarm = 1)
WITH Alarm (sub_id,
LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm) AS (
SELECT sub_id, LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm
FROM dbo.some_table
where sub_id <> '0'
)
, AddRowNumbers AS (
SELECT rowNumber = ROW_NUMBER() OVER (ORDER BY LO_Avg)
, sub_id
, LO_Avg, LO_Sensor_Alarm
, LC_Avg, LC_Sensor_Alarm
, RO_Avg, RO_Sensor_Alarm
, RC_Avg, RC_Sensor_Alarm
FROM Alarm
)
, UnPivotColumns AS (
SELECT rowNumber, value = LO_Avg FROM AddRowNumbers WHERE LO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, LC_Avg FROM AddRowNumbers WHERE LC_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RO_Avg FROM AddRowNumbers WHERE RO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RC_Avg FROM AddRowNumbers WHERE RC_Sensor_Alarm = 0
)
SELECT rowNumber.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
, COALESCE(range1.range, range2.range) AS xweb_range
FROM AddRowNumbers rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = MAX(value) - MIN(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) > 1) range1 ON range1.rowNumber = rowNumber.rowNumber
LEFT OUTER JOIN (SELECT rowNumber, range = AVG(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) = 1) range2 ON range2.rowNumber = rowNumber.rowNumber
INNER JOIN dbo.some_table cds
ON rowNumber.sub_id = cds.sub_id

It's difficult to understand exactly what your query is trying to do without knowing the domain. However, it seems to me like your query is simply trying to find, for each row in dbo.some_table where sub_id is not 0, the range of the following columns in the record (or, if only one matches, that single value):
LO_AVG when LO_SENSOR_ALARM=0
LC_AVG when LC_SENSOR_ALARM=0
RO_AVG when RO_SENSOR_ALARM=0
RC_AVG when RC_SENSOR_ALARM=0
You constructed this query assigning each row a sequential row number, unpivoted the _AVG columns along with their row number, computed the range aggregate grouping by row number and then joining back to the original records by row number. CTEs don't materialize results (nor are they indexed, as discussed in the comments). So each reference to AddRowNumbers is expensive, because ROW_NUMBER() OVER (ORDER BY LO_Avg) is a sort.
Instead of cutting this table up just to join it back together by row number, why not do something like:
SELECT cds.sub_id
, cds.equipment_id
, cds.read_time
, cds.LC_Avg
, cds.LC_Dev
, cds.LC_Ref_Gap
, cds.LC_Sensor_Alarm
, cds.LO_Avg
, cds.LO_Dev
, cds.LO_Ref_Gap
, cds.LO_Sensor_Alarm
, cds.RC_Avg
, cds.RC_Dev
, cds.RC_Ref_Gap
, cds.RC_Sensor_Alarm
, cds.RO_Avg
, cds.RO_Dev
, cds.RO_Ref_Gap
, cds.RO_Sensor_Alarm
--if the COUNT is 0, xweb_range will be null (since MAX will be null), if it's 1, then use MAX, else use MAX - MIN (as per your example)
, (CASE WHEN stats.[Count] < 2 THEN stats.[MAX] ELSE stats.[MAX] - stats.[MIN] END) xweb_range
FROM dbo.some_table cds
--cross join on the following table derived from values in cds - it will always contain 1 record per row of cds
CROSS APPLY
(
SELECT COUNT(*), MIN(Value), MAX(Value)
FROM
(
--construct a table using the column values from cds we wish to aggregate
VALUES (LO_AVG, LO_SENSOR_ALARM),
(LC_AVG, LC_SENSOR_ALARM),
(RO_AVG, RO_SENSORALARM),
(RC_AVG, RC_SENSOR_ALARM)
) x (Value, Sensor_Alarm) --give a name to the columns for _AVG and _ALARM
WHERE Sensor_Alarm = 0 --filter our constructed table where _ALARM=0
) stats([Count], [Min], [Max]) --give our derived table and its columns some names
WHERE cds.sub_id <> '0' --this is a filter carried over from the first CTE in your example

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using Order By with Distinct on a Join (PLSQL) - sql

Why don't you include ada_id at the selected fields of the outer query?

Related

Update failing while using dense_rank and row_number

Combine multiple boolean columns into a single column

How to pivot two rows into two columns

SQL Server UDF array inputs and outputs

speed up SQL Query

Categories

Resources