SQL query on denormalized tables - sql

I have these two below mentioned denormalized tables with out any data constraints. Records_audit will not have duplicate audit_id based rows though table doesn't have any constraints.
I will need SQL query to extract all fields of records_audit with an addtional matching column refgroup_Name from second table using matching condition of AuditID from both tables, printedCount greater than 1 and R_status as 'Y'. I tried to do with left join but it is selecting all records.
Can you help to correct my query? I tried with this below query but its selecting all unwanted from second table:
SELECT a.*, d.refgroup_Name
from Records_audit a
left join Patients_audit d ON ( (a.AUDITID=d.AUDITID )
and (a.printedCount> 1)
AND (a.R_status='Y')
)
ORDER BY 3 DESC
Records_audit:
AuditID
record_id
created_d_t
patient_ID
branch_ID
R_status
printedCount
1
Img77862
2020-02-01 08:40:12.614
xq123
aesop96
Y
2
2
Img87962
2021-02-01 08:40:12.614
xy123
aesop96
Y
1
Patients_audit:
AuditID
dept_name
visited_d_t
patient_ID
branch_ID
emp_No
refgroup_Name
1
Imaging
2020-02-01 11:41:12.614
xq123
aesop96
976581
finnyTown
1
EMR
2020-02-01 12:42:12.614
xq123
aesop96
976581
finnyTown
2
Imaging
2021-02-01 12:40:12.614
xy123
himpo77
976581
georgeTown
2
FrontOffice
2021-02-01 13:41:12.614
xy123
himpo77
976581
georgeTown
2
EMR
2021-02-01 14:42:12.614
xy123
himpo77
976581
georgeTown

A left join will give you all records in the "left" table, that is the from table. Since you have no where clause to constrain the query you're going to get all records in Records_audit.
See Visual Representation of SQL Joins for more about joins.
If your intent is to get all records in Records_audit which have an R_status of Y and a printedCount > 1, put those into a where clause.
select ra.*, pa.refgroup_name
from records_audit ra
left join patients_audit pa on ra.auditId = pa.auditId
where ra.printedCount > 1
and ra.r_status = 'Y'
order by ra.created_d_t desc
This will match all records in Records_audit which match the where clause. The left join ensures they match even if they do not have a matching Patients_audit record.
Other notes:
Your order by 3 relies on the order in which columns were declared in Records_audit. If you mean to order by records_audit.created_d_t write order by a.created_d_t.
If your query is making an assumption about the data, add a constraint to make sure it is true and remains true.

Related

SQL - JOIN 2 tables with either NULL OR MAX

I have two tables in Teradata that i need to LEFT JOIN.
The first one includes clients, the second their details with the validity end date. NULL represents currently valid.
Table1
client_id
1
2
Table2
client_id
valid_end
1
31.12.2021
1
31.12.2022
2
31.12.2020
2
null
I need to left join the two tables using the most recent record for each client from Table2.
In case there is a currently valid record with NULL, it is used. If there is not any NULL record, the highest date is used.
Result
client_id
valid_end
1
31.12.2022
2
null
Tried a lot using QUALIFY and MAX, but never reached the requested result. Thanks for advice.
Use ROW_NUMBER instead of MAX, NULLS FIRST sorts NULL before the highest date:
qualify
row_number()
over (partition by client_id
order by valid_end desc NULLS FIRST) = 1

unable to use LIMIT when using correlated query

I have two tables in Postgres. I want to get the latest 3records data from table.
Below is the query:
select two.sid as sid,
two.sidname as sidname,
two.myPercent as mypercent,
two.saccur as saccur,
one.totalSid as totalSid
from table1 one,table2 two
where one.sid = two.sid;
The above query displays all records checking the condition one.sid = two.sid;I want to get only recent 3 records data(4,5,6) from table2.
I know in Postgres we can use limit to limit the rows to retrieve, but here in table2 for each ID I have multiple rows. So I guess I cannot use limit on table2 but should use on table1. Any suggestions?
table1:
sid totalSid
1 10
2 20
3 30
4 40
5 50
6 60
table2:
sid sidname myPercent saccur
1 aaaa 11 11t
1 bbb 13 13g
1 ccc 11 11g
1 qw 88 88k
//more data for 2,3,4,5....
6 xyz 89 895W
6 xyz1 90 90k
6 xyz2 91 91p
6 xyz3 92 92q
Given a changed understanding of the question a simple subquery and join should suffice.
We select everything from table1 limit to 3 records in sid order desc. This gives us the 3 most recent Sid's and then join to table2 to get the other SID relevant data. The assumption here is that SID is unique in table one and "most recent" would be those records having the highest SID.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
INNER JOIN table2 two
ON one.sid = two.sid;
*note I removed a comma after one alias above.
and below we reinstated the ANSI 88 join syntax using , notation.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
, table2 two
WHERE one.sid = two.sid;
This syntax basically says get the 3 most recent SIDs from table one and cross join (For each record in one match it to all records in two) that to all records in table two but then return only records that have the same SID on both sides. Modern compilers may be able to use Cost based optimization to improve performance here negating the need to do the entire cross join; however, order of operation says this is what the database would normally have to do. if one and two are both tables of substantial size, you can see the cross join could result in a very large temporary dataset

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Checking for (and Deleting) Complex Object Duplicates in SQL Server

So I need to duplicate check a complex object, and then cascade delete dupes from all associated tables and I'm wondering if I can do it efficiently in SQL Server, or if I should go about it in my code. Structurally I have the following tables.
Claim
ClaimCaseSubTypes (mapping table for many to many relationship)
ClaimDiagnosticCodes (ditto)
ClaimTreatmentCodes (ditto)
Basically a Claim is only a duplicate if it is matching on 8 fields in itself AND has the same relationships in all the mapping tables.
For Example, the following records would be indicated as duplicates
Claim
Id CreateDate Other Fields
1 1/1/2015 matched
2 6/1/2015 matched
ClaimCaseSubTypes
ClaimId SubTypeId
1 34
1 64
2 34
2 64
ClaimDiagnosticCodes
ClaimId DiagnosticCodeId
1 1
2 1
ClaimTreatmentCodes
ClaimId TreatmentCodeId
1 5
1 6
2 6
2 5
And in this case I would want to keep 1 and delete 2 from the Claim table as well as any rows in the mapping tables with ClaimId of 2
This is the kind of problem that window functions are for:
;WITH cte AS (
SELECT c.ID,
ROW_NUMBER() OVER (PARTITION BY field1, field2, field3, ... ORDER BY c.CreateDate) As ClaimOrder
FROM Claim c
INNER JOIN other tables...
)
UPDATE Claim
SET IsDuplicate = IIF(cte.ClaimOrder = 1, 0, 1)
FROM Claim c
INNER JOIN cte ON c.ID = cte.ID
The fields that you include in the PARTITION BY indicates what fields need to be identical for two claims to be considered matched. The ORDER BY tell SQL Server assign the earliest claim the order of 1. Everything that doesn't have the order of 1 is a duplicate of something else.

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001