Eliminate NULL records in distinct select statement - sql

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002

Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.

Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.

It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.

This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid

try this
select distinct empid,date from employee where date is not null

Related

SQL query on denormalized tables

I have these two below mentioned denormalized tables with out any data constraints. Records_audit will not have duplicate audit_id based rows though table doesn't have any constraints.
I will need SQL query to extract all fields of records_audit with an addtional matching column refgroup_Name from second table using matching condition of AuditID from both tables, printedCount greater than 1 and R_status as 'Y'. I tried to do with left join but it is selecting all records.
Can you help to correct my query? I tried with this below query but its selecting all unwanted from second table:
SELECT a.*, d.refgroup_Name
from Records_audit a
left join Patients_audit d ON ( (a.AUDITID=d.AUDITID )
and (a.printedCount> 1)
AND (a.R_status='Y')
)
ORDER BY 3 DESC
Records_audit:
AuditID
record_id
created_d_t
patient_ID
branch_ID
R_status
printedCount
1
Img77862
2020-02-01 08:40:12.614
xq123
aesop96
Y
2
2
Img87962
2021-02-01 08:40:12.614
xy123
aesop96
Y
1
Patients_audit:
AuditID
dept_name
visited_d_t
patient_ID
branch_ID
emp_No
refgroup_Name
1
Imaging
2020-02-01 11:41:12.614
xq123
aesop96
976581
finnyTown
1
EMR
2020-02-01 12:42:12.614
xq123
aesop96
976581
finnyTown
2
Imaging
2021-02-01 12:40:12.614
xy123
himpo77
976581
georgeTown
2
FrontOffice
2021-02-01 13:41:12.614
xy123
himpo77
976581
georgeTown
2
EMR
2021-02-01 14:42:12.614
xy123
himpo77
976581
georgeTown
A left join will give you all records in the "left" table, that is the from table. Since you have no where clause to constrain the query you're going to get all records in Records_audit.
See Visual Representation of SQL Joins for more about joins.
If your intent is to get all records in Records_audit which have an R_status of Y and a printedCount > 1, put those into a where clause.
select ra.*, pa.refgroup_name
from records_audit ra
left join patients_audit pa on ra.auditId = pa.auditId
where ra.printedCount > 1
and ra.r_status = 'Y'
order by ra.created_d_t desc
This will match all records in Records_audit which match the where clause. The left join ensures they match even if they do not have a matching Patients_audit record.
Other notes:
Your order by 3 relies on the order in which columns were declared in Records_audit. If you mean to order by records_audit.created_d_t write order by a.created_d_t.
If your query is making an assumption about the data, add a constraint to make sure it is true and remains true.

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Select all related records

I have a table (in SQL Server) that stores records as shown below. The purpose for Old_Id is for change tracking.
Meaning that when I want to update a record, the original record has to be unchanged, but a new record has to be inserted with a new Id and with updated values, and with the modified record's Id in Old_Id column
Id Name Old_Id
---------------------
1 Paul null
2 Paul 1
3 Jim null
4 Paul 2
5 Tim null
My question is:
When I search for id = 1 or 2 or 4, I want to select all related records.
In this case I want see records the following ids: 1, 2, 4
How can it be written in a stored procedure?
Even if it's bad practice to go with this, I can't change this logic because its legacy database and it's quite a large database.
Can anyone help with this?
you can do that with Recursive Common Table Expressions (CTE)
WITH cte_history AS (
SELECT
h.id,
h.name,
h.old_id
FROM
history h
WHERE old_id IS NULL
and id in (1,2,4)
UNION ALL
SELECT
e.id,
e.name,
e.old_id
FROM
history e
INNER JOIN cte_history o
ON o.id = e.old_id
)
SELECT * FROM cte_history;

Remove Duplicate Values from Table and leave the ID with the most current date

Not a SQL guru, but I need a SQL statement that will return a table of unique values with the most current date, so remove all the duplicate values based on ID and keep the ID with the most current date.
My current SQL statement is this:
Select Bill_To_Merchant_ID, Agreement_Termination_Notification_Date
From STG.Fact_Agreement
Current result:
Bill_To_Merchant_ID Agreement_Termination_Notification_Date
----------------------------------------------------------------
1 01/09/2020
1 03/09/2020
2 05/09/2020
2 07/09/2020
3 06/09/2020
3 16/09/2020
Expected result:
Bill_To_Merchant_ID Agreement_Termination_Notification_Date
----------------------------------------------------------------
1 03/09/2020
2 07/09/2020
3 16/09/2020
If there are no duplicates, that record remains in the result set.
If you want to actually delete the not most recent records, then use:
DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.Bill_To_Merchant_ID = t2.Bill_To_Merchant_ID AND
t2.Agreement_Termination_Notification_Date > t1.Agreement_Termination_Notification_Date);
If you instead can tolerate just viewing your data as in the expected output, then aggregation should work:
SELECT
Bill_To_Merchant_ID,
MAX(Agreement_Termination_Notification_Date) AS Agreement_Termination_Notification_Date
FROM yourTable
GROUP BY
Bill_To_Merchant_ID;
I think you just want group by:
select Bill_To_Merchant_ID, max(Agreement_Termination_Notification_Date)
from STG.Fact_Agreement
group by Bill_To_Merchant_ID;

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001