SQL - JOIN 2 tables with either NULL OR MAX - sql

I have two tables in Teradata that i need to LEFT JOIN.
The first one includes clients, the second their details with the validity end date. NULL represents currently valid.
Table1
client_id
1
2
Table2
client_id
valid_end
1
31.12.2021
1
31.12.2022
2
31.12.2020
2
null
I need to left join the two tables using the most recent record for each client from Table2.
In case there is a currently valid record with NULL, it is used. If there is not any NULL record, the highest date is used.
Result
client_id
valid_end
1
31.12.2022
2
null
Tried a lot using QUALIFY and MAX, but never reached the requested result. Thanks for advice.

Use ROW_NUMBER instead of MAX, NULLS FIRST sorts NULL before the highest date:
qualify
row_number()
over (partition by client_id
order by valid_end desc NULLS FIRST) = 1

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

SQL Server find results within partition

I have the following table:
ID Date
-------------------
1 Null
1 1/2/2020
2 Null
2 12/2/2020
3 Null
For every ID which has at least one non-null date, I need to classify as 'accounted'.
Result set should look like below:
id Date AccountFlag
----------------------------
1 Null Accounted
1 1/2/2020 Accounted
2 Null Accounted
2 12/2/2020 Accounted
3 Null Unaccounted
You can use window functions to check if the same id has at least one non-null date, and a case expression to set the flag accordingly. Window aggregate functions come handy for this:
select id, date,
case when max(date) over(partition by id) is not null
then 'Accounted'
ese 'Unaccounted'
end as accountflag
from mytable
max() ignores null values, so it returns null if and only if all values in the partition are null. This would work just the same with min().

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001