SQL Server find results within partition - sql

I have the following table:
ID Date
-------------------
1 Null
1 1/2/2020
2 Null
2 12/2/2020
3 Null
For every ID which has at least one non-null date, I need to classify as 'accounted'.
Result set should look like below:
id Date AccountFlag
----------------------------
1 Null Accounted
1 1/2/2020 Accounted
2 Null Accounted
2 12/2/2020 Accounted
3 Null Unaccounted

You can use window functions to check if the same id has at least one non-null date, and a case expression to set the flag accordingly. Window aggregate functions come handy for this:
select id, date,
case when max(date) over(partition by id) is not null
then 'Accounted'
ese 'Unaccounted'
end as accountflag
from mytable
max() ignores null values, so it returns null if and only if all values in the partition are null. This would work just the same with min().

Related

Moving average within groups that returns NULL if any row is NULL (Snowflake - SQL)

I need to caluclate the moving average of a column per group (partitioned by id). The only twist is that I need the result to be NULL if any value in the corresponding window is NULL.
Example of expected behaviour (for a given id and window size=3):
A
mov_ave_A
NULL
NULL
1
NULL
1
NULL
1
1
4
2
The first 3 rows of the moving average are NULL, because the first value (which is included in the first 3 windows) is NULL. Row 4 of mov_ave_A is equal to 1 because it's the average of rows 2 to 4 of A, and so on.
I tried:
CASE WHEN SUM(CASE WHEN a IS NULL THEN 1 ELSE 0 END) = 0 THEN AVG(a) ELSE NULL END
OVER (
PARTITION BY id
ORDER BY date_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS mov_ave_A
but I get
"Sliding window frame unsupported for function CASE".
Also, I'd really like the solution to be short and simple as I need to create 6 such columns. So, I'll have to repeat the logic 6 times.
The issue with your query is the OVER clause is after the END. I believe this should work. You need to have the OVER clause for each window function so once for COUNT and once for AVG. COUNT is a easier to way to check for NULL's then using SUM
SELECT
*
,CASE
/*Check for 3 values in a, if so return the rolling AVG value. Implicit ELSE NULL*/
WHEN COUNT(a) OVER (PARTITION BY ID ORDER BY date_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) = 3
THEN AVG(a) OVER (PARTITION BY ID ORDER BY date_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END AS mov_ave_A
FROM YourTable
Use the following case expression:
CASE WHEN COUNT(a) OVER (
PARTITION BY id
ORDER BY date_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) = 3 THEN AVG(a) OVER (
PARTITION BY id
ORDER BY date_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) END AS mov_avg

SQL - JOIN 2 tables with either NULL OR MAX

I have two tables in Teradata that i need to LEFT JOIN.
The first one includes clients, the second their details with the validity end date. NULL represents currently valid.
Table1
client_id
1
2
Table2
client_id
valid_end
1
31.12.2021
1
31.12.2022
2
31.12.2020
2
null
I need to left join the two tables using the most recent record for each client from Table2.
In case there is a currently valid record with NULL, it is used. If there is not any NULL record, the highest date is used.
Result
client_id
valid_end
1
31.12.2022
2
null
Tried a lot using QUALIFY and MAX, but never reached the requested result. Thanks for advice.
Use ROW_NUMBER instead of MAX, NULLS FIRST sorts NULL before the highest date:
qualify
row_number()
over (partition by client_id
order by valid_end desc NULLS FIRST) = 1

SQL query to allow for latest datasets per items

I have this table in an SQL server database:
and I would like a query that gives me the values of cw1, cw2,cw3 for a restricted date condition.
I would like a query giving me the "latest" values of cw1, cw2, cw3 giving me previous values of cw1, cw2, cw3, if they are null for the last plan_date. This would be with a date condition.
So if the condition is plan_date between "02.01.2020" and "04.01.2020" then the result should be
1 04.01.2020 null, 9, 4
2 03.01.2020 30 , 15, 2
where, for example, the "30" is from the last previous date for item_nr 2.
You can get the last value using first_value(). Unfortunately, that is a window function, but select distinct solves that:
select distinct item_nr,
first_value(cw1) over (partition by item_nr
order by (case when cw1 is not null then 1 else 2 end), plan_date desc
) as imputed_cw1,
first_value(cw2) over (partition by item_nr
order by (case when cw2 is not null then 1 else 2 end), plan_date desc
) as imputed_cw2,
first_value(cw3) over (partition by item_nr
order by (case when cw3 is not null then 1 else 2 end), plan_date desc
) as imputed_cw3
from t;
You can add a where clause after the from.
The first_value() window function returns the first value from each partition. The partition is ordered to put the non-NULL values first, and then order by time descending. So, the most recent non-NULL value is first.
The only downside is that it is a window function, so the select distinct is needed to get the most recent value for each item_nr.

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001