PostgreSQL query with conditional empty values depending on preceding rows

PostgreSQL query with conditional empty values depending on preceding rows - sql

I am working on a postgresql query that i am not sure how to produce the output.
Lets say i have a sql query whose output i want is
name date visit_number visit
x 2011-01-01 123 ?? (value i want=1)
y 2011-01-01 123 ?? (value i want=empty)
a 2011-02-02 345 ?? (value i want=1)
b 2011-02-02 345 ?? (empty)
c 2011-02-02 345 ?? (empty)
currently my sql query contains all the values except the last column visit. I want the visit column to work this way...if visit_number contains same value for multiple rows, i want the column visit to show the value 1 for the first row and just null or empty for the remaining rows where the visit_number is the same. How do i do that???
i could write the sample query in any way.it could simply be :
select name,date,visit_number from sometable order by date;
I am using postgres 8.1 version.
Thanks

The first thing you should do is upgrade to a modern day version of PostgreSQL. Version 8.1 has reached end of life in November 2010.
In a more recent version you can conveniently solve this with window functions:
SELECT name, date, visit_number
, CASE WHEN row_number() OVER (PARTITION BY visit_number
ORDER BY date, name) = 1
THEN 1
ELSE NULL
END AS visit
FROM tbl
ORDER BY date, name;
I ordered by name additionally to break ties.
For versions before PostgreSQL 8.4, this query should work (untested):
SELECT name, date, visit_number
, CASE WHEN EXISTS (
SELECT *
FROM tbl t1
WHERE t1.visit_number = tbl.visit_number -- more to make it unique?
AND t1.date <= tbl.date -- or more columns to make order unambiguous
AND t1.name < tbl.name
)
THEN NULL ELSE 1 END AS visit
FROM tbl
ORDER BY date, name;

This is the query:
select *,
case when row_number() over (partition by visit_number) = 1
then 1
else null
end
from t
Here is an example
Edit:
Without window function:
select t4.*, case when t3.name is not null then 1 end as visit from t t4
left join (
select t1.* from t t1
left join t t2 on t1.name > t2.name and t1.date = t2.date and
t1.visit_number = t2.visit_number
where t2.name is null
) as t3
on t3.name = t4.name and t3.date = t4.date and t3.visit_number = t4.visit_number
Here is an example
NOTE: If name is a key then the last comparison t3.date = t4.date and t3.visit_number = t4.visit_number can be removed

Related

max latest null value sql

I am experiencing the following problem. I have a table in this table I same history. Due to an error, I'm interested in finding the following information.
The latest record for a user where column1 value is null and the modifiedon date is the newest for this user. The problem is the table contains more records where the modifiedon is not null for this user and mutated after the date I'm looking for.
Can someone please point me in the right direction?
Sample data:
personid FreeField01 ModifiedOn
1 0004998 15-10-2019 11:48:19
1 NULL 20-10-2019 01:53:39
1 0004998 22-10-2019 14:58:44
1 0004998 22-10-2019 14:58:44
1 NULL 23-10-2019 07:52:46
1 0004998 23-10-2019 17:16:45
So for this user, I'm not interested in any record and should be excluded from the result because the modified on datetime should be before 29-10 and before that date the freefield01 value should be null and modifiedon should be the latest.

Three conditions:
There is no newer entry for the person.
The entry value is NULL.
The date is before 2019-10-29.
The query:
select *
from mytable
where not exists
(
select *
from mytable newer
where newer.personid = mytable.personid
and newer.modifiedon > mytable.modifiedon
)
and freefield01 is null
and modifiedon < date '2019-10-29'
order by personid;

You can use this below script-
WITH CTE
AS(
SELECT personid,MAX(ModifiedOn) MD
FROM your_table
GROUP BY personid
HAVING MAX(ModifiedOn) < '30-10-2019'
)
SELECT * FROM your_table A
INNER JOIN CTE B ON A.personid = B.personid
AND A.ModifiedOn = B.MD
AND A.FreeField01 IS NULL
DEMO HERE

If I understand correctly, you are looking for persons where the FreeField01 has a value of NULL as of a certain date.
Here is one method:
select t.*
from t
where t.ModifiedOn = (select max(t2.ModifiedOn)
from t t2
where t2.personid = t.personid and
t2.ModifiedOn <= '2019-10-29'
) and
t.FreeField01 is null;
EDIT:
Based on your comment, you might just want an aggregation and having:
select personid
from t
where t.ModifiedOn <= '2019-10-29'
group by person_id
having sum(case when t.FreeField01 is null then 1 else 0 end) = 0

The simplest query that I found might be the following if I understand your request well :
SELECT t.personid, t.FreeField01, MAX(ModifiedOn) FROM test t
GROUP BY personid
HAVING MAX(ModifiedOn) < '29-10-2019' AND FreeField01 IS NULL
SEE EXAMPLE HERE
EDIT : Following below suggestions you can use this query instead :
SELECT t1.personid, t1.FreeField01, t1.ModifiedOn
FROM test t1
JOIN (
SELECT t.personid, MAX(ModifiedOn) AS MaxModifiedOn FROM test t
GROUP BY personid
HAVING MAX(ModifiedOn) < STR_TO_DATE('29-10-2019','%d-%m-%Y')
) t2 ON (t1.personid = t2.personid AND t1.ModifiedOn = t2.MaxModifiedOn)
WHERE FreeField01 IS NULL
SEE NEW DEMO HERE

Flag on condition

Here's my table :
key date
a 2002
a 2014
a 2011
b 2004
b 2016
b 2001
I'd like a SELECT statement that adds a flag for the most recent date, like that :
key date flag
a 2002 0
a 2014 1
a 2011 0
b 2004 0
b 2016 1
b 2001 0
Thanks

You can use an analytical function if you don't want to do a group by or self-join. You can probably consolidate this a little if you want to, but I find splitting it out using with makes it more obvious what is going on.
with max_date_query as (
select key, date, max(date) over (partition by key) max_date
from mytable
)
select key, date, case when date = max_date then 1 else 0 end flag
from max_date_query
There are other variations on the same theme where you can order the window by date desc and use row_number() instead of max() to determine the flag. I would imagine the one I showed is better, but not sure how much it will really make a difference. You might need to use that method if you have cases where you have duplicate max dates and need to really only choose one.

select t1.*, case when t2.a is null
then 0
else 1
end as flag
from your_table t1
left join
(
select key, max(date) as mdate
from your_table
group by key
) t2 on t1.key = t2.key and t1.date = t2.mdate

Not really sure what the "most recent" condition is (last "X" years?) and assuming the "2015" are in fact DATE values (not char), try:
select
t1.key,
t1.date,
CASE WHEN DATEDIFF('year', t1.date, CURRENT_DATE) < 2 THEN 1 ELSE 0 END as flag
from table t1;
if the "date" in fact is an integer:
select
t1.key,
t1.date,
CASE WHEN EXTRACT(YEAR FROM CURRENT_DATE) - t1.date < 2 THEN 1 ELSE 0 END as flag
from table t1;
Hope it helps
Sérgio

Select rows having the same features than others

I've the following table with 3 columns: Id, FeatureName and Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
Each Id has different features and each Feature has a value for that Id.
I need to write a query which gives me the Ids that have exactly the same features and values than a given one, but only taking into account those whose name starts with 'A'. For example, in the top table, I can use that query to search for all the Ids that have the same features. For example, features with values where Id=1 would result Id=3 with same features starting with 'A' and same values for these features.
I found a couple of different ways to do this, but all of them go very slow when the table has lots of rows (more than hundred of thousands)
The way I obtain the best performance is using the next query:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
where #nFeatures is the number of features starting by 'A' in Id=1. I counted them before calling this query. I make the intersection to avoid results that have the same parameters than Id=1 but also some others whose name starts with 'A'.
I think that the slowest part is the second subquery:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
but I don't know how to make it faster. Maybe I will have to play with the indexes.
Any idea of how could I write a fast query for this purpose?

So you want all rows where the combination of FeatureName and Value is not unique? You can use EXISTS:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
Demo
how could I write a fast query for this purpose?
If it's not fast enough create an index on FeatureName + Value.

I tried to eliminate the join with MyTable again to select the data for the ID's that have matching FeatureName and Value values. Here's the query:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
Here is the SQL Fiddle demo. It has an extra condition in the inline view mt2, to perform this search only for id = 1.

I'm a little dense this morning, I'm not sure if you wanted just the ID's or...
Here's my take on it...
You could probably move the where FeatureName like 'A%' into the inner query to filter the data on the initial table scan.
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id

A general solution is
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
to limit the solution to a group just add a WHERE condition on the main query for a.ID. The CTE is needed to get the correct number of rows for each id
SQLFiddle demo, in the demo I changed little the test data to have a another couple of ID with only one of the FeatureName of 1 and 3

Replace NULL with values

Here is my challenge:
I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value.
Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:
Source table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue NULL NULL
3 NULL NULL F
4 Frank Admission T
5 NULL NULL F
6 NULL NULL T
Desired output table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue Registrar T
3 Sue Registrar F
4 Frank Admission T
5 Frank Admission F
6 Frank Admission T
How do I write a query which will generate the desired output table?

One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.
With Block As (
Select ID
, Owner
, OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Status
, SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Flag
, FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
From Task_log
)
Select ID
, Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
, Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
, Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM Block
SQLFiddle demo
The UPDATE query is easily derived

As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.
UPDATE t1
set t1.owner = COALESCE(t1.owner, t2.owner),
t1.Status = COALESCE(t1.status, t2.status),
t1.Flag = COALESCE(t1.flag, t2.flag)
FROM Task_log as t1
INNER JOIN Task_log as t2
ON t1.id = (t1.id + 1)
where t1.owner is null
OR t1.status is null
OR t1.flag is null

I can think of several approaches.
You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).
You could also use a subselect for each column.
SELECT id,
(SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
-- other columns
You could probably do something with window functions, too.

A vanilla solution would be:
select id
, owner
, coalesce(owner, ( select owner from t t2
where id = (select max(id) from t t3
where id < t1.id and owner is not null))
) as new_owner
, flag
, coalesce(flag, ( select flag from t t2
where id = (select max(id) from t t3
where id < t1.id and flag is not null))
) as new_flag
from t t1
Rather inefficient, but should work on most DBMS

results of a sub table in the top level query

Not sure how to title this so please feel free to retitle.
I have two tables with a one to many relationship.
Table1
|ID|NAME|...|
Table2
|ID|Table1_ID|StartDate|EndDate|
I am trying to write a query that given a date will return the following
|TABLE1.ID|TABLE1.NAME|are any rows of table 2 in date|
I have a one to many between table 1 and table 2. I want to pass in a date to the query. If any of the many relationships in table 2 have a start date < passed in date and an end date > passed in date or end date is null then I want column 3 of result to be true. Otherwide I want it to be false.
Consider the example
|ID|NAME|...|
| 1|APPLE| ...|
| 2|PEAR| ...|
Table2
|ID|Table1_ID|StartDate|EndDate|
|1|1|01-01-2014|null|
|2|1|01-01-2014|01-02-2014|
|3|2|01-01-2014|01-02-2014|
if I pass in 01-01-2014 then I expect two rows with IDs 1 and 2 and both to be true (all rows match)
if I pass in 01-03-2014 then I expect two rows with ID 1 true (match on first row) and ID 2 to be false (because third row is outside of this date)
I am trying to do this in SQL to eventually convert to JPA. If there are any JPA functions that can do this then that would be good to know. Else I'll do a native query
Any pointers would be great!
Thanks

This should give you what you want:
select x.*, 'PASS' as checker
from table1 x
where exists
(select 'x'
from table2 y
where y.table1_id = x.table1_id
and y.startdate <= '01-01-2014'
and (y.enddate >= '01-01-2014' or y.enddate is null))
union all
select x.*, 'FAIL' as checker
from table1 x
where not exists
(select 'x'
from table2 y
where y.table1_id = x.table1_id
and y.startdate <= '01-01-2014'
and (y.enddate >= '01-01-2014' or y.enddate is null))

I don't know if I understand your question.
So, please, be patient... ;)
Try something like this:
select t1.id, t1.name,
case when t2.Table1_ID is null
then 'false'
else 'true' end as boolean_value
from Table1 t1,
(select distinct Table1_ID
from Table2
where yourdate >= StartDate
and (yourdate <= EndDate or EndDate is null) t2
where t1.id = t2.id (+);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas