SQL Case depending on previous status of record - sql

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2

Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".

You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;

other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

Related

SQL Get rows based on conditions

I'm currently having trouble writing the business logic to get rows from a table with id's and a flag which I have appended to it.
For example,
id: id seq num: flag: Date:
A 1 N ..
A 2 N ..
A 3 N
A 4 Y
B 1 N
B 2 Y
B 3 N
C 1 N
C 2 N
The end result I'm trying to achieve is that:
For each unique ID I just want to retrieve one row with the condition for that row being that
If the flag was a "Y" then return that row.
Else return the last "N" row.
Another thing to note is that the 'Y' flag is not always necessarily the last
I've been trying to get a case condition using a partition like
OVER (PARTITION BY A."ID" ORDER BY A."Seq num") but so far no luck.
-- EDIT:
From the table, the sample result would be:
id: id seq num: flag: date:
A 4 Y ..
B 2 Y ..
C 2 N ..
Using a window clause is the right idea. You should partition the results by the ID (as you've done), and order them so the Y flag rows come first, then all the N flag rows in descending date order, and pick the first for each id:
SELECT id, id_seq_num, flag, date
FROM (SELECT id, id_seq_num, flag, date,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY CASE flag WHEN 'Y' THEN 0
ELSE 1
END ASC,
date ASC) AS rk
FROM mytable) t
WHERE rk = 1
My approach is to take a UNION of two queries. The first query simply selects all Yes records, assuming that Yes only appears once per ID group. The second query targets only those ID having no Yes anywhere. For those records, we use the row number to select the most recent No record.
WITH cte1 AS (
SELECT id
FROM yourTable
GROUP BY id
HAVING SUM(CASE WHEN flag = 'Y' THEN 1 ELSE 0 END) = 0
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1."id seq" DESC) rn
FROM yourTable t1
INNER JOIN cte1 t2
ON t1.id = t2.id
)
SELECT *
FROM yourTable
WHERE flag = 'Y'
UNION ALL
SELECT *
FROM cte2 t2
WHERE t2.rn = 1
Here's one way (with quite generic SQL):
select t1.*
from Table1 as t1
where t1.id_seq_num = COALESCE(
(select max(id_seq_num) from Table1 as T2 where t1.id = t2.id and t2.flag = 'Y') ,
(select max(id_seq_num) from Table1 as T3 where t1.id = t3.id and t3.flag = 'N') )
Available in a fiddle here: http://sqlfiddle.com/#!9/5f7f9/6
SELECT DISTINCT id, flag
FROM yourTable

Picking one row over another

I have a table that looks like this:
ID Type Value
A Z01 10
A Z09 20
B Z01 30
C Z01 40
D Z09 50
E Z10 60
For each ID I would like to retrieve a value. Ideally the value should come from the row with type Z01. However, if Z01 is not available I'll pick Z09 instead. If nothing is available I would like to select nothing.
The result would look like this:
Id Type Value
A Z01 10
B Z01 30
C Z01 40
D Z09 50
How can I accomplish this with T-SQL?
This should give you what you want:
select *
from table t
where 1 = case
when t.type = 'Z01'
then 1
when t.type = 'Z09'
and not exists (select 1 from table where id = t.id and type = 'Z01')
then 1
else 0
end
An alternative, with using a more common approach is (re-writing the CASE expression):
select *
from table
where type = 'Z01'
OR (type = 'Z09' and not exists (select 1 from table where id = t.id and type = 'Z01'))
An obvious sargable approach (which will make your query use the appropriate index on your table, if it exists) would be:
select *
from table
where type = `Z01`
union all
select *
from table
where type = `Z09`
and not exists (select 1 from table where id = t.id and type = 'Z01')
And when I'm saying index I'm talking about a non-clustered index on the type column.
You can use query like this
;WITH cte
AS (SELECT
*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY type) AS rn
FROM yourtable
WHERE type IN ('Z01', 'Z09'))
SELECT
id, type, value
FROM cte
WHERE rn = 1
I would use window functions:
select t.*
from (select t.*,
row_number() over (partition by id
order by (case when type = 'Z01' then 1
when type = 'Z09' then 2
end)
) as seqnum
from t
) t
where seqnum = 1;
I don't fully understand teh statement "if nothing is available, then choose nothing". If, by this, you mean you want 'Z01', 'Z09' or nothing at all, then just do:
select t.*
from (select t.*,
row_number() over (partition by id order by type) as seqnum
from t
where type in ('Z01', 'Z09')
) t
where seqnum = 1;
The where clause does the filtering, and you can just order by the type alphabetically.
Something like this:
select distinct
base.Id,
Coalesce(z01.Type, any.Type) Type,
Coalesce(z01.Value, any.Value)
from (select distinct id from [table]) base
left outer join [table] z01
on z01.id = base.id
and z01.Type = 'Z01'
left outer join [table] any
on any.id = base.id
and any.type != 'Z01'
Start with a WHERE clause. Select only entries with Z01 and Z09. Then use ROW_NUMBER to rank your rows and keep the best.
select id, type, value
from
(
select
id, type, value,
row_number()
over (partition by id order by case when type = 'Z01' then 1 else 2 end) as rn
from mytable
where type in ('Z01','Z09')
) ranked
where rn = 1;
Then try this
Select distinct a.id,
coalesce(t1.type, t9.type) type,
case when t1.type is not null
then t1.value else t9.value end value
From table a
left join table t1
on t1.id = a.id
and t1.type = 'Z01'
left join table t9
on t9.id = a.id
and t9.type = 'Z09'
Where a.type in ('Z01', 'Z09') -- < -- this last to eliminate row E

Combine the most recent entries from a number of tables

I have a master table with a number of IDs in it:
ID ...
0 ...
1 ...
And multiple tables (say vtbl1, vtbl2, vtbl3) with a foreign key to master, a timestamp and a value:
ID Timestamp Value
0 01/01/01.. 2
1 01/01/02.. 7
0 01/01/03.. 5
I would like to get one or more entries for each ID in master with an entry (or null if no entries exist) containing the most recent entry in each v... table (grouped by timestamps):
ID Timestamp vtbl1.Value vtbl2.Value vtbl3.value
0 01/01/03.. 5 2
0 01/01/01.. 4
1 01/01/02.. 7 4 9
I'm sure this is fairly simple but my SQL is rusty and I've been going in circles. Any help would be appreciated.
Clarification
These values come from one or more sensors able to read one or more of the values. So the latest value in each value table for the ID is to be considered the current system state for that ID. If the timestamps match they are considered one update.
I need the minimal set of updates required for each ID to give a full data set for the current state.
Also the values can be of different types.
If I understand your question correctly, one option is to use conditional aggregation and union all:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl
from tbl3
) t
group by id, timestamp
Or if you have multiple records per id and you want the highest value per by timestamp, you can include row_number() in your subquery:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl3
) t
where rn = 1
group by id, timestamp
This can get difficult though if max(timestamp) values aren't the same in each of the child tables. Which do you join on at that point?
select m.*, v1.value as t1_val, v2.value as t2_val, v3.value as t3_val
from master m
left join (select x.*
from vtbl1 x
join (select id, max(timestamp) as last_ts
from vtbl1
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v1
on m.id = v1.id
left join (select x.*
from vtbl2 x
join (select id, max(timestamp) as last_ts
from vtbl2
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v2
on m.id = v2.id
left join (select x.*
from vtbl3 x
join (select id, max(timestamp) as last_ts
from vtbl3
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v3
on m.id = v3.id
The fastest query technique depends on the distribution of values. DISTINCT ON would be a simple solution in Postgres, ideal for just a few values per id in each child table. But guessing from your description I expect many rows per id, so I suggest a solution with LATERAL joins. Requires Postgres 9.3+:
Optimize GROUP BY query to retrieve latest record per user
One more complication for your already-not-so-simple case:
Also the values can be of different types
Alternative 1
Cast all values to text. Every data type can be cast to text.
Base query
SELECT m.id, v.timestamp, 1 AS tbl, v.value -- simple int as table id
FROM master m
, LATERAL (
SELECT timestamp, value::text -- cast to text
FROM vtbl1
WHERE id = m.id -- lateral reference
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 2 AS tbl, v.value -- ascending without gaps
FROM master m
, LATERAL (
SELECT timestamp, value::text
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 3 AS tbl, value
FROM ...
;
All you need for this to be fast is an index on (id, timestamp) for each child table. Best in this form (adding value is only useful if you get index-only scans out of it):
CREATE INDEX vtbl1_combo_idx ON vtbl1 (id, timestamp DESC NULLS LAST, value)
1a. Aggregate (pseudo-crosstab)
To format as desired use aggregate functions on CASE expressions in Postgres 9.3 or older (like demonstrated by #sgeddes) or (better) the new aggregate FILTER clause in Postgres 9.4+:
How can I simplify this game statistics query?
SELECT id, timestamp
, max(value) FILTER (WHERE tbl = 1) AS val1
, max(value) FILTER (WHERE tbl = 2) AS val2
, ...
FROM ( <query frm above> ) t
GROUP BY 1, 2;
1b. Crosstab
Actual cross tabulation (also called "pivot" in other RDBMS) should be considerably faster. You need the additional module tablefunc installed, instructions below.
The special difficulty here: we have a composite "row name" (id, timestamp), but the function expects a single column as row name. So we substitute with row_number(), but do not display that surrogate key in the result:
SELECT id, timestamp, val1, val2, val3, ...
-- normally SELECT * is enough; explicit list to filter rn
FROM crosstab(
$$
SELECT row_number() OVER (ORDER BY id, timestamp DESC NULLS LAST) AS rn
, id, timestamp, tbl, value
FROM ( <query from above> ) t
ORDER BY 1
$$
, 'SELECT generate_series(1,3)' -- replace 3 with highest table nr.
) AS ct (
rn int, id int, timestamp date
, val1 text, val2 text, val3 text, ...);
Closely related:
Postgres - Transpose Rows to Columns
Relevant basics:
PostgreSQL Crosstab Query
Pivot on Multiple Columns using Tablefunc
Alternative 2
Simple, but may be just as fast and preserves original data types:
SELECT id, timestamp
, max(val1) AS val1, max(val2) AS val2, max(val3) AS val3, ...
FROM (
SELECT m.id, v.timestamp
, v.value AS val1, NULL::int AS val2, NULL::numeric AS val3, ...
-- list all values with actual data type
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl1
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, v.value, NULL, ... -- column names & data types defined in first SELECT
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, NULL, v.value, ...
FROM ...
) t
GROUP BY 1, 2
ORDER BY 1, 2;
Aside: Never use basic type names or reserved words (in standard SQL) like timestamp as identifier.

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?
select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID

How to filter out records grouped by date with a large date difference

I have some records, grouped by name and date.
I would like to find any records in a table that have a date difference between them larger than a week, from the most recent record.
Would this be possible to do with a cte?
I am thinking something along these lines (it is difficult to explain)
; with mycte as (
select *
from #GroupedRecords)
select *
from mycte a
join (select *
from #GroupedRecords) b on a.Name = b.Name
where datediff(day, a.DateCreated, b.DateCreated) > 7
For example:
Id Name Date
1 Foo 02/03/2010
2 Bar 23/02/2010
3 Ram 21/01/2010
4 Foo 29/02/2010
5 Foo 22/02/2010
6 Foo 05/12/2009
The results should be:
Id Name Date
1 Foo 02/03/2010
5 Foo 22/02/2010
6 Foo 05/12/2009
You can try:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
WHERE ( (SELECT MAX(DATE) AS md
FROM groupedrecords gr2
WHERE gr1.name = gr2.name) - gr1.DATE ) > 7;
Or probably better yet:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
INNER JOIN (SELECT name,
MAX(DATE) AS md
FROM groupedrecords AS gr2
GROUP BY name) AS q1
ON gr1.name = q1.name
WHERE ( q1.md - gr1.DATE ) > 7;
UPDATE: As suggested in the comments, here is a version that uses union to get the id with the max date per group AND the ids of those that are 7 days or older than the max date. I used a CTE for fun, it was not necessary. Note that if there is more than 1 ID that shares the max date in a group, this query will need to be modified-
WITH CTE
AS (SELECT name,
Max(date) AS MD
FROM Records
GROUP BY name)
SELECT R.ID,
R.name,
R.date
FROM CTE
INNER JOIN Records AS R
ON CTE.Name = R.Name
AND CTE.MD = R.date
UNION ALL
SELECT r1.id,
r1.name,
r1.DATE
FROM Records AS R1
INNER JOIN CTE
ON CTE.name = R1.name
WHERE ( CTE.md - R1.DATE ) > 7
ORDER BY name ASC,
date DESC
I wonder if this gets close to a solution:
; with tableWithRow as (
select *, row_number() over (order by name, date) as rowNum
from t
)
select t1.*, t2.id t2id, t2.name t2name, t2.date t2date, t2.rowNum t2rowNum
from tableWithRow t1
join tableWithRow t2
on t1.rowNum = t2.rowNum + 1 and t1.name = t2.name