SQL select rows based on CASE statement - sql

I would want to get each ID's 'good' value if exists, 'bad' if it doesn't.
If an ID has index='good', want to return that row for the ID.
If an ID ONLY has index='bad', want to get that one.
How would you go about it?

You can try the below -
with cte as
(
select id, index, value, row_number() over(partition by id order by case when index='good' then 1 else 2 end) as rn
from tablename
)
select id,index, value
from cte where rn=1

SELECT
COALESCE(good.id, bad.id) AS id,
COALESCE(good.index, bad.index) AS index,
COALESCE(good.value, bad.value) AS value
FROM data AS good
FULL OUTER JOIN data AS bad on good.id=bad.id and good.index='good' and bad.index='bad'

You can use EXISTS to check if there is index = 'good' for each id:
SELECT t1.*
FROM tablename t1
WHERE t1.index = 'good'
OR NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.id = t1.id AND t2.index = 'good'
);

Related

GoogleSQL - SELECT IF

I'm working with a dataset - structured like this
I want to exclude all records with ReviewRound being "a" if they have gone through review round "b" - If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
Some records have not gone to round "b". The issues I'm running into are as a result of there being multiple records for each unique ID.
Ideally this could be done in GoogleBigQuery, if not, filtering through GoogleScripts may also be an option!
Any suggestions would be appreciated!
If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
If I followed you correctly, you could express this as a not condition with a correlated subquery that ensures that, if the current record has ReviewRound = 'a', there is no other record that has the same id and ReviewRound = 'b'.
select t.*
from mytable t
where not (
t.ReviewRound = 'a'
and exists (
select 1
from mytable t1
and t1.id = t.id and t1.ReviewRound = 'b'
)
)
You can do this with window functions as well:
select t.* except (num_bs)
from (select t.*,
countif(reviewround = 'b') over (partition by id) as num_bs
from t
) t
where num_bs = 0 or reviewround = 'b';
By using window functions, you can solve it with this query
SELECT ID, Score
FROM (
SELECT *,
MAX(CASE WHEN ReviewRound = 'b' THEN 1 ELSE 0 END) OVER (partition by ID) as has_b
FROM mytable
) t
WHERE has_b = 0
Re-conceptualizing as keeping only the latest review round, I would try:
select * from mytable join
(select ID, max(ReviewRound) as ReviewRound from mytable group by ID)
on (ID, ReviewRound)

How to define unique value in union if two rows are not same data

I'm creating a simple SQL query with union, the result is returned correctly, but how to set a default value in a dummy column if the union result has two rows for one value?
If the result returned two values for one employee, then the dummy column is 'N' for the first value and 'Y' for the second value.
And if the result returned only one value for the employee, then the dummy column is 'Y'
How to achieve that?
This is the query that I'm using
select
dbo.employee,
dbo.starting_date
from
table_1
union
select
dbo.employee,
dbo.hiring_date
from
table_2
With a CTE:
with cte as (
select dbo.employee, dbo.starting_date date from table_1
union all
select dbo.employee, dbo.hiring_date date from table_2
)
select
t.*,
case when exists (
select 1 from cte
where employee = t.employee and date > t.date
) then 'N' else 'Y' end dummycolumn
from cte t
You can use window functions for this:
select t.employee, t.date,
(case when 1 = row_number() over (partition by t.employee order by t.date)
then 'Y' else 'N'
end) as dummy
from ((select t1.employee, t1.starting_date as date
from table_1 t1
) union all
(select t2.employee, t2.starting_date as date
from table_2 t2
)
) t

SQL Case depending on previous status of record

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2
Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".
You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;
other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"
try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

SQL: Get running row delta for records

Let's say we have this table with columns RowID and Call:
RowID Call DesiredOut
1 A 0
2 A 0
3 B
4 A 1
5 A 0
6 A 0
7 B
8 B
9 A 2
10 A 0
I want to SQL query the last column DesiredOut as follows:
Each time Call is 'A' go back until 'A' is found again and count the number of records which are in between two 'A' entries.
Example: RowID 4 has 'A' and the nearest predecessor is in RowID 2. Between RowID 2 and RowID 4 we have one Call 'B', so we count 1.
Is there an elegant and performant way to do this with ANSI SQL?
I would approach this by first finding the rowid of the previous "A" value. Then count the number of values in-between.
The following query implements this logic using correlated subqueries:
select t.*,
(case when t.call = 'A'
then (select count(*)
from table t3
where t3.id < t.id and t3.id > prevA
)
end) as InBetweenCount
from (select t.*,
(select max(rowid)
from table t2
where t2.call = 'A' and t2.rowid < t.rowid
) as prevA
from table t
) t;
If you know that rowid is sequential with no gaps, you can just use subtraction instead of a subquery for the calculation in the outer query.
You could use a query to find the previous Call = A row. Then, you could count the number of rows between that row and the current row:
select RowID
, `Call`
, (
select count(*)
from YourTable t2
where RowID < t1.RowID
and RowID > coalesce(
(
select RowID
from YourTable t3
where `Call` = 'A'
and RowID < t1.RowID
order by
RowID DESC
limit 1
),0)
)
from YourTable t1
Example at SQL Fiddle.
Here is another solution using window functions:
with flagged as (
select *,
case
when call = 'A' and lead(call) over (order by rowid) <> 'A' then 'end'
when call = 'A' and lag(call) over (order by rowid) <> 'A' then 'start'
end as change_flag
from calls
)
select t1.rowid,
t1.call,
case
when change_flag = 'start' then rowid - (select max(t2.rowid) from flagged t2 where t2.change_flag = 'end' and t2.rowid < t1.rowid) - 1
when call = 'A' then 0
end as desiredout
from flagged t1
order by rowid;
The CTE first marks the start and end of each "A"-Block and the final select then uses these markers to get the difference between the start of one block and the end of the previous one.
If the rowid is not gapless, you can easily add a gapless rownumber inside the CTE to calculate the difference.
I'm not sure about the performance though. I wouldn't be surprised if Gordon's answer is faster.
SQLFiddle example: http://sqlfiddle.com/#!15/e1840/1
Believe it or not, this will be pretty fast if the two columns are indexed.
select r1.RowID, r1.CallID, isnull( R1.RowID - R2.RowID - 1, 0 ) as DesiredOut
from RollCall R1
left join RollCall R2
on R2.RowID =(
select max( RowID )
from RollCall
where RowID < R1.RowID
and CallID = 'A')
and R1.CallID = 'A';
Here is the Fiddle.
You could do something like that:
SELECT a.rowid - b.rowid
FROM table as a,
(SELECT rowid FROM table where rowid < a.rowid order by rowid) as b
WHERE <something>
ORDER BY a.rowid
As I cannot say which DBMS you are using this is more kind of pseudo code which could work based on your system.

PostgreSQL Selecting Most Recent Entry for a Given ID

Table Essentially looks like:
Serial-ID, ID, Date, Data, Data, Data, etc.
There can be Multiple Rows for the Same ID. I'd like to create a view of this table to be used in Reports that only shows the most recent entry for each ID. It should show all of the columns.
Can someone help me with the SQL select? thanks.
There's about 5 different ways to do this, but here's one:
SELECT *
FROM yourTable AS T1
WHERE NOT EXISTS(
SELECT *
FROM yourTable AS T2
WHERE T2.ID = T1.ID AND T2.Date > T1.Date
)
And here's another:
SELECT T1.*
FROM yourTable AS T1
LEFT JOIN yourTable AS T2 ON
(
T2.ID = T1.ID
AND T2.Date > T1.Date
)
WHERE T2.ID IS NULL
One more:
WITH T AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC) AS rn
FROM yourTable
)
SELECT * FROM T WHERE rn = 1
Ok, i'm getting carried away, here's the last one I'll post(for now):
WITH T AS (
SELECT ID, MAX(Date) AS latest_date
FROM yourTable
GROUP BY ID
)
SELECT yourTable.*
FROM yourTable
JOIN T ON T.ID = yourTable.ID AND T.latest_date = yourTable.Date
I would use DISTINCT ON
CREATE VIEW your_view AS
SELECT DISTINCT ON (id) *
FROM your_table a
ORDER BY id, date DESC;
This works because distinct on suppresses rows with duplicates of the expression in parentheses. DESC in order by means the one that normally sorts last will be first, and therefor be the one that shows in the result.
https://www.postgresql.org/docs/10/static/sql-select.html#SQL-DISTINCT
This seems like a good use for correlated subqueries:
CREATE VIEW your_view AS
SELECT *
FROM your_table a
WHERE date = (
SELECT MAX(date)
FROM your_table b
WHERE b.id = a.id
)
Your date column would need to uniquely identify each row (like a TIMESTAMP type).