Comparing different rows in PostgreSQL for each Id - sql

Few columns in my table looks like
Id Code date latest
1 T 2014-10-04 0
2 B 2014-10-19 0
2 B 2014-10-26 0
1 S 2014-10-05 0
1 T 2014-10-06 0
1 T 2014-10-08 1
2 P 2014-10-27 1
I am tracking all changes made by each ID. if there is any change, I insert new row and update the latest value column.
What I want is for each Id, I should be able to find last code where latest is 0. Also, that code should not be equal to existing code(latest = 1) So for id = 1, answer cannot be
Id Code
1 T
as for id = 1 T is existing code (latest = 1).
So ideally my output should look like:
Id Code
1 S
2 B
I think I can get the latest value for code for each id where latest = 0.
But how do I make sure that it should not be equal to existing code value (latest = 1)

Works in Postgres:
SELECT DISTINCT ON (t0.id)
t0.id, t0.code
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.code IS NULL
ORDER BY t0.id, t0.date DESC;
I use the combination of a LEFT JOIN / IS NULL to remove siblings of rows with latest = 1. There are various ways to do this:
Select rows which are not present in other table
Details for DISTINCT ON:
Select first row in each GROUP BY group?
Version with CTE and 2x LEFT JOIN
Since Redshift does not seem to support DISTINCT ON:
WITH cte AS (
SELECT t0.*
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.id IS NULL
)
SELECT c0.id, c0.code
FROM cte c0
LEFT JOIN cte c1 ON c1.id = c0.id
AND c1.date > c0.date
WHERE c1.id IS NULL
ORDER BY c0.id;
SQL Fiddle showing both.

I think the following does what you want:
select t.*
from (select distinct on (code) id, code
from table t
where latest = 0
order by code, date desc
) t
where not exists (select 1 from table t2 where t2.id = t.id and t2.code = t.code and t2.latest = 1);

I believe you should have a data for the current version and you should create another table where you would store previous revisions, having foreign key to the Id. Your Id does not fulfill the general expectations for a column with such a name. So, ideally, you would:
create a table Revisions(Id, myTableId, core, date, revision), where Id would be auto_increment primary key and myTableId would point to the Id of the records (1 and 2 in the example)
migrate the elements into revision: insert into Revisions(myTableId, core, date, revision) select Id, core, date latest from MyTable where latest = 0
update the migrated records: update Revisions r1 set r1.revision = (select count(*) from revisions r2 where r2.date < r1.date)
remove the old data from your new table: delete from MyTable where latest = 0
drop your latest column from MyTable
From here, you will be always able to select the penultimate version, or second to last and so on, without problems. Note, that my code suggestions might be of wrong syntax in postgreSQL, as I have never used it, but the idea should work there as well.

Related

Deleting equal number of records with positive and negative values in a table

I have a table having multiple negative and positive values, i want to delete only those number of records from table which are having negative values and have the same positive values . I'm not sure how to explain this scenario...
I will give a brief example-
I have a table with 6 records in which 2 records are with negative value and 4 record with positive
Name | number
A | 1
A |-1
A | 1
A |-1
A | 1
A | 1
So here i want to delete equal number of records of negative value and positive value
so my output should be
Name | Number
A | 1
A | 1
By using Row_number
;WITH CTE AS (
select *,ROW_NUMBER()OVER(PARTITION BY number ORDER BY (SELECT NULL)) -1 RN from Table1 )
Select Name, number from CTE WHERE RN NOT IN (1,0)
The following query assumes that your table has either a column called id which is either a primary key or some other means to order your records. Without any order, your question cannot be answered, and in fact the data sample you showed us would have no meaning, since internally records have no order in a SQL database.
WITH cte1 AS (
SELECT t1.id, t1.number, SUM(t2.number) as sum
FROM yourTable t1
INNER JOIN yourTable t2 on t1.id >= t2.id
GROUP BY t1.id, t1.number
)
WITH cte2 AS (
SELECT MAX(id) AS cutoff
FROM cte1
WHERE sum = 0
)
SELECT t.*
FROM yourTable t
WHERE t.id > (SELECT cutoff FROM cte2)
Note that I used the old school way of computing a running sum because you never told us the version of SQL Server which you are using. Hence, I didn't want to make assumptions about what you have available.
declare #negvalrecs int = (select COUNT(*) from tab where Number < 0)
delete
from tab
where Number < 0
delete top (#negvalrecs)
from tab
where Number > 0
Thanks for all your inputs!
I have a solution for it. We will be needing row number function for it.
--Providing row number to rows
select *,row_number () over (partition by name,number order by name) R into #1 from Table
--Taking negative values
select * into #2 from #1 where number<0
--Now Deleting those records from the main table by joining this table
delete #1 from #1 a inner join #2 b on a.name=b.name and a.number=b.number and a.r<=b.r
delete #1 from #1 a inner join #2 b on a.name=b.name and a.number=-(b.number) and a.r<=b.r
Hope it helps!
I recently encountered a similar problem and this is how I resolved it.
I also had records in table where there we no negatives for a given name the union all is to bring such records.
SELECT t1.name, t1.number
FROM table t1
LEFT OUTER JOIN
(SELECT name, number FROM table where number < 0) t2
ON
t1.name = t2.name and t1.number = t2.number
WHERE t1.number > 0 and t2.number IS NOT NULL
UNION ALL
SELECT t1.name, t1.number
FROM table t1
LEFT OUTER JOIN
(SELECT name, number FROM table where number < 0) t2
ON
t1.name = t2.name
WHERE t1.number > 0 and t2.number IS NULL;`
Try this,
delete from table_name
where substring(ltrim(rtrim(number)),1,1)='-'

SQL Server Return Rows Where Field Changed

I have a table with 3 values.
ID AuditDateTime UpdateType
12 12-15-2015 18:09 1
45 12-04-2015 17:41 0
75 12-21-2015 04:26 0
12 12-17-2015 07:43 0
35 12-01-2015 05:36 1
45 12-15-2015 04:35 0
I'm trying to return only records where the UpdateType has changed from AuditDateTime based on the IDs. So in this example, ID 12 changes from the 12-15 entry to the 12-17 entry. I would want that record returned. There will be multiple instances of ID 12, and I need all records returned where an ID's UpdateType has changed from its previous entry. I tried adding a row_number but it didn't insert sequentially because the records are not in the table in order. I've done a ton of searching with no luck. Any help would be greatly appreciated.
By using a CTE it is possible to find the previous record based upon the order of the AuditDateTime
WITH CTEData AS
(SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AuditDateTime) [ROWNUM], *
FROM #tmpTable)
SELECT A.ID, A.AuditDateTime, A.UpdateType
FROM CTEData A INNER JOIN CTEData B
ON (A.ROWNUM - 1) = B.ROWNUM AND
A.ID = B.ID
WHERE A.UpdateType <> B.UpdateType
The Inner Join back onto the CTE will give in one query both the current record (Table Alias A) and previous row (Table Alias B).
This should do what you're trying to do I believe
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
LEFT OUTER JOIN dbo.My_Table T3 ON
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
WHERE
T3.ID IS NULL
Alternatively:
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
WHERE
NOT EXISTS
(
SELECT *
FROM
dbo.My_Table T3
WHERE
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
)
The basic gist of both queries is that you're looking for rows where an earlier row had a different type and no other rows exist between the two rows (hence, they're sequential). Both queries are logically identical, but might have differing performance.
Also, these queries assume that no two rows will have identical audit times. If that's not the case then you'll need to define what you expect to get when that happens.
You can use the lag() window function to find the previous value for the same ID. Now you can pick only those rows that introduce a change:
select *
from (
select lag(UpdateType) over (
partition by ID
order by AuditDateTime) as prev_updatetype
, *
from YourTable
) sub
where prev_updatetype <> updatetype
Example at SQL Fiddle.

How can I avoid a sub-query?

This is my table:
ID KEY VALUE
1 alpha 100
2 alpha 500
3 alpha 22
4 beta 60
5 beta 10
I'm trying to retrieve a list of all KEY-s with their latest values (where ID is in its maximum):
ID KEY VALUE
3 alpha 22
5 beta 10
In MySQL I'm using this query, which is not effective:
SELECT temp.* FROM
(SELECT * FROM t ORDER BY id DESC) AS temp
GROUP BY key
Is it possible to avoid a sub-query in this case?
Use an INNER JOIN to join with your max ID's.
SELECT t.*
FROM t
INNER JOIN (
SELECT ID = MAX(ID)
FROM t
GROUP BY
key
) tm ON tm.ID = t.ID
Assuming the ID column is indexed, this is likely as fast as its going to get.
here is the mysql documentation page that discusses this topic.
it presents three distinct options.
the only one that doesn't involve a sub query is:
SELECT t1.id, t1.k, t1.value
FROM t t1
LEFT JOIN t t2 ON t1.k = t2.k AND t1.id < t2.id
WHERE t2.k IS NULL;
There's page in the manual explaining how to do this

MYSQL join, return first matching row only from where join condition using OR

I'm having a problem with a particular MySQL query.
I have table1, and table2, table2 is joined onto table1.
Now the problem is that I am joining table2 to table1 with a condition that looks like:
SELECT
table1.*, table2.*
JOIN table2 ON ( table2.table1_id = table1.id
AND ( table2.lang = 'fr'
OR table2.lang = 'eu'
OR table2.lang = 'default') )
I need it to return only 1 row from table2, even though there might exists many table2 rows for the correct table1.id, with many different locales.
I am looking for a way to join only ONE row with a priority of the locales, first check for one where lang = something, then if that doesn't manage to join/return anything, then where lang = somethingelse, and lastly lang = default.
FR, EU can be different for many users, and rows in the database might exist for many different locales.. I need to select the most suitable ones with the correct fallback priority.
I tried doing the query above with a GROUP BY table2.table1_id, and it seemed to work, but I realised that if the best matching (first OR) was entered later in the table (higher primary ID) it would return 2nd or default priority as the grouped by row..
Any tips?
Thank you!
it still doesn't seem to "know" what t1.id is :(
Here follows my entire query, it goes to show table1 = _product_sku, table2 = _product_sku_data, t1 = ps, t2 = psd
SELECT ps.id, psd.description, psd.lang
FROM _product_sku ps
CROSS JOIN
( SELECT lang, title, description
FROM _product_sku_data
WHERE product_sku_id = ps.id
ORDER BY CASE WHEN lang='$this->profile_language_preference' THEN 0
WHEN lang='$this->browser_language' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
LIMIT 1
) AS psd
Edit:
This version uses variables to provide some sort of ranking within the available languages. Tables and test-data are not from the original question, but from the query OP provided as an answer.
It produced the expected results when I tried it:
SELECT id, description, lang
FROM
(
SELECT ps.id, psd.description, psd.lang,
CASE
WHEN #id != ps.id THEN #rownum := 1
ELSE #rownum := #rownum + 1
END AS rank,
#id := ps.id
FROM _product_sku ps
JOIN _product_sku_data psd ON ( psd.product_sku_id = ps.id )
JOIN ( SELECT #id:=NULL, #rownum:=0 ) x
ORDER BY id,
CASE WHEN lang='$this->profile_language_preference' THEN 0
WHEN lang='$this->browser_language' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
) x
WHERE rank = 1;
Old version which did not work, since ps.id is not known in the WHERE clause:
This one should return you the rows of table1 with the "best matching" row of table2 by using LIMIT 1 and ordering languages as defined:
SELECT t1.id, t2.lang, t2.some_column
FROM table1 t1
CROSS JOIN
( SELECT lang, some_column
FROM table2
WHERE table1_id = t1.id
ORDER BY CASE WHEN lang='fr' THEN 0
WHEN lang='eu' THEN 1
WHEN lang='default' THEN 2
ELSE 3
END
LIMIT 1
) t2

Oracle: Check if rows exist in other table

I've got a query joining several tables and returning quite a few columns.
An indexed column of another table references the PK of one of these joined tables. Now I would like to add another column to the query that states if at least one row with that ID exists in the new table.
So if I have one of the old tables
ID
1
2
3
and the new table
REF_ID
1
1
1
3
then I'd like to get
ID REF_EXISTS
1 1
2 0
3 1
I can think of several ways to do that, but what is the most elegant/efficient one?
EDIT
I tested the performance of the queries provided with 50.000 records in the old table, every other record matched by two rows in the new table, so half of the records have REF_EXISTS=1.
I'm adding average results as comments to the answers in case anyone is interested. Thanks everyone!
Another option:
select O.ID
, case when N.ref_id is not null then 1 else 0 end as ref_exists
from old_table o
left outer join (select distinct ref_id from new_table) N
on O.id = N.ref_id
I would:
select distinct ID,
case when exists (select 1 from REF_TABLE where ID_TABLE.ID = REF_TABLE.REF_ID)
then 1 else 0 end
from ID_TABLE
Provided you have indexes on the PK and FK you will get away with a table scan and index lookups.
Regards
K
Use:
SELECT DISTINCT t1.id,
CASE WHEN t2.ref_id IS NULL THEN 0 ELSE 1 END AS REF_EXISTS
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.ref_id = t1.id
Added DISTINCT to ensure only unique rows are displayed.
A join could return multiple rows for one id, as it does for id=1 in the example data. You can limit it to one row per id with a group by:
SELECT
t1.id
, COUNT(DISTINCT t2.ref_id) as REF_EXISTS
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.ref_id = t1.id
GROUP BY t1.id
The group by ensures there's only one row per id. And count(distinct t2.ref_id) will be 1 if a row is found and 0 otherwise.
EDIT: You can rewrite it without a group by, but I doubt that will make things easer:
SELECT
t1.id
, CASE WHEN EXISTS (
SELECT * FROM TABLE_2 t2 WHERE t2.ref_id = t1.id)
THEN 1 ELSE 0 END as REF_EXISTS
, ....
FROM TABLE_1 t1