Compare different orders of the same table - sql

I have this following scenario, a table with these columns:
table_id|user_id|os_number|inclusion_date
In the system, the os_number is sequential for the users, but due to a system bug some users inserted OSs in wrong order. Something like this:
table_id | user_id | os_number | inclusion_date
-----------------------------------------------
1 | 1 | 1 | 2015-11-01
2 | 1 | 2 | 2015-11-02
3 | 1 | 3 | 2015-11-01
Note the os number 3 inserted before the os number 2
What I need:
Recover the table_id of the rows 2 and 3, which is out of order.
I have these two select that show me the table_id in two different orders:
select table_id from table order by user_id, os_number
select table_id from table order by user_id, inclusion_date
I can't figure out how can I compare these two selects and see which users are affected by this system bug.

Your question is a bit difficult because there is no correct ordering (as presented) -- because dates can have ties. So, use the rank() or dense_rank() function to compare the two values and return the ones that are not in the correct order:
select t.*
from (select t.*,
rank() over (partition by user_id order by inclusion_date) as seqnum_d,
rank() over (partition by user_id order by os_number) as seqnum_o
from t
) t
where seqnum_d <> seqnum_o;

Use row_number() over both orders:
select *
from (
select *,
row_number() over (order by os_number) rnn,
row_number() over (order by inclusion_date) rnd
from a_table
) s
where rnn <> rnd;
table_id | user_id | os_number | inclusion_date | rnn | rnd
----------+---------+-----------+----------------+-----+-----
3 | 1 | 3 | 2015-11-01 | 3 | 2
2 | 1 | 2 | 2015-11-02 | 2 | 3
(2 rows)

Not entirely sure about the performance on this but you could use a cross apply on the same table to get the results in one query. This will bring up the pairs of table_ids which are incorrect.
select
a.table_id as InsertedAfterTableId,
c.table_id as InsertedBeforeTableId
from table a
cross apply
(
select b.table_id
from table b
where b.inclusion_date < a.inclusion_date and b.os_number > a.os_number
) c

Both query examples given below simply check a mismatch between inclusion date and os_number:
This first query should return the offending row (the one whose os_number is off from its inclusion date)--in the case of the example row 3.
select table.table_id, table.user_id, table.os_number from table
where EXISTS(select * from table t
where t.user_id = table.user_id and
t.inclusion_date > table.inclusion_date and
t.os_number < table.os_number);
This second query will return the table numbers and users for two rows that are mismatched:
select first_table.table_id, second_table.table_id, first_table.user_id from
table first_table
JOIN table second_table
ON (first_table.user_id = second_table.user_id and
first_table.inclusion_date > second_table.inclusion_date and
first_table.os_number < second_table.os_number);

I would use WINDOW FUNCTIONS to get row numbers in orders in question and then compare them:
SELECT
sub.table_id,
sub.user_id,
sub.os_number,
sub.inclusion_date,
number_order_1, number_order_2
FROM (
SELECT
table_id,
user_id,
os_number,
inclusion_date,
row_number() OVER (PARTITION BY user_id
ORDER BY os_number
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS number_order_1,
row_number() OVER (PARTITION BY user_id
ORDER BY inclusion_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS number_order_2
FROM
table
) sub
WHERE
number_order_1 <> number_order_1
;
EDIT:
Because of a_horse_with_no_name made good point about my final answer. I've back to my first answer (look in edit history) which work also if os_number isn't gapless.

select *
from (
select a_table.*,
lag(inclusion_date) over (partition by user_id order by os_number) as last_date
from a_table
) result
where last_date is not null AND last_date>inclusion_date;
This should cover gaps as well as ties. Basically, I simply check the inclusion_date of the last os_number, and make sure it's not strictly greater than the current date (so 2 version on the same date is fine).

Related

Finding SQL duplicates - two methods different results

I have a table in which duplicates may appear. A duplicate is considered when:
sector_id, department_id,number_id are the same (I will add that these are foreign keys to other tables, because maybe it is important)
and valid_to is null
I did this with two queries:
1.
select count(*) from(
select sector_id, departament_id,numer_id, count(*) from tables.workspace
where valid_to is null
group by 1,2,3
having count(*) >1 ) as r
--results : 650
with duplicate_rows as
(
select *, count(id) over (partition by sector_id, departament_id, numer_id) duplicate_count from tables.workspace where valid_to is null
)
select count(*) from
(
select * from duplicate_rows where duplicate_count >1
) as t
--results : 3655
Please explain what I`m doing wrong, possibly why these two functions return different values and which of them is true
Your second query is the wrong one.
You're using a window function and selecting everything in your CTE, which means that every record will have the total COUNT for each combination of your partition by fields.
For example, if there are 3 records with sector_id = 'A', departament_id = 'RED', numer_id = 1, your CTE will look like this:
sector_id | departament_id | numer_id | duplicate_count
------------+----------------+----------+-----------------
A | RED | 1 | 3
A | RED | 1 | 3
A | RED | 1 | 3
Which means that your second query will return 3 instead of 1.
Try adding a DISTINCT to the query that selects from the CTE and it should give you the same results as your first query.
select distinct * from duplicate_rows where duplicate_count >1

Is there a way to calculate average based on distinct rows without using a subquery?

If I have data like so:
+----+-------+
| id | value |
+----+-------+
| 1 | 10 |
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 2 | 20 |
+----+-------+
How do I calculate the average based on the distinct id WITHOUT using a subquery (i.e. querying the table directly)?
For the above example it would be (10+20+30)/3 = 20
I tried to do the following:
SELECT AVG(IF(id = LAG(id) OVER (ORDER BY id), NULL, value)) AS avg
FROM table
Basically I was thinking that if I order by id and check the previous row to see if it has the same id, the value should be NULL and thus it would not be counted into the calculation, but unfortunately I can't put analytical functions inside aggregate functions.
As far as I know, you can't do this without a subquery. I would use:
SELECT AVG(avg_value)
FROM
(
SELECT AVG(value) AS avg_value
FROM yourTable
GROUP BY id
) t;
WITH RANK AS (
Select *,
ROW_NUMBER() OVER(PARTITION BY ID) AS RANK
FROM
TABLE
QUALIFY RANK = 1
)
SELECT
AVG(VALUES)
FROM RANK
The outer query will have other parameters that need to access all the data in the table
I interpret this comment as wanting an average on every row -- rather than doing an aggregation. If so, you can use window functions:
select t.*,
avg(case when seqnum = 1 then value end) over () as overall_avg
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from t
) t;
Yes there is a way,
Simply use distinct inside the avg function as below :
select avg(distinct value) from tab;
http://sqlfiddle.com/#!4/9d156/2/0

How to select the last record of each ID

I need to extract the last records of each user from the table. The table schema is like below.
mytable
product | user_id |
-------------------
A | 15 |
B | 15 |
A | 16 |
C | 16 |
-------------------
The output I want to get is
product | user_id |
-------------------
B | 15 |
C | 16 |
Basically the last records of each user.
Thanks in advance!
You can use a window function called ROW_NUMBER.Here is a solution for you given below. I have also made a demo query in db-fiddle for you. Please check link Demo Code in DB-Fiddle
WITH CTE AS
(SELECT product, user_id,
ROW_NUMBER() OVER(PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id FROM CTE WHERE RN=1 ;
You can try using row_number()
select product,iserid
from
(
select product, userid,row_number() over(partition by userid order by product desc) as rn
from tablename
)A where rn=1
There is no such thing as a "last" record unless you have a column that specifies the ordering. SQL tables represent unordered sets (well technically, multisets).
If you have such a column, then use distinct on:
select distinct on (user_id) t.*
from t
order by user_id, <ordering col> desc;
Distinct on is a very handy Postgres extension that returns one row per "group". It is the first row based on the ordering specified in the order by clause.
You should have a column that stores the insertion order. Whether through auto increment or a value with date and time.
Ex:
autoIncrement
produt
user_id
1
A
15
2
B
15
3
A
16
4
C
16
SELECT produt, user_id FROM table inner join
( SELECT MAX(autoIncrement) as id FROM table group by user_id ) as table_Aux
ON table.autoIncrement = table_Aux.id

How can i select only id of min created date in each group [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
Imagine next tables
Ticket Table
========================
| id | question |
========================
| 1 | Can u help me :)? |
========================
UserEntry Table
======================================================
| id | answer | dateCreated | ticket_id |
======================================================
| 2 | It's my plessure :)? | 2016-08-05 | 1 |
=======================================================
| 3 | How can i help u ? | 2016-08-06 | 1 |
======================================================
So how can I only get id of row for each group which have min date value
So my expected answer should be like that
====
| id |
====
| 2 |
====
UPDATE:
I got the solution in next query
SELECT id FROM UserEntry WHERE datecreated IN (SELECT MIN(datecreated) FROM CCUserEntry GROUP BY ticket_id)
Improved Answer
SELECT id FROM UserEntry WHERE (ticket_id, datecreated) IN
(SELECT ticket_id, MIN(datecreated) FROM UserEntry GROUP BY ticket_id);
Also this is a good and right answer too (NOTE: DISTINCT ON is not a part of the SQL standard.)
SELECT DISTINCT ON (ue.ticket_id) ue.id
FROM UserEntry ue
ORDER BY ue.ticket_id, ue.datecreated
It seems you want to select the ID with the minimum datecreated. That is simple: select the minimum date and then select the id(s) matching this date.
SELECT id FROM UserEntry WHERE datecreated = (SELECT MIN(datecreated) FROM UserEntry);
If you are sure you won't have ties or if you are fine with just one row anyway, you can also use FETCH FIRST ROW ONLY which doesn't have a tie clause in PostgreSQL unfortunately.
SELECT id FROM UserEntry ORDER BY datecreated FETCH FIRST ROW ONLY;
UPDATE: You want the entry ID for the minimum date per ticket. Per ticket translates to GROUP BY ticket_id in SQL.
SELECT ticket_id, id FROM UserEntry WHERE (ticket_id, datecreated) IN
(SELECT ticket_id, MIN(datecreated) FROM UserEntry GROUP BY ticket_id);
The same can be achieved with window functions where you read the table only once:
SELECT ticket_id, id
FROM
(
SELECT ticket_id, id, RANK() OVER (PARTITION BY ticket_id ORDER BY datecreated) AS rnk
FROM UserEntry
) ranked
WHERE rnk = 1;
(Change SELECT ticket_id, id to SELECT id if you want the queries not to show the ticket ID, which would make the results harder to understand of course :-)
You may want fetch first row only or distinct on (if you care about more than one ticket):
SELECT DISTINCT ON (ue.ticket_id) ue.id
FROM UserEntry ue
ORDER BY ue.ticket_id, ue.date_created
This will get the id on the row with the minimum date_created value.
A solution with ANSI SQL that works on a wide range of DBMS that support modern SQL is to use window functions:
select id
from (
select id, row_number() over (partition by ticket_id order by date_created) as rn
from userentry
) t
where rn = 1;
Note that in Postgres, Gordon's solution using distinct on () is usually faster then using window functions

Group by groups of 3 by column

Say I have a table that looks like this:
| id | category_id | created_at |
| 1 | 3 | date... |
| 2 | 4 | date... |
| 3 | 1 | date... |
| 4 | 2 | date... |
| 5 | 5 | date... |
| 6 | 6 | date... |
And imagine there are a lot more entries. I'd like to grab these in a way that they are fresh, so ordering them by created_at DESC - but I'd also like to group them by category, in groups of 3!
So in pseudocode it looks something like this:
Go to category 1
-> Pick last 3
Go to category 2
-> Pick last 3
Go to category 3
-> Pick last 3
And so forth, starting over from category_id 1 when there's no other category to grab from. This will then be paginated as well so I need to make it work with offset & limit as well somehow.
I'm not at all sure where to start or what they keywords to google for are. I'd be happy with some nudges in the right direction so I can find the answer myself, or a full answer.
Another case for the window function row_number().
Just the latest 3 rows per category
SELECT id, category_id, created_at
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
WHERE rn < 4
ORDER BY category_id, rn;
The latest 3 rows per category, plus later rows
If you want to append the rest of the rows (your question gets fuzzy if and how):
SELECT *
FROM (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
) sub
ORDER BY (rn > 3), category_id, rn;
One can sort by the outcome of a boolean expression (rn > 3):
FALSE (0)
TRUE (1)
NULL (because default is NULLS LAST - not applicable here)
This way, the latest 3 rows per category come first and all the rest later.
Or use a CTE and UNION ALL:
WITH cte AS (
SELECT id, category_id, created_at
, row_number() OVER (PARTITION BY category_id
ORDER BY created_at DESC) AS rn
FROM tbl
)
)
SELECT id, category_id, created_at
FROM cte
WHERE rn < 4
ORDER BY category_id, rn
)
UNION ALL
)
SELECT id, category_id, created_at
FROM cte
WHERE rn >= 4
ORDER BY category_id, rn
);
Same result.
All parentheses required to attach ORDER BY in individual legs of a UNION query.