How to compare two rows in postgresql? - sql

I have a table tab that contaions:
item identifier quantity methodid
10 1 20 2
10 1 30 3
11 1 10 3
11 1 12.5 3
11 2 20 5
12 1 20 1
12 1 30 1
I need to write a function that checks if there is a case of duplicate methodid for item and identifier.
In the above example item 11 identifier 1 has two rows of methodid 3 means it's duplicated, also item 12 idfentifier 1 has duplicated rows as well.
I don't need to do anything to the data just to identify this situation.
I don't need to find where and what was duplicated... just tell there is duplication.
The only information I have is the identifier
CREATE OR REPLACE FUNCTION func(identifier integer)
RETURNS integer AS
$BODY$
declare
errorcode int;
begin
if _____________ then
errorcode =1;
raise exception 'there id duplication in this identifier';
END IF;
continue work
return 0;
exception
when raise_exception then
return errorcode;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
in the blank spot I want to put a query that checks for duplications.
How do I write a query that perform the check?
The structure of function can be changed. but I need somehow to know when to raise the exception.

To check wether any datasets are duplicated based on selected columns you could group by these columns and count the occurrences.
So in your case you could do:
SELECT 1 FROM tab GROUP BY item, identifier, methodid HAVING COUNT(*) > 1;
To incorporate this into your functions you could just check if it exists:
if EXISTS (SELECT 1 ...) then

Use group by:
select item, identifier, methodid, count(*)
from tab
group by item, identifier, methodid
having count(*) > 1
Where having count(*) > 1 is used to return only duplicated rows.

Try with this following one may be you will get your result set.
First generate a row number for the table which we have.
For that the following is the query.
select *,ROW_NUMBER() over (partition by item,identifier,methodid order by item) as RowID
from tab;
Then you will get the result like below.
Item Identifier quantity methodid RowID
10 1 20 2 1
10 1 30 3 1
11 1 10 3 1
11 1 12.5 3 2
11 2 20 5 1
12 1 20 1 1
12 1 30 1 2
12 1 40 2 1
So from this result set you can try with following query,then you will get the result
select * from (
select *,ROW_NUMBER() over (partition by item,identifier,methodid order by item) as rowid
from tab) as p
where p.rowid = 1
Thanks.

select *
from ( select item,identifier,quantity,methodid,
row_number() over(partition item,identifier,methodid) as rank)
each rank row with value higher than 1 is a duplicated row

Related

Postgres The plpgsql aggregate function filters the length of each group

For plpgsql aggregate function help, not sure whether it can be realized. Thanks in advance for your help
Table
_id group_id content num len
0 2 tab 1 3
1 2 name 2 4
2 1 tag 1 3
3 1 bag 2 3
4 1 a 3 1
5 2 b 3 1
6 1 bo 4 2
7 2 an 4 2
I want to implement an aggregation function to aggregate according to group_id, and num is processed in sorted order, and then judge in the function to skip if len is less than or equal to 2, and then return the data of the specified length after each aggregation.
example:
with sorted_table as(select * from Table order by num)
select my_func(content, len, 2(required_num)) from sorted_table group by group_id;
expect result
_id group_id content num len
0 2 tab 1 3
1 2 name 2 4
2 1 tag 1 3
3 1 bag 2 3
for example, need to sort the top 10 (required_num) in each group, sort according to the num of each group, and compare the contents of the top 10 in turn. If the similarity is too high(i can use select similarity judge), filter out, and so on to reach 10 per group Claim. It may also be this
group_id result
2 [{"num":1,"content":"tab","len":3,"_id":0},{"num":2,"content":"name","len":4,"_id":1}]
1 [{"num":1,"content":"tag","len":3,"_id":2},{"num":2,"content":"bag","len":3,"_id":3}]
As far as I understand the question, you don't really need the custom aggregate:
select group_id,
jsonb_agg(t) filter (where len <= 2) as result
from the_table t
group by group_id;

select query - eliminate rows with duplicate column value on condition

I have a select query that ends up with results like:
ID COMPLIANT
------------------
10 0
12 0
29 0
29 1
43 1
44 1
44 0
How can I get results without these duplicate ID rows, on the condition that if an ID has already been marked as COMPLIANT once (a 1 instead of a 0), the duplicate rows with COMPLIANT=0 do not appear? I'd want:
ID COMPLIANT
------------------
10 0
12 0
29 1
43 1
44 1
How about aggregation?
select id, max(complaint) as complaint
from t
group by id;
This returns one row per id. If you can have multiple complaints -- and you want all of those -- than an alternative is:
select id, complaint
from t
where complaint = 1
union all
select id, complaint
from t
where not exists (select 1 from t t2 where t2.id = t.id and t2.complaint = 1);
this will work:
select id, max(complaint)
from tablename
group by id;

SQL Server GROUP BY COUNT Consecutive Rows Only

I have a table called DATA on Microsoft SQL Server 2008 R2 with three non-nullable integer fields: ID, Sequence, and Value. Sequence values with the same ID will be consecutive, but can start with any value. I need a query that will return a count of consecutive rows with the same ID and Value.
For example, let's say I have the following data:
ID Sequence Value
-- -------- -----
1 1 1
5 1 100
5 2 200
5 3 200
5 4 100
10 10 10
I want the following result:
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 1
5 2 200 2
5 4 100 1
10 10 10 1
I tried
SELECT ID, MIN([Sequence]) AS Start, Value, COUNT(*) AS [Count]
FROM DATA
GROUP BY ID, Value
ORDER BY ID, Start
but that gives
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 2
5 2 200 2
10 10 10 1
which groups all rows with the same values, not just consecutive rows.
Any ideas? From what I've seen, I believe I have to left join the table with itself on consecutive rows using ROW_NUMBER(), but I am not sure exactly how to get counts from that.
Thanks in advance.
You can use Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g to create a group:
SELECT
ID,
MIN(Sequence) AS Sequence,
Val,
COUNT(*) AS cnt
FROM
(
SELECT
ID,
Sequence,
Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g,
Val
FROM
yourtable
) AS s
GROUP BY
ID, Val, g
Please see a fiddle here.

Get offset of a row after performing sort operation in sql

I am using SQLite database.
Suppose I have rows with IDs 1 to 50. Then I perform select and order by operation.
Say, the result is IDs : 6,3,5,2,9,12,1,34,45,15.
Now, I want to know the offset of a particular row with given ID in the above result.e.g. offset of ID 1 is 6.
Can I do this in a single query?
put the query of ordered result into a subquery and use count(*) and check the id sequence:
Example:
SCHEMA:
CREATE TABLE tbl ("id" INTEGER,"val" INTEGER);
INSERT INTO tbl ("id","val")
VALUES
(12,6),(1,7),(34,8),(6,1),(9,5),
(45,9),(15,10),(3,2),(5,3),(2,4);
QUERY:
select id,(
select count(*)
from (
select id,val
from tbl order by val
) b
where a.val >= b.val)-1 as offset
from tbl a
order by offset
RESULT:
id offset
6 0
3 1
5 2
2 3
9 4
12 5
1 6
34 7
45 8
15 9
SQLFIDDLE DEMO

How to COUNT DISTINCT on more than one column

I have the following table.
group _id p_id version value
1 1 1 10
1 1 2 11
1 1 2 12
1 2 3 13
2 1 2 14
2 1 3 15
2 1 2 16
I would like to count on how many records for each group_id and how many distinct p_id + version for each group_id. I have following query
SELECT "group_id",count(*) , count(distinct "p_id","version")
FROM tbl
group by "group_id"
Aapparently, it' not going to work, as Oracle will give me error on COUNT
ORA-00909: invalid number of arguments
I know this can be done by subquery. However, is there any simple way to get same result? Considing the performance is important to me, as we have more than 500 million records in the table.
SQL Fiddle
I don't know if it's the best way, but I normally concatenate the two values, using a delimiter to enforce "distinctness", so they become one expression, which Oracle can handle with COUNT DISTINCT:
SELECT "group_id",count(*) , count(distinct "p_id" || '-' || "version")
FROM tbl
group by "group_id"