how to get last value of any attribute in postgresql pivot table - sql

I have a table like this
id u_id attr_key attr_value process_id insert_time
------|-------|----------|------------|--------------|--------------
1 1 name john 1 1
2 1 family smith 1 2
3 2 job clerk 2 3
4 1 name sarah 3 4
.............
I have to find two things:
I have to create a view by tablefunc(crosstab) to fetch a group of data for any of u_id ..so it's simple
I have to find (realtime) last value of any key of any u_id (like Hbase database) so I don't have any good solution
this is what i need
id u_id attr_key attr_value
------|-------|----------|------------
4 1 name sarah
2 1 family smith
Any idea or function?
(its possible to add a column in my data model )

The most efficient solution to greatest-n-per-group problems in Postgres is to use distinct on ():
The following will retrieve that for a single u_id
select distinct on (attr_key) id, u_id, attr_key, attr_value
from the_table
where u_id = 1
order by attr_key, insert_time desc;
An index on (u_id, attr_key, insert_time) should help for performance

Related

PostgreSQL: how to delete duplicated rows grouped by the value of a column?

Given the following table, I need to delete every row corresponding to a certain "id" whenever all these rows are duplicated in a successive "id". Note that the deletion all rows for a specific "id" should happen only in case that every row between the two ids match (with the exception of the different "id" column).
id
name
subject
score
1
Ann
Maths
9
1
Ann
History
8
2
Ann
Maths
9
2
Ann
History
8
3
Ann
Maths
9
3
Ann
History
7
4
Bob
Maths
8
4
Bob
History
8
For this specific input, the updated output table should be:
id
name
subject
score
1
Ann
Maths
9
1
Ann
History
8
3
Ann
Maths
9
3
Ann
History
7
4
Bob
Maths
8
4
Bob
History
8
This because all records between id 1 and 2 are the exactly the same. This doesn't apply for "id" 1 and 3, as long as there's at least one row not in common between the two (id 1 has 8 in History while id 3 has 7 in the same subject).
So it is not as simple as deleting duplicated rows. Here's my attempt:
DELETE FROM table a
USING table b
WHERE a.name = b.name
AND a.subject = b.subject
AND a.score = b.score
AND a.ID < b.ID;
Can you help me?
You can first get all ids that shouldn't be deleted and then exclude them in the WHERE clause of the DELETE statement.
Step 1. In order to match unique ids that are not repeated for all rows, you can use PostgreSQL DISTINCT ON construct, that will allows you to get every row that is not duplicated on the fields "name", "subject", "score". Then retrieve these ids only once with a simple DISTINCT.
SELECT DISTINCT id
FROM (SELECT DISTINCT ON (name, subject, score) id
FROM tab
ORDER BY name, subject, score, id) ids_to_keep
Step 2. Hence you can build the DELETE statement using the NOT IN operator inside the WHERE clause:
DELETE FROM tab
WHERE id NOT IN (
SELECT DISTINCT id
FROM (SELECT DISTINCT ON (name, subject, score) id
FROM tab
ORDER BY name, subject, score, id) ids_to_keep
);
Check the demo here.

Query to get index of member in multiple classes?

Given the following table Attendance, where AttID is the primary key and table is sorted. I'm attempting to search in the index of MemberID in each ClassID, or return total members in a class if the MemberID does not exist in the class (this condition is less important to me).
AttID
ClassID
MemberID
1
1
1
2
1
2
3
1
3
4
2
30
5
2
40
5
2
1
6
2
50
For example:
Given the target MemberID is 1, I will get the following
ClassID
Index
1
1
2
3
Given the target MemberID is 2, I will get the following
ClassID
Index
1
2
2
4
I'm using these results to determine whether a member that attended a class is within the classes' capacity.
"5" is repeated for Attid so it is not a primary key. I will assume this is a typo.
You have basically described the row_number() function:
select a.*
from (select a.*,
row_number() over (partition by classid order by attid) as seqnum
from attendance a
) a
where memberid = ?

How to select all duplicate rows except original one?

Let's say I have a table
CREATE TABLE names (
id SERIAL PRIMARY KEY,
name CHARACTER VARYING
);
with data
id name
-------------
1 John
2 John
3 John
4 Jane
5 Jane
6 Jane
I need to select all duplicate rows by name except the original one. So in this case I need the result to be this:
id name
-------------
2 John
3 John
5 Jane
6 Jane
How do I do that in Postgresql?
You can use ROW_NUMBER() to identify the 'original' records and filter them out. Here is a method using a cte:
with Nums AS (SELECT id,
name,
ROW_NUMBER() over (PARTITION BY name ORDER BY ID ASC) RN
FROM names)
SELECT *
FROM Nums
WHERE RN <> 1 --Filter out rows numbered 1, 'originals'
select * from names where not id in (select min(id) from names
group by name)

Splitting the data through SSIS

I have a table "Employee" as shown below
Id Name
1 John
2 Jaffer
3 Syam
4 Aish
5 Gidson
1 Aboo
2 Sindhu
3 Saravanan
I want to get two outputs like
Id
1
2
3
Id
4
5
Which transformation should i use?
Could you Please help on that?
You will have to write two queries.
SELECT Id
FROM Employee
GROUP BY Id
HAVING COUNT(Id)>1
The above query will give you first output
SELECT Id
FROM Employee
GROUP BY Id
HAVING COUNT(Id)=1
This will give you 2nd output.

How do I use a select query to get the least of one value for each unique second value?

There are groups like this;
USER_ID SEQ_ID NAME
1 2 Armut
1 3 Elma
1 4 Kiraz
2 1 Nar
2 2 Uzum
4 3 Sheftali
4 4 Karpuz
4 5 Kavun
After select query I want to see only;
USER_ID SEQ_ID NAME
1 2 Armut
2 1 Nar
4 3 Karpuz
That is, I want the row with the least SEQ_ID for each USER_ID. What SQL query will give me this result?
Best regards
SELECT USER_ID, SEQ_ID, NAME
FROM table
WHERE NAME IN ('Armut', 'Nar', 'Karpuz')
ORDER BY USER_ID
If you have something else in mind, please clarify your question.
Looks to me like it should be:
SELECT USER_ID, MIN(SEQ_ID) AS SEQ_ID, NAME
FROM table
GROUP BY USER_ID, NAME
ORDER BY USER_ID;