Get first and last record for each group - sql

I have 2 tables like
My output should look like this. Basically, latest and the first status for each user_unique_access_id along with the timestamp
I tried the below query but my output has duplicated records.
select
distinct u.user_unique_access_id,
first_value(ul.status) over (partition by u.user_unique_access_id
order by
ul.created_timestamp asc) as first_status,
min(ul.created_timestamp) as first_created_at,
first_value(ul.status) over (partition by u.user_unique_access_id
order by
ul.created_timestamp desc) as current_status,
max(ul.created_timestamp) as last_created_at
from
public.users u
join public.user_location ul on
u.user_key = ul.user_key
group by
u.user_unique_access_id,
u.user_name,
ul.status,
ul.created_timestamp ;
DDL and DML statement
create table public.users ( user_key serial primary key, user_name varchar(20), user_unique_access_id varchar(20) );
create table public.user_location( user_key bigint not null, STATUS VARCHAR (512) not null, CREATED_TIMESTAMP timestamp not null, foreign key (user_key) references public.users (user_key) );
insert
into
public.users ( user_unique_access_id, user_name)
values('ABC_1', 'ABC');
insert
into
public.users ( user_unique_access_id, user_name)
values('ABC_2', 'ABC');
insert
into
public.user_location (user_key, status, created_timestamp)
values(1, 'Entrance', current_timestamp);
insert
into
public.user_location (user_key, status, created_timestamp)
values(1, 'Building A', current_timestamp);
insert
into
public.user_location (user_key, status, created_timestamp)
values(1, 'Building B', current_timestamp);
insert
into
public.user_location (user_key, status, created_timestamp)
values(1, 'Exit', current_timestamp);
insert
into
public.user_location (user_key, status, created_timestamp)
values(2, 'Entrance', current_timestamp);
insert
into
public.user_location (user_key, status, created_timestamp)
values(2, 'Building A', current_timestamp);

You can get the "first" and the "last" record using DISTINCT ON. Use that in two derived tables (one for the "first", one for the "last" record) and left join these to users.
SELECT u.user_unique_access_id,
f.status,
f.created_timestamp,
l.status,
l.created_timestamp
FROM users u
LEFT JOIN (SELECT DISTINCT ON (ul.user_key)
ul.user_key,
ul.status,
ul.created_timestamp
FROM user_location ul
ORDER BY ul.user_key ASC,
ul.created_timestamp DESC) f
ON f.user_key = u.user_key
LEFT JOIN (SELECT DISTINCT ON (ul.user_key)
ul.user_key,
ul.status,
ul.created_timestamp
FROM user_location ul
ORDER BY ul.user_key ASC,
ul.created_timestamp ASC) l
ON l.user_key = u.user_key;
(I would have linked a db<>fiddle but your DML is pretty useless as the timestamps are all the same.)

Related

Spark SQL Query to assign join date by next closest once

`CREATE TABLE TABLE_1(
CALL_ID INT,
CALL_DATE DATE);
INSERT INTO TABLE_1(CALL_ID, CALL_DATE)
VALUES (1, '2022-10-22'),
(2, '2022-10-31'),
(3, '2022-11-04');
CREATE TABLE TABLE_2(
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_2(PROD_ID, PROD_DATE)
VALUES (1, '2022-10-25'),
(2, '2022-11-17');
CREATE TABLE TABLE_RESULT(
CALL_ID INT,
CALL_DATE DATE,
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_RESULT(CALL_ID, CALL_DATE, PROD_ID, PROD_DATE)
VALUES (1, '2022-10-22', 1, '2022-10-25'),
(2, '2022-10-31', NULL, NULL),
(3, '2022-11-04', 2, '2022-11-17');`
Can you help me to create the TABLE_RESULT with a join in a elegant way? This is a very small example.
Thanks
I solved it. Thanks anyway.
SELECT * FROM (SELECT *, COALESCE(LEAD(CALL_DATE) OVER (PARTITION BY 1 ORDER BY CALL_DATE), CURRENT_DATE) AS CALL_DATE_NEXT FROM TABLE_1) AS A LEFT JOIN TABLE_2 AS B ON (A.CALL_DATE<=B.PROD_DATE AND A.CALL_DATE_NEXT>B.PROD_DATE)

delete rows so that only one row exist for one first name

I created table.
CREATE TABLE test_tab(
ID INT,
FIRSTNAME VARCHAR(40),
TS TIMESTAMP)
And insert values into it.
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (1, 'Jhon', '2018-06-05 00:11:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (2, 'Jhon', '2018-06-15 00:14:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (3, 'Jhon', '2018-06-19 00:10:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (4, 'Mike', '2018-06-05 00:10:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (5, 'Mike', '2018-06-15 00:10:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (6, 'Mike', '2018-06-20 00:10:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (7, 'Lis', '2018-06-05 00:13:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (8, 'Lis', '2018-06-15 00:17:56');
INSERT INTO test_tab (ID, FIRSTNAME, TS) VALUES (9, 'Lis', '2018-06-21 00:10:56');
I need to delete rows so that only one row exist for one first name, leave row with maximum TS.
It is the example of my request.
How can I delete it?
SELECT DISTINCT firstname
FROM test_tab
GROUP BY firstname
HAVING COUNT(firstname) > 1
union
select firstname from test_tab where ts = (select max(ts) from test_tab)
You can delete from a derived table as long as there is a bijection to the underlying table:
delete from (
select t.*, row_number() over (partition by FIRSTNAME order by ts) as rn
from test_tab t
)
where rn > 1;
Fiddle
Try this.
DELETE FROM TEST_TAB T
WHERE NOT EXISTS
(
SELECT 1
FROM TEST_TAB G
WHERE G.FIRSTNAME = T.FIRSTNAME
HAVING MAX (G.TS) = T.TS
);
SELECT * FROM TEST_TAB;
ID
FIRSTNAME
TS
3
Jhon
2018-06-19 00:10:56.000000
6
Mike
2018-06-20 00:10:56.000000
9
Lis
2018-06-21 00:10:56.000000
fiddle

How to extract data rows from a table not present in the second table

I have two tables in my postgreSQL database, Completion and path_completion.
Completion table has 12 columns and Path_Completion has 11 columns where all are same except one extra column in the completion table
Both tables has common rows. I want to get those rows which are not present in Completion table but are there in the Path_completion
Completion table -
Path_Completion table-
I would like to have my result as follows-
Logic being id- 4, event - le155 is present is both tables and status for this in completion table is "\N"
I tried the following but this didn't work-
select *
from path_completion
where (unique_id,event,status) not in
( select unique_id,event,status from completion where status IN ('Completed'))
Sample Query for full testing:
CREATE TABLE test.completion (
id int4 NULL,
ds date NULL,
"event" varchar(100) NULL,
time_duration interval NULL,
event_type varchar(100) NULL,
status varchar(100) NULL,
ranking int4 NULL
);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Completed', 1);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', '\N', 2);
INSERT INTO completion (id, ds, "event", time_duration, event_type, status, ranking)
VALUES(5, '2022-03-25', 'le160', '4 days'::interval, 'abc', '\N', 2);
CREATE TABLE test.path_completion (
id int4 NULL,
ds date NULL,
"event" varchar(100) NULL,
time_duration interval NULL,
event_type varchar(100) NULL,
status varchar(100) NULL
);
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(1, '2022-03-02', 'le100', '8 days'::interval, 'xyz', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(2, '2022-03-18', 'le108', '5 days'::interval, 'pqr', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(3, '2022-03-19', 'le140', '13 days'::interval, 'abc', 'Path_complete');
INSERT INTO path_completion (id, ds, "event", time_duration, event_type, status)
VALUES(4, '2022-03-25', 'le155', '12 days'::interval, 'mno', 'Path_complete');
-- Sample Query:
select pc.* from path_completion pc
inner join completion cc on pc.id = cc.id and pc."event" = cc."event"
where cc.status <> 'Completed';
-- Result
id ds event time_duration event_type status
-------------------------------------------------------------------
4 2022-03-25 le155 12 days mno Path_complete
If your tables have a large data (approximately - over 100000) then using not in gets very bad performance. Recommended using join tables instead of not in. For example:
select * from path_completion pc
inner join completion cc on pc.unique_id = cc.unique_id
where cc.status <> 'Completed'
This gets only data which has an in same tables and status <> 'Completed'. If you need select data that are maybe not in completion table, so use this query:
select * from path_completion pc
left join completion cc on pc.unique_id = cc.unique_id and cc.status <> 'Completed'
I recommended to you using on conditions for example status is null (or status = '\N') instead of status <> 'Completed' for getting the best performance and for always using index scan table mode.

Identifying the next unwatched episode in a series

I need to write a query in SQL that will identify the next unwatched episode in a series by a user. I have tables containing user history (userID, episodeID and show titleID) and Episode (episodeID(PK), episodeName and playCount), where the history table stores the episodeIDs of all watched episodes and the episode table contains all possible episodes.
I had hoped to do some sort of join between the two selecting by 'userID, titleID but i'm going in circles.
I've been learning SQL for about 5 hours in total so have no idea where to begin on this, any help gratefully received. I'm aware the code below is horrible and very clunky (please don't assume any prior knowledge!)
I've tried a left join, but this will return all data irrespective of user.
Create table Actor(
actorID integer primary key,
first_name text not null,
last_name text not null
);
create table AppearsIn(
actorID integer,
seriesID integer primary key
);
create table Series(
seriesID text,
titleID text,
episodeID text,
releaseDate date
);
create table Episode(
episodeID text primary key,
episodeName text,
playCount integer,
actorID integer
);
create table title(
titleID text primary key,
actorID integer,
titleName text,
genre text,
ageRating integer,
releaseDate date
);
create table history(
userID integer,
titleID text,
episodeID text
);
create table user (
userID integer not null,
fullName text not null,
Email text not null,
dateOfBirth date not null,
subscriptionEndDate date not null
);
insert into Actor (actorID, first_name, last_name)
values (76547, 'tom', 'cruise');
insert into Actor (actorID, first_name, last_name)
values (345, 'val', 'kilner');
insert into AppearsIn (actorID, seriesID)
values (345, 1);
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E1', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E2', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E3', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E4', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E5', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E6', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E7', '1991-02-02');
insert into Series (seriesID, titleID, episodeID, releaseDate)
values ('WDS2', 'WalkingDead', 'WDS2E8', '1991-02-02');
insert into Episode (episodeID, episodeName, actorID, playCount)
values ('WDS2E1', 'A', 76547, 1);
insert into Episode (episodeID, episodeName, actorID, playCount)
values ('WDS2E2', 'B', 76547, 1);
insert into Episode (episodeID, episodeName, actorID, playCount)
values ('WDS2E3', 'C', 76547, 1);
insert into Episode (episodeID, episodeName, actorID, playCount)
values ('WDS2E4', 'D', 76547, 0);
insert into Episode (episodeID, episodeName, actorID, playCount)
values ('WDS2E5', 'E', 76547, 0);
insert into history (userID, titleID, episodeID)
values (8924, 'Walking Dead','WDS2E1');
insert into history (userID, titleID, episodeID)
values (8924, 'Walking Dead', 'WDS2E2');
insert into history (userID, titleID, episodeID)
values (8924, 'Walking Dead', 'WDS2E3');
insert into user (userID, fullName, Email, dateOfBirth, subscriptionEndDate)
values (8924, 'bill123', 'bill123#warmpost.com', '1970-02-12', '2019-06-05');
insert into title (titleID, actorID, titleName, genre, ageRating, releaseDate)
values ('WalkingDead', 123455, 'The Walking Dead', 'Drama', 15, '2015-01-01');
This is the query I tried:
SELECT * FROM Episode
LEFT JOIN history ON history.episodeID = Episode.EpisodeID
Where playcount <1
limit 1;
It sounds like you're just having issues with the join. Here's a page on how joins work. From that page:
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
So, if you want only movies that user has seen, you could try replacing LEFT JOIN with RIGHT JOIN, or change the order in which you specify the tables.
I'm currently not able to test this myself, but hopefully this is what you're looking for.
SELECT * FROM Episode
RIGHT JOIN history ON history.episodeID = Episode.EpisodeID
WHERE playcount < 1
LIMIT 1;

SQL Query JOIN on the same table for a given data

I have the following table and data:
CREATE TABLE TEST_TABLE (
ID NUMBER(6) NOT NULL,
COMMON_SEQ NUMBER(22),
NAME VARCHAR(20),
CONSTRAINT PK_CONST PRIMARY KEY (ID)
);
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1001, NULL, 'Michelle');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1002, NULL, 'Tiberius');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1003, NULL, 'Marigold');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1004, 999, 'Richmond');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1005, 999, 'Marianne');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1006, NULL, 'Valentin');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1007, 888, 'Juliette');
INSERT INTO TEST_TABLE (ID, COMMON_SEQ, NAME) VALUES (1008, NULL, 'Lawrence');
Some records in this table are related to each other by the common value of COMMON_SEQ (for example COMMON_SEQ of 999 relates Richmond and Marianne).
How can I select all names based on given ID as an input?
I tried joining table to itself (works ok when COMMON_SEQ is null). This example returns Michelle record:
SELECT T.ID, T.COMMON_SEQ,T.NAME
FROM TEST_TABLE T
LEFT JOIN TEST_TABLE T2 ON NOT T.COMMON_SEQ is NULL
AND T.COMMON_SEQ=T2.COMMON_SEQ AND T.ID<>T2.ID
WHERE T.ID=1001
But it doesn't bring back 2 records for ID 1004. This example returns only Richmond record (but I need to return also Marianne record):
SELECT T.ID, T.COMMON_SEQ,T.NAME
FROM TEST_TABLE T
LEFT JOIN TEST_TABLE T2 ON NOT T.COMMON_SEQ is NULL
AND T.COMMON_SEQ=T2.COMMON_SEQ AND T.ID<>T2.ID
WHERE T.ID=1004
How can I improve/rewrite the query to return Richmond and Marianne records when I supply only one ID value (either 1004 or 1005)?
You could use:
SELECT *
FROM TEST_TABLE t
WHERE COMMON_SEQ IN (SELECT COMMON_SEQ
FROM TEST_TABLE t1
WHERE t1.ID = 1004)
OR t.ID = 1004;
DBFiddle Demo
Passing the same parameter twice to handle NULL in COMMON_SEQ.
Try this
SELECT COALESCE (ty.id, tx.id) AS id,
COALESCE (ty.common_seq, tx.common_seq) AS common_seq,
COALESCE (ty.name, tx.name) AS name
FROM test_table tx LEFT OUTER JOIN test_table ty
ON (tx.common_seq = ty.common_seq)
WHERE tx.ID = 1004;
With this you can avoid using IN or EXISTS and this is likely to be more performant.