Postgresql find by two columns - sql

I have join table tags_videos
tag_id | video_id
--------+----------
1195 | 15033
1198 | 15033
1199 | 15033
1196 | 15034
1198 | 15034
1199 | 15034
1197 | 15035
1198 | 15035
1199 | 15035
1195 | 15036
1197 | 15036
1198 | 15036
How can I select distinct video_id who have two specific tag_id
For example my tag_ids is 1195 and 1198, i should get video_ids 15033 and 15036 (who have 1195 and 1198 tag_id)

Extract the unique (tag_id, video_id) pairs for the two tags in t CTE and select these video_id's that have both tag_id's (i.e. 2 occurrences).
with t as
(
select distinct tag_id, video_id
from tags_videos
where tag_id in ('1195', '1198')
)
select video_id from t
group by video_id having count(*) = 2;
DB-fiddle demo

Related

Compare table to itself and update one value based on another - bulk

The following select provides a list of 8524 values. Half are duplicates of the other half, with different dates. I need to terminate the older values based on the new DateEffective
SELECT PRID, COUNT(SiteID) AS SiteID_Count FROM PRL
WHERE GETDATE() BETWEEN DateEffective AND DateTerminated
and SiteGID in (190,191,192,193,30,31,32,33)
GROUP BY PRID
HAVING COUNT(SiteID)=2
ORDER BY PRID
Below table shows the current and expected result:
select * from PRL where SiteGID in (30,31,32,33) and PRID = 1339
UNION
select * from PRL where SiteGID in (190,191,192,193) and PRID = 1339
table:
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | 9999-12-31
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | **2021-02-18**
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
I want to link two tmp tables together, possibly using row_number and partitions? I'm really not sure - any advice is greatly appreciated
Based on your description,
PRLID is the primary key of table PRL
Grouping is based on (PRID, SiteID)
DateTerminated needs to be updated with following DateEffective - 1 day if applicable.
with cte as (
select prlid,
date_sub(lead(date_effective,1) over (partition by prid, site_id order by date_effective), interval 1 day) as new_date_terminated
from prl)
update prl as p
inner join cte c
using (prlid)
set p.date_terminated = c.new_date_terminated
where c.new_date_terminated is not null
and p.date_terminated <> c.new_date_terminated;
Outcome:
prlid |prid|site_gid|site_id|date_effective|date_terminated|
------+----+--------+-------+--------------+---------------+
895|1339| 30| 4353| 2010-04-10| 2021-02-18|
966598|1339| 191| 4353| 2021-02-19| 9999-12-31|

Postgres array_agg each value at new string within one row

I have such a query
SELECT group_id, array_agg(element_id) FROM table
GROUP BY group_id;
As a result I have something like that:
group_id | array_agg
106 | {2147,2138,2144}
107 | {2132,2510,2139}
What query should be written, so result may be depicted in this way:
group_id | array_agg
106 | {2147
| 2138
| 2144}
107 | {2132
| 2510
| 2139}
Basically one should format the output in a client app, however you can use string_agg() with a new-line character:
select group_id, string_agg(element_id::text, e'\n')
from my_table
group by group_id;
group_id | string_agg
----------+------------
106 | 2147 +
| 2138 +
| 2144
107 | 2132 +
| 2510 +
| 2139
(2 rows)

Removing clusters of duplicates in a query resultset

I have the following query returning the following results:
db=# SELECT t1.id as id1, t2.id as id2
db-# FROM table_1 As t1, table_2 As t2
db-# WHERE ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
db-# ORDER BY t1.id, t2.id, ST_Distance(t2.lonlat, t1.lonlat);
id1 | id2
-------+------
4499 | 1118
4500 | 1118
4501 | 1119
4502 | 1119
4503 | 1118
4504 | 1118
4505 | 1119
4506 | 1119
4507 | 1118
4508 | 1118
4510 | 1118
4511 | 1118
4514 | 1117
4515 | 1117
4518 | 1117
4519 | 1117
4522 | 1117
4523 | 1117
4603 | 1116
4604 | 1116
4607 | 1116
And I want the resultset to look like this:
id1 | id2
-------+------
4499 | 1118
4501 | 1119
4503 | 1118
4505 | 1119
4507 | 1118
4514 | 1117
4603 | 1116
Essentially, in the results, the query is returning duplicates of id2, but it's ok that id2 occurs many times in the results, but it's not ok if id2 is duplicated in clusters.
The use case here is that id1 represents the ID of a table of GPS positions, while id2 represents a table of waypoints, and I want to have a query that returns the closest passing point to any waypoint (so if waypoint #1118 is passed, then it cannot be passed again until another waypoint is passed).
Is there a way to make this happen using Postgres?
This is a gaps-and-islands problem, but rather subtle. In this case, you only want the rows where the previous row has a different id2. That suggests using LAG():
SELECT id1, id2
FROM (SELECT tt.*, LAG(id2) OVER (ORDER BY id1, id2, dist) as prev_id2
FROM (SELECT t1.id as id1, t2.id as id2,
ST_Distance(t2.lonlat, t1.lonlat) as dist
FROM table_1 t1 JOIN
table_2 t2
ON ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
) tt
) tt
WHERE prev_id2 is distinct from id2
ORDER BY id1, id2, dist;
Note: I think the logic as presented could be simplified because id1 seems unique. Hence the distance calculation seems entirely superfluous. I left that logic in because it might be relevant in your actual query.

How do i calculate minimum and maximum for groups in a sequence in SQL Server?

I am having the following data in my database table in SQL Server:
Id Date Val_A Val_B Val_C Avg Vector MINMAXPOINTS
329 2016-01-15 78.09 68.40 70.29 76.50 BELOW 68.40
328 2016-01-14 79.79 75.40 76.65 76.67 BELOW 75.40
327 2016-01-13 81.15 74.59 79.00 76.44 ABOVE 81.15
326 2016-01-12 81.95 77.04 78.95 76.04 ABOVE 81.95
325 2016-01-11 82.40 73.65 81.34 75.47 ABOVE 82.40
324 2016-01-08 78.75 73.40 77.20 74.47 ABOVE 78.75
323 2016-01-07 76.40 72.29 72.95 73.74 BELOW 72.29
322 2016-01-06 81.25 77.70 78.34 73.12 ABOVE 81.25
321 2016-01-05 81.75 76.34 80.54 72.08 ABOVE 81.75
320 2016-01-04 80.95 75.15 76.29 70.86 ABOVE 80.95
The column MIMMAXPOINTS should actually contain lowest of Val_B until Vector is 'BELOW' and highest of Val_A until Vector is 'ABOVE'. So, we would have the following values in MINMAXPOINTS:
MINMAXPOINTS
68.40
68.40
82.40
82.40
82.40
82.40
72.29
81.75
81.75
81.75
Is it possible without cursor?
Any help will be greatly appreciated!.
At first apply classic gaps-and-islands to determine groups (gaps/islands/above/below) and then calculate MIN and MAX for each group.
I assume that ID column defines the order of rows.
Tested on SQL Server 2008. Here is SQL Fiddle.
Sample data
DECLARE #T TABLE
([Id] int, [dt] date, [Val_A] float, [Val_B] float, [Val_C] float, [Avg] float,
[Vector] varchar(5));
INSERT INTO #T ([Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]) VALUES
(329, '2016-01-15', 78.09, 68.40, 70.29, 76.50, 'BELOW'),
(328, '2016-01-14', 79.79, 75.40, 76.65, 76.67, 'BELOW'),
(327, '2016-01-13', 81.15, 74.59, 79.00, 76.44, 'ABOVE'),
(326, '2016-01-12', 81.95, 77.04, 78.95, 76.04, 'ABOVE'),
(325, '2016-01-11', 82.40, 73.65, 81.34, 75.47, 'ABOVE'),
(324, '2016-01-08', 78.75, 73.40, 77.20, 74.47, 'ABOVE'),
(323, '2016-01-07', 76.40, 72.29, 72.95, 73.74, 'BELOW'),
(322, '2016-01-06', 81.25, 77.70, 78.34, 73.12, 'ABOVE'),
(321, '2016-01-05', 81.75, 76.34, 80.54, 72.08, 'ABOVE'),
(320, '2016-01-04', 80.95, 75.15, 76.29, 70.86, 'ABOVE');
Query
To understand better how it works examine results of each CTE.
CTE_RowNumbers calculates two sequences of row numbers.
CTE_Groups assigns a number for each group (above/below).
CTE_MinMax calculates MIN/MAX for each group.
Final SELECT picks MIN or MAX to return.
WITH
CTE_RowNumbers
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,ROW_NUMBER() OVER (ORDER BY ID DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Vector ORDER BY ID DESC) AS rn2
FROM #T
)
,CTE_Groups
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,rn1-rn2 AS Groups
FROM CTE_RowNumbers
)
,CTE_MinMax
AS
(
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,MAX(Val_A) OVER(PARTITION BY Groups) AS MaxA
,MIN(Val_B) OVER(PARTITION BY Groups) AS MinB
FROM CTE_Groups
)
SELECT [Id], [dt], [Val_A], [Val_B], [Val_C], [Avg], [Vector]
,CASE
WHEN [Vector] = 'BELOW' THEN MinB
WHEN [Vector] = 'ABOVE' THEN MaxA
END AS MINMAXPOINTS
FROM CTE_MinMax
ORDER BY ID DESC;
Result
+-----+------------+-------+-------+-------+-------+--------+--------------+
| Id | dt | Val_A | Val_B | Val_C | Avg | Vector | MINMAXPOINTS |
+-----+------------+-------+-------+-------+-------+--------+--------------+
| 329 | 2016-01-15 | 78.09 | 68.4 | 70.29 | 76.5 | BELOW | 68.4 |
| 328 | 2016-01-14 | 79.79 | 75.4 | 76.65 | 76.67 | BELOW | 68.4 |
| 327 | 2016-01-13 | 81.15 | 74.59 | 79 | 76.44 | ABOVE | 82.4 |
| 326 | 2016-01-12 | 81.95 | 77.04 | 78.95 | 76.04 | ABOVE | 82.4 |
| 325 | 2016-01-11 | 82.4 | 73.65 | 81.34 | 75.47 | ABOVE | 82.4 |
| 324 | 2016-01-08 | 78.75 | 73.4 | 77.2 | 74.47 | ABOVE | 82.4 |
| 323 | 2016-01-07 | 76.4 | 72.29 | 72.95 | 73.74 | BELOW | 72.29 |
| 322 | 2016-01-06 | 81.25 | 77.7 | 78.34 | 73.12 | ABOVE | 81.75 |
| 321 | 2016-01-05 | 81.75 | 76.34 | 80.54 | 72.08 | ABOVE | 81.75 |
| 320 | 2016-01-04 | 80.95 | 75.15 | 76.29 | 70.86 | ABOVE | 81.75 |
+-----+------------+-------+-------+-------+-------+--------+--------------+
Modify the query to check for group of data greater than current records as
You can use below query using case statment which will let you select a conditional value based on vector value for each row.
The query is
SELECT ID, DATE, VAL_A, VAL_B, VAL_C, AVG, VECTOR,
CASE
WHEN VECTOR = 'BELOW' THEN (SELECT MIN(VAL_B) FROM TABLE A WHERE ROWID >= B.ROWID)
WHEN VECTOR = 'ABOVE' THEN (SELECT MAX(VAL_A) FROM TABLE A WHERE ROWID >= B.ROWID)
END AS MINMAXVALUE
FROM TABLE B
GO
Check this should yield the result you are expecting from the data.
You can use below query using case statment which will let you select a conditional value based on vector value for each row.
The query is
SELECT ID, DATE, VAL_A, VAL_B, VAL_C, AVG, VECTOR,
CASE
WHEN VECTOR = 'BELOW' THEN (SELECT MIN(VAL_B) FROM TABLE A)
WHEN VECTOR = 'ABOVE' THEN (SELECT MAX(VAL_A) FROM TABLE A)
END AS MINMAXVALUE
FROM TABLE B
GO
Check if this help you.

Conditional Partition

I have a table (Employee_Training) that has the following columns:
Employee_Number
Course_ID
Date_Completed
I have this query that I use to show training, and it filters out the duplicate Date_Completed, only showing the most recent date:
SELECT x.*
FROM (SELECT t.*, ROW_NUMBER() OVER
(PARTITION BY t.Course_ID, t.Employee_Number
ORDER BY t.Date_Completed DESC) AS rank
FROM Employee_Training t) x
WHERE x.rank = 1
Is there any way to format this query not to apply the partition to a specific Course_ID, say like 1000004? I would want to see all the rows where Course_ID = 1000004.
Here is some sample data:
Just using a select all on that table:
557 | 1000002 | 2014-11-18
557 | 1000002 | 2009-7-6
557 | 1000004 | 2011-1-15
557 | 1000004 | 2005-9-22
557 | 1000004 | 2004-4-17
557 | 1000010 | 2014-6-10
557 | 1000010 | 2013-6-09
557 | 1000010 | 2012-6-10
Using my original query I get these results:
557 | 1000002 | 2014-11-18
557 | 1000004 | 2011-1-15
557 | 1000010 | 2014-6-10
What I would like to see (Only the 1000004 not being filtered out):
557 | 1000002 | 2014-11-18
557 | 1000004 | 2011-1-15
557 | 1000004 | 2005-9-22
557 | 1000004 | 2004-4-17
557 | 1000010 | 2014-6-10
Thank you.
Thank you.
You could exclude them from you're row_number partition and union them on at end.
SELECT x.*
FROM (SELECT t.*, ROW_NUMBER() OVER
(PARTITION BY t.Course_ID, t.Employee_Number
ORDER BY t.Date_Completed DESC) AS rank
FROM Employee_Training t
WHERE course_id!=1000004) x
WHERE x.rank = 1
UNION ALL
SELECT t.*,1 as rank
FROM Employee_Training t
WHERE course_id=1000004