Finding difference within same table - sql

I am trying to find difference in table depending on specific column.
I have a table which looks something like this:
+------------------------+
| rn | P_id | D_id |
+------------------------+
| 1 | 8 | 20 |
+----+--------------+----+
| 2 | 13 | 20 |
+----+--------------+----+
| 3 | 8 | 21 |
+----+--------------+----+
| 4 | 13 | 21 |
+----+--------------+----+
| 5 | 15 | 21 |
+----+--------------+----+
| 6 | 17 | 21 |
+------------------------+
So , I want to get the P_id of rows where D_id is equal to 21 and is unique (For unique I mean there shouldn't be a row with with same P_id where D_id is equal to 20)
For example , in the shown table , expected result will be - P_id 15 and 17.
Would like get this result using JOIN if possible.
EDIT : I am using MSSQL . I want to explain the situation just to clarify some confusion . Imagine D_id represents downloadID (21 - latest download , 20 - old one ) , so I want to simply compare data from old download to new one and see if any new data has been added.
In this case those new ones are records where P_id is 15 and 17

WITH CTE(RN,P_ID,D_ID) AS
(
SELECT 1 , 8 , 20 UNION ALL
SELECT 2 , 13 , 20 UNION ALL
SELECT 3 , 8 , 21 UNION ALL
SELECT 4 , 13 , 21 UNION ALL
SELECT 5 ,15 , 21 UNION ALL
SELECT 6 , 17, 21
)
SELECT C.P_ID
FROM CTE AS C
GROUP BY C.P_ID
HAVING MAX(C.D_ID)=MIN(C.D_ID) AND MAX(C.D_ID)=21

If you have to use a join, you should use a subquery in which you fetch the number of occurences of a P_id.
This query fetches the number of occurences:
SELECT `P_id`, COUNT(`rn`) AS `cnt`
FROM `table`
GROUP BY `P_id`;
That would mean that the entire query becomes something like this:
SELECT t.`P_id`
FROM `table` t
INNER JOIN ( SELECT `P_id`, COUNT(`rn`) AS `cnt`
FROM `table`
GROUP BY `P_id` ) c
ON c.`P_id` = t.`P_id`
WHERE t.`D_id` = 21 AND c.`cnt` = 1;

Related

SQL - Need to efficiently pair two entities by minimum distance apart within a group

For this example, I have one table which includes a list of people, a group category, and the location for each individual person (Long/Lat coordinates). A single individual can be in multiple groups. Here is an example table:
Person Group Long Lat
1 1 11 23
2 1 12 24
. . . .
. . . .
. . . .
2 2 12 24
I have another table which lists Businesses, their locations, and a shared group that matches the groupings in the first table. Again, a Business can be in multiple groups. Example table:
Busns Group Long Lat
5 1 5 6
6 1 6 7
. . . .
. . . .
. . . .
5 2 5 6
I want to, by Person and by Group, match the Business with minimum distance between the two. This is proving to be a very memory-intensive task as I have it. Currently I create an enormous table through a RIGHT JOIN, which then measures the distance between a person and a business for every group. then I create another that finds the minimum distance for each person in a group, and then I do an INNER JOIN in order to pair the original table down. Example code:
DROP TABLE IF EXISTS DistancePairs;
CREATE LOCAL TEMPORARY TABLE DistancePairs ON COMMIT PRESERVE ROWS AS (
SELECT a.Person
,a.Group
,b.Business
,a.Latitude AS PersonLat
,a.Longitude AS PersonLong
,b.Latitude AS BusinessLat
,b.Longitude AS BusinessLong
,0.621371*DISTANCEV(a.Latitude,a.Longitude,b.Latitude,b.Longitude) AS AproxDistance
FROM people a
RIGHT JOIN business b
ON a.Group = b.Group
);
DROP TABLE IF EXISTS MinDist;
CREATE LOCAL TEMPORARY TABLE MinDist ON COMMIT PRESERVE ROWS AS (
SELECT DISTINCT
Person
,Group
,MIN(AproxDistance) AS AproxDistance
FROM Distance Pairs
);
SELECT a.Person
,a.Group
,a.Business
,a.AproxDistance
FROM DistancePairs a
JOIN MindDist b
ON a.Person = b.Person
AND a.Group = b.Group
AND a.AproxDistance = b.AproxDistance
;
Is there a better way to do this? This performs terribly and runs for hours given the size of the data set I'm using. The original Person and Business tables have already been created using WHERE statements to limit their size.
Could you try formulating the query with a join, followed by an analytic LIMIT clause?
I only have your little bit of example data, so I can't really test it for sense or nonsense. But here goes:
WITH
-- this is your input data ...
persons ( Person, grp, Long, Lat ) AS (
SELECT 1 , 1 , 11 , 23
UNION ALL SELECT 2 , 1 , 12 , 24
UNION ALL SELECT 2 , 2 , 12 , 24
)
,
-- and this, is also your input data ....
businesses (Busns, grp, Long, Lat) AS (
SELECT 5 , 1 , 5 , 6
UNION ALL SELECT 6 , 1 , 6 , 7
UNION ALL SELECT 5 , 2 , 5 , 6
)
,
-- real WITH clause would start here ....
join_and_calc AS (
SELECT
person
, p.grp
, busns
, p.lat
, p.long
, b.lat
, b.long
, 0.621371 * DISTANCEV(p.lat,p.long,b.lat,b.long) AS app_dist
FROM persons p
JOIN businesses b USING(grp)
)
SELECT
*
FROM join_and_calc
LIMIT 1 OVER(PARTITION BY person,grp,busns ORDER BY app_dist)
;
The result I get is:
person | grp | busns | lat | long | lat | long | app_dist
--------+-----+-------+-----+------+-----+------+------------------
1 | 1 | 5 | 23 | 11 | 6 | 5 | 1235.42458453758
1 | 1 | 6 | 23 | 11 | 7 | 6 | 1149.36524763703
2 | 1 | 5 | 24 | 12 | 6 | 5 | 1322.28298287477
2 | 1 | 6 | 24 | 12 | 7 | 6 | 1234.90557929051
2 | 2 | 5 | 24 | 12 | 6 | 5 | 1322.28298287477
Good luck -
Marco

How to count all the connected nodes (rows) in a graph on Postgres?

My table has account_id and device_id. One account_id could have multiple device_ids and vice versa. I am trying to count the depth of each connected many-to-many relationship.
Ex:
account_id | device_id
1 | 10
1 | 11
1 | 12
2 | 10
3 | 11
3 | 13
3 | 14
4 | 15
5 | 15
6 | 16
How do I construct a query that knows to combine accounts 1-3 together, 4-5 together, and leave 6 by itself? All 7 entries of accounts 1-3 should be grouped together because they all touched the same account_id or device_id at some point. I am trying to group them together and output the count.
Account 1 was used on device's 10, 11, 12. Those devices used other accounts too so we want to include them in the group. They used additional accounts 2 and 3. But account 3 was further used by 2 more devices so we will include them as well. The expansion of the group brings in any other account or device that also "touched" an account or device already in the group.
A diagram is shown below:
You can use a recursive cte:
with recursive t(account_id, device_id) as (
select 1, 10 union all
select 1, 11 union all
select 1, 12 union all
select 2, 10 union all
select 3, 11 union all
select 3, 13 union all
select 3, 14 union all
select 4, 15 union all
select 5, 15 union all
select 6, 16
),
a as (
select distinct t.account_id as a, t2.account_id as a2
from t join
t t2
on t2.device_id = t.device_id and t.account_id >= t2.account_id
),
cte as (
select a.a, a.a2 as mina
from a
union all
select a.a, cte.a
from cte join
a
on a.a2 = cte.a and a.a > cte.a
)
select grp, array_agg(a)
from (select a, min(mina) as grp
from cte
group by a
) a
group by grp;
Here is a SQL Fiddle.
You can GROUP BY the device_id and then aggregate together the account_id into a Postgres array. Here is an example query, although I'm not sure what your actual table name is.
SELECT
device_id,
array_agg(account_id) as account_ids
FROM account_device --or whatever your table name is
GROUP BY device_id;
Results - hope it's what you're looking for:
16 | {6}
15 | {4,5}
13 | {3}
11 | {1,3}
14 | {3}
12 | {1}
10 | {1,2}
-- \i tmp.sql
CREATE TABLE graph(
account_id integer NOT NULL --references accounts(id)
, device_id integer not NULL --references(devices(id)
,PRIMARY KEY(account_id, device_id)
);
INSERT INTO graph (account_id, device_id)VALUES
(1,10) ,(1,11) ,(1,12)
,(2,10)
,(3,11) ,(3,13) ,(3,14)
,(4,15)
,(5,15)
,(6,16)
;
-- SELECT* FROM graph ;
-- Find the (3) group leaders
WITH seed AS (
SELECT row_number() OVER () AS cluster_id -- give them a number
, g.account_id
, g.device_id
FROM graph g
WHERE NOT EXISTS (SELECT*
FROM graph nx
WHERE (nx.account_id = g.account_id OR nx.device_id = g.device_id)
AND nx.ctid < g.ctid
)
)
-- SELECT * FROM seed;
;
WITH recursive omg AS (
--use the above CTE in a sub-CTE
WITH seed AS (
SELECT row_number()OVER () AS cluster_id
, g.account_id
, g.device_id
, g.ctid AS wtf --we need an (ordered!) canonical id for the tuples
-- ,just to identify and exclude them
FROM graph g
WHERE NOT EXISTS (SELECT*
FROM graph nx
WHERE (nx.account_id = g.account_id OR nx.device_id = g.device_id) AND nx.ctid < g.ctid
)
)
SELECT s.cluster_id
, s.account_id
, s.device_id
, s.wtf
FROM seed s
UNION ALL
SELECT o.cluster_id
, g.account_id
, g.device_id
, g.ctid AS wtf
FROM omg o
JOIN graph g ON (g.account_id = o.account_id OR g.device_id = o.device_id)
-- AND (g.account_id > o.account_id OR g.device_id > o.device_id)
AND g.ctid > o.wtf
-- we still need to exclude duplicates here
-- (which could occur if there are cycles in the graph)
-- , this could be done using an array
)
SELECT *
FROM omg
ORDER BY cluster_id, account_id,device_id
;
Results:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 10
cluster_id | account_id | device_id
------------+------------+-----------
1 | 1 | 10
2 | 4 | 15
3 | 6 | 16
(3 rows)
cluster_id | account_id | device_id | wtf
------------+------------+-----------+--------
1 | 1 | 10 | (0,1)
1 | 1 | 11 | (0,2)
1 | 1 | 12 | (0,3)
1 | 1 | 12 | (0,3)
1 | 2 | 10 | (0,4)
1 | 3 | 11 | (0,5)
1 | 3 | 13 | (0,6)
1 | 3 | 14 | (0,7)
1 | 3 | 14 | (0,7)
2 | 4 | 15 | (0,8)
2 | 5 | 15 | (0,9)
3 | 6 | 16 | (0,10)
(12 rows)
Newer version (I added an Id column to the table)
-- for convenience :set of all adjacent nodes.
CREATE TEMP VIEW pair AS
SELECT one.id AS one
, two.id AS two
FROM graph one
JOIN graph two ON (one.account_id = two.account_id OR one.device_id = two.device_id)
AND one.id <> two.id
;
WITH RECURSIVE flood AS (
SELECT g.id, g.id AS parent_id
, 0 AS lev
, ARRAY[g.id]AS arr
FROM graph g
UNION ALL
SELECT c.id, p.parent_id AS parent_id
, 1+p.lev AS lev
, p.arr || ARRAY[c.id] AS arr
FROM graph c
JOIN flood p ON EXISTS (
SELECT * FROM pair WHERE p.id = pair.one AND c.id = pair.two)
AND p.parent_id < c.id
AND NOT p.arr #> ARRAY[c.id] -- avoid cycles/loops
)
SELECT g.*, a.parent_id
, dense_rank() over (ORDER by a.parent_id)AS group_id
FROM graph g
JOIN (SELECT id, MIN(parent_id) AS parent_id
FROM flood
GROUP BY id
) a
ON g.id = a.id
ORDER BY a.parent_id, g.id
;
New results:
CREATE VIEW
id | account_id | device_id | parent_id | group_id
----+------------+-----------+-----------+----------
1 | 1 | 10 | 1 | 1
2 | 1 | 11 | 1 | 1
3 | 1 | 12 | 1 | 1
4 | 2 | 10 | 1 | 1
5 | 3 | 11 | 1 | 1
6 | 3 | 13 | 1 | 1
7 | 3 | 14 | 1 | 1
8 | 4 | 15 | 8 | 2
9 | 5 | 15 | 8 | 2
10 | 6 | 16 | 10 | 3
(10 rows)

PostgreSQL LATERAL JOIN to LIMIT GROUP BY

Sorry I'm just failing to do the lateral join!
I got a table like this:
ID | NUMBER | VALUE
-------------------
20 | 12 | 0.7
21 | 12 | 0.8
22 | 13 | 0.8
23 | 13 | 0.7
24 | 13 | 0.9
25 | Null | 0.9
Now I would like to get the first 2 rows for each NUMBER sorted by decreasing order of VALUE.
ID | NUMBER | VALUE
-------------------
21 | 12 | 0.8
20 | 12 | 0.7
24 | 13 | 0.9
22 | 13 | 0.8
The code I tried so far looks like this:
(Found: Grouped LIMIT in PostgreSQL: show the first N rows for each group?)
SELECT DISTINCT t_outer.id, t_top.number, t_top.value
FROM table t_outer
JOIN LATERAL (
SELECT * FROM table t_inner
WHERE t_inner.number NOTNULL
AND t_inner.id = t_outer.id
AND t_inner.number = t_outer.number
ORDER BY t_inner.value DESC
LIMIT 2
) t_top ON TRUE
order by t_outer.value DESC;
Everything is fine so far, it just seems like the LIMIT 2 is not working. I get all the rows for all NUMBER elements back.
Make use of windows analytical function row_number
Rextester Demo
select "ID", "NUMBER", "VALUE" from
(select t.*
,row_number() over (partition by "NUMBER"
order by "VALUE" desc
) as rno
from table1 t
) t1
where t1.rno <=2;
Output
ID NUMBER VALUE
21 12 0,8000
20 12 0,7000
24 13 0,9000
22 13 0,8000
25 NULL 0,9000
Explanation:
Inner query t1, will assing rno order by value desc for each number group. Then in outer query, you can select rno <= 2 to get your output.

SQL:How to dynamically return error code for records which doesn't exist in table

I am trying to replicate a workplace scenario. The sqlfiddle for Oracle db is not working so I couldn't recreate the table.
Say I have a table like below
Table1
+----+------+
| ID | Col1 |
+----+------+
| 1 | A |
| 2 | B |
| 3 | C |
+----+------+
Now we run a query with where condition. The in clause for where is passed by user and run time and can change.
Suppose user inputs 1,2,4,5
So the SQL will be like
select t.* from Table1 t where t.id in (1,2,4,5);
The result of this query will be
+----+------+
| ID | Col1 |
+----+------+
| 1 | A |
| 2 | B |
+----+------+
Now output I am expecting should be something like below
+----+---------+------+
| ID | ErrCode | Col1 |
+----+---------+------+
| 1 | 0 | A |
| 2 | 0 | B |
| 4 | 404 | |
| 5 | 404 | |
+----+---------+------+
As 3 was not entered by user, we will not return it. But for 4 and 5, there is no record in our table, so I want to create another dummy column which will contain error code. The data columns should be null.
It is not mandatory that the user input should go to in clause. We can use it anywhere in the query.
I am thinking of some way of splitting the input id and use them as rows. Then use them to do left join with Table1 to find the records which exists and doesn't exist in Table1 and use case on that to decide among 0 or 404 as error code.
Appreciate any other way we can do it by query.
Here it goes
SQL> WITH table_filter AS
2 (SELECT regexp_substr(txt, '[^,]+', 1, LEVEL) id
3 FROM (SELECT '1,2,4,5' AS txt FROM dual) -- User input here
4 CONNECT BY regexp_substr(txt, '[^,]+', 1, LEVEL) IS NOT NULL),
5 table1 AS -- Sample data
6 (SELECT 1 id,
7 'A' col1
8 FROM dual
9 UNION ALL
10 SELECT 2,
11 'B'
12 FROM dual
13 UNION ALL
14 SELECT 3,
15 'C'
16 FROM dual)
17 SELECT f.id,
18 CASE
19 WHEN t.id IS NULL THEN
20 404
21 ELSE
22 0
23 END AS err_code,
24 t.col1
25 FROM table_filter f
26 LEFT OUTER JOIN table1 t
27 ON t.id = f.id;
ID ERR_CODE COL1
---------------------------- ---------- ----
1 0 A
2 0 B
5 404
4 404
SQL>
Oracle Setup:
CREATE TABLE Table1 ( id, col1 ) AS
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 2, 'B' FROM DUAL;
Query:
SELECT i.COLUMN_VALUE AS id,
NVL2( t.col1, 0, 404 ) AS ErrCode,
t.col1
FROM TABLE( SYS.ODCINUMBERLIST( 1, 2, 4, 5 ) ) i
LEFT OUTER JOIN
Table1 t
ON ( i.COLUMN_VALUE = t.id );
Output:
ID ERRCODE COL1
-- ------- ----
1 0 A
2 0 B
4 404
5 404
The collection of ids can be built dynamically using PL/SQL or an external language and then passed as a bind variable. See my answer here for an example.

SQL query update by grouping

I'm dealing with some legacy data in an Oracle table and have the following
--------------------------------------------
| RefNo | ID |
--------------------------------------------
| FOO/BAR/BAZ/AAAAAAAAAA | 1 |
| FOO/BAR/BAZ/BBBBBBBBBB | 1 |
| FOO/BAR/BAZ/CCCCCCCCCC | 1 |
| FOO/BAR/BAZ/DDDDDDDDDD | 1 |
--------------------------------------------
For each of the /FOO/BAR/BAZ/% records I want to make the ID a Unique incrementing number.
Is there a method to do this in SQL?
Thanks in advance
EDIT
Sorry for not being specific. I have several groups of records /FOO/BAR/BAZ/, /FOO/ZZZ/YYY/. The same transformation needs to occur for each of these other (example) groups. The recnum can't be used I want ID to start from 1, incrementing, for each group of records I have to change.
Sorry for making a mess of my first post. Output should be
--------------------------------------------
| RefNo | ID |
--------------------------------------------
| FOO/BAR/BAZ/AAAAAAAAAA | 1 |
| FOO/BAR/BAZ/BBBBBBBBBB | 2 |
| FOO/BAR/BAZ/CCCCCCCCCC | 3 |
| FOO/BAR/BAZ/DDDDDDDDDD | 4 |
| FOO/ZZZ/YYY/AAAAAAAAAA | 1 |
| FOO/ZZZ/YYY/BBBBBBBBBB | 2 |
--------------------------------------------
Let's try something like this(Oracle version 10g and higher):
SQL> with t1 as(
2 select 'FOO/BAR/BAZ/AAAAAAAAAA' as RefNo, 1 as ID from dual union all
3 select 'FOO/BAR/BAZ/BBBBBBBBBB', 1 from dual union all
4 select 'FOO/BAR/BAZ/CCCCCCCCCC', 1 from dual union all
5 select 'FOO/BAR/BAZ/DDDDDDDDDD', 1 from dual union all
6 select 'FOO/ZZZ/YYY/AAAAAAAAAA', 1 from dual union all
7 select 'FOO/ZZZ/YYY/BBBBBBBBBB', 1 from dual union all
8 select 'FOO/ZZZ/YYY/CCCCCCCCCC', 1 from dual union all
9 select 'FOO/ZZZ/YYY/DDDDDDDDDD', 1 from dual
10 )
11 select row_number() over(partition by ComPart order by DifPart) as id
12 , RefNo
13 From (select regexp_substr(RefNo, '[[:alpha:]]+$') as DifPart
14 , regexp_substr(RefNo, '([[:alpha:]]+/)+') as ComPart
15 , RefNo
16 , Id
17 from t1
18 ) q
19 ;
ID REFNO
---------- -----------------------
1 FOO/BAR/BAZ/AAAAAAAAAA
2 FOO/BAR/BAZ/BBBBBBBBBB
3 FOO/BAR/BAZ/CCCCCCCCCC
4 FOO/BAR/BAZ/DDDDDDDDDD
1 FOO/ZZZ/YYY/AAAAAAAAAA
2 FOO/ZZZ/YYY/BBBBBBBBBB
3 FOO/ZZZ/YYY/CCCCCCCCCC
4 FOO/ZZZ/YYY/DDDDDDDDDD
I think that actual updating the ID column wouldn't be a good idea. Every time you add new groups of data you would have to run the update statement again. The better way would be creating a view and you will see desired output every time you query it.
rownum can be used as an incrementing ID?
UPDATE legacy_table
SET id = ROWNUM;
This will assign unique values to all records in the table. This link contains documentation about Oracle Pseudocolumn.
You can run the following:
update <table_name> set id = rownum where descr like 'FOO/BAR/BAZ/%'
This is pretty rough and I'm not sure if your RefNo is a single value column or you just made it like that for simplicity.
select
sub.RefNo
row_number() over (order by sub.RefNo) + (select max(id) from TABLE),
from (
select FOO+'/'+BAR+'/'+BAZ+'/'+OTHER as RefNo
from TABLE
group by FOO+'/'+BAR+'/'+BAZ+'/'+OTHER
) sub