Clustering in SQL with any of a set of properties being equal

Clustering in SQL with any of a set of properties being equal - sql

I have a Table T with 4 properties: {id, v, p1, p2} with id being unique.
I have to associate these items into disjunct groups g1,...,gn.
Let G := {g1,...,gn} and let gi belong to G.
For any t and t' that belongs to T and to gi, the following properties hold:
there is no t' where t.v != t'.v
there exists at least one t'' in gi, such that t.p1=t''.p1 or t.p2=t''.p2
Currently I start with the following to create the first candidates to become groups (here g):
SELECT
T.v,
T.p1,
ROW_NUMBER () OVER (ORDER BY T.v DESC, T.p1 DESC) as g
INTO TEMP t_TG
FROM T
GROUP BY T.v, T.p1
ORDER BY T.v DESC, T.p1 DESC;
In order to fetch the property p2 I use the following query:
SELECT DISTINCT
T.v,
T.p1,
T.p2,
t_TG.g
INTO TG
FROM T
JOIN t_TG ON
T.v=t_TG.v AND T.p1=t_TG.p1
Now comes the part I'm not sure about. I know that:
For each row in TG I need to find other other groups along these lines:
SELECT DISTINCT g FROM TG WHERE p2 IN (SELECT p2 FROM TG WHERE g=gi);
The result should then be updated in TG like this:
UPDATE TG SET g=gmin (minimum of previous select) WHERE g IN (result of previous select)
This needs to be done until no changes happen in the Table.
Is SLQ even the appropriate tool for this task?
If so how do I implement this last step and should I use RECURSIVE?

Related

SQL query to join smae table multiple times

I have a scenario to join the same table multiple times to get the desired output. For ex I have two tables TABLE A and TABLE B.
Step 1: I want to take the all the parties from TABLE A which have
lowest Idate. Lowest idate will be fetched based partyid and idate
column.
Step 2: Then based on CID which is fetched from TABLE A in step 1,
we need to fetch the corresponding MID from TABLE B which have
MIDTYPE=130300.
Step 3: Then based on the MID fetched in step 2 we need to traverse
the same table and find out the latest record for the same MID based
on idate in TABLE B and fetch the corresponding CID for the MID.
Step 4: Now for that CID we need to fetch MID value for MIDTYPE
130307 in the same table(TABLEB). And my final output should be combination of MID
which we fetched for step 3 and MID fetched for 130307 in step 4.
I write a query like this ..but its taking lot of time for the query to run as we are going through the same table(TABLEB) multiple times and TABLEB have millions of rows. Is there anyway we can rewrite this query in different way. Could some one can help with this me.
SELECT
ident.mid mid1,
b.mid mid2
FROM
(
SELECT
*
FROM
tableb
WHERE
midtype = 130307
) ident
INNER JOIN (
SELECT
s.cid,
s.mid,
s.midtype
FROM
(
SELECT
cid,
partyid,
admin_sys_tp_cd,
mid,
ilast
FROM
(
SELECT
cq.cid,
RANK() OVER(
PARTITION BY cq.partyid
ORDER BY
cq.idate ASC
) rnk,
cq.idate,
cq.partyid,
i.mid,
i.idate AS ilast
FROM
tablea cq
INNER JOIN tableb i ON cq.cid = i.cid
INNER JOIN tablec ON i.cid = c.cid
WHERE
i.midtype = 130300
)
WHERE
rnk = 1
) a
INNER JOIN (
SELECT
*
FROM
(
SELECT
cid,
mid,
midtype,
RANK() OVER(
PARTITION BY mid
ORDER BY
idate DESC
) rnk_mpid
FROM
tableb
)
WHERE
rnk_mpid = 1
) s ON a.mid = s.mid
AND s.midtype = 130300
) b ON ident.cid = b.cid
AND ident.midtype = 130307

not what you asked, but before others and I, spent time trying to get different approaches for you, let's make sure the basics are covered.
No matter how different you can write an SQL query, they will never perform fast, in a MILLION base table if you don't have the proper indexes for it. Specially in your case, as you have to access it 3 times at least.
Just by looking at your detailed steps. I would say that you should have at least 3 different indexes created to support this query.
TableA_Index1 ( PARTYID, LDATE, INCLUDES CID)
TableB_Index1 (CID, MIDTYPE, INCLUDES MID )
TableB_Index2 (MID, LDATE, INCLUDES CID )
Do you have them ?
Have you ever tried to run this query on db2-advisor (db2advis) to get recommended indexes for it ?

Select Query for Repeated Records in SQLite

This problem is a generalization of this question. Rather than finding all the games with specific players playing against others, I want to be able to find all the games where the same players played against each other.
Here is sample data:
1,ChrisEveret,1
1,BillieJeanKing,1
1,RogerFederer,0
1,TomasMuster,0
2,RogerFederer,1
2,SallieMae,1
2,NovakDjokovic,0
2,JimCourier,0
3,ChrisEveret,0
3,BillieJeanKing,0
3,RogerFederer,1
3,TomasMuster,1
The desired output is
1,ChrisEveret,1
1,BillieJeanKing,1
1,RogerFederer,0
1,TomasMuster,0
3,ChrisEveret,0
3,BillieJeanKing,0
3,RogerFederer,1
3,TomasMuster,1
The actual data has only about two thousand rows, so performance is not a concern. I have come up with the following remarkably convoluted and inexact partial solution:
CREATE TABLE sets (gameid int, player text ,winloss int);
.import data.csv sets
select * from sets where gameid in
(select gameid from (select gameid,mo from
(select gameid,mo,count(*) from
(select gameid,group_concat(player) as mo from
(select gameid,player from sets order by gameid,player)
group by gameid)
group by gameid)
where mo in
(select mo from (select gameid,mo,count(*) from
(select gameid,group_concat(player) as mo from
(select gameid,player from sets order by gameid,player)
group by gameid)
group by mo
having count(*)>1))));
This returns all matches where the same four people played together, but not necessarily those in which the teams were the same. I do not know if there is a solution to this problem that does not involve using group_concat(). That is the only way I was able to make even this limited progress on it, however. I also am not sure that the method used to order the group_concat results for aggregation will always work.

SQLite does not guarantee the ordering using group_concat() -- and there is no way to control it. So you have to use more cumbersome methods.
You can get the pairs of games with the same player using:
with s as (
select s.*, count(*) over (partition by gameid) as num_players
from sets s
)
select s1.gameid, s2.gameid
from s s1 join
s s2
on s1.player = s2.player and s1.num_players = s2.num_players
group by s1.gameid = s2.gameid
having count(*) = max(s1.num_players);
You can then use this logic if you want to get the players in each game (or just use group_concat() for that).
EDIT:
Window functions were introduced in SQLite version 3.28. In earlier versions, try this:
with s as (
select s.*, ss.num_players
from sets s join
(select gameid, count(*) as num_players
from sets s
group by gameid
) ss
on ss.gameid = s.gameid
)
select s1.gameid, s2.gameid
from s s1 join
s s2
on s1.player = s2.player and s1.num_players = s2.num_players
group by s1.gameid = s2.gameid
having count(*) = max(s1.num_players);
Here is a db<>fiddle that shows all pairs of games that have the same players (note that this includes each team to itself).

Calculate MAX for every row in SQL

I have this tables:
Docenza(id, id_facolta, ..., orelez)
Facolta(id, ...)
and I want to obtain, for every facolta, only the id of Docenza who has done the maximum number of orelez and the number of orelez:
id_docenzaP facolta1 max(orelez)
id_docenzaQ facolta2 max(orelez)
...
id_docenzaZ facoltaN max(orelez)
how can I do this? This is what i do:
SELECT DISTINCT ... F.nome, SUM(orelez) AS oreTotali
FROM Docenza D
JOIN Facolta F ON F.id = D.id_facolta
GROUP BY F.nome
I obtain somethings like:
docenzaP facolta1 maxValueForidP
docenzaQ facolta1 maxValueForidQ
...
docenzaR facolta2 maxValueForidR
docenzaS facolta2 maxValueForidS
...
docenzaZ facoltaN maxValueForFacoltaN
How can I take only the max value for every facolta?

Presumably, you just want:
SELECT F.nome, sum(orelez) AS oreTotali
FROM Docenza D JOIN
Facolta F
ON F.id = D.id_facolta
GROUP BY F.nome;
I'm not sure what the SELECT DISTINCT is supposed to be doing. It is almost never used with GROUP BY. The . . . suggests that you are selecting additional columns, which are not needed for the results you want.

This is untested, and since you didn't provide sample data with expected results I can't be sure it's really what you need.
It's a bit ugly and I'm sure there is some clever correlated sub query approach, but I've never been good with those.
SELECT st.focolta,
s_orelez,
TMP3.id_docenza
FROM some_table AS st
INNER
JOIN (SELECT *
FROM (SELECT focolta,
s_orelez,
id_docenza,
ROW_NUMBER() OVER -- Get the ranking of the orelez sum by focolta.
( PARTITION BY focolta
ORDER BY s_orelez DESC
) rn_orelez
FROM (SELECT focolta,
id_docenza,
SUM(orelez) OVER -- Sum the orelez by focolta
( PARTITION BY focolta
) AS s_orelez
FROM some_table
) TMP
) TMP2
WHERE = TMP2.rn_orelez = 1 -- Limit to the highest rank value
) TMP3
ON some_table.focolta = TMP3.focolta; -- Join to focolta to the id associated with the hightest value.

Count number of points within different ranges SQL

We have real estate point X.
We want to calculate the number of stations within
0-200 m
200-400 m
400-600 m
After i have this I will later create a new table where these are summarized according to mathematical expressions.
SELECT loc_dist.id, loc_dist.namn1, grps.grp, count(*)
FROM (
SELECT b.id, b.namn1, ST_Distance_Sphere(b.geom, s.geom) AS dist
FROM stations s, bostader b) AS loc_dist
JOIN (
VALUES (1,200.), (2,400.), (3,600.)
) AS grps(grp, dist) ON loc_dist.dist < grps.dist
GROUP BY 1,2,3
ORDER BY 1,2,3;
I have this now, but it takes forever to run and can't get any results since I have more than 2000 entries from both b and s, I want number of s from a specific b. But this calculates for all, how do I add a:
WHERE b.id= 114477
for example? I only get syntax error on the join when I try to do this, I only want group distances from one or maybe 5 different b, depending on their b.id

After a lot of help from TA, the answer is here and works nicely, added ranges and a BETWEEN clause to get count within the circle rings
SELECT loc_dist.id, loc_dist.namn1, grps.grp, count(*)
FROM (
SELECT b.id, b.namn1, ST_Distance_Sphere(b.geom, s.geom) AS dist
FROM stations s, bostader b WHERE b.id=114477) AS loc_dist
JOIN (
VALUES (1,0,200), (2,200,400), (3,400,600)
) AS grps(grp, dist_l, dist_u) ON loc_dist.dist BETWEEN dist_l AND dist_u
GROUP BY 1,2,3
ORDER BY 1,2,3;

How to Update a group of rows

My sqlfiddle: http://sqlfiddle.com/#!15/4f9da/1
I'm really bad explaining this and noob to do complex query(just the basics), because its complicated.
Situation: The column revision is a group of the same object related, for example: ids 1 2 3 are the same object and always refering the last old object on using id to ground_id.
Problem: I need to make ord column to make same id for the same group of object. example: the ids 1 2 3 need their value setted to 1, because the revison 0 is the id 1. Same for id 4, which must have ord 4 and id 5 too.
Basically must be like this:

You need a recursive query to do this. First you select the rows where ground_id IS NULL, set ord to the value of id. In the following iterations you add more rows based on the value of ground_id, setting the ord value to that of the row it is being matched to. You can then use that set of rows (id, ord) as a row source for the UPDATE:
WITH RECURSIVE set_ord (id, ord) AS (
SELECT id, id
FROM ground
WHERE ground_id IS NULL
UNION
SELECT g.id, o.ord
FROM ground g
JOIN set_ord o ON o.id = g.ground_id
)
UPDATE ground g
SET ord = s.ord
FROM set_ord s
WHERE g.id = s.id;
(SQLFiddle is currently not-responsive so I can't post my code there)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Clustering in SQL with any of a set of properties being equal - sql

Related

SQL query to join smae table multiple times

Select Query for Repeated Records in SQLite

Calculate MAX for every row in SQL

Count number of points within different ranges SQL

How to Update a group of rows

Categories

Resources