Generate matrix of access in source-target table - sql

I have two tables in PostgreSQL, class and inheritance.
Each row in inheritance has 2 class IDs source_id and target_id:
CREATE TABLE public.class (
id bigint NOT NULL DEFAULT nextval('class_id_seq'::regclass),
name character varying(500) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT class_pkey PRIMARY KEY (id)
)
CREATE TABLE public.inheritance (
id bigint NOT NULL DEFAULT nextval('inherited_id_seq'::regclass),
source_id bigint NOT NULL,
target_id bigint NOT NULL,
CONSTRAINT inherited_pkey PRIMARY KEY (id),
CONSTRAINT inherited_source_id_fkey FOREIGN KEY (source_id)
REFERENCES public.class (id),
CONSTRAINT inherited_target_id_fkey FOREIGN KEY (target_id)
REFERENCES public.class (id)
)
I want to create Access Matrix between all classes based in inheritance relationship in inheritance table.
I try this code:
select * , case when id in (select target_id from inheritance where source_id=1) then 1 else 0 end as "1"
, case when id in (select target_id from inheritance where source_id=2) then 1 else 0 end as "2"
, case when id in (select target_id from inheritance where source_id=3) then 1 else 0 end as "3"
, case when id in (select target_id from inheritance where source_id=4) then 1 else 0 end as "4"
, case when id in (select target_id from inheritance where source_id=5) then 1 else 0 end as "5"
, case when id in (select target_id from inheritance where source_id=6) then 1 else 0 end as "6"
, case when id in (select target_id from inheritance where source_id=7) then 1 else 0 end as "7"
, case when id in (select target_id from inheritance where source_id=8) then 1 else 0 end as "8"
, case when id in (select target_id from inheritance where source_id=9) then 1 else 0 end as "9"
from class
and get the right answer, but it's just for 9 static rows in class.
How can I get all number of rows in class using a dynamic SQL command?
If we can't do it with SQL, how can we do it with PL/pgSQL?

Static solution
SQL demands to know name and type of each result column (and consequently their number) at call time. You cannot derive result columns from data dynamically with plain SQL. You can use an array or a document type instead of separate columns:
SELECT *
FROM class c
LEFT JOIN (
SELECT target_id AS id, array_agg(source_id) AS sources
FROM (SELECT source_id, target_id FROM inheritance i ORDER BY 1,2) sub
GROUP BY 1
) i USING (id);
id
name
sources
1
c1
{2,3,4}
2
c2
{5}
3
c3
{5,6,7}
4
c4
{7}
5
c5
{8}
6
c6
{9}
7
c7
{9}
8
c8
null
9
c9
null
Dynamic solution
If that's not good enough you need dynamic SQL with two round-trips to the DB server: 1. Generate SQL. 2. Execute SQL. Using the crosstab() function from the additional module tablefunc. If you are unfamiliar, read this first:
PostgreSQL Crosstab Query
Generate SQL:
SELECT format(
$q$SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (%s)'
) AS ct (id int, %s int)
USING (id)
ORDER BY id;
$q$
, string_agg(c.id::text, '), (')
, string_agg('"' || c.id || '"', ' int, ')
)
FROM (SELECT id FROM class ORDER BY 1) c;
Returns a query of this form, which we ...
2. Execute:
SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9)'
) AS ct (id int, "1" int, "2" int, "3" int, "4" int, "5" int, "6" int, "7" int, "8" int, "9" int)
USING (id)
ORDER BY id;
... to get:
id | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
----+------+---+---+---+---+---+---+---+---+---
1 | c1 | | 1 | 1 | 1 | | | | |
2 | c2 | | | | | 1 | | | |
3 | c3 | | | | | 1 | 1 | 1 | |
4 | c4 | | | | | | | 1 | |
5 | c5 | | | | | | | | 1 |
6 | c6 | | | | | | | | | 1
7 | c7 | | | | | | | | | 1
8 | c8 | | | | | | | | |
9 | c9 | | | | | | | | |
db<>fiddle here
See:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
Dynamic execution with psql
Still two round-trips to the server, but only a single command.
Both of the following solutions use psql meta-commands and only work from within psql!
With \gexec
Using the standard interactive terminal, you can feed the generated SQL back to the Postgres server for execution directly with \gexec:
test=> SELECT format(
$q$SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (%s)'
) AS ct (id int, %s int)
USING (id)
ORDER BY id;
$q$
, string_agg(c.id::text, '), (')
, string_agg('c' || c.id, ' int, ')
)
FROM (SELECT id FROM class ORDER BY 1) c\gexec
Same result.
With \crosstabview
test=> SELECT *
test-> FROM class c
test-> LEFT JOIN (
test(> SELECT target_id AS id, source_id, 1 AS val
test(> FROM inheritance
test(> ) i USING (id)
test-> \crosstabview id source_id val
id | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
----+---+---+---+---+---+---+---+---+--
1 | 1 | 1 | 1 | | | | | |
2 | | | | 1 | | | | |
3 | | | | 1 | 1 | 1 | | |
4 | | | | | | 1 | | |
5 | | | | | | | 1 | |
6 | | | | | | | | 1 |
7 | | | | | | | | 1 |
8 | | | | | | | | |
9 | | | | | | | | |
(9 rows)
See (with related answers for both):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
There are lots of subtleties in these solutions ...
Aside
Assuming there are some mechanisms in place to disallow duplicates and directly contradicting relationships. Like:
CREATE UNIQUE INDEX inheritance_uni_idx
ON inheritance (GREATEST(source_id, target_id), LEAST(source_id, target_id));
See:
How to create a Postgres table with unique combined primary key?

Related

how to perform sql actions/query for duplicate rows

I have 2 tables:
1-brokers(this is a company that could have multiple broker individuals)
and
2-brokerIndividuals (A person/individuals table that has a foreign key of broker company it belongs to and the individuals details)
I'm trying to create a unique index column for brokers table where the fields companyName are unique and isDeleted is NULL. Currently, the table is already populated so I want to write an SQL QUERY to find duplicate rows and whenever there are rows with the same companyName and isDeleted=NULL, I would like to perform 2 actions/queries:
1-keep the first row as it is and changes other duplicates(rows following the first duplicate) rows' isDeleted columns value to true.
2- associate or change the foreign key in brokerIndividuals for the duplicate rows for the first row.
The verbal description of what I am trying to do is: soft delete the duplicate rows and associate their corresponding brokerIndividuals to the first occurrence of duplicates. Table needs to have 1 occurrence of companyName where isDeleted is NULL.
I am using knex.js ORM so if that help's you can also suggest a solution using knex functions but knex doesn't support partial index yet( Knex.js - How to create unique index with 'where' clause? ) so I have to use the raw SQL method. Plus the DB I'm using is mssql(version: 6.0.1).
Here's a full test case (commented), with link to the fiddle:
Working test case, tested with MySQL 5.5, 5.6, 5.7, 8.0 and MariaDB up to 10.6
Create the tables and insert initial data with duplicate company_name entries:
CREATE TABLE brokers (
id int primary key auto_increment
, company_name VARCHAR(30)
, isDeleted boolean default null
);
CREATE TABLE brokerIndividuals (
id int primary key auto_increment
, broker_id int references brokers (id)
);
INSERT INTO brokers (company_name) VALUES
('name1')
, ('name1')
, ('name1')
, ('name1')
, ('name123')
, ('name123')
, ('name123')
, ('name123')
;
INSERT INTO brokerIndividuals (broker_id) VALUES
(2)
, (7)
;
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | null |
| 3 | name1 | null |
| 4 | name1 | null |
| 5 | name123 | null |
| 6 | name123 | null |
| 7 | name123 | null |
| 8 | name123 | null |
+----+--------------+-----------+
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 2 |
| 2 | 7 |
+----+-----------+
Adjust brokers to determine isDeleted based on the MIN(id) per company_name:
UPDATE brokers
JOIN (
SELECT company_name, MIN(id) AS id
FROM brokers
GROUP BY company_name
) AS x
ON x.company_name = brokers.company_name
AND isDeleted IS NULL
SET isDeleted = CASE WHEN (x.id <> brokers.id) THEN 1 END
;
The updated brokers contents:
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | 1 |
| 3 | name1 | 1 |
| 4 | name1 | 1 |
| 5 | name123 | null |
| 6 | name123 | 1 |
| 7 | name123 | 1 |
| 8 | name123 | 1 |
+----+--------------+-----------+
For brokerIndividuals, find / set the correct broker_id:
UPDATE brokerIndividuals
JOIN brokers AS b1
ON b1.id = brokerIndividuals.broker_id
JOIN brokers AS b2
ON b1.company_name = b2.company_name
AND b2.isDeleted IS NULL
SET brokerIndividuals.broker_id = b2.id
;
New contents:
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 1 |
| 2 | 5 |
+----+-----------+

how to "deepcopy" rows

My question is similar to this one but more involved. Suppose I have a table A with id idA, and another table B with idB and foreign key idA. I would like to duplicate all entries of A, including corresponding entries in B. For example, if I have the following tables at the start:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
|---|---|
Then the result should be:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 4 |
| 5 | 4 |
| 6 | 5 |
|---|---|
This is quite tricky. You need to insert the ids into the a -- but then be able to match them back to the existing ids to insert the right values into b.
A generic solution looks like this:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
),
a_mapping (
select a.idA, i.idA as new_idA
from (select a.*, row_number() over (order by idA) as seqnum
from a
) a join
(select i.*, row_number() over (order by idA) as seqnum
from i
) i
on a.seqnum = i.seqnum
)
insert into b (idA) (
select am.new_idA
from b join
a_mapping am
on b.idA = am.idA;
Note: If you have another unique column or columns in the row, then the mapping is a little easier to generate. Of course, if you are copying all the columns, then nothing else is unique, so you do need the row_number().
Of course, for your very simple example, you don't need a mapping table. You can just use:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
)
insert into b (idA) (
select i.idA
from i
I have an approach which may be equivalent to what Gordon Linoff suggests, I would be grateful if you could point out any flaws!
Let's set up the tables:
CREATE TABLE A(
idA SERIAL PRIMARY KEY,
txt varchar);
INSERT INTO A(txt)
VALUES ('A1'), ('A2'),('A3');
CREATE TABLE B(
idB SERIAL PRIMARY KEY,
idA int REFERENCES A(idA),
txt varchar);
INSERT INTO B(idA, txt)
VALUES (1, 'A1.B1'), (1, 'A1.B2'), (2, 'A2.B1');
so the initial data looks as follows:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
(4 rows)
Now, we can use the NEXTVAL function to generate the mappings directly:
CREATE TEMP TABLE tmp_A_new AS (
SELECT *, NEXTVAL('A_idA_seq') as newidA
FROM A ORDER BY idA -- order probably not needed
);
INSERT INTO A(idA, txt) (SELECT newidA, txt FROM tmp_A_new);
CREATE TEMP TABLE tmp_B_new AS (
SELECT B.idB, newidA, B.txt, NEXTVAL('B_idB_seq') as newidB
FROM B, tmp_A_new WHERE B.idA=tmp_A_new.idA ORDER BY idB
);
INSERT INTO B(idB, idA, txt) (SELECT newidB, newidA, txt FROM tmp_B_new);
The results look correct:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
4 | A1 | 4 | 4 | A1.B1
4 | A1 | 5 | 4 | A1.B2
5 | A2 | 6 | 5 | A2.B1
6 | A3 | | |
(8 rows)
Note that this could be continued further down to C, D, etc.
I would be glad for any comments :)

Convert tuple value to column names

Got something like:
+-------+------+-------+
| count | id | grade |
+-------+------+-------+
| 1 | 0 | A |
| 2 | 0 | B |
| 1 | 1 | F |
| 3 | 1 | D |
| 5 | 2 | B |
| 1 | 2 | C |
I need:
+-----+---+----+---+---+---+
| id | A | B | C | D | F |
+-----+---+----+---+---+---+
| 0 | 1 | 2 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 1 |
| 2 | 0 | 5 | 1 | 0 | 0 |
I don't know if I can even do this. I can group by id but how would you read the count value for each grade column?
CREATE TABLE #MyTable(_count INT,id INT , grade VARCHAR(10))
INSERT INTO #MyTable( _count ,id , grade )
SELECT 1,0,'A' UNION ALL
SELECT 2,0,'B' UNION ALL
SELECT 1,1,'F' UNION ALL
SELECT 3,1,'D' UNION ALL
SELECT 5,2,'B' UNION ALL
SELECT 1,2,'C'
SELECT *
FROM
(
SELECT _count ,id ,grade
FROM #MyTable
)A
PIVOT
(
MAX(_count) FOR grade IN ([A],[B],[C],[D],[F])
)P
You need a "pivot" table or "cross-tabulation". You can use a combination of aggregation and CASE statements, or, more elegantly the crosstab() function provided by the additional module tablefunc. All basics here:
PostgreSQL Crosstab Query
Since not all keys in grade have values, you need the 2-parameter form. Like this:
SELECT * FROM crosstab(
'SELECT id, grade, count FROM table ORDER BY 1,2'
, $$SELECT unnest('{A,B,C,D,F}'::text[])$$
) ct(id text, "A" int, "B" int, "C" int, "D" int, "F" int);

Pl/PgSQL, insert into two tables, one with header rows

What would be the best way of going about doing the following insert. I've looked around, and I'm kind of stuck.
The table I Currently have (insert)
| id | order_id | item_id | type | group |
| 1 | 1 | 1 | 2 | 1 | <- type 2 represents a "header" item, or a "kit"
| 2 | 1 | 2 | 1 | 1 | <- type 1 represents a member of the "kit"
| 3 | 1 | 3 | 1 | 1 |
| 4 | 1 | 4 | 2 | 2 | <- New group means new kit
| 5 | 1 | 2 | 1 | 2 |
| 6 | 1 | 5 | 1 | 2 |
I need to insert these items into the following two tables:
1) item_entry
| id | mode | tmplt_id | item_id | parent_item_entry_id |
| 1 | 1 | 1 | NULL | NULL | <- This is a header line, mode 1
| 2 | 2 | NULL | 2 | 1 | <- This is a sub line, mode 2
| 3 | 2 | NULL | 3 | 1 | <- parent_item_entry_id references the header it belongs to
| 4 | 1 | 4 | NULL | NULL |
| 5 | 2 | NULL | 2 | 4 |
| 6 | 2 | NULL | 5 | 4 |
2) item_entry_details
| id | item_entry_id | order_id | group |
| 1 | 1 | 1 | 1 |
| 2 | 4 | 1 | 2 | <- only header information is necessary
Of course, item_entry.id is driven off of a sequence (item_entry_id_seq). Is there an elegent way of getting this to work? I currently loop through each group, first assigning a nextval() to a variable, and then I loop through each item in the group, writing to the table.
FOR recGroup IN SELECT DISTINCT group FROM insert LOOP
intParentItemEntryID := nextval('item_entry_id_seq');
FOR recLine IN SELECT * FROM insert LOOP
INSERT INTO item_entry VALUES (CASE intParentItemEntryID/DEFAULT, CASE 1/2, CASE recLine.item_id/NULL, CASE NULL/recLine.item_id, CASE NULL/intParentItemEntryID)
INSERT INTO item_entry_details VALUES (DEFAULT, intParentItemEntryID, recLine.order_id, recLine.group);
END LOOP;
END LOOP;
Is there a better way, or is the above the only way this type of insert can be done?
No need for procedural code "row-at-a-time" here, just plain old sql will suffice.
-- create the tables that the OP did not provide
DROP TABLE the_input;
CREATE TABLE the_input
( id INTEGER NOT NULL PRIMARY KEY
, order_id INTEGER NOT NULL
, item_id INTEGER NOT NULL
, ztype INTEGER NOT NULL
, zgroup INTEGER NOT NULL
);
DROP TABLE target1 ;
CREATE TABLE target1
( id INTEGER NOT NULL
, zmode INTEGER NOT NULL
, tmplt_id INTEGER
, item_id INTEGER
, parent_item_entry_id INTEGER
);
DROP TABLE target2 ;
CREATE TABLE target2
( id SERIAL NOT NULL
, item_entry_id INTEGER NOT NULL
, order_id INTEGER NOT NULL
, zgroup INTEGER NOT NULL
);
-- fil it up ...
INSERT INTO the_input
( id, order_id, item_id, ztype, zgroup ) VALUES
( 1 , 1 , 1 , 2 , 1 ) -- <- type 2 represents a "header" item, or a "kit"
,( 2 , 1 , 2 , 1 , 1 ) -- <- type 1 represents a member of the "kit"
,( 3 , 1 , 3 , 1 , 1 ) --
,( 4 , 1 , 4 , 2 , 2 ) -- <- New group means new kit
,( 5 , 1 , 2 , 1 , 2 ) --
,( 6 , 1 , 5 , 1 , 2 ) --
;
-- Do the inserts.
INSERT INTO target1(id,zmode,tmplt_id,item_id,parent_item_entry_id)
SELECT i1.id, 1, i1.id, NULL, NULL
FROM the_input i1
WHERE i1.ztype=2
UNION ALL
SELECT i2.id, 2, NULL, i2.id, ip.item_id
FROM the_input i2
JOIN the_input ip ON ip.zgroup = i2.zgroup AND ip.ztype=2
WHERE i2.ztype=1
;
INSERT INTO target2(item_entry_id,order_id,zgroup)
SELECT DISTINCT MIN(item_id),order_id, zgroup
FROM the_input i1
WHERE i1.ztype=2
GROUP BY order_id,zgroup
;
SELECT * FROM target1
ORDER BY id;
SELECT * FROM target2
ORDER BY id;
Result:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "the_input_pkey" for table "the_input"
CREATE TABLE
INSERT 0 6
DROP TABLE
CREATE TABLE
INSERT 0 6
id | zmode | tmplt_id | item_id | parent_item_entry_id
----+-------+----------+---------+----------------------
1 | 1 | 1 | |
2 | 2 | | 2 | 1
3 | 2 | | 3 | 1
4 | 1 | 4 | |
5 | 2 | | 5 | 4
6 | 2 | | 6 | 4
(6 rows)
DROP TABLE
NOTICE: CREATE TABLE will create implicit sequence "target2_id_seq" for serial column "target2.id"
CREATE TABLE
INSERT 0 2
id | item_entry_id | order_id | zgroup
----+---------------+----------+--------
1 | 4 | 1 | 2
2 | 1 | 1 | 1
(2 rows)

help optimizing query (shows strength of two-way relationships between contacts)

i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.
mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id | int(11) | YES | | NULL | |
| contact_id | int(11) | YES | MUL | NULL | |
| other_contact_id | int(11) | YES | | NULL | |
| strength | int(11) | YES | | NULL | |
| recorded | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+
now i want to get a list of two-way relationships between contacts (meaning there are two rows, one with contact a specifying a relationship strength with contact b and another with contact b specifying a strength for contact a -- the strength of the two-way relationship is the smaller of those two strength values).
this is the query i've come up with but it is pretty slow:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from (
select
cr1.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr2
inner join contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded
) as mrcr1,
(
select
cr3.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr4
inner join contact_relationship cr3 on
cr3.contact_id = cr4.contact_id
and cr3.other_contact_id = cr4.other_contact_id
and cr3.recorded = cr4.max_recorded
) as mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
anyone have any recommendations of how to speed it up?
note that because a user may specify the strength of his relationship with a particular user more than once, you must only grab the most recent record for each pair of contacts.
update: here is the result of explaining the query...
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where; Using join buffer |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 4 | DERIVED | cr3 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr4.contact_id,cr4.other_contact_id | 1 | Using where |
| 5 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 2 | DERIVED | cr1 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr2.contact_id,cr2.other_contact_id | 1 | Using where |
| 3 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
You are losing a lot lot lot of time selecting the most recent record. 2 options :
1- Change the way you are stocking data, and have a table with only recent record, and an other table more like historical record.
2- Use analytic request to select the most recent record, if your DBMS allows you to do this. Something like
Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship
Once you have the good record line, I think your query will go a lot faster.
Scorpi0's answer got me to thinking maybe I could use a temp table...
create temporary table mrcr1 (
contact_id int,
other_contact_id int,
strength int,
index mrcr1_index_1 (
contact_id,
other_contact_id
)
) replace as
select
cr1.contact_id,
cr1.other_contact_id,
cr1.strength from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id, other_contact_id
) as cr2
inner join
contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded;
which i had to do twice (second time into a temp table named mrcr2) because mysql has a limitation where you can't alias the same temp table twice in one query.
with my two temp tables created my query then becomes:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from
mrcr1,
mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;