Help Optimizing MySQL Table (~ 500,000 records) and PHP Code - sql

I have a MySQL table that collects player data from various game servers (Urban Terror). The bot that collects the data runs 24/7, and currently the table is up to about 475,000+ records. Because of this, querying this table from PHP has become quite slow. I wonder what I can do on the database side of things to make it as optomized as possible, then I can focus on the application to query the database. The table is as follows:
CREATE TABLE IF NOT EXISTS `people` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(40) NOT NULL,
`ip` int(4) unsigned NOT NULL,
`guid` varchar(32) NOT NULL,
`server` int(4) unsigned NOT NULL,
`date` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `Person` (`name`,`ip`,`guid`),
KEY `server` (`server`),
KEY `date` (`date`),
KEY `PlayerName` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='People that Play on Servers' AUTO_INCREMENT=475843 ;
I'm storying the IPv4 (ip and server) as 4 byte integers, and using the MySQL functions NTOA(), etc to encode and decode, I heard that this way is faster, rather than varchar(15).
The guid is a md5sum, 32 char hex. Date is stored as unix timestamp.
I have a unique key on name, ip and guid, as to avoid duplicates of the same player.
Do I have my keys setup right? Is the way I'm storing data efficient?
Here is the code to query this table. You search for a name, ip, or guid, and it grabs the results of the query and cross references other records that match the name, ip, or guid from the results of the first query, and does it for each field. This is kind of hard to explain. But basically, if I search for one player by name, I'll see every other name he has used, every IP he has used and every GUID he has used.
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
Search: <input type="text" name="query" id="query" /><input type="submit" name="btnSubmit" value="Submit" />
</form>
<?php if (!empty($_POST['query'])) { ?>
<table cellspacing="1" id="1up_people" class="tablesorter" width="300">
<thead>
<tr>
<th>ID</th>
<th>Player Name</th>
<th>Player IP</th>
<th>Player GUID</th>
<th>Server</th>
<th>Date</th>
</tr>
</thead>
<tbody>
<?php
function super_unique($array)
{
$result = array_map("unserialize", array_unique(array_map("serialize", $array)));
foreach ($result as $key => $value)
{
if ( is_array($value) )
{
$result[$key] = super_unique($value);
}
}
return $result;
}
if (!empty($_POST['query'])) {
$query = trim($_POST['query']);
$count = 0;
$people = array();
$link = mysql_connect('localhost', 'mysqluser', 'yea right!');
if (!$link) {
die('Could not connect: ' . mysql_error());
}
mysql_select_db("1up");
$sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (name LIKE \"%$query%\" OR INET_NTOA(ip) LIKE \"%$query%\" OR guid LIKE \"%$query%\")";
$result = mysql_query($sql, $link);
if (!$result) {
die(mysql_error());
}
// Now take the initial results and parse each column into its own array
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$name = htmlspecialchars($row[1]);
$people[] = array(
'id' => $row[0],
'name' => $name,
'ip' => $row[2],
'guid' => $row[3],
'server' => $row[4],
'date' => $row[5]
);
}
// now for each name, ip, guid in results, find additonal records
$people2 = array();
foreach ($people AS $person) {
$ip = $person['ip'];
$sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (ip = \"$ip\")";
$result = mysql_query($sql, $link);
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$name = htmlspecialchars($row[1]);
$people2[] = array(
'id' => $row[0],
'name' => $name,
'ip' => $row[2],
'guid' => $row[3],
'server' => $row[4],
'date' => $row[5]
);
}
}
$people3 = array();
foreach ($people AS $person) {
$guid = $person['guid'];
$sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (guid = \"$guid\")";
$result = mysql_query($sql, $link);
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$name = htmlspecialchars($row[1]);
$people3[] = array(
'id' => $row[0],
'name' => $name,
'ip' => $row[2],
'guid' => $row[3],
'server' => $row[4],
'date' => $row[5]
);
}
}
$people4 = array();
foreach ($people AS $person) {
$name = $person['name'];
$sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (name = \"$name\")";
$result = mysql_query($sql, $link);
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$name = htmlspecialchars($row[1]);
$people4[] = array(
'id' => $row[0],
'name' => $name,
'ip' => $row[2],
'guid' => $row[3],
'server' => $row[4],
'date' => $row[5]
);
}
}
// Combine people and people2 into just people
$people = array_merge($people, $people2);
$people = array_merge($people, $people3);
$people = array_merge($people, $people4);
$people = super_unique($people);
foreach ($people AS $person) {
$date = ($person['date']) ? date("M d, Y", $person['date']) : 'Before 8/1/10';
echo "<tr>\n";
echo "<td>".$person['id']."</td>";
echo "<td>".$person['name']."</td>";
echo "<td>".$person['ip']."</td>";
echo "<td>".$person['guid']."</td>";
echo "<td>".$person['server']."</td>";
echo "<td>".$date."</td>";
echo "</tr>\n";
$count++;
}
// Find Total Records
//$result = mysql_query("SELECT id FROM 1up_people", $link);
//$total = mysql_num_rows($result);
mysql_close($link);
}
?>
</tbody>
</table>
<p>
<?php
echo $count." Records Found for \"".$_POST['query']."\" out of $total";
?>
</p>
<?php
}
$time_stop = microtime(true);
print("Done (ran for ".round($time_stop-$time_start)." seconds).");
?>
Any help at all is appreciated!
Thank you.

SELECT id,
name,
Inet_ntoa(ip) AS ip,
guid,
Inet_ntoa(server) AS server,
DATE
FROM 1up_people
WHERE ( name LIKE "%$query%"
OR Inet_ntoa(ip) LIKE "%$query%"
OR guid LIKE "%$query%" )
Some issues with the above query:
The query uses 3 fields in the where clauses and OR's the condition on each of the field. MySQL can use only one index for a query. So it has to select index on either name or ip or guid for this query. Even if there is a compound index (name,ip,guid) it cannot be used in this scenario as the conditions are OR-ed. A better way to do such queries is to use UNION. Eg.
SELECT <fields> FROM table1 WHERE field1='val1' /*will use index on field1*/
UNION
SELECT <fields> FROM table1 WHERE field2='val2' /*will use index on field2*/
...
SELECT <fields> FROM table1 WHERE fieldn='valn' /*will use index on fieldn*/.
In the above query you do a select on each field separately and then UNION it. This allows the indexes on each of those fields to be used making the query efficient. It has a downside of getting duplicate results if the same row matches on more than one condition. To avoid that you can use UNION DISTINCT instead of UNION, but will be more expensive as mysql has to de-dedupe the output. For this suggestion to work the issues discussed below also needs to be addressed. (There is not index on guid and it needs to be build).
The conditions use LIKE '%query%' for name and guid i.e wildcard(%) at the beginning. This means the index cannot be used even if it exists. Index can be used when you use = or % in the end of the string as "query%". When % is used in the start of the string index will not be used. (Ref: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html). A possible way out is to use only wildcard in the end or use full-text indexing on these fields.
The condition on ip is as INET_NTOA(ip) LIKE "%query%". When a function is used on the field any index on that field cannot be used. MySQL does not support functional index as of now. If such a query needs to be supported you may have to store this field also as a varchar and treat it similar to name and guid.
Because of the above issues the query will always do a full table scan and will not use any index. Using UNION (as suggested in 1) will not provide any improvement 2 and 3 are not fixed, and in fact it may hurt the performance as it may be doing 3 table scans instead of 1. You can try creating a full-text index on (name,guid,ip_string) and do your query as MATCH(name, guid, ip_string) AGAINST ("$query")
From looking at the code I see that after getting the results from the above query, subsequent queries are fired based on the results of this query. I am not sure if that is required as I think it will not find any new records. When you search for f LIKE "%q%" and use the results do searches like f='r1', the LIKE condition should have already captured all occurences of 'r1' and the subsequent queries will only be returning the duplicate results. In my opinion the additional queries can be skipped, but may be I am missing something.
On a side note do not interpolate the query strings in the SQL statement as name LIKE "%$query%". This is not secure and can be used for SQL injection attack. Use prepared statements with binded variables.

Since your table is MyISAM, create FULLTEXT indexes which will perform better then LIKE '%%'
to avoid all the queries in the loop, insert the main query into a temporary table, which you will use later to query related records:
Example
Instead of the primary SELECT, insert the rows first:
CREATE TEMPORARY TABLE IF NOT EXISTS `tmp_people` (
`id` bigint(20) unsigned NOT NULL,
`name` varchar(40) NOT NULL,
`ip` int(4) unsigned NOT NULL,
`guid` varchar(32) NOT NULL,
`server` int(4) unsigned NOT NULL,
`date` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `server` (`server`),
KEY `date` (`date`),
KEY `PlayerName` (`name`)
);
TRUNCATE TABLE tmp_people;
INSERT tmp_people
SELECT id, name, ip AS ip, guid, server AS server, date
FROM up_people
WHERE (name LIKE \"%$query%\" OR INET_NTOA(ip) LIKE \"%$query%\" OR guid LIKE \"%$query%\")
Then, query the results:
SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM tmp_people;
Finally, instead of looping over individual records, query all related records in the same select:
To get the related by ip:
SELECT up.id, up.name, INET_NTOA(up.ip) AS ip, up.guid, INET_NTOA(up.server) AS server, up.date FROM up_people up JOIN tmp_people tmp ON up.ip = tmp.ip
to get the related by guid:
SELECT up.id, up.name, INET_NTOA(up.ip) AS ip, up.guid, INET_NTOA(up.server) AS server, up.date FROM up_people up JOIN tmp_people tmp ON up.guid = tmp.guid;
to get the related by name:
SELECT up.id, up.name, INET_NTOA(up.ip) AS ip, up.guid, INET_NTOA(up.server) AS server, up.date FROM up_people up JOIN tmp_people tmp ON up.name = tmp.name
Side notes:
you do not need the PlayerName Index, since the name field is the left most field in the Person Index
There is no index on the guid field, so the query that finds related by guid will be slow.

Going back to the original structure, I would get rid of the composite index on (name, ip, guid) and create a non-unique index on name, and another non-unique index on ip.
I am not sure what to do about the guid. If you want to prevent duplicate player records, and neither the name alone, nor the name-with-ip is sufficient to guarantee uniqueness, perhaps appending an autoincrementing-integer-converted-to-string rather than a guid would be better.
As others have noted, "contains substring" i.e %foo% searches cannot take full advantage of an index; since the substring could occur in any/every indexed value, the entire index would have to be scanned. On the other hand, "starts-with" substring searches i.e. foo% are able to take advantage of an index.

Related

Select row with all related child rows as array in one query

I'm using Postgres (latest) with node (latest) PG (latest). Some endpoint is receiving json which looks like:
{
"id": 12345,
"total": 123.45,
"items": [
{
"name": "blue shirt",
"url": "someurl"
},
{
"name": "red shirt",
"url": "someurl"
}
]
}
So I'm storing this in two tables:
CREATE TABLE orders (
id INT NOT NULL,
total NUMERIC(10, 2) DEFAULT 0 NOT NULL,
PRIMARY KEY (id)
);
CREATE INDEX index_orders_id ON orders(id);
CREATE TABLE items (
id BIGSERIAL NOT NULL,
order_id INT NOT NULL,
name VARCHAR(128) NOT NULL,
url VARCHAR(128) DEFAULT '' NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);
CREATE INDEX index_items_id ON items(id);
The items table has a FK of order_id to relate the id of the order to its respective items.
Now, the issue is I almost always need to fetch the order along with the items.
How do I get an output similar to my input json in one query?
I know it can be done in two queries, but this pattern will be all over the place and needs to be efficient. My last resort would be to store the items as JSONB column directly in the orders table, but then if I need to query on the items or do joins with them it won't be as easy.
One of many ways:
SELECT jsonb_pretty(
to_jsonb(o.*) -- taking whole row
|| (SELECT jsonb_build_object('items', jsonb_agg(i))
FROM (
SELECT name, url -- picking columns
FROM items i
WHERE i.order_id = o.id
) i
)
)
FROM orders o
WHERE o.id = 12345;
This returns formatted text similar to the displayed input. (But keys are sorted, so 'total' comes after 'items'.)
If an order has no items, you get "items": null.
For a jsonb value, strip the jsonb_pretty() wrapper.
I chose jsonb for its additional functionality - like the jsonb || jsonb → jsonb operator and the jsonb_pretty() function.
Related:
Return multiple columns of the same row as JSON array of objects
If you want a json value instead, you can cast the jsonb directly (without format) or the formatted text (with format). Or build a json value with rudimentary formatting directly (faster):
SELECT row_to_json(sub, true)
FROM (
SELECT o.*
, (SELECT json_agg(i)
FROM (
SELECT name, url -- pick columns to report
FROM items i
WHERE i.order_id = o.id
) i
) AS items
FROM orders o
WHERE o.id = 12345
) sub;
db<>fiddle here
It all depends on what you need exactly.
Aside:
Consider type text (or varchar) instead of the seemingly arbitrary varchar(128). See:
Should I add an arbitrary length limit to VARCHAR columns?

GORM preload: How to use a custom table name

I have a GORM query with a preload that works just fine because I'm binding it to a struct called "companies" which is also the name of the corresponding database table:
var companies []Company
db.Preload("Subsidiaries").Joins("LEFT JOIN company_prod ON company_products.company_id = companies.id").Where("company_products.product_id = ?", ID).Find(&companies)
Now I want to do something similar, but bind the result to a struct that does not have a name that refers to the "companies" table:
var companiesFull []CompanyFull
db.Preload("Subsidiaries").Joins("LEFT JOIN company_prod ON company_products.company_id = companies.id").Where("company_products.product_id = ?", ID).Find(&companies)
I've simplified the second call for better understanding, the real call has more JOINs and returns more data, so it can't be bound to the "companies" struct.
I'm getting an error though:
column company_subsidiaries.company_full_id does not exist
The corresponding SQL query:
SELECT * FROM "company_subsidiaries" WHERE "company_subsidiaries"."company_full_id" IN (2,1)
There is no "company_subsidiaries.company_full_id", the correct query should be:
SELECT * FROM "company_subsidiaries" WHERE "company_subsidiaries"."company_id" IN (2,1)
The condition obviously gets generated from the name of the struct the result is being bound to. Is there any way to specify a custom name for this case?
I'm aware of the Tabler interface technique, however it doesn't work for Preload I believe (tried it, it changes the table name of the main query, but not the preload).
Updated: More info about the DB schema and structs
DB schema
TABLE companies
ID Primary key
OTHER FIELDS
TABLE products
ID Primary key
OTHER FIELDS
TABLE subsidiaries
ID Primary key
OTHER FIELDS
TABLE company_products
ID Primary key
Company_id Foreign key (companies.id)
Product_id Foreign key (products.id)
TABLE company_subsidiaries
ID Primary key
Company_id Foreign key (companies.id)
Subsidiary_id Foreign key (subsidiaries.id)
Structs
type Company struct {
Products []*Product `json:"products" gorm:"many2many:company_products;"`
ID int `json:"ID,omitempty"`
}
type CompanyFull struct {
Products []*Product `json:"products" gorm:"many2many:company_products;"`
Subsidiaries []*Subsidiary `json:"subsidiaries" gorm:"many2many:company_products;"`
ID int `json:"ID,omitempty"`
}
type Product struct {
Name string `json:"name"`
ID int `json:"ID,omitempty"`
}
type Subsidiary struct {
Name string `json:"name"`
ID int `json:"ID,omitempty"`
}
Generated SQL (by GORM)
SELECT * FROM "company_subsidiaries" WHERE "company_subsidiaries"."company_full_id" IN (2,1)
SELECT * FROM "subsidiaries" WHERE "subsidiaries"."id" IN (NULL)
SELECT companies.*, company_products.*, FROM "companies" LEFT JOIN company_products ON company_products.company_id = companies.id WHERE company_products.product_id = 1
Seems like the way to go in this case may be to customize the relationship in your CompanyFull model. Using joinForeignKey the following code works.
type CompanyFull struct {
Products []*Product `json:"products" gorm:"many2many:company_products;joinForeignKey:ID"`
Subsidiaries []*Subsidiary `json:"subsidiaries" gorm:"many2many:company_subsidiaries;joinForeignKey:ID"`
ID int `json:"ID,omitempty"`
}
func (CompanyFull) TableName() string {
return "companies"
}
func main(){
...
result := db.Preload("Subsidiaries").Joins("LEFT JOIN company_products ON company_products.company_id = companies.id").Where("company_products.product_id = ?", ID).Find(&companies)
if result.Error != nil {
log.Println(result.Error)
} else {
log.Printf("%#v", companies)
}
For more info regarding customizing the foreign keys used in relationships, take a look at the docs https://gorm.io/docs/many_to_many.html#Override-Foreign-Key

Efficiently mapping one-to-many many-to-many database to struct in Golang

Question
When dealing with a one-to-many or many-to-many SQL relationship in Golang, what is the best (efficient, recommended, "Go-like") way of mapping the rows to a struct?
Taking the example setup below I have tried to detail some approaches with Pros and Cons of each but was wondering what the community recommends.
Requirements
Works with PostgreSQL (can be generic but not include MySQL/Oracle specific features)
Efficiency - No brute forcing every combination
No ORM - Ideally using only database/sql and jmoiron/sqlx
Example
For sake of clarity I have removed error handling
Models
type Tag struct {
ID int
Name string
}
type Item struct {
ID int
Tags []Tag
}
Database
CREATE TABLE item (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
);
CREATE TABLE tag (
id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name VARCHAR(160),
item_id INT REFERENCES item(id)
);
Approach 1 - Select all Items, then select tags per item
var items []Item
sqlxdb.Select(&items, "SELECT * FROM item")
for i, item := range items {
var tags []Tag
sqlxdb.Select(&tags, "SELECT * FROM tag WHERE item_id = $1", item.ID)
items[i].Tags = tags
}
Pros
Simple
Easy to understand
Cons
Inefficient with the number of database queries increasing proportional with number of items
Approach 2 - Construct SQL join and loop through rows manually
var itemTags = make(map[int][]Tag)
var items = []Item{}
rows, _ := sqlxdb.Queryx("SELECT i.id, t.id, t.name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
for rows.Next() {
var (
itemID int
tagID int
tagName string
)
rows.Scan(&itemID, &tagID, &tagName)
if tags, ok := itemTags[itemID]; ok {
itemTags[itemID] = append(tags, Tag{ID: tagID, Name: tagName,})
} else {
itemTags[itemID] = []Tag{Tag{ID: tagID, Name: tagName,}}
}
}
for itemID, tags := range itemTags {
items = append(Item{
ID: itemID,
Tags: tags,
})
}
Pros
A single database call and cursor that can be looped through without eating too much memory
Cons
Complicated and harder to develop with multiple joins and many attributes on the struct
Not too performant; more memory usage and processing time vs. more network calls
Failed approach 3 - sqlx struct scanning
Despite failing I want to include this approach as I find it to be my current aim of efficiency paired with development simplicity. My hope was by explicitly setting the db tag on each struct field sqlx could do some advanced struct scanning
var items []Item
sqlxdb.Select(&items, "SELECT i.id AS item_id, t.id AS tag_id, t.name AS tag_name FROM item AS i JOIN tag AS t ON t.item_id = i.id")
Unfortunately this errors out as missing destination name tag_id in *[]Item leading me to believe the StructScan is not advanced enough to recursively loop through rows (no criticism - it is a complicated scenario)
Possible approach 4 - PostgreSQL array aggregators and GROUP BY
While I am sure this will not work I have included this untested option to see if it could be improved upon so it may work.
var items = []Item{}
sqlxdb.Select(&items, "SELECT i.id as item_id, array_agg(t.*) as tags FROM item AS i JOIN tag AS t ON t.item_id = i.id GROUP BY i.id")
When I have some time I will try and run some experiments here.
the sql in postgres :
create schema temp;
set search_path = temp;
create table item
(
id INT generated by default as identity primary key
);
create table tag
(
id INT generated by default as identity primary key,
name VARCHAR(160),
item_id INT references item (id)
);
create view item_tags as
select id,
(
select
array_to_json(array_agg(row_to_json(taglist.*))) as array_to_json
from (
select tag.name, tag.id
from tag
where item_id = item.id
) taglist ) as tags
from item ;
-- golang query this maybe
select row_to_json(row)
from (
select * from item_tags
) row;
then golang query this sql:
select row_to_json(row)
from (
select * from item_tags
) row;
and unmarshall to go struct:
pro:
postgres manage the relation of data. add / update data with sql functions.
golang manage business model and logic.
it's easy way.
.
I can suggest another approach which I have used before.
You make a json of the tags in this case in the query and return it.
Pros: You have 1 call to the db, which aggregates the data, and all you have to do is parse the json into an array.
Cons: It's a bit ugly. Feel free to bash me for it.
type jointItem struct {
Item
ParsedTags string
Tags []Tag `gorm:"-"`
}
var jointItems []*jointItem
db.Raw(`SELECT
items.*,
(SELECT CONCAT(
'[',
GROUP_CONCAT(
JSON_OBJECT('id', id,
'name', name
)
),
']'
)) as parsed_tags
FROM items`).Scan(&jointItems)
for _, o := range jointItems {
var tempTags []Tag
if err := json.Unmarshall(o.ParsedTags, &tempTags) ; err != nil {
// do something
}
o.Tags = tempTags
}
Edit: code might behave weirdly so I find it better to use a temporary tags array when moving instead of using the same struct.
You can use carta.Map() from https://github.com/jackskj/carta
It tracks has-many relationships automatically.

MetaColumns() for a MS SQL view?

I am trying to use ADODB to work with a MS SQL database containing views.
MetaColumns() works well with tables, but returns an empty array when I use a view name as the parameter. Further research shows that $metaColumnsSQL uses sys.tables object for resolving column names, so it doesn't appear to be intended for views. Is there a way to obtain column names for a view object?
ADOdb cannot provide a metaColumns() object for a view because it's basis is the interrogation of the schema for the objects associated with a single table.
You can emulate metaColumns() with a view by using the fetchField() method as follows, using NorthWind:
$SQL = " CREATE VIEW vEmpTerritories AS
SELECT employees.employeeId, EmployeeTerritories.TerritoryId
FROM employees, EmployeeTerritories
WHERE EmployeeTerritories.employeeId = employees.employeeId";
$db->execute($SQL);
$SQL = "SELECT * FROM vEmpTerritories";
$f = $db->execute($SQL);
$recordSet = $db->Execute($SQL);
$cols = $recordSet->fieldCount();
for($i=0;$i<$cols;$i++){
$fld = $recordSet->FetchField($i);
print_r($fld);
}
This would return an array of ADOfieldObjects with basic information about each column:
ADOFieldObject Object
(
[name] => employeeId
[max_length] =>
[type] => int
[column_source] => employeeId
)
ADOFieldObject Object
(
[name] => TerritoryId
[max_length] => 20
[type] => nvarchar
[column_source] => TerritoryId
)
Unfortunately, the data returned from fetchfield() is not as detailed as from metaColumns but it may be sufficient for your needs.

Prefetch related row or nothing with DBIx::Class, maybe with OUTER LEFT JOIN?

I want to retrieve rows from a table with DBIx::Class and prefetch respective rows from the same table where a column has a particular other value. I need to fetch all assignments from schedule A (to copy them) and retrieve all respective assignments from schedule B.
I have made up tables for testing which look like this:
CREATE TABLE tasks (
id INTEGER
);
CREATE TABLE schedules (
id INTEGER
);
CREATE TABLE assignments (
id INTEGER,
scheduleId INTEGER,
taskId INTEGER,
worker TEXT,
FOREIGN KEY (scheduleId) REFERENCES schedules(id),
FOREIGN KEY (taskId) REFERENCES tasks(id)
);
There are some assignments for schedule 1, and a few for schedule 2:
INSERT INTO tasks (id) VALUES (1);
INSERT INTO tasks (id) VALUES (2);
INSERT INTO schedules (id) VALUES (1);
INSERT INTO schedules (id) VALUES (2);
INSERT INTO assignments (id,scheduleId,taskId,worker) VALUES (1,1,1,"Alice");
INSERT INTO assignments (id,scheduleId,taskId,worker) VALUES (2,1,2,"Bob");
INSERT INTO assignments (id,scheduleId,taskId,worker) VALUES (3,2,1,"Charly");
This is some SQL that returns the desired result:
SELECT * FROM assignments AS a1
LEFT OUTER JOIN assignments AS a2 ON
a2.scheduleId = 2 AND
a2.taskId = a1.taskId
WHERE a1.scheduleId = 1;
In SQLite this works as expected: The results shows a line for each assignment from schedule 1 and the respective assignment from schedule 2.
id|scheduleId|taskId|worker|id|scheduleId|taskId|worker
1|1|1|Alice|3|2|1|Charly
1|1|2|Bob|NULL|NULL|NULL|NULL
What I've tried with DBIx::Class so far doesn't work. This is what the class for assignments looks like:
package MyApp::Schema::Result::Assignment;
...
__PACKAGE__->has_many(
inAllSchedules => 'MyApp::Schema::Result::Assignment',
{
'foreign.taskId' => 'self.taskId',
}
);
The following code correctly joins the rows but returns only rows from schedule 1 which actually have a respective row from schedule 2:
my $assignments = $schema->resultset('Assignment')->search({
'inAllSchedules.scheduleId' => 2,
}, {
prefetch => 'inAllSchedules',
});
This code correctly also correclty joins the rows and returns rows with no joined row, too, but I don't know how to filter the joined rows. I do not want to retrieve rows for schedule 3 etc. or just any other row ...
my $assignments = $schema->resultset('Assignment')->search(undef, {
join_type => 'left outer',
prefetch => 'inAllSchedules',
});
I can not write a specific relationship because the ID of schedule A or B is only given at runtime, of course.
How to either generate the given SQL code or otherwise retrieve the data in a clean way?
This looks like a case for custom join conditions. The solution below works for your limited example, but may need tweaking for your actual application.
__PACKAGE__->has_many(
'inAllSchedules' => "MyApp::Schema::Result::Assignment",
sub {
my $args = shift;
return {
"$args->{foreign_alias}.taskId" => { '-ident' => "$args->{self_alias}.taskId" },
"$args->{foreign_alias}.id" => { '<>' => { '-ident' => "$args->{self_alias}.id" } },
};
}
);
You would use it like so:
my $assignments = $schema->resultset("Assignment")->search({
'me.scheduleId' => 1,
'inAllSchedules.scheduleId' => [ 2, undef ],
},
{
'prefetch' => "inAllSchedules",
});