SQL Query for Joining Tables Using A Columns Splitted Strings - sql

Here's what I'm trying to do.
I have these:
Table1:
Name | Surname | Age | Location | ContactPeopleIds
John | Cobaing | 25 | Turkey | 1234,1512,1661, 2366,
Jack | Maltean | 29 | Italy | 6155,2333,1633,
Table2:
ID | Name | LastName | Location
1234 | Meg | Ryan | US
1512 | Jesy | Jade | US
1661 | John | Kradel | US
2366 | Jack | Abdona | Nigeria
TableIWant
Name | Surname | Age | Location | ContactPeopleNames
John | Cobaing | 25 | Turkey | Meg Ryan, Jesy Jade, John Kradel, Jack Abdona
I have found a splitter function called fn_ParseText2Table(data, splitter) that creates a table from the data splitted with splitter char. (Reference here)
For example:
select *
from dbo.fn_ParseText2Table('1234,1512,1661,2366', ',')
function produces:
int_value | num_value | txt_value
null | null | 1234
null | null | 1512
null | null | 1661
null | null | 2366
But I couldn't create a query using this.
I'm not sure to use t-sql or not.
I've tried to use common table expression but couldn't manage that either.
If you can provide multiple solutions, it would be very kind to provide detail about their performance value differences.

ok...
When you suggested that you'd tried a CTE you where heading in the right direction.
What you need to do however is chain 3 CTE's together. Once you have the processing chain you then need to progressively pass things through it like a filter, first splitting the ID's into a column of ints, then joining the ints on table2 to get the names, then recombining those names.
As has been previously mentioned, whoever designed this designed it badly, but assuming your using MS-SQL server and T-SQL the following code will do what you need it to:
DECLARE #tempString AS varchar(max)
SET #tempString =''
;WITH firstCte AS
(
SELECT
CAST('<M>' + REPLACE(contactpeopleids, ',','</M><M>') + '</M>' AS XML) AS Names
FROM
soTable1
-- THIS WHERE CLAUSE MUST MATCH THE FINAL WHERE CLAUSE
WHERE
name = 'John'
AND surname = 'Cobaing'
)
,secondCte AS
(
SELECT
Split.a.value('.','VARCHAR(100)') AS NameIds
FROM
firstCte
CROSS APPLY Names.nodes('/M') Split(a)
)
,thirdCte AS
(
SELECT
t2.name + ' ' + t2.lastname AS theContactName
FROM
secondCte t1
-- NOTE: IF THE IDS YOU EXTRACT FROM TABLE 1 DO NOT HAVE A MATCH IN TABLE 2 YOU WILL GET NO RESULT FOR THAT ID HERE!
-- IF YOU WANT NULL RESULTS CHANGE THIS TO A 'LEFT JOIN'
INNER JOIN
soTable2 t2 ON t1.NameIds = t2.id
)
SELECT
#tempString = #tempString + ',' + theContactName
FROM
thirdCte
;
-- The select substring is used to remove the leading ','
SELECT
name,
surname,
age,
location,
SUBSTRING(#tempString,2,LEN(#tempString)) AS contactpeoplenames
FROM
soTable1
WHERE
name = 'John'
AND surname = 'Cobaing'
It's probably not as elegant as it could be, and for ease of use you might want to wrap it up in a user defined function and pass the firstname and surname of the person to look up into it. If you do it that way, then you can use the function in a regular SQL select query to return rows direct out of table 1 into a view or another table.
The fun part of it all is actually the way in which we trick SQL server into splitting the string. You'll notice that we actually replace the ',' with XML tags, then use the XML processing functions to make SQL server think that we are processing an XML string.
SQL Server has had great routines for doing this kind of task sing the 2005 version, and allows for whole blocks of XML to be serialised and de-serialised to/from a varchar field directly in your db table, by making SQL server think it's dealing with XML it does most of the hard work for us.

**NORMALIZED EXAMPLE OF SELF REFERENCING ONE TO MANY RELATIONSHIP**
Study this example, must apply to yur case, made it fast (and is not fianl code, for example no meassure taken on mysql failure)
Put mysql host username and password..
<?PHP
echo '<pre>';
//mysql connect
mysql_connect('localhost', 'root','');
mysql_select_db("test");
//add some tsting data
addTestingData();
//suppose this come from a user
$_POST['user_id']=1;
//get all contacts of user with id = 1
$sql =
"SELECT `tbl_users`.`user_id`, `user_name`,
`user_surname`,`user_location` from `tbl_users`
LEFT JOIN `tbl_user_contacts`
ON `tbl_users`.`user_id`=`tbl_user_contacts`.`contact`
where `tbl_user_contacts`.`user_id`=".
mysql_real_escape_string($_POST['user_id'])." ";
//get data from mysql
$result = mysql_query($sql ) ;
while($row= mysql_fetch_row($result) )
print_r( $row );
///////////////end////////////////////////////////////////////
function addTestingData()
{
mysql_query("DROP TABLE IF EXISTS `tbl_users`");
mysql_query("
CREATE TABLE `tbl_users` (
`user_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT ,
`user_name` VARCHAR(50) NOT NULL,
`user_surname` VARCHAR(50) NOT NULL,
`user_location` VARCHAR(50) NOT NULL,
`user_age` smallint not null,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
");
for($i=1;$i<21;$i++) {
mysql_query("
insert into `tbl_users` (`user_name`,`user_surname`,`user_location`,
`user_age` ) values
('name{$i}','surname{$i}', 'location{$i}', '{$i}' ) ") ;
}
mysql_query("DROP TABLE IF EXISTS `tbl_user_contacts`");
mysql_query("
CREATE TABLE `tbl_user_contacts` (
`user_id` MEDIUMINT UNSIGNED NOT NULL ,
`contact` MEDIUMINT UNSIGNED NOT NULL ,
`other_field_testing` VARCHAR(30) NOT NULL,
PRIMARY KEY (`user_id`,`contact`),
CONSTRAINT `tbl_contact_fk1` FOREIGN KEY (`user_id`)
REFERENCES `tbl_users` (`user_id`)
ON DELETE CASCADE ,
CONSTRAINT `tbl_contact_fk2` FOREIGN KEY (`contact`)
REFERENCES `tbl_users` (`user_id`)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
");
$tmp=array();//help avoid dupicate entries while testing
for($i=1;$i<99;$i++) {
$contact=rand(1,20);
$user_id=rand(1,20);
if(!in_array($contact.$user_id,$tmp))
{
$tmp[]=$contact.$user_id;
mysql_query("
insert into `tbl_user_contacts` (`user_id`,`contact`,`other_field_testing` )
values ('{$user_id}','{$contact}','optinal-testing') ") ;
}//end of if
}//end of for
}//end of function
?>

Related

Postgres - Optimized dynamic headers in separate table

I have 2 tables in PostgreSQL:
CREATE TABLE contacts (
id bigint NOT NULL,
header_1 text NOT NULL,
header_2 text,
header_3 text );
CREATE TABLE headers (
id bigint NOT NULL,
name character varying,
header_type text NOT NULL,
organization_id bigint );
INSERT INTO contacts
(id, header_1, header_2, header_3)
VALUES
(1,'bob1#hotmail.com','Bob1','lol'),
(2,'bob2#hotmail.com','Bob2','lol'),
(3,'bob3#hotmail.com','Bob3','lol'),
(4,'bob4#hotmail.com','Bob4','lol'),
(5,'bob5#hotmail.com','Bob5','lol');
INSERT INTO headers
(id, name, header_type, organization_id)
VALUES
(1,'Email','email', 1),
(2,'First Name','first_name', 1),
(3,'Last Name','last_name', 1);
I wanna end up with this structure, the tricky part is that the headers are dynamic, meaning there can be n amount of headers, "contacts" columns will always start with 'header_' and "headers" will always match the contact id,
Email | First Name | Last Name
------------------|------------|-----------
bob1#hotmail.com | Bob1 | lol
bob2#hotmail.com | Bob2 | lol
bob3#hotmail.com | Bob3 | lol
bob4#hotmail.com | Bob4 | lol
bob5#hotmail.com | Bob5 | lol
Optimized querys are prefered
EDIT: Just to clarify
1.- There can be n amount of contact tables (contact1, contact2, etc)
2.- There can be n amount of rows in both header and contact tables.
3.- You can asume the data will always be integral, if table "contacts24" has a column named "header_57", you can asume theres gonna be a row in headers table with id: 57
At SQL, the table can not have a different number of columns for each row. So, it means you can not have a dynamic count of headers for each row on your contacts table.

Create a table without knowing its columns in SQL

How can I create a table without knowing in advance how many and what columns it exactly holds?
The idea is that I have a table DATA that has 3 columns : ID, NAME, and VALUE
What I need is a way to get multiple values depending on the value of NAME - I can't do it with simple WHERE or JOIN (because I'll need other values - with other NAME values - later on in my query).
Because of the way this table is constructed I want to PIVOT it in order to transform every distinct value of NAME into a column so it will be easier to get to it in my later search.
What I want now is to somehow save this to a temp table / variable so I can use it later on to join with the result of another query...
So example:
Columns:
CREATE TABLE MainTab
(
id int,
nameMain varchar(max),
notes varchar(max)
);
CREATE TABLE SecondTab
(
id int,
id_mainTab, int,
nameSecond varchar(max),
notes varchar(max)
);
CREATE TABLE DATA
(
id int,
id_second int,
name varchar(max),
value varchar(max)
);
Now some example data from the table DATA:
| id | id_second_int | name | value |
|-------------------------------------------------------|
| 1 | 5550 | number | 111115550 |
| 2 | 6154 | address | 1, First Avenue |
| 3 | 1784 | supervisor | John Smith |
| 4 | 3467 | function | Marketing |
| 5 | 9999 | start_date | 01/01/2000 |
::::
Now imagine that 'name' has A LOT of different values, and in one query I'll need to get a lot of different values depending on the value of 'name'...
That's why I pivot it so that number, address, supervisor, function, start_date, ... become colums.
This I do dynamically because of the amount of possible columns - it would take me a while to write all of them in an 'IN' statement - and I don't want to have to remember to add it manually every time a new 'name' value gets added...
herefore I followed http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
the thing is know that I want the result of my execute(#query) to get stored in a tempTab / variable. I want to use it later on to join it with mainTab...
It would be nice if I could use #cols (which holds the values of DATA.name) but I can't seem to figure out a way to do this.
ADDITIONALLY:
If I use the not dynamic way (write down all the values manually after 'IN') I still need to create a column called status. Now in this column (so far it's NULL everywhere because that value doesn't exist in my unpivoted table) i want to have 'open' or 'closed', depending on the date (let's say i have start_date and end_date,
CASE end_date
WHEN end_date < GETDATE() THEN pivotTab.status = 'closed'
ELSE pivotTab.status = 'open'
Where can I put this statement? Let's say my main query looks like this:
SELECT * FROM(
(SELECT id_second, name, value, id FROM TABLE_DATA) src
PIVOT (max(value) FOR name IN id, number, address, supervisor, function, start_date, end_date, status) AS pivotTab
JOIN SecondTab ON SecondTab.id = pivotTab.id_second
JOIN MainTab ON MainTab.id = SecondTab.id_mainTab
WHERE pivotTab.status = 'closed';
Well, as far as I can understand - you have some select statement and just need to "dump" its result to some temporary table. In this case you can use select into syntax like:
select .....
into #temp_table
from ....
This will create temporary table according to columns in select statement and populate it with data returned by select datatement.
See MDSN for reference.

Speeding up partitioning query on ancient SQL Server version

The Setup
I've got performance and conceptional problems with getting a query right on SQL Server 7 running on a dual core 2GHz + 2GB RAM machine - no chance of getting that out of the way, as you might expect :-/.
The Situation
I'm working with a legacy database and I need to mine for data to get various insights. I've got the all_stats table that contains all the stat data for a thingy in a specific context. These contexts are grouped with the help of the group_contexts table. A simplified schema:
+--------------------------------------------------------------------+
| thingies |
+--------------------------------------------------------------------|
| id | INT PRIMARY KEY IDENTITY(1,1) |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| all_stats |
+--------------------------------------------------------------------+
| id | INT PRIMARY KEY IDENTITY(1,1) |
| context_id | INT FOREIGN KEY REFERENCES contexts(id) |
| value | FLOAT NULL |
| some_date | DATETIME NOT NULL |
| thingy_id | INT NOT NULL FOREIGN KEY REFERENCES thingies(id) |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| group_contexts |
+--------------------------------------------------------------------|
| id | INT PRIMARY KEY IDENTITY(1,1) |
| group_id | INT NOT NULL FOREIGN KEY REFERENCES groups(group_id) |
| context_id | INT NOT NULL FOREIGN KEY REFERENCES contexts(id) |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| contexts |
+--------------------------------------------------------------------+
| id | INT PRIMARY KEY IDENTITY(1,1) |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| groups |
+--------------------------------------------------------------------+
| group_id | INT PRIMARY KEY IDENTITY(1,1) |
+--------------------------------------------------------------------+
The Problem
The task is, for a given set of thingies, to find and aggregate the 3 most recent (all_stats.some_date) stats of a thingy for all groups the thingy has stats for. I know it sounds easy but I can't get around how to do this properly in SQL - I'm not exactly a prodigy.
My Bad Solution (no it's really bad...)
My solution right now is to fill a temporary table with all the required data and UNION ALLing the data I need:
-- Before I'm building this SQL I retrieve the relevant groups
-- for being able to build the `UNION ALL`s at the bottom.
-- I also retrieve the thingies that are relevant in this context
-- beforehand and include their ids as a comma separated list -
-- I said it would be awfull ...
-- Creating the temp table holding all stats data rows
-- for a thingy in a specific group
CREATE TABLE #stats
(id INT PRIMARY KEY IDENTITY(1,1),
group_id INT NOT NULL,
thingy_id INT NOT NULL,
value FLOAT NOT NULL,
some_date DATETIME NOT NULL)
-- Filling the temp table
INSERT INTO #stats(group_id,thingy_id,value,some_date)
SELECT filtered.group_id, filtered.thingy_id, filtered.some_date, filtered.value
FROM
(SELECT joined.group_id,joined.thingy_id,joined.value,joined.some_date
FROM
(SELECT groups.group_id,data.value,data.thingy_id,data.some_date
FROM
-- Getting the groups associated with the contexts
-- of all the stats available
(SELECT DISTINCT context.group_id
FROM all_stats AS stat
INNER JOIN group_contexts AS groupcontext
ON groupcontext.context_id = stat.context_id
) AS groups
INNER JOIN
-- Joining the available groups with the actual
-- stat data of the group for a thingy
(SELECT context.group_id,stat.value,stat.some_date,stat.thingy_id
FROM all_stats AS stat
INNER JOIN group_contexts AS groupcontext
ON groupcontext.context_id = stat.context_id
WHERE stat.value IS NOT NULL
AND stat.value >= 0) AS data
ON data.group_id = groups.group_id) AS joined
) AS filtered
-- I already have the thingies beforehand but if it would be possible
-- to include/query for them in another way that'd be OK by me
WHERE filtered.thingy_id in (/* somewhere around 10000 thingies are available */)
-- Now I'm building the `UNION ALL`s for each thingy as well as
-- the group the stat of the thingy belongs to
-- thingy 42 {
-- Getting the average of the most recent 3 stat items
-- for a thingy with id 42 in group 982
SELECT x.group_id,x.thingy_id,AVG(x.value)
FROM
(SELECT TOP 3 s.group_id,s.thingy_id,s.value,s.some_date
FROM #stats AS s
WHERE s.group_id = 982
AND s.thingy_id = 42
ORDER BY s.some_date DESC) AS x
GROUP BY x.group_id,x.thingy_id
HAVING COUNT(*) >= 3
UNION ALL
-- Getting the average of the most recent 3 stat items
-- for a thingy with id 42 in group 314159
SELECT x.group_id,x.thingy_id,AVG(x.value)
FROM
(SELECT TOP 3 s.group_id,s.thingy_id,s.value,s.some_date
FROM #stats AS s
WHERE s.group_id = 314159
AND s.thingy_id = 42
ORDER BY s.some_date DESC) AS x
GROUP BY x.group_id,x.thingy_id
HAVING COUNT(*) >= 3
-- }
UNION ALL
-- thingy 21 {
-- Getting the average of the most recent 3 stat items
-- for a thingy with id 21 in group 982
/* you get the idea */
This works - slowly, but it works - for small sets of data (e.g. say 100 thingies that have 10 stats attached each) but the problem domain it has to eventually work is in 10000+ thingies with potentially hundreds of stats per thingy. As a side note: the generated SQL query is ridiculously large: a pretty small query involves say 350 thingies that have data in 3 context groups and it's amounting to more than 250 000 formatted lines of SQL - executing in a stunning 5 minutes.
So if anyone has an idea how to solve this I really, really would appreciate your help :-).
On your ancient SQL Server release you need to use some old-style Scalar Subquery to get the last three rows for all thingies in a single query :-)
SELECT x.group_id,x.thingy_id,AVG(x.value)
FROM
(
SELECT s.group_id,s.thingy_id,s.value
FROM #stats AS s
where (select count(*) from #stats as s2
where s.group_id = s2.group_id
and s.thingy_id = s2.thingy_id
and s.some_date <= s2.some_date
) <= 3
) AS x
GROUP BY x.group_id,x.thingy_id
HAVING COUNT(*) >= 3
To get better performance you need to add a clustered index, probably (group_id,thingy_id,some_date desc,value) to the #stats table.
If group_id,thingy_id,some_date is unique you should remove the useless ID column, otherwise order by group_id,thingy_id,some_date desc during the Insert/Select into #stats and use ID instead of some_date for finding the last three rows.

Adding string to the primary key?

I want to add some string with the primary key value while creating the table in sql?
Example:
my primary key column should automatically generate values like below:
'EMP101'
'EMP102'
'EMP103'
How to achieve it?
Try this: (For SQL Server 2012)
UPDATE MyTable
SET EMPID = CONCAT('EMP' , EMPID)
Or this: (For SQL Server < 2012)
UPDATE MyTable
SET EMPID = 'EMP' + EMPID
SQLFiddle for SQL Server 2008
SQLFiddle for SQL Server 2012
Since you want to set auto increment in VARCHAR type column you can try this table schema:
CREATE TABLE MyTable
(EMP INT NOT NULL IDENTITY(1000, 1)
,[EMPID] AS 'EMP' + CAST(EMP AS VARCHAR(10)) PERSISTED PRIMARY KEY
,EMPName VARCHAR(20))
;
INSERT INTO MyTable(EMPName) VALUES
('AA')
,('BB')
,('CC')
,('DD')
,('EE')
,('FF')
Output:
| EMP | EMPID | EMPNAME |
----------------------------
| 1000 | EMP1000 | AA |
| 1001 | EMP1001 | BB |
| 1002 | EMP1002 | CC |
| 1003 | EMP1003 | DD |
| 1004 | EMP1004 | EE |
| 1005 | EMP1005 | FF |
See this SQLFiddle
Here you can see EMPID is auto incremented column with Primary key.
Source: HOW TO SET IDENTITY KEY/AUTO INCREMENT ON VARCHAR COLUMN IN SQL SERVER (Thanks to #bvr)
What the rule of thumb is, is that never use meaningful information in primary keys (like Employee Number / Social Security number). Let that just be a plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).
it seems the only reason you are want to use a non-integer keys is that the key is generated as string concatenation with another column to make it unique.
From a best practice perspective, it is strongly recommended that integer primary keys are used, but often, this guidance is ignored.
May be going through the following posts might be of help:
Should I design a table with a primary key of varchar or int?
SQL primary key: integer vs varchar
You can achieve it at least in two ways:
Generate new id on the fly when you insert a new record
Create INSTEAD OF INSERT trigger that will do that for you
If you have a table schema like this
CREATE TABLE Table1
([emp_id] varchar(12) primary key, [name] varchar(64))
For the first scenario you can use a query
INSERT INTO Table1 (emp_id, name)
SELECT newid, 'Jhon'
FROM
(
SELECT 'EMP' + CONVERT(VARCHAR(9), COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0) + 1) newid
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
) q
Here is SQLFiddle demo
For the second scenario you can a trigger like this
CREATE TRIGGER tg_table1_insert ON Table1
INSTEAD OF INSERT AS
BEGIN
DECLARE #max INT
SET #max =
(SELECT COALESCE(REPLACE(MAX(emp_id), 'EMP', ''), 0)
FROM Table1 WITH (TABLOCKX, HOLDLOCK)
)
INSERT INTO Table1 (emp_id, name)
SELECT 'EMP' + CONVERT(VARCHAR(9), #max + ROW_NUMBER() OVER (ORDER BY (SELECT 1))), name
FROM INSERTED
END
Here is SQLFiddle demo
I am looking to do something similar but don't see an answer to my problem here.
I want a primary Key like "JonesB_01" as this is how we want our job number represented in our production system.
--ID | First_Name | Second_Name | Phone | Etc..
-- Bob Jones 9999-999-999
--ID = "Second_Name"+"F"irst Initial+"_(01-99)"
The number 01-99 has been included to allow for multiple instances of a customer with the same surname and first initial. In our industry it's not unusual for the same customer to have work done on multiple occasions but are not repeat business on an ongoing basis. I expect this convention to last a very long time. If we ever exceed it, then I can simply add a third interger.
I want this to auto populate to keep data entry as simple as possible.
I managed to get a solution to work using Excel formulars and a few helper cells but am new to SQL.
--CellA2 = JonesB_01 (=concatenate(D2+E2))
--CellB2 = "Bob"
--CellC2 = "Jones"
--CellD2 = "JonesB" (=if(B2="","",Concatenate(C2,Left(B2)))
--CellE2 = "_01" (=concatenate("_",Text(F2,"00"))
--CellF2 = "1" (=If(D2="","",Countif($D$2:$D2,D2))
Thanks.
SELECT 'EMP' || TO_CHAR(NVL(MAX(TO_NUMBER(SUBSTR(A.EMP_NO, 4,3))), '000')+1) AS NEW_EMP_NO
FROM
(SELECT 'EMP101' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP102' EMP_NO
FROM DUAL
UNION ALL
SELECT 'EMP103' EMP_NO
FROM DUAL
) A

Speeding up this big JOIN

EDIT: there was a mistake in the following question that explains the observations. I could delete the question but this might still be useful to someone. The mistake was that the actual query running on the server was SELECT * FROM t (which was silly) when I thought it was running SELECT t.* FROM t (which makes all the difference). See tobyobrian's answer and the comments to it.
I've a too slow query in a situation with a schema as follows. Table t has data rows indexed by t_id. t adjoins tables x and y via junction tables t_x and t_y each of which contains only the foreigns keys required for the JOINs:
CREATE TABLE t (
t_id INT NOT NULL PRIMARY KEY,
data columns...
);
CREATE TABLE t_x (
t_id INT NOT NULL,
x_id INT NOT NULL,
PRIMARY KEY (t_id, x_id),
KEY (x_id)
);
CREATE TABLE t_y (
t_id INT NOT NULL,
y_id INT NOT NULL,
PRIMARY KEY (t_id, y_id),
KEY (y_id)
);
I need to export the stray rows in t, i.e. those not referenced in either junction table.
SELECT t.* FROM t
LEFT JOIN t_x ON t_x.t_id=t.t_id
LEFT JOIN t_y ON t_y.t_id=t.t_id
WHERE t_x.t_id IS NULL OR t_y.t_id IS NULL
INTO OUTFILE ...;
t has 21 M rows while t_x and t_y both have about 25 M rows. So this is naturally going to be a slow query.
I'm using MyISAM so I thought I'd try to speed it up by preloading the t_x and t_y indexes. The combined size of t_x.MYI and t_y.MYI was about 1.2 M bytes so I created a dedicated key buffer for them, assigned their PRIMARY keys to the dedicated buffer and LOAD INDEX INTO CACHE'ed them.
But as I watch the query in operation, mysqld is using about 1% CPU, the average system IO pending queue length is around 5, and mysqld's average seek size is in the 250 k range. Moreover, nearly all the IO is mysqld reading from t_x.MYI and t_x.MYD.
I don't understand:
Why mysqld is reading the .MYD files at all?
Why mysqld isn't using the preloaded the t_x and t_y indexes?
Could it have something to do with the t_x and t_y PRIMARY keys being over two columns?
EDIT: The query explained:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+
| 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 20980052 | |
| 1 | SIMPLE | t_x | ref | PRIMARY | PRIMARY | 4 | db.t.t_id | 235849 | Using index |
| 1 | SIMPLE | t_y | ref | PRIMARY | PRIMARY | 4 | db.t.t_id | 207947 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-----------+----------+-------------+
Use not exists - this will be the fastest - much better than 'joins' or using 'not in' in this sitution.
SELECT t.* FROM t a
Where not exists (select 1 from t_x b
where b.t_id = a.t_id)
or not exists (select 1 from t_y c
where c.t_id = a.t_id);
I can answer part 1 of your question, and i may or may not be able to answer part two if you post the output of EXPLAIN:
In order to select t.* it needs to look in the MYD file - only the primary key is in the index, to fetch the data columns you requested it needs the rest of the columns.
That is, your query is quite probably filtering the results very quickly, its just struggling to copy all the data you wanted.
Also note that you will probably have duplicates in your output - if one row has no refs in t_x, but 3 in x_y you will have the same t.* repeated 3 times. Given we think the where clause is sufficiently efficient, and much time is spent on reading the actual data, this is quite possibly the source of your problems. try changing to select distinct and see if that helps your efficiency
This may be a bit more efficient:
SELECT *
FROM t
WHERE t.id NOT IN (
SELECT DISTINCT t_id
FROM t_x
UNION
SELECT DISTINCT t_id
FROM t_y
);