Missing records if add "NOT IN" - sql

I have a table of doctor names and states.
f_name | l_name | state
MICHAEL | CRANE |
HAL | CRANE | MD
THOMAS | ROMINA | DE
And so on.
What I want is to get all doctors that are NOT in MD. However, if I write this expression I'm missing those with NULL values for state.
SELECT *
FROM doctors
WHERE state NOT IN ('MD')
I don't understand the issue. I was able to fix it by adding
OR state IS NULL
Obviously it has something to due with NOT IN (or IN) not handling NULL. Can anyone explain this for me? Is there an alternative for what I was trying to do?
Thanks

Yes, there is an alternative - you would use the NVL() function (or COALESCE() if you want to stick to the ANSI standard):
SELECT * FROM doctors
WHERE NVL(state, '##') NOT IN ('MD')
However you don't really need to use NOT IN here - it's only necessary when you have multiple values, e.g.:
SELECT * FROM doctors
WHERE NVL(state, '##') NOT IN ('MD','PA')
With one value you can just use = (or in this case, != or <>):
SELECT * FROM doctors
WHERE NVL(state, '##') != 'MD'
In Oracle SQL, NULL can't be compared to other values (not even other NULLs). So WHERE NULL = NULL, for example, will return zero rows. You do NULL comparisons with IS NULL and IS NOT NULL.

As noted already, you don't know that Michael Crane's state isn't Maryland. It's NULL, which can be read as representing "don't know". It might be Maryland, or it might not be. NOT IN ('MD') only finds those values known not to be 'MD'.
If you have a filter WHERE x, you can use MINUS to find exactly those records where x is not true (where x is either false or unknown).
select *
from doctors
minus
select *
from doctors
where state in ('MD');
This has one big advantage over anything involving IS NULL or NVL: it's immediately obvious exactly which records you don't want to see. You don't have to worry about accidentally missing one case where NULL isn't covered in your condition, and you don't have to worry about records that happen to match whatever dummy value you use with NVL.
It's generally not good for performance on Oracle, accessing the table twice, but for one-off queries, depending on the table size, the time saved writing the query can be more than the added execution time.

Inside database, null are not physical string values("null"), it simply says no value. So, if you compare NULLs to anything, it will not be equal or not equal. Even, two NULL's are not equal. You can only check whetherr a value is NULL or not but you can't compare it to other values.

Related

Selecting only such groups that contain certain value

First of all, even though this SQL: How do you select only groups that do not contain a certain value? thread is almost identical to my problem, it doesn't fully dissipate my confusion about the problem.
Let's have a table "Contacts" like this one:
+----------------------+
| Department FirstName |
+----------------------+
| 100 Thomas |
| 200 Peter |
| 100 Jerry |
+----------------------+
First, I want to group the rows by the department number and show number of rows in each displayed group. This, I believe, can be easily done by the following query.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
This outputs 2 groups. First with dep.no. 100 containing 2 rows, second with 200 containing only one row.
But then, I want to extend the query to exclude any group that doesn't contain certain value in certain column (e.g. Thomas in FirstName). Here are my questions:
1) Reading the above-mentioned thread I was able to come up with this, which seems to work correctly:
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
WHERE Department IN (SELECT Department FROM Contacts WHERE FirstName = "Thomas")
GROUP BY Department
Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.
2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
HAVING NOT SUM(FirstName = "Thomas") = 0
Q: How does this work? Why alteration: HAVING SUM(FirstName = "Thomas") > 0 doesn't work?
3) Q: Is there any simple and correct way to do this using the HAVING clause?
I expected, that simple "HAVING FirstName='Thomas'" after the GROUP BY would do the trick as it seems to follow a common language, but it does not.
Note that I want the whole groups to be chosen by the query so "WHERE FirstName='Thomas'" isn't s solution for my problem as it excludes all the rows that don't satisfy the condition before the grouping takes place (at least the way I understand it).
Q: How does this work? I understand the "WHERE Department IN" part,
but then I'd expect a value, but instead another nested query is
included, which to me doesn't make much sense as I'm only beginner
with SQL.
The nested query returns values which are used to match against Department
2) By accident I was able to come up with another query that also
seems to work, but feels weird and I also don't understand its
workings.
HAVING NOT SUM(FirstName = "Thomas") = 0
"Feels weird" because, well, it is. This is not a place for the SUM function.
EDIT: Why does this work?
The expression FirstName = "Thomas" gets evaluated as true or false (known as a Boolean expression). True numerically is equal to 1 and False converts to 0 (zero). By including SUM you then calculated the totals so really zero (still) means false and "not zero" is true. Then to make it weird(er) you included NOT which negated the whole thing and it becomes NOT TRUE = 0 or FALSE = FALSE (which is of course... TRUE)!!
EDIT: I think what could be more helpful to you is consideration of when to use WHERE and when to use HAVING (instead of the Boolean magic taking place).
From this answer:
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows.
WHERE was appropriate for your example because first you want to "only return rows WHERE Department IN (100)" and then you want to "group those rows by Department" and get a COUNT of how many rows had been selected.

Google Bigquery use of substr, never returns back results

I have a table which has two sets of data, one set of data has information like
Type | Name | Id
PackagedDrug |Pseudoephedrine HCl Oral Tablet 120 MG| 110
PackagedDrug |Pseudoephedrine HCl Oral Tablet 60 MG|111
DrugName| Pseudoephedrine HCl| 112
What I want to do is join PackagedDrug with DrugName concepts, so get all Ids for Type PackagedDrug whose Name is matching with Name for Type DrugName. If I hardcode the Name for DrugName in the following query, it runs instantenously, but if I take out the hardcoding then it just keeps on running. Could you please suggest me suitable ways to speed up the big query?
SELECT a.MSC_ID MSC_id, a.MSC_CONcept_type, a.concept_id, a.concept_name , b.concept_name
from
(select MSC_id, MSC_CONcept_type, concept_id, concept_name
FROM [ClientAlerts.MSC_Concepts]
where MSC_CONcept_type in ('MediSpan.Concepts.PackagedDrug') ) a
CROSS JOIN
(select MSC_CONcept_type, concept_id, concept_name , length(concept_name) len
FROM [ClientAlerts.MSC_Concepts]
where MSC_CONcept_type in ('MediSpan.Concepts.NamebasedClassification.DrugName')
-- and concept_name in ('Pseudoephedrine HCl')
) b
where substr(a.concept_name,1,b.len)+' ' = b.concept_name
Thanks,
Savita
This has nothing to do with BigQuery itself. When you hardcode, your values are "filtered" way faster, because it doesn't have to check every row, since it looks for the hardcoded value.
If you don't use the hardcoded value, it will look at WAY more rows, compare ALL the rows from your first query with your second. Honestly, if you describe your use case properly here, I don't think of any way to do this faster.
But one question does come to mind. Why do you have a "type". It seems like it should be two different tables instead.

Using NVL for multiple columns - Oracle SQL

Good morning my beloved sql wizards and sorcerers,
I am wanting to substitute on 3 columns of data across 3 tables. Currently I am using the NVL function, however that is restricted to two columns.
See below for an example:
SELECT ccc.case_id,
NVL (ccvl.descr, ccc.char)) char_val
FROM case_char ccc, char_value ccvl, lookup_value lval1
WHERE
ccvl.descr(+) = ccc.value
AND ccc.value = lval1.descr (+)
AND ccc.case_id IN ('123'))
case_char table
case_id|char |value
123 |email| work_email
124 |issue| tim_
char_value table
char | descr
work_email | complaint mail
tim_ | timeliness
lookup_value table
descr | descrlong
work_email| xxx#blah.com
Essentially what I am trying to do is if there exists a match for case_char.value with lookup_value.descr then display it, if not, then if there exists a match with case_char.value and char_value.char then display it.
I am just trying to return the description for 'issue'from the char_value table, but for 'email' I want to return the descrlong from the lookup_value table (all under the same alias 'char_val').
So my question is, how do I achieve this keeping in mind that I want them to appear under the same alias.
Let me know if you require any further information.
Thanks guys
You could nest NVL:
NVL(a, NVL(b, NVL(c, d))
But even better, use the SQL-standard COALESCE, which does take multiple arguments and also works on non-Oracle systems:
COALESCE(a, b, c, d)
How about using COALESCE:
COALESCE(ccvl.descr, ccc.char)
Better to Use COALESCE(a, b, c, d) because of below reason:
Nested NVL logic can be achieved in single COALESCE(a, b, c, d).
It is SQL standard to use COALESCE.
COALESCE gives better performance in terms, NVL always first calculate both of the queries used and then compare if the first value is null then return a second value. but in COALESCE function it checks one by one and returns response whenever if found a non-null value instead of executing all the used queries.

Why doesn't SQL support "= null" instead of "is null"?

I'm not asking if it does. I know that it doesn't.
I'm curious as to the reason. I've read support docs such as as this one on Working With Nulls in MySQL but they don't really give any reason. They only repeat the mantra that you have to use "is null" instead.
This has always bothered me. When doing dynamic SQL (those rare times when it has to be done) it would be so much easier to pass "null" into where clause like this:
#where = "where GroupId = null"
Which would be a simple replacement for a regular variable. Instead we have to use if/else blocks to do stuff like:
if #groupId is null then
#where = "where GroupId is null"
else
#where = "where GroupId = #groupId"
end
In larger more-complicated queries, this is a huge pain in the neck. Is there a specific reason that SQL and all the major RDBMS vendors don't allow this? Some kind of keyword conflict or value conflict that it would create?
Edit:
The problem with a lot of the answers (in my opinion) is that everyone is setting up an equivalency between null and "I don't know what the value is". There's a huge difference between those two things. If null meant "there's a value but it's unknown" I would 100% agree that nulls couldn't be equal. But SQL null doesn't mean that. It means that there is no value. Any two SQL results that are null both have no value. No value does not equal unknown value. Two different things. That's an important distinction.
Edit 2:
The other problem I have is that other HLLs allow null=null perfectly fine and resolve it appropriately. In C# for instance, null=null returns true.
The reason why it's off by default is that null is really not equal to null in a business sense. For example, if you were joining orders and customers:
select * from orders o join customers c on c.name = o.customer_name
It wouldn't make a lot of sense to match orders with an unknown customer with customers with an unknown name.
Most databases allow you to customize this behaviour. For example, in SQL Server:
set ansi_nulls on
if null = null
print 'this will not print'
set ansi_nulls off
if null = null
print 'this should print'
Equality is something that can be absolutely determined. The trouble with null is that it's inherently unknown. If you follow the truth table for three-value logic, null combined with any other value is null - unknown. Asking SQL "Is my value equal to null?" would be unknown every single time, even if the input is null. I think the implementation of IS NULL makes it clear.
It's a language semantic.
Null is the lack of a value.
is null makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".
I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).
We have a table for employees, EMP:
EMP
---
EMPNO GIVENNAME
E0001 Boris
E0002 Chris
E0003 Dave
E0004 Steve
E0005 Tony
And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS):
TROUS
-----
EMPNO DATE COLOUR
E0001 20110806 Brown
E0002 20110806 Blue
E0003 20110806 Black
E0004 20110806 Brown
E0005 20110806 Black
E0001 20110807 Black
E0003 20110807 Black
E0004 20110807 Grey
I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:
SELECT e.GIVENNAME,t.COLOUR
FROM
EMP e
LEFT JOIN
TROUS t
ON
e.EMPNO = t.EMPNO and
t.DATE = '20110807'
And we get the result set:
GIVENNAME COLOUR
Chris NULL
Steve Grey
Dave Black
Boris Black
Tony NULL
Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?
Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)
Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)
Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)
Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)
Were Chris and Tony wearing the same colour trousers on that day? (Unknown)
Note, that you're already aware of specific mechanisms (e.g. IS NULL) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.
But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)
If you're using NULL purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.
I find it interesting that you've brought up NaN in comments to other answers, without seeing that NaN and (SQL) NULL have a lot in common. The biggest difference between them is that NULL is intended for use across the system, no matter what data type is involved.
You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)).
The concept is that NULL is not an equitable value. It denotes the absence of a value.
Therefore, a variable or a column can only be checked if it IS NULL, but not if it IS EQUAL TO NULL.
Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL, or IS LESS THAN OR EQUAL TO NULL
NULL is unknown. It is neither true nor false so when you are comparing anything to unknown, the only answer is "unknown" Much better article on wikipedia http://en.wikipedia.org/wiki/Null_(SQL)
Because in ANSI SQL, null means "unknown", which is not a value. As such, it doesn't equal anything; you can just evaluate the value's state (known or unknown).
a. Null is not the "lack of a value"
b. Null is not "empty"
c. Null is not an "unset value"
It's all of the above and none of the above.
By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.
So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.
It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.
It's an "unset value" in that if the space was just allocated, the value that is there has never been set.
"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.
Because of that, if we try to compare this "unknown" value, we will get a comparison that
a) may or may not be valid
b) may or may not have the result we expect
c) may or may not crash the database.
So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.
Therefore, "= null" makes no sense.
In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.
I agree with the OP that
where column_name = null
should be syntactic sugar for
where column_name is null
However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:
A == B
This predicate tests whether A is equal to B. Here's what the truth table looks like:
T U F
-----
T | T U F
U | U U U
F | F U T
If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.
In SQL, null is a synonym for unknown. So, the SQL predicate
column_name = null
tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.
NULL doesn't equal NULL. It can't equal NULL. It doesn't make sense for them to be equal.
A few ways to think about it:
Imagine a contacts database, containing fields like FirstName, LastName, DateOfBirth and HairColor. If I looked for records WHERE DateOfBirth = HairColor, should it ever match anything? What if someone's DateOfBirth was NULL, and their HairColor was too? An unknown hair color isn't equal to an unknown anything else.
Let's join the contacts table with purchases and product tables. Let's say I want to find all the instances where a customer bought a wig that was the same color as their own hair. So I query WHERE contacts.HairColor = product.WigColor. Should I get matches between every customer I don't know the hair color of and products that don't have a WigColor? No, they're a different thing.
Let's consider that NULL is another word for unknown. What's the result of ('Smith' = NULL)? The answer is not false, it's unknown. Unknown is not true, therefore it behaves like false. What's the result of (NULL = NULL)? The answer is also unknown, therefore also effectively false. (This is also why concatenating a string with a NULL value makes the whole string become NULL -- the result really is unknown.)
Why don't you use the isnull function?
#where = "where GroupId = "+ isnull(#groupId,"null")

Optimizing a strange MySQL Query

Hoping someone can help with this. I have a query that pulls data from a PHP application and turns it into a view for use in a Ruby on Rails application. The PHP app's table is an E-A-V style table, with the following business rules:
Given fields: First Name, Last Name, Email Address, Phone Number and Mobile Phone Carrier:
Each property has two custom fields defined: one being required, one being not required. Clients can use either one, and different clients use different ones based on their own rules (e.g. Client A may not care about First and Last Name, but client B might)
The RoR app must treat each "pair" of properties as only a single property.
Now, here is the query. The problem is it runs beautifully with around 11,000 records. However, the real database has over 40,000 and the query is extremely slow, taking roughly 125 seconds to run which is totally unacceptable from a business perspective. It's absolutely required that we pull this data, and we need to interface with the existing system.
The UserID part is to fake out a Rails-esque foreign key which relates to a Rails table. I'm a SQL Server guy, not a MySQL guy, so maybe someone can point out how to improve this query? They (the business) demand that it be sped up but I'm not sure how to since the various group_concat and ifnull calls are required due to the fact that I need every field for every client and then have to combine the data.
select `ls`.`subscriberid` AS `id`,left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) AS `user_id`,
ifnull(min((case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end)),_utf8'') AS `first_name`,
ifnull(min((case when (`s`.`fieldid` in (3,36)) then `s`.`data` else NULL end)),_utf8'') AS `last_name`,
ifnull(`ls`.`emailaddress`,_utf8'') AS `email_address`,
ifnull(group_concat((case when (`s`.`fieldid` = 81) then `s`.`data` when (`s`.`fieldid` = 154) then `s`.`data` else NULL end) separator ''),_utf8'') AS `mobile_phone`,
ifnull(group_concat((case when (`s`.`fieldid` = 100) then `s`.`data` else NULL end) separator ','),_utf8'') AS `sms_only`,
ifnull(group_concat((case when (`s`.`fieldid` = 34) then `s`.`data` else NULL end) separator ','),_utf8'') AS `mobile_carrier`
from ((`list_subscribers` `ls`
join `lists` `l` on((`ls`.`listid` = `l`.`listid`)))
left join `subscribers_data` `s` on((`ls`.`subscriberid` = `s`.`subscriberid`)))
where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+')
group by `ls`.`subscriberid`,`l`.`name`,`ls`.`emailaddress`
EDIT
I removed the regexp and that sped the query up to about 20 seconds, instead of nearly 120 seconds. If I could remove the group by then it would be faster, but I cannot as removing this causes it to duplicate rows with blank data for each field, instead of aggregating them. For instance:
With group by
id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier
1 1 John Doe jdoe#example.com 5551234567 0 Sprint
Without group by
id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier
1 1 John jdoe#xample.com
1 1 Doe jdoe#example.com
1 1 jdoe#example.com
1 1 jdoe#example.com 5551234567
And so on. What we need is the first result.
EDIT #2
The query still seems to take a long time, but earlier today it was running in only about 20 seconds on the production database. Without changing a thing, the same query is now once again taking over 60 seconds. This is still unacceptable.. any other ideas on how to improve this?
That is, without a doubt, the second most hideous SQL query I have ever laid my eyes on :-)
My advice is to trade storage requirements for speed. This is a common trick used when you find your queries have a lot of per-row functions (ifnull, case and so forth). These per-row functions never scale very well as the table becomes larger.
Create new fields in the table which will hold the values you want to extract and then calculate those values on insert/update (with a trigger) rather than select. This doesn't technically break 3NF since the triggers guarantee data consistency between columns.
The vast majority of database tables are read far more often than they're written so this will amortise the cost of the calculation across many selects. In addition, just about every reported problem with databases is one of speed, not storage.
An example of what I mean. You can replace:
case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end
with:
`s`.`data_2_35`
in your query if your insert/update trigger simply sets the data_2_35 column to data or NULL depending on the value of fieldid. Then you index data_2_35 and, voila, instant speed improvement at the cost of a little storage.
This trick can be done to the five case clauses, the left/regexp bit and the "naked" ifnull function as well (the ifnull functions containing min and group_concat may be harder to do).
The problem is most likely the WHERE condition:
where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+')
This looks like complex string comparison, so no index can be used, which results in a full table scan, possibly for every row in the result set. I am not a MySQL expert, but if you can simplify this into more simple column comparisons it will probably run much faster.
The first thing that jumps out at me as the source of all the trouble:
The PHP app's table is an E-A-V style table...
Trying to convert data in EAV format into conventional relational format on the fly using SQL is bound to be awkward and inefficient. So don't try to smash it into a conventional column-per-attribute format. The following query returns multiple rows per subscriber, one row per EAV attribute:
SELECT ls.subscriberid AS id,
SUBSTRING_INDEX(l.name, _utf8'_', 1) AS user_id,
COALESCE(ls.emailaddress, _utf8'') AS email_address,
s.fieldid, s.data
FROM list_subscribers ls JOIN lists l ON (ls.listid = l.listid)
LEFT JOIN subscribers_data s ON (ls.subscriberid = s.subscriberid
AND s.fieldid IN (2,3,34,35,36,81,100,154)
WHERE SUBSTRING_INDEX(l.name, _utf8'_', 1) REGEXP _utf8'[[:digit:]]+'
This eliminates the GROUP BY which is not optimized well in MySQL -- it usually incurs a temporary table which kills performance.
id user_id email_address fieldid data
1 1 jdoe#example.com 2 John
1 1 jdoe#example.com 3 Doe
1 1 jdoe#example.com 81 5551234567
But you'll have to sort out the EAV attributes in application code. That is, you can't seamlessly use ActiveRecord in this case. Sorry about that, but that's one of the disadvantages of using a non-relational design like EAV.
The next thing that I notice is the killer string manipulation (even after I've simplified it with SUBSTRING_INDEX()). When you're picking substrings out of a column, this says you me that you've overloaded one column with two distinct pieces of information. One is the name and the other is some kind of list-type attribute that you would use to filter the query. Store one piece of information in each column.
You should add a column for this attribute, and index it. Then the WHERE clause can utilize the index:
SELECT ls.subscriberid AS id,
SUBSTRING_INDEX(l.name, _utf8'_', 1) AS user_id,
COALESCE(ls.emailaddress, _utf8'') AS email_address,
s.fieldid, s.data
FROM list_subscribers ls JOIN lists l ON (ls.listid = l.listid)
LEFT JOIN subscribers_data s ON (ls.subscriberid = s.subscriberid
AND s.fieldid IN (2,3,34,35,36,81,100,154)
WHERE l.list_name_contains_digits = 1;
Also, you should always analyze an SQL query with EXPLAIN if it's important for them to have good performance. There's an analogous feature in MS SQL Server, so you should be accustomed to the concept, but the MySQL terminology may be different.
You'll have to read the documentation to learn how to interpret the EXPLAIN report in MySQL, there's too much info to describe here.
Re your additional info: Yes, I understand you can't do away with the EAV table structure. Can you create an additional table? Then you can load the EAV data into it:
CREATE TABLE subscriber_mirror (
subscriberid INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
first_name2 VARCHAR(100),
last_name2 VARCHAR(100),
mobile_phone VARCHAR(100),
sms_only VARCHAR(100),
mobile_carrier VARCHAR(100)
);
INSERT INTO subscriber_mirror (subscriberid)
SELECT DISTINCT subscriberid FROM list_subscribers;
UPDATE subscriber_data s JOIN subscriber_mirror m USING (subscriberid)
SET m.first_name = IF(s.fieldid = 2, s.data, m.first_name),
m.last_name = IF(s.fieldid = 3, s.data, m.last_name),
m.first_name2 = IF(s.fieldid = 35, s.data, m.first_name2),
m.last_name2 = IF(s.fieldid = 36, s.data, m.last_name2),
m.mobile_phone = IF(s.fieldid = 81, s.data, m.mobile_phone),
m.sms_only = IF(s.fieldid = 100, s.data, m.sms_only),
m.mobile_carrer = IF(s.fieldid = 34, s.data, m.mobile_carrier);
This will take a while, but you only need to do it when you get a new data update from the vendor. Subsequently you can query subscriber_mirror in a much more conventional SQL query:
SELECT ls.subscriberid AS id, l.name+0 AS user_id,
COALESCE(s.first_name, s.first_name2) AS first_name,
COALESCE(s.last_name, s.last_name2) AS last_name,
COALESCE(ls.email_address, '') AS email_address),
COALESCE(s.mobile_phone, '') AS mobile_phone,
COALESCE(s.sms_only, '') AS sms_only,
COALESCE(s.mobile_carrier, '') AS mobile_carrier
FROM lists l JOIN list_subscribers USING (listid)
JOIN subscriber_mirror s USING (subscriberid)
WHERE l.name+0 > 0
As for the userid that's embedded in the l.name column, if the digits are the leading characters in the column value, MySQL allows you to convert to an integer value much more easily:
An expression like '123_bill'+0 yields an integer value of 123. An expression like 'bill_123'+0 has no digits at the beginning, so it yields an integer value of 0.