SQL: Sorting By Email Domain Name - sql

What is the shortest and/or efficient SQL statement to sort a table with a column of email address by it's DOMAIN name fragment?
That's essentially ignoring whatever is before "#" in the email addresses and case-insensitive. Let's ignore the internationalized domain names for this one.
Target at: mySQL, MSSQL, Oracle
Sample data from TABLE1
id name email
------------------------------------------
1 John Doe johndoe#domain.com
2 Jane Doe janedoe#helloworld.com
3 Ali Baba ali#babaland.com
4 Foo Bar foo#worldof.bar.net
5 Tarrack Ocama me#am-no-president.org
Order By Email
SELECT * FROM TABLE1 ORDER BY EMAIL ASC
id name email
------------------------------------------
3 Ali Baba ali#babaland.com
4 Foo Bar foo#worldof.bar.net
2 Jane Doe janedoe#helloworld.com
1 John Doe johndoe#domain.com
5 Tarrack Ocama me#am-no-president.org
Order By Domain
SELECT * FROM TABLE1 ORDER BY ?????? ASC
id name email
------------------------------------------
5 Tarrack Ocama me#am-no-president.org
3 Ali Baba ali#babaland.com
1 John Doe johndoe#domain.com
2 Jane Doe janedoe#helloworld.com
4 Foo Bar foo#worldof.bar.net
EDIT:
I am not asking for a single SQL statement that will work on all 3 or more SQL engines. Any contribution are welcomed. :)

Try this
Query(For Sql Server):
select * from mytbl
order by SUBSTRING(email,(CHARINDEX('#',email)+1),1)
Query(For Oracle):
select * from mytbl
order by substr(email,INSTR(email,'#',1) + 1,1)
Query(for MySQL)
pygorex1 already answered
Output:
id name email
5 Tarrack Ocama me#am-no-president.org
3 Ali Baba ali#babaland.com
1 John Doe johndoe#domain.com
2 Jane Doe janedoe#helloworld.com
4 Foo Bar foo#worldof.bar.net

For MySQL:
select email, SUBSTRING_INDEX(email,'#',-1) AS domain from user order by domain desc;
For case-insensitive:
select user_id, username, email, LOWER(SUBSTRING_INDEX(email,'#',-1)) AS domain from user order by domain desc;

If you want this solution to scale at all, you should not be trying to extract sub-columns. Per-row functions are notoriously slow as the table gets bigger and bigger.
The right thing to do in this case is to move the cost of extraction from select (where it happens a lot) to insert/update where it happens less (in most normal databases). By incurring the cost only on insert and update, you greatly increase the overall efficiency of the database, since that's the only point in time where you need to do it (i.e., it's the only time when the data changes).
In order to achieve this, split the email address into two distinct columns in the table, email_user and email_domain). Then you can either split it in your application before insertion/update or use a trigger (or pre-computed columns if your DBMS supports it) in the database to do it automatically.
Then you sort on email_domain and, when you want the full email address, you use email_name|'#'|email_domain.
Alternatively, you can keep the full email column and use a trigger to duplicate just the domain part in email_domain, then you never need to worry about concatenating the columns to get the full email address.
It's perfectly acceptable to revert from 3NF for performance reasons provided you know what you're doing. In this case, the data in the two columns can't get out of sync simply because the triggers won't allow it. It's a good way to trade disk space (relatively cheap) for performance (we always want more of that).
And, if you're the sort that doesn't like reverting from 3NF at all, the email_name/email_domain solution will fix that.
This is also assuming you just want to handle email addresses of the form a#b - there are other valid email addresses but I can't recall seeing any of them in the wild for years.

For SQL Server, you could add a computed column to your table with extracts the domain into a separate field. If you persist that column into the table, you can use it like any other field and even put an index on it, to speed things up, if you query by domain name a lot:
ALTER TABLE Table1
ADD DomainName AS
SUBSTRING(email, CHARINDEX('#', email)+1, 500) PERSISTED
So now your table would have an additional column "DomainName" which contains anything after the "#" sign in your e-mail address.

Assuming you really must cater for MySQL, Oracle and MSSQL .. the most efficient way might be to store the account name and domain name in two separate fields. The you can do your ordering:
select id,name,email from table order by name
select id,name,email,account,domain from table order by email
select id,name,email,account,domain from table order by domain,account
as donnie points out, string manipulation functions are non standard .. that is why you will have to keep the data redundant!
I've added account and domain to the third query, since I seam to recall not all DBMSs will sort a query on a field that isn't in the selected fields.

This will work with Oracle:
select id,name,email,substr(email,instr(email,'#',1)+1) as domain
from table1
order by domain asc

For postgres the query is:
SELECT * FROM table
ORDER BY SUBSTRING(email,(position('#' in email) + 1),252)
The value 252 is the longest allowed domain (since, the max length of an email is 254 including the local part, the #, and the domain.
See this for more details: What is the maximum length of a valid email address?

You are going to have to use the text manipulation functions to parse out the domain. Then order by the new column.

MySQL, an intelligent combination of right() and instr()
SQL Server, right() and patindex()
Oracle, instr() and substr()
And, as said by someone else, if you have a decent to high record count, wrapping your email field in functions in you where clause will make it so the RDBMS can't use any index you might have on that column. So, you may want to consider creating a computed column which holds the domain.

If you have million records, I suggest you to create new column with domain name only.

My suggestion would be (for mysql):
SELECT
LOWER(email) AS email,
SUBSTRING_INDEX(email, '#', + 1) AS account,
REPLACE(SUBSTRING_INDEX(email, '#', -1), CONCAT('.',SUBSTRING_INDEX(email, '.', -1)),'') -- 2nd part of mail - tld.
AS domain,
CONCAT('.',SUBSTRING_INDEX(email, '.', -1)) AS tld
FROM
********
ORDER BY domain, email ASC;
And then just add a WHERE...

The original answer for SQL Server didn't work for me....
Here is a version for SQL Server...
select SUBSTRING(email,(CHARINDEX('#',email)+1),len(email)), count(*)
from table_name
group by SUBSTRING(email,(CHARINDEX('#',email)+1),len(email))
order by count(*) desc

work smarter not harder:
SELECT REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING(emails.email, POSITION('#' IN emails.email)+1)),'.',2)) FROM emails

Related

SQL: Taking one column from two tables and putting them into one predefined table

Just a little bug off my shoulder, but for what I'm using this code for, it is not the end of the world if this one doesn't get answered. To preface, a few things: I know this is entirely improper, I know this should never be used -- let alone, done -- in a production environment, and I know that the root of this operation is totally unconventional, but I'm asking anyway:
If I have two tables with a set of values that I am looking to grab and put into one other, combined and predefined table, side by side, how might I do that?
Right now, I have two statements doing
INSERT INTO table ('leftCol') SELECT NAME FROM smolT1 ORDER BY num DESC LIMIT 3
INSERT INTO table ('rightCol') SELECT NAME FROM smolT2 ORDER BY num DESC LIMIT 3
but, as one would imagine, that query ends up with something like...
leftCol | rightCol
Jack |
James |
John |
| Jill
| Justina
| Jesebelle
and of course, it would be much more preferred if the left and right column lined up, though, for the sake of gathering just those six records, I suppose it is not too big of a concern.
To add on, yes, these two tables do have a NAME in common, but with how I am querying them, they are totally irrelevant one another and should not be associated with one another, just displayed side by side.
I am simply curious as to whether or not one query would get these two unrelated queries to work together and print neatly into a form or if I just have to live with this data looking like this.
Cheers!
The most recent versions of SQLite support window functions. This allows you to do:
select min(name1) as name1, min(name2) as name2
from ((select name as name1, null as num2 row_number() over (order by name) as seqnum
from smolt1
where name is not null
) union all
(select null, name, row_number() over (order by name) as seqnum
from smolt2
where name is not null
)
) lr
group by seqnum;

SQL Select statement separate values into two columns

I have string value in the column Username.
Sample usernames:
USERNAME
--------
foobar123
john
smith23
steve
peter
king213
The user names with numbers at the end means that these users are no longer active. I want to separate these usernames into two columns Active and Not_Active in one Select Statement since i'll be using these for reports purposes.
Result should be:
Active Not_Active
john foobar123
steve smith23
peter king213
Query:
SELECT
Username,
(
CASE Username
WHEN '%[0-9]%' THEN 'Not'
ELSE 'Active'
END
)
FROM
Users;
I tried Case but I don't know how to get the username value.
Expanding on my comment reply to your original posting, you don't want the data in the same output result-set; instead you will want two result sets (i.e. two tables), each with the different criteria.
Note that you cannot use LIKE string matching (%) with the WHEN statement in SQL. You have to use it in a CASE WHEN statement (one without a "switch" expression)
-- Result set one: Active users
SELECT
UserName
FROM
Users
WHERE
UserName NOT LIKE '%[0-9]';
-- Result set two: Inactive users
SELECT
UserName
FROM
Users
WHERE
UserName LIKE '%[0-9]';
If you really want, you can combine these two queries into a single result-set with the data in different columns. This would be done by adding a ROW_NUMBER() column to each intermediate table, then doing a FULL OUTER JOIN on ROW_NUMBER(), however the output result would be meaningless and painful to iterate over in any consuming client code.
Another option might be a single result-set, with a computed IsActive column:
SELECT
UserName,
( CASE WHEN UserName NOT LIKE '%[0-9]' THEN 1 ELSE 0 END ) AS IsActive
FROM
Users
...which would be considerably easier to process in any consuming code.
this format of saving data is quite useful in this one column holds username and other holds status of the user
select username,
case username
when '%[0-9]% then inactive
else active
end as user_status
from table_1
username user_status
john active
john123 inactive

SQL for Middle Value Rather than MIN/MAX or FIRST/LAST

Is there a SQL function to return the middle value of three?
For example, assume I have a table with people who have three cars, sorted alphabetically by AutoMaker.
John: Ford
John: Honda
John: VW
then
MIN(AutoMaker) returns Ford.
MAX(AutoMaker) returns VW.
Is there a similar SQL function that will return Honda?
I am working with MS Access and Oracle.
Thank you.
Short answer: No. It's too specific.
Longer answer: It's too specific. Hence, the "middle" in what you said is actually the second record. But if you had 5 records, it would be the third, and so on. If you need that in practice, just assign a row number to each row (Oracle, Access) and then select the ((n+1)/2)nd row (WHERE row_number = (n+1)/2).
PS - which is the middle row if you have 4 rows? :)
The query could be something like this
select row_id, Field1
FROM tbl
where row_id = (select cInt(count(Field1)/2) from tbl)
The problem in access is that you do not have a row_number you would need to add a row_id to the table and then populate row_id 1,2,3,4 (ordered on Field1)

Counting Distinct Values in large dataset (40M rows): SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;

CREATE TABLE `names` ( `name` varchar(20) );
Assume the names table contains all 40 million first names of everyone living in California (for example).
SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;
How can I optimize this query?
Expected Result:
count | name
9999 | joe
9995 | mike
9990 | kate
.... | ....
2 | kal-el
You have to create an index on the name column of your table. The query is as good as it can be.
Well, what makes you think it's not already optimised? This looks like the sort of query a good database engine should be able to handle relatively easily - particularly if you've got an appropriate index on your table.
Do you actually have a bottleneck here, or are you worrying about something that might happen in the future? If it's the latter, I suggest you try it with your RDBMS (by generating dummy data), and see what happens.

How to fetch values with a MySQL query?

I want to fetch all the records of First_Name, LastName, First Name Last Name in a mysql Query.
For example,
mytable looks like this:
rec Id First Name Last Name
1 Gnaniyar Zubair
2 Frankyn Albert
3 John Mathew
4 Suhail Ahmed
Output should be like this:
Gnaniyar Zubair, Frankyn Albert, John Mathew, Suhail Ahmed
Give me the SQL.
If this must the done in the query, you can use GROUP_CONCAT, but unless you're grouping by something it's a pretty silly query and the concatenation should really be done on the client.
SELECT GROUP_CONCAT(FirstName + ' ' + LastName
ORDER BY FirstName, LastName
SEPARATOR ', ') AS Names
FROM People;
It is not a matter of getting one row with all the records, but a matter of representation of data. Therefore, I suggest to take a simple SELECT query, take the records you need, then arrange them in the view layer as you like.
On the other hand, why do you need to solve this record concatenation at SQL level and not on view level?
If you wanted to get them in just one row, you're probably not using your database properly.
If you just want to join together the first and last names, that's easy:
SELECT CONCAT(`First Name`, ' ', `Last Name`) FROM mytable