getting unique results without using group by - sql

For some reason, I am not able to use GROUP BY but I can use DISTINCT like this:
SELECT DISTINCT(name), id, email from myTable
However above query lists people with same name also because I am selecting more than one columns whereas I want to select only unique names. Is there someway to get unique names without using GROUP BY ?

Although using GROUP BY is the most direct way, you can do other things if it is prohibited. For example, you can use NOT EXISTS with a subquery, like this:
SELECT name, id, email
FROM myTable t
WHERE NOT EXISTS (SELECT * FROM myTable tt WHERE tt.name=t.name AND tt.id < t.id)
This query uses NOT EXISTS to eliminate rows with the same name and IDs higher than the one selected. Note that since this query must pick a single user per name, it may eliminate some users based on their ID, which is a rather arbitrary criterion.

Related

SQL Query for multiple columns with one column distinct

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

SQL: How to select full rows in each group which matched conditions on some fields

I have one table in postgresql database, for example:
Is there any way to get result as below output with good performance? That means in each group I want get full of rows which matched with some conditions, such as userid=100, also add more fields by aggregate functions
Output (with userid=100 as the condition I want, or other condition):
Note: The data is dynamically, such as the content, seen... field are random
I have used this SQL query, but it only can two fields:
SELECT groupid,
string_agg(text(userid), ', ') AS lst_userids,
FROM t1
GROUP BY groupid
Thanks for any help!
You seem to want something like this:
SELECT min(id) as id, groupid,
string_agg(text(userid), ', ') AS lst_userids,
max(case when seen then content end) as content,
bool_or(seen) as seen
FROM t1
GROUP BY groupid;
I am guessing what the actual logic is, but you can definitely have multiple columns in an aggregation query.

SQL Having on columns not in SELECT

I have a table with 3 columns:
userid mac_address count
The entries for one user could look like this:
57193 001122334455 42
57193 000C6ED211E6 15
57193 FFFFFFFFFFFF 2
I want to create a view that displays only those MAC's that are considered "commonly used" for this user. For example, I want to filter out the MAC's that are used <10% compared to the most used MAC-address for that user. Furthermore I want 1 row per user. This could easily be achieved with a GROUP BY, HAVING & GROUP_CONCAT:
SELECT userid, GROUP_CONCAT(mac_address SEPARATOR ',') AS macs, count
FROM mactable
GROUP BY userid
HAVING count*10 >= MAX(count)
And indeed, the result is as follows:
57193 001122334455,000C6ED211E6 42
However I really don't want the count-column in my view. But if I take it out of the SELECT statement, I get the following error:
#1054 - Unknown column 'count' in 'having clause'
Is there any way I can perform this operation without being forced to have a nasty count-column in my view? I know I can probably do it using inner queries, but I would like to avoid doing that for performance reasons.
Your help is very much appreciated!
As HAVING explicitly refers to the column names in the select list, it is not possible what you want.
However, you can use your select as a subselect to a select that returns only the rows you want to have.
SELECT a.userid, a.macs
FROM
(
SELECT userid, GROUP_CONCAT(mac_address SEPARATOR ',') AS macs, count
FROM mactable
GROUP BY userid
HAVING count*10 >= MAX(count)
) as a
UPDATE:
Because of a limitation of MySQL this is not possible, although it works in other DBMS like Oracle.
One solution would be to create a view for the subquery. Another solution seems cleaner:
CREATE VIEW YOUR_VIEW (userid, macs) AS
SELECT userid, GROUP_CONCAT(mac_address SEPARATOR ',') AS macs, count
FROM mactable
GROUP BY userid
HAVING count*10 >= MAX(count)
This will declare the view as returning only the columns userid and macs although the underlying SELECT statement returns more columns than those two.
Although I am not sure, whether the non-DBMS MySQL supports this or not...

DISTINCT pulling duplicate column values

The following query is pulling duplicate site_ids, with me using DISTINCT I can't figure out why...
SELECT
DISTINCT site_id,
deal_woot.*,
site.woot_off,
site.name AS site_name
FROM deal_woot
INNER JOIN site ON site.id = site_id
WHERE site_id IN (2, 3, 4, 5, 6)
ORDER BY deal_woot.id DESC LIMIT 5
DISTINCT looks at the entire record, not just the column directly after it. To accomplish what you want, you'll need to use GROUP BY:
Non-working code:
SELECT
site_id,
deal_woot.*,
site.woot_off,
site.name AS site_name
FROM deal_woot
INNER JOIN site ON site.id = site_id
WHERE site_id IN (2, 3, 4, 5, 6)
GROUP BY site_id
Why doesn't it work? If you GROUP BY a column, you should use an aggregate function (such as MIN or MAX) on the rest of the columns -- otherwise, if there are multiple site_woot_offs for a given site_id, it's not clear to SQL which of those values you want to SELECT.
You will probably have to expand deal_woot.* to list each of its fields.
Side-note: If you're using MySQL, I believe it's not technically necessary to specify an aggregate function for the remaining columns. If you don't specify an aggregate function for a column, it chooses a single column value for you (usually the first value in the result set).
Your query is returning DISTINCT rows, it is not just looking at site_id. In other words, if any of the columns are different, a new row is returned from this query.
This makes sense, because if you actually do have differences, what should the server return as values for deal_woot.* ? If you want to do this, you need to specify this - perhaps done by getting distinct site_id's, then getting LIMIT 1 of the other values in a subquery with an appropiate ORDER BY clause.
You are selecting distinct value from one table only. When you join with the other table it will pull all rows that match each of your distinct value from the other table, causing duplicate id's
If you want to select site info and a single row from deal_woot table with the same site_id, you need to use a different query. For example,
SELECT site.id, deal_woot.*, site.woot_off, site.name
FROM site
INNER JOIN
(SELECT site_id, MAX(id) as id FROM deal_woot
WHERE site_id IN (2,3,4,5,6) GROUP BY site_id) X
ON (X.site_id = site.id)
INNER JOIN deal_woot ON (deal_woot.id = X.id)
WHERE site.id IN (2,3,4,5,6);
This query should work regardless of sql dialect/db vendor. For mysql, you can just add group by site_id to your original query, since it lets you use GROUP BY without aggregate functions.
** I assume that deal_woot.id and site.id are primary keys for deal_woot and site tables respectively.

Why shouldn’t you use DISTINCT when you could use GROUP BY?

According to tips from MySQL performance wiki:
Don't use DISTINCT when you have or could use GROUP BY.
Can somebody post example of queries where GROUP BY can be used instead of DISTINCT?
If you know that two columns from your result are always directly related then it's slower to do this:
SELECT DISTINCT CustomerId, CustomerName FROM (...)
than this:
SELECT CustomerId, CustomerName FROM (...) GROUP BY CustomerId
because in the second case it only has to compare the id, but in the first case it has to compare both fields. This is a MySQL specific trick. It won't work with other databases.
SELECT Code
FROM YourTable
GROUP BY Code
vs
SELECT DISTINCT Code
FROM YourTable
The basic rule : Put all the columns from the SELECT clause into the GROUP BY clause
so
SELECT DISTINCT a,b,c FROM D
becomes
SELECT a,b,c FROM D GROUP BY a,b,c
Example.
Relation customer(ssnum,name, zipcode, address) PK(ssnum). ssnum is social security number.
SQL:
Select DISTINCT ssnum from customer where zipcode=1234 group by name
This SQL statement returns unique records for those customer's that have zipcode 1234. At the end results are grouped by name.
Here DISTINCT is no not necessary. because you are selecting ssnum which is already unique because ssnun is primary key. two person can not have same ssnum.
In this case Select ssnum from customer where zipcode=1234 group by name will give better performance than "... DISTINCT.......".
DISTINCT is an expensive operation in a DBMS.