I DISTINCTly hate MySQL (help building a query) - sql

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?

SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`

Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.

Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.

Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c

Related

How can I remove duplicate rows from a table but keeping the summation of values of a column

Suppose there is a table which has several identical rows. I can copy the distinct values by
SELECT DISTINCT * INTO DESTINATIONTABLE FROM SOURCETABLE
but if the table has a column named value and for the sake of simplicity its value is 1 for one particular item in that table. Now that row has another 9 duplicates. So the summation of the value column for that particular item is 10. Now I want to remove the 9 duplicates(or copy the distinct value as I mentioned) and for that item now the value should show 10 and not 1. How can this be achieved?
item| value
----+----------------
A | 1
A | 1
A | 1
A | 1
B | 1
B | 1
I want to show this as below
item| value
----+----------------
A | 4
B | 2
Thanks in advance
You can try to use SUM and group by
SELECT item,SUM(value) value
FROM T
GROUP BY item
SQLfiddle:http://sqlfiddle.com/#!18/fac26/1
[Results]:
| item | value |
|------|-------|
| A | 4 |
| B | 2 |
Broadly speaking, you can just us a sum and a GROUP BY clause.
Something like:
SELECT column1, SUM(column2) AS Count
FROM SOURCETABLE
GROUP BY column1
Here it is in action: Sum + Group By
Since your table probably isn't just two columns of data, here is a slightly more complex example showing how to do this to a larger table: SQL Fiddle
Note that I've selected my rows individually so that I can access the necessary data, rather than using
SELECT *
And I have achieved this result without the need for selecting data into another table.
EDIT 2:
Further to your comments, it sounds like you want to alter the actual data in your table rather than just querying it. There may be a more elegant way to do this, but a simple way use the above query to populate a temporary table, delete the contents of the existing table, then move all the data back. To do this in my existing example:
WITH MyQuery AS (
SELECT name, type, colour, price, SUM(number) AS number
FROM MyTable
GROUP BY name, type, colour, price
)
SELECT * INTO MyTable2 FROM MyQuery;
DELETE FROM MyTable;
INSERT INTO MyTable(name, type, colour, price, number)
SELECT * FROM MyTable2;
DROP TABLE MyTable2;
WARNING: If youre going to try this, please use a development environment first (i.e one you don't mind breaking!) to ensure it does exactly what you want it to do. It's imperative that your initial query captures ALL the data you want.
Here is the SQL Fiddle of this example in action: SQL Fiddle

Querying SQL table with different values in same column with same ID

I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEG T
In this case, the people with ID’s 1 are different (their last name is clearly different) but they have the same ID. How do I query and get the result? The table in this case has millions of rows. If it was a smaller table, I would probably have queried all ID’s with a count > 1 and filtered them in an excel.
What I am trying to do is, get a list of all such ID's which have been assigned to two different users.
Any ideas or help would be very appreciated.
Edit: I dont think I framed the question very well.
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
SELECT
ID
FROM
<<Table>>
GROUP BY
ID
HAVING
COUNT(*) > 1;
SELECT *
FROM myTable
WHERE ID IN (
SELECT ID
FROM myTable
GROUP BY ID
HAVING MAX(LastName) <> MIN(LastName) OR MAX(FirstName) <> MIN(FirstName)
)
ORDER BY ID, LASTNAME

How can I order the rows inside a table by a column (but not the `SELECT`'s response)?

For example, I want to order a table like this
Foo | Bar
---------
1 | a
5 | d
2 | c
1 | b
2 | a
to this:
Foo | Bar
---------
1 | a
1 | b
2 | a
2 | c
5 | d
(ordered by Foo column)
That's because I only want to select the Bars that have a given Foo, and if it's already ordered I guess they will be faster to select because I won't have to use ORDER BY.
And if it's possible, once sorting by columns Foo, I want to sort the rows which have the same Foo by Bar column.
Of course, if I INSERT or UPDATE to table, it should remain ordered.
In SQL, tables are inherently unordered. This is a very important characteristic of databases. For instance, you can delete a row in the middle of a table, and when a new row is inserted, it uses up the space occupied by the deleted row. This is more efficient that just appending rows to the end of the data.
In other words, the order by clause is used basically for output purposes only. Okay, I can think of two other situations . . . with limit (or a related clause) and with window functions (which SQLite does not support).
In any case, ordering the data also would not matter for a query such as this:
select bar
from t
where foo = $FOO
The SQL engine does not "know" that the table is ordered. So, it will start at the beginning of the table and do the comparison for each row.
The way to make this more efficient is by building an index on foo. Then you will be able to get the efficiencies that you want.

how to query with child relations to same table and order this correctly

Take this table:
id name sub_id
---------------------------
1 A (null)
2 B (null)
3 A2 1
4 A3 1
The sub_id column is a relation to his own table, to column ID.
subid --- 0:1 --- id
Now I have the problem to make a correctly SELECT query to show that the child rows (which sub_id is not null) directly selected under his parent row. So this must be a correctly order:
1 A (null)
3 A2 1
4 A3 1
2 B (null)
A normal SELECT order the id. But how or which keyword help me to order this correctly?
JOIN isn't possible I think because I want to get all the rows separated. Because the rows will be displayed on a Gridview (ASP.Net) with EntityDataSource but the child rows must be displayed directly under his parent.
Thank you.
Look at Managing Hierarchical Data in MySQL.
Since recursion is an expensive operation because basicly you're firing multiple queries to your database you could consider using the Nested Set Model. In short you're assigning numbers to ranges in your table. It's a long article but it worth reading it. I've used it during my internship as a solution not to have 1000+ queries, But bring it down to 1 query.
Your handling 'overhead' now lies at the point of updating the table by adding, updating or deleting records. Since you then have to update all the records with a bigger 'right-value'. But when you're retrieving the data, it all goes with 1 query :)
select * from table1 order by name, sub_id will in this case return your desired result but only because the parents names and the child name are similar. If you're using SQL 2005 a recursive CTE will work:
WITH recurse (id, Name, childID, Depth)
AS
(
SELECT id, Name, ISNULL(childID, id) as id, 0 AS Depth
FROM table1 where childid is null
UNION ALL
SELECT table1.id, table1.Name, table1.childID, recurse.Depth + 1 AS Depth FROM table1
JOIN recurse ON table1.childid = recurse.id
)
SELECT * FROM recurse order by childid, depth
SELECT
*
FROM
table
ORDER BY
COALESCE(id,sub_id), id
btw, this will work only for one level.. any thing more than that requires recursive/cte function

How to select 10 rows below the result returned by the SQL query?

Here is the SQL table:
KEY | NAME | VALUE
---------------------
13b | Jeffrey | 23.5
F48 | Jonas | 18.2
2G8 | Debby | 21.1
Now, if I type:
SELECT *
FROM table
WHERE VALUE = 23.5
I will get the first row.
What I need to accomplish is to get the first and the next two rows below. Is there a way to do it?
Columns are not sorted and WHERE condition doesn't participate in the selection of the rows, except for the first one. I just need the two additional rows below the returned one - the ones that were entered after the one which has been returned by the SELECT query.
Without a date column or an auto-increment column, you can't reliably determine the order the records were entered.
The physical order with which rows are stored in the table is non-deterministic.
You need to define an order to the results to do this. There is no guaranteed order to the data otherwise.
If by "the next 2 rows after" you mean "the next 2 records that were inserted into the table AFTER that particular row", you will need to use an auto incrementing field or a "date create" timestamp field to do this.
If each row has an ID column that is unique and auto incrementing, you could do something like:
SELECT * FROM table WHERE id > (SELECT id FROM table WHERE value = 23.5)
If I understand correctly, you're looking for something like:
SELECT * FROM table WHERE value <> 23.5
You can obviously write a program to do that but i am assuming you want a query. What about using a Union. You would also have to create a new column called value_id or something in those lines which is incremented sequentially (probably use a sequence). The idea is that value_id will be incremented for every insert and using that you can write a where clause to return the remaining two values you want.
For example:
Select * from table where value = 23.5
Union
Select * from table where value_id > 2 limit 2;
Limit 2 because you already got the first value in the first query
You need an order if you want to be able to think in terms of "before" and "after".
Assuming you have one you can use ROW_NUMBER() (see more here http://msdn.microsoft.com/en-us/library/ms186734.aspx) and do something like:
With MyTable
(select row_number() over (order by key) as n, key, name, value
from table)
select key, name, value
from MyTable
where n >= (select n from MyTable where value = 23.5)