selecting duplicate columns in psql then sorting by row id - sql

I have tried searching the web and whilst there are plenty of answers for finding duplicates, I am yet to stumble on one that allows me to find all the duplicates within a column (i.e where the same 'name' occurs more than once) and then only select the lowest row id (which would be the first duplicate name entered).
So the table's description (inserted from a file):
create table customer(id int, name varchar,)
id| name
1 | Darren
2 | Mark
3 | Julie
4 | Mark
5 | Julie
The query:
CREATE VIEW AS
SELECT COUNT(name), name
FROM customer
GROUP BY name
HAVING COUNT(name) > 1
Result (the order is never guaranteed, I want Mark to always come first as he has the lowest id):
Julie
Mark
Now the issue is, if i select id I have to include it in the group by. Doing that means no duplicate columns get selected as there wont be any since ever id is unique. And without selecting id I cant ORDER BY desc.
I hope I am clear, if not I can re-word or supply more information.

Please try this? Nested query. Basically the SELECT/GROUP is called. On the outside, we get the information selected and sort it.
CREATE VIEW AS
SELECT CNT_NAME, NAME
FROM
(
SELECT COUNT(name) CNT_NAME, name, min(id) min_id
FROM customer
GROUP BY name
HAVING COUNT(name) > 1
) AS alias
ORDER BY MIN_ID

Related

Identify duplicate fields in a table

I'm trying to identify specific fields that are duplicated in a table in a mariadb-10.4.20 Joomla database. I would like to identify all rows that have a specific field duplicated, then ultimately be able to remove those duplicates, leaving just the one with the highest ID.
This table contains the IDs, titles and aliases for the articles in a joomla website. The script I'm building (in perl) will use this information to print the primary title alias and create redirects for any others.
I was previously using "group by" but it appears there's been a change recently in how it's used, and now it doesn't work properly. I don't understand the new format, and I'm not even sure it was previously working fully.
Here's a basic query that shows there are two of the same articles with different IDs:
MariaDB [mydb]> select id,alias,title from db1_content where title = "article title";
+--------+---------------+--------------+
| id | alias | title |
+--------+---------------+--------------+
| 299959 | unique-title | Unique Title |
| 300026 | unique-title | Unique Title |
+--------+------------------------------+
Here's an attempt at trying to use "group by" but it returns no results.
MariaDB [mydb]> select id,title,count(title) from db1_content group by id,title having count(title) > 1;
Empty set (0.230 sec)
If I run the same query without the id field, then it does return a list of all titles that are duplicated, along with the number of occurrences of each title.
That's not exactly what I want, though. I need it to print the id, alias and title fields so I can reference them in my perl script to subsequently perform another query to ultimately delete the duplicates and create links to be used in RewriteRules.
What am I doing wrong?
Since MariaDB cannot currently delete from a CTE, you could use a derived table to generate row numbers for each title ordered by id descending, JOIN that to your main table and then delete any row which has a row number greater than 1. For example:
DELETE db1 FROM db1_content db1
JOIN (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY title ORDER BY id DESC) AS rn
FROM db1_content
) dbr ON db1.id = dbr.id
WHERE dbr.rn > 1
If you don't want to actually delete the records using SQL, you can just select the ones that need to be deleted by using a CTE:
WITH rns AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY title ORDER BY id DESC) AS rn
FROM db1_content
)
SELECT id, alias, title
FROM rns
WHERE rn > 1
Demo on dbfiddle

Order By clause in sql server

Suppose, there is a table and I need to sort one of its column (name) alphabetically and at the same time I want to sort it by using ID column in asc order based on the condition ( rows that have same name). So, I failed to understand how this will work. Once the records will be sorted by column (name) then will it sort all rows by using id column?
Can someone explain how actually order by clause works in this case
select name,
id
from hack h
order by name,
id
use order by name, id
select name,
id
from hack
order by name,
id
I just tried to understand what you want to know, you want to realize how it happens when the order by clause have two or more columns ,am I right? Let's go to an example,
the first column is id and the second is name,
2 A
5 B
6 A
3 A
1 B
the result of SQL "select name,id from hack order by name,id" will get the result as below
A 2
A 3
A 6
B 1
B 5
see, it will sort first by name column, and then sort id in the same name value group.
That's it ,did I make it clear?
This answers the original question.
In the code you posted:
substring(name, len(name) - 2, len(name))
returns the last 3 characters of the name.
So you are sorting by these last 3 characters and not by name.
When there are 2 names with the same last 3 characters these will be sorted by id.
If there are more than one column names after "order by" keyword, the system orders the records according to the first column just after order by.

Querying SQL table with different values in same column with same ID

I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEG T
In this case, the people with ID’s 1 are different (their last name is clearly different) but they have the same ID. How do I query and get the result? The table in this case has millions of rows. If it was a smaller table, I would probably have queried all ID’s with a count > 1 and filtered them in an excel.
What I am trying to do is, get a list of all such ID's which have been assigned to two different users.
Any ideas or help would be very appreciated.
Edit: I dont think I framed the question very well.
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
SELECT
ID
FROM
<<Table>>
GROUP BY
ID
HAVING
COUNT(*) > 1;
SELECT *
FROM myTable
WHERE ID IN (
SELECT ID
FROM myTable
GROUP BY ID
HAVING MAX(LastName) <> MIN(LastName) OR MAX(FirstName) <> MIN(FirstName)
)
ORDER BY ID, LASTNAME

SQL get differences in one column by ID

It's hard for me to word what I want which is why I've had trouble researching this issue. What I want is to look at a table by id and see if another column changes:
id name
---- ------
1 Al
2 Mia
1 Al
2 Jean
In the example, I don't care about id 1 because the name always stayed as Al but I care about id 2 because there is a record with the name Mia but then, that id 2 also has a record with the name Jean. I was thinking of using group by somehow but that doesn't work. Any ideas?
Try this:
SELECT id
FROM mytable
GROUP BY id
HAVING MIN(name) <> MAX(name)
This will select all ids having at least two different values.

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c