How to get distinct values in one column with multiple possible values in the other? - sql

I'm trying to work out if this is possible, let me give an example. Would be awesome if you could guide me in the right direction please.
Table = names
--------------------
Marks & Spencer
Marks & Spencer
marks & spencer
What I am trying to do is to return distinct values where I have converted all & signs and changed to upper case.
So my query is:
SELECT regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name FROM names GROUP BY names;
However, what I would like to do is also return one of the original values, it does not matter which one, but I only want 1 row to be returned, like
Result
----------------
name original
------------------------
MARKS&SPENCER Marks & Spencer
Is this possible? Because at the moment, what I get returned is this:
Result
----------------
name original
------------------------
MARKS&SPENCER Marks & Spencer
MARKS&SPENCER Marks & Spencer
MARKS&SPENCER marks & spencer
Thank you for reading, would really appreciate the help.
==========
EDIT
The query I am using to get the above result is:
SELECT names.name, T.result FROM names
INNER JOIN
(
SELECT DISTINCT regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS result FROM names
) AS T
ON regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')=T.result
GROUP BY T.result, names.name
ORDER BY T.result ASC
I am using PostgreSQL btw, which can do more than MySQL incase that changes things?

You need to group by the new name to get only one row and, as you don't care which original name appears, aggregate it with something like min:
SELECT min(name),regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM names
GROUP BY regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')

There is still room for improvement:
SELECT regexp_replace(upper(name), E'&(?:AMP;)+|\\+', '&', 'g') AS name
, min(name) AS min_org_name
-- , string_agg(name) AS org_names -- if you want a list of originals
-- , array_to_string(array_agg(name), ', ') AS org_names -- for pg < 9.0+
, count(*) AS ct
FROM (
SELECT *
FROM (VALUES
('Marks & Spencer')
, ('Marks & Spencer')
, ('marks & spencer')
, ('marks & speNceR + sons')
, ('marks &amp;AMP; speNceR & sons')
) AS names(name)
) name
GROUP BY 1;
Major points
Improve regexp:
replace &(amp;)* with identical &(amp;)+
after use of upper() on the original, the 'i' flag only slows execution. Rather upper case pattern, too: &(AMP;)+
Use non-capturing parenthesis: (?:)
As you use a escape sequence \\+, use proper syntax E''
Simplify GROUP BY with positional parameter, no need to spell it out twice

At present you're grouping by the original field (you can't group by a field in your select).
Do you want one of these?
SELECT DISTINCT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
Or...
SELECT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
GROUP BY
name,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')
Or...
SELECT
original,
name
FROM
(
SELECT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
)
AS clean_data
GROUP BY
original,
name

Related

SQL code to get the first letter of two strings

I am trying to get the first letter of a single name and full name.
For example
Name
Alex Patterson
Alex
Output should be
A P
A
Can someone help me with to achieve this?
For recent SQL Server:
SELECT Name, STRING_AGG (initial, ' ') as full_initial
FROM (
SELECT Name, SUBSTRING(value,1,1) as initial
FROM people
CROSS APPLY STRING_SPLIT(name, ' ')
) t
GROUP BY Name

Split string into words using Postgres

I am looking for some help in separating scientific names in my data. I want to take only the genus names and group them, but they are both connected in the same column. I saw the SQL Sever had a CHARINDEX command, but PostgreSQL does not. Does there need to be a function created for this? If so, how would it look?
I want to change 'Mallotus philippensis' to just 'Mallotus' or to just 'philippensis'
I am currently using Postgres 11, 12.
Use SPLIT_PART:
WITH yourTable AS (
SELECT 'Mallotus philippensis'::text AS genus
)
SELECT
SPLIT_PART(genus, ' ', 1) AS genus,
SPLIT_PART(genus, ' ', 2) AS species
FROM yourTable;
Demo
Probably string_to_array will be slightly more efficient than split_part here because string splitting will be done only once for each row.
SELECT
val_arr[1] AS genus,
val_arr[2] AS species
FROM (
SELECT string_to_array(val, ' ') as val_arr
FROM (
VALUES
('aaa bbb'),
('cc dddd'),
('e fffff')
) t (val)
) tt;

Using string methods in a SELECT query to select up to the second space?

In an MS-Access database I'm working with, one of the tables has a field called "Name". The format of this field will generally be along the lines of "firstname surname integer", but sometimes may just be "firstname surname".
I need to select just the first name and the surname from the name field.
I've looked at using the Left function
SELECT DISTINCT LEFT([Name], x)
However since names are different lengths, this isn't going to work since there is no constant integer to use as the second parameter. Nor can it be used with
SELECT DISTINCT LEFT(InStr([Name], " "), x)
for the above reason, but also because because that would split the field at the first space.
Is there a way using LEFT, TRIM, SPLIT or any other string manipulation that I can create a query to select just the first two parts of the name? I need the space included.
You can try this.
SELECT DISTINCT IIf( ( InStr( InStr([Name],' ') + 1 , [Name], ' ') > 0 ), Left( [Name], InStr(InStr([Name],' ') + 1 , [Name], ' ') ), [Name])
FROM MyTable;

Use of substring in SQL

My query is the following:
SELECT id, category FROM table1
This returns the following rows:
ID|category
1 |{IN, SP}
2 |
3 |{VO}
Does anyone know how i can remove the first char and last char of the string in PostgreSQL, so it removes: {}?
Not sure, what you mean with "foreign column", but as the column is an array, the best way to deal with that is to use array_to_string()
SELECT id, array_to_string(category, ',') as category
FROM table1;
The curly braces are not part of the stored value. This is just the string representation of an array that is used to display it.
Either using multiple REPLACE functions.
SELECT id, REPLACE(REPLACE(category, '{', ''), '}', '')
FROM table1
Or using a combination of the SUBSTRING, LEFT & LENGTH functions
SELECT id, LEFT(SUBSTRING(category, 2, 999),LENGTH(SUBSTRING(category, 2, 999)) - 1)
FROM table1
Or just SUBSTRING and LENGTH
SELECT id, SUBSTRING(category, 2, LENGTH(category)-2)
FROM table1
You could replace the {} with an empty string
SELECT id, replace(replace(category, '{', ''), '}', '') FROM table1
select id,
substring(category,charindex('{',category,len(category))+2,len(category)-2)
from table1;
select id
,left(right(category,length(category)-1),length(category)-2) category
from boo
select id
,trim(both '{}' from category)
from boo
Trim():
Remove the longest string containing only the characters (a space by
default) from the start/end/both ends of the string
The syntax for the replace function in PostgreSQL is:
replace( string, from_substring, to_substring )
Parameters or Arguments
string
The source string.
from_substring
The substring to find. All occurrences of from_substring found within string are replaced with to_substring.
to_substring
The replacement substring. All occurrences of from_substring found within string are replaced with to_substring.
UPDATE dbo.table1
SET category = REPLACE(category, '{', '')
WHERE ID <=3
UPDATE dbo.table1
SET category = REPLACE(category, '}', '')
WHERE ID <=3

Changing LastName,FirstName to LastName,FirstInitial

I'm sure this is super easy, but how would I go about converting LastName,FirstName to LastName,FirstInitial?
For example changing Smith,John to Smith,J or Johnson,John to Johnson,J etc.
Thank You!
In case of LastName and FirstName columns:
select LastName,substr(FirstName,1,1)
from mytable
;
In case of a fullname saved in a single column:
select substr(fullname,1,instr(fullname || ',',',')-1) || substr(fullname,instr(fullname || ',',','),2)
from mytable
;
or
select regexp_replace (fullname,'([^,]*,?)(.).*','\1\2')
from mytable
;
Here is one way, using just "standard" instr and substr. Assuming your input is a single string in the format 'Smith,John':
select substr(fullname, 1, instr(fullname, ',')+1) from yourtable;
yourtable is the name of the table, and fullname is the name of the column.
instr(fullname, ',') finds the position of the comma within the input string (it would be 6 in 'Smith,John'); thensubstrtakes the substring that begins at the first position (the1in the function call) and ends at the position calculated byinstr`, PLUS 1 (to get the first initial as well).