Group by portion of field - sql

I have a field in a PostgreSQL table, name, with this format:
JOHN^DOE
BILLY^SMITH
FIRL^GREGOIRE
NOEL^JOHN
and so on. The format is LASTNAME^FIRSTNAME. The table has ID, name, birthdate and sex fields.
How can I do a SQL statement with GROUP BY FIRSTNAME only ? I have tried several things, and I guess regexp_match could be the way, but I don't know how to write a correct regular expression for this task. Can you help me ?

I would recommend split_part():
group by split_part(mycol, '^', 1)
Demo on DB Fiddle:
mycol | split_part
:------------ | :---------
JOHN^DOE | JOHN
BILLY^SMITH | BILLY
FIRL^GREGOIRE | FIRL
NOEL^JOHN | NOEL

Use regexp_replace. Note that '^' needs to be escaped, since in many regexp dialects it means the beginning of the line or or the string. Extending your example with one more name, and using group by on the first field:
select
count(*)
, regexp_replace(tmp_col, '\^.*', '')
from
(values
('JOHN^DOE')
, ('BILLY^SMITH')
, ('FIRL^GREGOIRE')
, ('NOEL^JOHN')
, ('JOHN^SMITH')
)
as tmp_table(tmp_col)
group by regexp_replace(tmp_col, '\^.*', '')
;
Prints:
count | regexp_replace
-------+----------------
1 | BILLY
2 | JOHN
1 | NOEL
1 | FIRL
(4 rows)
To group by on the second field, use a similar regex:
select
count(*)
, regexp_replace(tmp_col, '.*\^', '')
from
(values
('JOHN^DOE')
, ('BILLY^SMITH')
, ('FIRL^GREGOIRE')
, ('NOEL^JOHN')
, ('JOHN^SMITH')
)
as tmp_table(tmp_col)
group by regexp_replace(tmp_col, '.*\^', '')
;
Prints:
count | regexp_replace
-------+----------------
1 | JOHN
1 | GREGOIRE
1 | DOE
2 | SMITH
(4 rows)

Related

SQL group by splitted data

I have a table that looks like this:
Name | Temperament
----------------------------------------
"Husky" | "Smart, Loyal, Cute"
"Poodle"| "Smart, Cute"
"Golden"| "Cute, Loyal, Caring, Loving"
And I want to project this data as a group by of the temperaments.
For example:
Temperament | Name | Count(Optional)
-----------------------------------------------------------
"Smart" | "Poodle", "Husky" | 2
"Loyal" | "Husky", "Golden" | 2
"Cute" | "Poodle", "Golden", "Husky" | 3
"Caring" | "Golden" | 1
"Loving" | "Golden" | 1
My problem is that I couldn't find a way to split the string in my table and manipulate this data.
It would be great if anyone can help me with this problem.
If pure SQL can't be done it might be helpful to tell that I'm using Entity Framework and if the solution can be written in it, it might be even better.
Thank you all.
This can be done in pure SQL. In Oracle, you can use regexp functions and a regular expression to split the delimited strings, then use string aggregation to generate the list of names per temperament:
with cte (name, temperament, temp, cnt, lvl) as (
select
name,
temperament,
regexp_substr (temperament, '[^, ]+', 1, 1) temp,
regexp_count(temperament, ',') cnt,
1 lvl
from mytable
union all
select
name,
temperament,
regexp_substr (temperament, '[^, ]+', 1, lvl + 1),
cnt,
lvl + 1
from cte
where lvl <= cnt
)
select
temp temperament,
listagg(name, ', ') within group(order by name) name,
count(*) cnt
from cte
group by temp
order by 1
Demo on DB Fiddle:
TEMPERAMENT | NAME | CNT
:---------- | :-------------------- | --:
Caring | Golden | 1
Cute | Golden, Husky, Poodle | 3
Loving | Golden | 1
Loyal | Golden, Husky | 2
Smart | Husky, Poodle | 2
If anyone needs the answer:
var result = (from t in ((from t1 in db.mytables select new {tmp= t1.TEMP1}).Concat(from t2 in db.mytables select new {tmp= t2.TEMP2}).Concat(from t3 in db.mytables select new {tmp= t3.TEMP3})) group t.tmp by t.tmp into g select new { tmp = g.Key, cnx=g.Count()}).ToList();
Hope it'll help someone!

Get Count Between Comma and First Character

In my table, the values appear like this in a column:
Names
----------------
Doe,John P
Woods, Adam
Hart, Keeve
Hensen,Sarah J
Is it possible to get a count of space between the comma and first character after it? Expected result:
Names |Count_of_spaces_before_next_character
-----------------|--------------------------------------
Doe,John P | 0
Woods, Adam | 1
Hart, Keeve | 5
Hensen,Sarah J | 0
Thanks for any direction, much appreciate it !
You may try with the following statement:
Table:
CREATE TABLE Data (Names varchar(1000))
INSERT INTO Data
(Names)
VALUES
('Doe,John P'),
('Woods, Adam'),
('Hart, Keeve'),
('Hensen,Sarah J')
Statement:
SELECT Names, LEN(After) - LEN(LTRIM(After)) AS [Count]
FROM (
SELECT
Names,
RIGHT(Names, LEN(Names) - CHARINDEX(',', Names)) AS After
FROM Data
) t
Result:
Names Count
Doe,John P 0
Woods, Adam 1
Hart, Keeve 5
Hensen,Sarah J 0
You can remove everything up to the comma and then measure the length after trimming off the spaces:
select len(rest) - len(ltrim(rest))
from t cross apply
(values (stuff(name, 1, charindex(',', name + ','), ''))
) v(rest);
The + ',' handles the case where there is no comma in name.
Here is a db<>fiddle.
Updated
Even shorter & Cleaner
select names
,patindex('%[^ ]%',stuff(names,1,charindex(',',names),''))-1 as Count_of_spaces_before_next_character
from mytab
-
+----------------+---------------------------------------+
| names | Count_of_spaces_before_next_character |
+----------------+---------------------------------------+
| Doe,John P | 0 |
| Woods, Adam | 1 |
| Hart, Keeve | 5 |
| Hensen,Sarah J | 0 |
| Hello World | 0 |
+----------------+---------------------------------------+
SQL Fiddle
Building off of Gordon's answer. If you are not familiar with table constructor approach in cross apply, you might find this more readable. Hope the select in cross apply clears your confusion around how it can behave like a subquery. Also, I don't think you need to pad the name column with an additional ',' because charindex will return 0 if it doesn't find that character.
select name, len(rest) - len(ltrim(rest))
from t1
cross apply (select stuff(name, 1, charindex(',', name), '') as rest) t2
Logic: LEN(string after first comma) - LEN(LTRIM(string after first comma))
SELECT Name, LEN (SUBSTRING(Names,CHARINDEX(',',Names,1)+1 ,LEN(Names) - CHARINDEX(',',Names,1)) ) - LEN(LTRIM(SUBSTRING(Names,CHARINDEX(',',Names,1)+1 , LEN(Names) - CHARINDEX(',',Names,1))))
FROM #Table

Concat multiple rows PSQL

id | name | Subject | Lectured_Times | Faculty
3258132 | Chris Smith | SATS1364 | 10 | Science
3258132 | Chris Smith | ECTS4605 | 9 | Engineering
How would I go about creating the following
3258132 Chris Smith SATS1364, 10, Science + ECTS4605, 9,Engineering
where the + is just a new line. Notice how after the '+'(new line) it doesnt concat the id,name
try
SELECT distinct concat(id,"name",string_agg(concat(subject, Lectured_Times , Faculty), chr(10)))
from tn
where id = 3258132
group by id;
As mentioned above string_agg is perfect solution for this.
select
id, name, string_agg(concat(subject, Lectured_Times, Faculty), '\n')
from table
group by id, name

Search an SQL table that already contains wildcards?

I have a table that contains patters for phone numbers, where x can match any digit.
+----+--------------+----------------------+
| ID | phone_number | phone_number_type_id |
+----+--------------+----------------------+
| 1 | 1234x000x | 1 |
| 2 | 87654311100x | 4 |
| 3 | x111x222x | 6 |
+----+--------------+----------------------+
Now, I might have 511132228 which will match with row 3 and it should return its type. So, it's kind of like SQL wilcards, but the other way around and I'm confused on how to achieve this.
Give this a go:
select * from my_table
where '511132228' like replace(phone_number, 'x', '_')
select *
from yourtable
where '511132228' like (replace(phone_number, 'x','_'))
Try below query:
SELECT ID,phone_number,phone_number_type_id
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
Query with test data:
With TableName as
(
SELECT 3 ID, 'x111x222x' phone_number, 6 phone_number_type_id from dual
)
SELECT 'true' value_available
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
The above query will return data if pattern match is available and will not return any row if no match is available.

MySQL GROUP_CONCAT headache

For performance,I need to set a limit for the GROUP_CONCAT,
and I need to know if there are rows not included.
How to do it?
EDIT
Let me provide a contrived example:
create table t(qid integer unsigned,name varchar(30));
insert into t value(1,'test1');
insert into t value(1,'test2');
insert into t value(1,'test3');
select group_concat(name separator ',')
from t
where qid=1;
+----------------------------------+
| group_concat(name separator ',') |
+----------------------------------+
| test1,test2,test3 |
+----------------------------------+
But now,I want to group 2 entries at most,and need to know if there is some entry not included in the result:
+----------------------------------+
| group_concat(name separator ',') |
+----------------------------------+
| test1,test2 |
+----------------------------------+
And I need to know that there is another entry left(in this case it's "test3")
this should do the trick
SELECT
SUBSTRING_INDEX(group_CONCAT(name) , ',', 2) as list ,
( if(count(*) > 2 , 1 , 0)) as more
FROM
t
WHERE
qid=1
How are you going to set the limit? And what performance issues will it solve?
You can get the number of rows in a group using count(*) and compare it to the limit.