SQL group by splitted data - sql

I have a table that looks like this:
Name | Temperament
----------------------------------------
"Husky" | "Smart, Loyal, Cute"
"Poodle"| "Smart, Cute"
"Golden"| "Cute, Loyal, Caring, Loving"
And I want to project this data as a group by of the temperaments.
For example:
Temperament | Name | Count(Optional)
-----------------------------------------------------------
"Smart" | "Poodle", "Husky" | 2
"Loyal" | "Husky", "Golden" | 2
"Cute" | "Poodle", "Golden", "Husky" | 3
"Caring" | "Golden" | 1
"Loving" | "Golden" | 1
My problem is that I couldn't find a way to split the string in my table and manipulate this data.
It would be great if anyone can help me with this problem.
If pure SQL can't be done it might be helpful to tell that I'm using Entity Framework and if the solution can be written in it, it might be even better.
Thank you all.

This can be done in pure SQL. In Oracle, you can use regexp functions and a regular expression to split the delimited strings, then use string aggregation to generate the list of names per temperament:
with cte (name, temperament, temp, cnt, lvl) as (
select
name,
temperament,
regexp_substr (temperament, '[^, ]+', 1, 1) temp,
regexp_count(temperament, ',') cnt,
1 lvl
from mytable
union all
select
name,
temperament,
regexp_substr (temperament, '[^, ]+', 1, lvl + 1),
cnt,
lvl + 1
from cte
where lvl <= cnt
)
select
temp temperament,
listagg(name, ', ') within group(order by name) name,
count(*) cnt
from cte
group by temp
order by 1
Demo on DB Fiddle:
TEMPERAMENT | NAME | CNT
:---------- | :-------------------- | --:
Caring | Golden | 1
Cute | Golden, Husky, Poodle | 3
Loving | Golden | 1
Loyal | Golden, Husky | 2
Smart | Husky, Poodle | 2

If anyone needs the answer:
var result = (from t in ((from t1 in db.mytables select new {tmp= t1.TEMP1}).Concat(from t2 in db.mytables select new {tmp= t2.TEMP2}).Concat(from t3 in db.mytables select new {tmp= t3.TEMP3})) group t.tmp by t.tmp into g select new { tmp = g.Key, cnx=g.Count()}).ToList();
Hope it'll help someone!

Related

Count string occurances within a list column - Snowflake/SQL

I have a table with a column that contains a list of strings like below:
EXAMPLE:
STRING User_ID [...]
"[""null"",""personal"",""Other""]" 2122213 ....
"[""Other"",""to_dos_and_thing""]" 2132214 ....
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
QUESTION:
I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:
SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by u.STRING
order by cnt desc;
However the above method doesn't work as it only counts the number of user ids that use a specific grouping of strings.
The ideal output using the example above would like this!
DESIRED OUTPUT:
STRING COUNT_Instances
"null" 1223
"personal" 543
"Other" 324
"to_dos_and_thing" 221
"getting_things_done" 146
"Work!!!!!" 22
Based on your description, here is my sample table:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here's the query and output:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html
REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html

How do I merge and delete duplicated rows in SQL using UPDATE?

For example, I have a table of:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web
2 | 23 | xyz | 0 | mobile
3 | 24 | xyzc | 0 | web
4 | 25 | xyzc | 0 | web
I want the result to be:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web&mobile
2 | 24 | xyzc | 0 | web
3 | 25 | xyzc | 0 | web
How do I do this in SQL Server using UPDATE and DELETE statements?
Any help is greatly appreciated!
I might actually suggest just leaving the original data intact, and instead creating a view here:
CREATE VIEW yourView AS
SELECT ROW_NUMBER() OVER (ORDER BY MIN(id)) AS id,
code, name, type,
STRING_AGG(deviceType, '&') WITHIN GROUP (ORDER BY id) AS deviceType
FROM yourTable
GROUP BY code, name, type;
Demo
One main reason for not actually doing the update is that every time new data comes in, you might possibly have to run that update, over and over. Instead, just keeping the original data and running the view occasionally might perform better here.
Note that I assume that you are using SQL Server 2017 or later. If not, then STRING_AGG would have to be replaced with an uglier approach, but you should consider upgrading in this case.
To do what you want, you would need two separate statements.
This updates the "first" row of each group with all the device types in the group:
update t
set t.devicetype = t1.devicetype
from mytable t
inner join (
select min(id) as id, string_agg(devicetype, '&') within group(order by id) as devicetype
from mytable
group by code, name, type
having count(*) > 1
) t1 on t1.id = t.id
This deletes everything but the first row per group:
with t as (
select row_number() over(partition by code, name, type order by id) rn
from mytable
)
delete from t where rn > 1
Demo on DB Fiddle

Group by portion of field

I have a field in a PostgreSQL table, name, with this format:
JOHN^DOE
BILLY^SMITH
FIRL^GREGOIRE
NOEL^JOHN
and so on. The format is LASTNAME^FIRSTNAME. The table has ID, name, birthdate and sex fields.
How can I do a SQL statement with GROUP BY FIRSTNAME only ? I have tried several things, and I guess regexp_match could be the way, but I don't know how to write a correct regular expression for this task. Can you help me ?
I would recommend split_part():
group by split_part(mycol, '^', 1)
Demo on DB Fiddle:
mycol | split_part
:------------ | :---------
JOHN^DOE | JOHN
BILLY^SMITH | BILLY
FIRL^GREGOIRE | FIRL
NOEL^JOHN | NOEL
Use regexp_replace. Note that '^' needs to be escaped, since in many regexp dialects it means the beginning of the line or or the string. Extending your example with one more name, and using group by on the first field:
select
count(*)
, regexp_replace(tmp_col, '\^.*', '')
from
(values
('JOHN^DOE')
, ('BILLY^SMITH')
, ('FIRL^GREGOIRE')
, ('NOEL^JOHN')
, ('JOHN^SMITH')
)
as tmp_table(tmp_col)
group by regexp_replace(tmp_col, '\^.*', '')
;
Prints:
count | regexp_replace
-------+----------------
1 | BILLY
2 | JOHN
1 | NOEL
1 | FIRL
(4 rows)
To group by on the second field, use a similar regex:
select
count(*)
, regexp_replace(tmp_col, '.*\^', '')
from
(values
('JOHN^DOE')
, ('BILLY^SMITH')
, ('FIRL^GREGOIRE')
, ('NOEL^JOHN')
, ('JOHN^SMITH')
)
as tmp_table(tmp_col)
group by regexp_replace(tmp_col, '.*\^', '')
;
Prints:
count | regexp_replace
-------+----------------
1 | JOHN
1 | GREGOIRE
1 | DOE
2 | SMITH
(4 rows)

SQL Oracle - how to find the final value for the same table?

So I have a table that looks like this:
+---------+-------------+-----------+
| Name | Name_Change | Status |
+---------+-------------+-----------+
| Rick | Brandon | Cancelled |
| Brenda | Alexa | Active |
| Brandon | TJ | Cancelled |
| TJ | Jonathan | Active |
| Randy | | Active |
+---------+-------------+-----------+
So Rick --> Brandon --> TJ --> Jonathan
So my output should be:
+------+------------+--------+
| Name | Final Name | Status |
+------+------------+--------+
| Rick | Jonathan | Active |
+------+------------+--------+
How do I code this on SQL?
TIA
You can use a recursive CTE, as in:
with
n (name, name_change, status, version) as (
select t.*, 1 from t where name = 'Rick'
union all
select n.name, t.name_change, t.status, n.version + 1
from n
join t on t.name = n.name_change
)
select *
from n
where version = (select max(version) from n);
See running example at SQL Fiddle.
This is a typical example of Hierarchical Query which starts with the member Name = 'Rick'. Filter out the records by using CONNECT_BY_ISLEAF pseudocolumn as WHERE CONNECT_BY_ISLEAF = 1 in order to have the topmost(e.g. deepest or leaf) element. Btw, it's possible to derive the whole path after starting member(Rick) by using SYS_CONNECT_BY_PATH() function, and derive the Name depending on the shortest hierarchy path returning from this function. Therefore, use
WITH t AS
(
SELECT MAX(Name) KEEP ( DENSE_RANK FIRST
ORDER BY LENGTH(SYS_CONNECT_BY_PATH(Name_Change, ' ->')) )
OVER () AS "Name",
SYS_CONNECT_BY_PATH(Name_Change, ' ->') AS "Hierarchy Path",
Name_Change AS "Final Name",
Status AS "Status",
CONNECT_BY_ISLEAF AS cbi
FROM tab t
START WITH Name = 'Rick'
CONNECT BY PRIOR Name_Change = Name
)
SELECT "Name", "Hierarchy Path", "Final Name", "Status"
FROM t
WHERE cbi = 1;
Demo

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);