SQL Group By with Text Transformation - sql

I'm trying to do some transformations on a large data set that I'm working on and was hoping for a bit of assistance on a particular grouping. I have a series of records that follow a pattern similar to below:
Language Full Name Customer ID
--------------------------------------
English John Smith 12222
French John Smith 12222
Spanish John Smith 12222
English Karen Wong 55999
Cantonese Karen Wong 55999
I need the data such that the Full Name and Customer ID are not repeated so simply using DISTINCT for that. However, one oddity in the requirement is that all the different languages need to be preserved and squashed into the resulting output so the resulting data needs to look like this:
Languages Spoken Full Name Customer ID
----------------------------------------------------
English, French, Spanish John Smith 12222
English, Cantonese Karen Wong 55999
Sounded like a simple thing but I guess I'm not a big SQL guru and keep getting funny results. Any help would be much appreciated :)

If you're using SQL Server 2017 or Azure SQL than you can just use STRING_AGG
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017
For everything else (covers solutions from SQL Server 2005 and on):
Simulating group_concat MySQL function in Microsoft SQL Server 2005?

Related

How to merge crosstab info down in Access?

Not sure if this is possible but I'm hoping it is. I am using MS Access for Estate Planning for work. I've gotten to the point where I've got the data to look like this:
File_Name
Executor_1
Executor_2
Beneficiary_1
Beneficiary_2
Hill, Hank
Peggy Hill
Peggy Hill
Hill, Hank
Bobby Hill
Bobby Hill
Gribble, Dale
Nancy Gribble
Gribble, Dale
Joseph Gribble
Joseph Gribble
Gribble, Dale
John Redcorn
But I need it to look like this:
File_Name
Executor_1
Executor_2
Beneficiary_1
Beneficiary_2
Hill, Hank
Peggy Hill
Bobby Hill
Peggy Hill
Bobby Hill
Gribble, Dale
Nancy Gribble
Joseph Gribble
Joseph Gribble
John Redcorn
I need it in the latter format so I can use MailMerge in word and create the Will. Can anyone provide any guidance? We don't currently use any software for Est. Planning so anything beats having to go into Word manually and retype everything. Please let me know if more information is needed.
Edit:
This is what the SQL looks like:
TRANSFORM Last(File_Roles.File_Name) AS LastOfFile_Name
SELECT File_Roles.Executor_1,
File_Roles.Executor_2,
File_Roles.Beneficiary_1,
File_Roles.Beneficiary_2,
File_Roles.Trustee_1,
File_Roles.Trustee_2,
File_Roles.Guardian_1,
File_Roles.Guardian_2,
File_Roles.ATTY_IF_1, File_Roles.ATTY_IF_2,
File_Roles.HCATTY_IF_1,
File_Roles.HCATTY_IF_2
FROM File_Roles
GROUP BY File_Roles.Executor_1,
File_Roles.Executor_2,
File_Roles.Beneficiary_1,
File_Roles.Beneficiary_2,
File_Roles.Trustee_1,
File_Roles.Trustee_2,
File_Roles.Guardian_1,
File_Roles.Guardian_2,
File_Roles.ATTY_IF_1,
File_Roles.ATTY_IF_2,
File_Roles.HCATTY_IF_1,
File_Roles.HCATTY_IF_2
PIVOT File_Roles.File_Name;
You can use GROUP BY and MAX()
SELECT
t.File_Name,
MAX(t.Executor_1) As Executor_1,
MAX(t,Executor_2) As Executor_2,
MAX(t.Beneficiary_1) As Beneficiary_1,
MAX(t.Beneficiary_2) As Beneficiary_2
FROM table_or_query t
GROUP BY File_Name
But maybe you can fix your original crosstab query to do this right away. Probably you are doing the grouping wrong. You must group by File_Name in the crosstab query and apply Max to the total row of the value (so it is difficult to say without seeing this query).
GROUP BY File_Name means that one row is created for each distinct value of File_Name.
Since this will merge several rows into one, you must specify an aggregate function for every column in the SELECT list not listed in the GROUP BY clause. This can be e.g. SUM(), AVG(), MIN() or MAX(). See SQL Aggregate Functions for a complete list. Since any Null value is considered to be less than any other value, MAX() will take this non-Null value from the merged rows.

Oracle SQL to extraction of human names like Nltk

Is there a SQL/Regex or some advance function where we can extract human names for a columns that has around 2 million rows? some thing like NLTK
below is my sample. In the below I wanted to extract only human names (i.e.) filter companies **. Like these I have 2 million mixed with real companies and human names
KAREN STRAUSS
KASEY NEMELKA
KATHLEEN MCMAHON
KATHRYN HOCKADAY
KATHRYN HOLAHAN
KATIE NELSON
**KATHERINE KACENA CONSULTING**
KATHY ATKINS
KATRINA GRANT
KATY DYER
KATY G TACKES
**KAUFFMAN S TRANSPORT LLC**
KATHERINE MAGPANTAY
KATHERINE VENTURA
KATHRYN RUANO
JORGE DANIEL MUSCIA
JOSE MANUEL ROSALES SANTEROS
JOSE MANUEL VILAS CARR
JOSEPH H WILNER
This is too long for a comment. Human names are too variable. After all, is "John Deere" the name of a company. Or is it the name of a person? Or both?
You can construct special purpose logic for your data. It will take time to develop but something like this:
regexp(lower(name), '\s(consult|llc)')

The difference between those two SQL queries

I have converted a sql query written by an other senoir developer who also is the group lead and I am new to programming. He wrote a query that was reading a collection of rows from DB by sending array of parameters, For example:
SELECT [LastName],[FirstMidName],[EnrollmentDate]
FROM [ContosoUniversity1].[dbo].[Student]
WHERE ([LastName] ='Alexander' AND [FirstMidName] = 'Carson')
OR ([LastName] ='Justice' AND [FirstMidName] = 'Peggy')
However, I was given an assignment to improve the security of the query. I did some changes to apply sqlParameter() to the query. The query was written as:
SELECT [LastName],[FirstMidName],[EnrollmentDate]
FROM [ContosoUniversity1].[dbo].[Student]
WHERE [LastName] IN ('Alexander','Justice')
AND [FirstMidName] IN ('Carson','Peggy')
So basically its follows the where.. in clause that I can further do my other tasks. And these two lines give the same result but he insisted that mine was logically bad. I have very hard time to understand his explanation and self-doubt that if I am doing wrong to convert this query. Could anyone share any opinion?
The first query will only bring in an exact grouping of names. Imagine if someone else went to the school called Carson Justice. Your query would bring him in, the seniors query would not.
I.e.
FirstMidName | LastName
Alexander | Carson
Peggy | Justice
Peggy | Carson
Alexander | Justice
Seniors query would return Alexander Carson, Peggy Justice
Your query would return all 4 names (Alexander Carson, Peggy Justice, Peggy Carson, Alexander Justice)
Yours is logically wrong because it will bring in Peggy Alexander. The first query won't bring her in. And that doesn't seem like the intent of the exercise.

How to create an SQL query that takes values on different rows and joins them together on the same row (variable number of joins required)

Not sure how to phrase the question really, but here's what I have and here's what I need.
I've got a table that looks like this:
Name K% Year
Albert Pujols 7.90% 2006
Albert Pujols 8.50% 2007
Albert Pujols 8.40% 2008
Albert Pujols 9.10% 2009
Albert Pujols 10.90% 2010
Albert Pujols 8.90% 2011
Albert Pujols 11.30% 2012
I'd like to create a query that will produce output that looks like:
Albert Pujols 7.90% 8.50% 8.40% 9.10% 10.90% 8.90% 11.30%
While this particular player has 7 rows, I can't be guaranteed that such will exist.
Is this even possible?
I'd appreciate any help. I wouldn't have any trouble if I knew that there were only 2 rows (inner join on name)... but the variable number of rows is throwing me for a loop.
Edit**
Peter Wooster's answer of pivoting was the solution I needed.
If you are doing this so you can print a report, best thing to do is use a report writer that supports cross tabs. Jasper Reports does.
SQL is not really good at this kind of stuff. There are tricky ways you could get it to give you the results, but they'd be pretty silly.

VBA Access SQL - field within LIKE operator

Can I use a table column within a Like operator? I've created an example,
TableA
Names Location
Albert Smith Senior Aberdeen
John Lee London
Michael Rogers Junior Newcastle
Mary Roberts Edinburgh
TableB
Names
Albert Smith
John Lee
Michael Rogers
I want to do a query such as:
SELECT TableA.Location
into NewTable
FROM TableA
WHERE TableA.Names Like '*[TableB.Names]*';
In this case, there would be no match for Mary Roberts, Edinburgh but the first three locations would be returned.
Is it possible to put a column into a like statement?
If not does anyone have any ideas how I could do this?
Hope you can help
PS I can't use an actual asterisk since this is removed and the text italicised, also I have read about using % instead but this has not worked for me.
You can join the two tables and use LIKE within the JOIN clause:
SELECT TableA.Location
into NewTable
FROM TableA
INNER JOIN TableB ON TableA.Names LIKE TableB.Names & '*';
Honestly, I had no idea that you can do this in Access before I tried it just now :-)