Get following letter from given - sql

I have a table with company names. Some companies have different locations and different legal names but they should be reported under the same Group Code. The Code is made up using the first five letters.
Company GroupCode
DEEZER FRANCE DEEZE
DEEZER SPAIN DEEZE
DEEZER ALGERIA DEEZE
So far so good. Now I’m adding a different company which starts with the same letters but should get a new Group Code.
A new Code should be assigned if the company name does not contain a word which is part of a company name already having a GroupCode. In this Case DEEZER is the key word which determines association with GroupCode DEEZE
Rule is that the code should then use the first four letters + the fifth letter next in the alphabet. If this code also exists then use the first four letters + the fifth letter next but one in the alphabet. The required result would look like:
Company GroupCode Status
DEEZER FRANCE DEEZE EXISTING
DEEZER SPAIN DEEZE EXISITNG
DEEZER ALGERIA DEEZE EXISTING
DEEZEMBER DEEZF CREATED
DEEZEMAL DEEZG CREATED
So what I need to figure out is the next „unused“ letter. How can I achieve this with SQL Server 2008 R2?

Try this:
;with cte as
(select max(groupcode) maxcode
from yourtable
where left(code,4) = left(#companyname,4))
insert into yourtable (company, groupcode, [status])
select #companyname,
case when maxcode is null then left(#companyname,4) + 'a' else left(maxcode,4) + char(ascii(right(maxcode,1))+1) end,
'created'
from cte
Assumption: Your input is taking the company name as a parameter from somewhere, presumably the front end.
The idea is to use ascii function to get the ASCII code of the last letter, increment it by 1 and go back to the corresponding character using char function.
Be warned, however, that this is definitely not the best solution. For instance, I have not implemented bounds checking to ensure range between A and Z. In fact, I would suggest that you handle this in application code rather than at DB level.

Related

SQL Server - grab part of string after a value sequence

I have a table called Note with a column named Notes.
Notes
------
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Arial;}}
\viewkind4\uc1\pard\lang1033\fs20 called insurance company they are waiting to hear from the claimant's attorney
It has font info in the beginning which I don't need. I've created a new column name final_notes and would like to grab everything after the "fs" plus two characters. The final result would be
final_notes
-----------
called insurance company they are waiting to hear from the claimant's attorney
We use PATINDEX to find the first occurrence of fs followed by two digits.
We null it out if we get a 0 i.e. we cannot find the string.
SUBSTRING(Note, NULLIF(PATINDEX('%fs[0-9][0-9]%', Note), 0) + 4, LEN(Note))

Using LIKE clause when formats are different

I was given a patient list with names and I am trying to match with a list already in our database and am having troubles given the format of the name field in the patient list. This list is taken from a web form so people can input names however they want so it does not match up well.
WEBFORM_NAME
PATIENT_NAME
JOHN SMITH
SMITH,JOHN L
SHANNON BROWN
BROWN,SHANNON MARIE
Is there a way to use a LIKE clause in an instance like this? All I really need is the LIKE clause to find the first name because I have joined on phone number and email address already. My issue is when households have the same phone number and email address (spouses for example) I just want to return the right person in the household.
Not sure if all you need is to get first name, here is the WIldCard expression to get first name
SELECT LEFT(WEBFORM_NAME,CHARINDEX(' ',WEBFORM_NAME)-1) AS FirstName1,
SUBSTRING(PATIENT_NAME,CHARINDEX(',',PATIENT_NAME)+1,(CHARINDEX(' ',PATIENT_NAME)-CHARINDEX(',',PATIENT_NAME))) AS FirstName2
FROM yourTable
The assumption here seems to be that the webform (where user would manually) type in the name would be of the format <First Name> [<optional middle Name(s)>] <Last Name>, where as the data stored in the table are of the form <Last Name>,<First Name> [<optional middle Name(s)>]. Its not an exact science, but since other criteria (like email, phone etc) have been matched best case
select *
from webform w, patient p
where
-- extract just the last name and match that
regexp_like(p.name,
'^' ||
regexp_extract(w.name,
'([^[:space:],][[:space:],])*([^[:space:],]+)', 1, 2))
and -- extract the first name and match it
regexp_like(p.name,
',[[:space:]]*' ||
regexp_extract(w.name, '(^[^[:space:],]+)'))
Since webform is free form user input, its hard to handle abbreviated middle name(s) and other variations so using the above will do first name and last name based matching which in addition to the matching you are already doing should help.

BigQuery: grouping by similar strings for a large dataset

I have a table of invoice data with over 100k unique invoices and several thousand unique company names associated with them.
I'm trying to group these company names into more general groups to understand how many invoices they're responsible for, how often they receive them, etc.
Currently, I'm using the following code to identify unique company names:
SELECT DISTINCT(company_name)
FROM invoice_data
ORDER BY company_name
The problem is that this only gives me exact matches, when its obvious that there are many string values in company_name that are similar. For example: McDonalds Paddington, McDonlads Oxford Square, McDonalds Peckham, etc.
How can I make by GROUP BY statement more general?
Sometimes the issue isn't as simple as the example listed above, occasionally there is simply an extra space or PTY/LTD which throws off a GROUP BY match.
EDIT
To give an example of what I'm looking for, I'd be looking to turn the following:
company_name
----------------------
Jim's Pizza Paddington|
Jim's Pizza Oxford |
McDonald's Peckham |
McDonald's Victoria |
-----------------------
And be able to group by their company name rather than exclusively with an exact string match.
Have you tried using the Soundex function?
SELECT
SOUNDEX(name) AS code,
MAX( name) AS sample_name,
count(name) as records
FROM ((
SELECT
"Jim's Pizza Paddington" AS name)
UNION ALL (
SELECT
"Jim's Pizza Oxford" AS name)
UNION ALL (
SELECT
"McDonald's Peckham" AS name)
UNION ALL (
SELECT
"McDonald's Victoria" AS name))
GROUP BY
1
ORDER BY
You can then use the soundex to create groupings, with a split or other type of function to pull the part of the string which matches the name group or use a windows function to pull back one occurrence to get the name string. Not perfect but means you do not need to pull into other tools with advanced language recognition.

What are the cases whereby EXCEPT and DISTINCT are different?

Looking into my notes for introduction to databases, I have stumbled upon a case that i do not understand (Between except and distinct).
It says so in my notes that:
The two queries below have the same results, but this will not be the case in general.
First query:
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan'
EXCEPT
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.last_name LIKE 'D%';
Second query:
Select DISTINCT c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan' AND NOT (c.last_name LIKE 'D%');
Could anyone provide me some insights as to what are cases whereby the results would differ?
Number 1 selects first, last & email from customers who are from Japan and whose last names do not start with D.
Number 2 selects first, last & email, where no two records have all 3 fields the same, where the customers are from Singapore and their last names do not begin with D.
I suppose I can imagine a table where these would yield the same results, but I don't think it would ever appear except in very contrived circumstances.
Joe Smith jsmith#abc.com Japan
Joe Smith jsmith#abc.com Singapore
Would be one of them. Both queries would yield Joe Smith jsmith#abc.com. Another case would be if no-one was from either country or everyone's last name started with D, then they would both yield nothing.
None of this is tested, and the EXCEPT statement is something I've read about but never had occasion to use.
The first is looking at Japan, the second at Singapore, so I don't see why these would generally -- or specifically -- return the same data.
Even if the countries were the same you have another issue with NULL values. So, if your data looks like this:
first_name last_name email country
xxx NULL a Japan
Your first query would return the row. The second would not.

SQL Server - copy data across tables , but copy the data only when it match with a specific column name

For example I got this 2 table
dbo.fc_states
StateId Name
6316 Alberta
6317 British Columbia
and dbo.fc_Query
Name StatesName StateId
Abbotsford Quebec NULL
Abee Alberta NULL
100 Mile House British Columbia NULL
Ok pretty straightforward , how do I copy the stateId over from fc_states to fc_Query, but match it with the StatesName, let say the result would be
Name StatesName StateId
Abee Alberta 6316
100 Mile House British Columbia 6317
Thanks, and both stateName column type is text
How about:
update fc_Query set StateId =
(select StateId from fc_states where fc_states.Name = fc_Query.StatesName)
That should give you the result you're looking for.
This is a different way than what Eddie did, I like MERGE for updates if they're not dead simple (like I wouldn't consider yours dead simple). So if you're bored/curious also try
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
on stateids.name = fc_query.statesname
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
The first part, from "WITH" to the GROUP BY NAME), is a CTE, that creates a table-like thing - a name 'stateIds' that is good as a table for the immediately following part of the query - where there's guaranteed to be only one row per state name. Then the MERGE looks for anything in the fc_query with a matching name. And if there's a match, it sets it as you want. YOu can make a small edit if you don't want to overwrite existing stateids in fc_query:
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
ON stateids.name = fc_query.statesname
AND fc_query.statid IS NOT NULL
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
And you can have it do something different to rows that don't match. So I think MERGE is good for a lot of applications. You need a semicolon at the end of MERGE statements, and you have to guarantee that there will only be one match or zero matches in the source (that is "stateids", my CTE) for each row in the target; if there's more than one match some horrible thing happens, Satan wins or the US economy falters, I'm not sure what, just never let it happen.