Hey all I'm trying to parse out any duplicates in an access database. I want the database to be usable for the access illiterate and therefore I am trying to set up queries that can be run without any understanding of the program.
My database is setup where there are occasionally special characters attached to the entries in the Name field. I am interested in checking for duplicate entries based of the fields field1 and name. How can I include the counts for entries with special characters with their non-special character counterparts? Is this possible in a single step or do I need to add a step where I clean the data first?
Currently my code (shown below) only returns counts for entries not including special characters.
SELECT
table.[field1],
table.[Name],
Count(table.[Name]) AS [CountOfName]
FROM
table
GROUP BY
table.[field1],
table.[Name]
HAVING
(((table.[Name]) Like "*") AND ((Count(table.[Name]))>1));
I have tried adding a leading space to the Like statement (Like " *"), but that returns zero results.
P.S. I have also tried the Replace statement to replace the special characters, but that did not work.
field1 ##
1234567
1234567
4567890
4567890
name ##
brian
brian
ted
ted‡
Results
field1
1234567
name
brian
countofname
2
GROUP BY works by placing rows into groups where values are the same. So, when you run your query on your data and it groups by field1 and name, you are saying "Put these records into groups where they share a common field1 and name value". If you want 4567890, ted and 4567890, ted† to show in the same group, and thus have a count of 2, both the field1 and name have to be the same.
If you only have one or two possible special characters on the end of the names, you could potentially use Replace() or Substring() to remove all the special chars from the end of the names, but remember you must also GROUP BY the new expression you create; you can't GROUP BY the original name field or you won't get your desired count. You could also create another column that contains a sanitized name, one without any special character on the end.
I don't have Access installed, but something like this should do it:
SELECT
table.[field1],
Replace(table.[Name], "†", "") AS Name,
Count(table.[Name]) AS [CountOfName]
FROM
table
GROUP BY
table.[field1],
Replace(table.[Name], "†", "")
Related
I am trying to filter a column in a dataframe for all the entries which contain a certain substring:
Let's say for the above, i want to filter for all the rows that contain ada in their name. So that would be rows 5 and 6 only. I would write something like this:
SELECT * from MailList where 'ada' in email
But this doesn't work.
If this is a SQL question, you can use LIKE operator as:
... where email like '%ada%'
Note that I also added the % on it that means it will look for the ada string anywhere in the name.
as in:
blaadable#gmail.com --> matches
adable#gmail.com --> matches
ble#gmail.com.ada --> matches
You can also use it (%) only as a prefix and/or suffix. As in
... email like '%ada' or email like 'ada%' It will work accordingly.
]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'
I'm trying to exclude all names with numbers in it, but still include certain special characters like:
()-'
So far, i've got this query:
SELECT FirstName FROM Client WHERE FirstName NOT LIKE '%[^0-9]%';
This excludes most names with numbers in it, however, a few like the following are still coming through (not being excluded):
How do i exclude these from above, but still keep rows that look like the ones below:
Thank you in advance!
I'm trying to exclude all names with numbers in it.
Assuming you are using SQL Server, your logic has too many negations. You want:
WHERE FirstName NOT LIKE '%[0-9]%';
This translates to FirstName not having any number in it.
If you want names that are only alphabetical and spaces, then use:
WHERE FirstName NOT LIKE '%[^a-zA-Z ]%';
Your version simply states that the string has no character which is not like a digit -- that is, it consists only of digits.
Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.
Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.
This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.
I have the following data in my "Street_Address_1" column:
123 Main Street
Using Postgresql, how would I write a query to update the "Street_Name" column in my Address table? In other words, "Street_Name" is blank and I'd like to populate it with the street name value contained in the "Street_Address_1" column.
From what I can tell, I would want to use the "regexp_matches" string method. Unfortunately, I haven't had much luck.
NOTE: You can assume that all addresses are in a "StreetNumber StreetName StreetType" format.
If you just want to take Street_Address_1 and strip out any leading numbers, you can do this:
UPDATE table
SET street_name = regexp_replace(street_address_1, '^[0-9]* ','','');
This takes the value in street_address_1 and replaces any leading string of numbers (plus a single space) with an empty string (the fourth parameter is for optional regex flags like "g" (global) and "i" (case-insensitive)).
This version allows things like "1212 15th Street" to work properly.
Something like...:
UPDATE table
SET Street_Name = substring(Street_Address_1 FROM '^[0-9]+ ([a-zAZ]+) ')
See relevant section from PGSQL 8.3.7 docs, the substring form is detailed shortly after the start of the section.