I need to list out the records with some records in column_A & column_B both are of char(10) type. But the issue is Column_A has trailing spaces in it, Even though both the columns having same data but it couldn't match.
I tried to LTRIM(RTRIM(column_A)), REPLACE(column_A, CHAR(32), '') etc., but none of them doesn't work.
Could someone suggest any other method to solve this issue.
Note: The above mentioned methods are resulting fine in SELECT clause.
Thanks in advance.
This should work:
WHERE LTRIM(RTRIM(COLA)) = LTRIM(RTRIM(COLB))
Related
I have a decent size table with 20+ columns and almost 3 million rows, and I want to select all the unique values from a single column and enter them into a newly created table. After research, I have attempted this using both the DISTINCT and GROUP BY approaches, but both are producing duplicate values. Furthermore, I've set the new column in the new table as a Primary Key, which I don't believe should allow duplicate values.
I'm definitely a beginner here, so perhaps there is something simple I'm doing wrong. Here's some sample code:
Using GROUP BY
INSERT INTO ResourceGroups(ResourceGroup)
SELECT ResourceGroup
FROM dbo.UsageData
WHERE ResourceGroup IS NOT NULL
GROUP BY ResourceGroup
Using DISTINCT
INSERT INTO ResourceGroups(ResourceGroup)
SELECT DISTINCT ResourceGroup
FROM dbo.UsageData
WHERE ResourceGroup IS NOT NULL
The results of both of these seem to be the same. Here's a sample of the first few rows:
ResourceGroup
aiiInnovationTime
Api-Default-Central-US
Api-Default-Central-US
applicationinsights
applicationinsights
azurefunctions-southeastasia
azurefunctions-southeastasia
The query resulted in 532 rows, and it clearly eliminated some duplicates after consolidating down from 3 million. However, there are obviously still duplicates here, and they also successfully inserted into a primary key column which shouldn't allow duplicate. Furthermore there's a blank row despite my attempt to filter out NULLs (though maybe there's a space or something there?). Needless to say, I'm a bit confused about what I'm doing wrong, and would greatly appreciate any help that this community can provide!
Both the queries you mentioned should give you unique results, the anomaly however, is due to may be leading or trailing white-spaces.
Depending on the DB you can modify the query for e.g.
For Oracle DB: You can use TRIM function which removes both leading and trailing white-spaces.
SQL Server Don't have single function you have to use LTRIM and RTRIM to remove spaces.
Assuming there are spaces in your data
SELECT DISTINCT
REPLACE(REPLACE(REPLACE(REPLACE(ResourceGroup, CHAR(13) + CHAR(10), ' ... '),
CHAR(10) + CHAR(13), ' ... '), CHAR(13), ' '), CHAR(10), ' ... ')
FROM dbo.UsageData
WHERE LTRIM(RTRIM(ResourceGroup)) IS NOT NULL
LTRIM trims leading spaces and RTRIM trims trailing spaces. Try this out and see if it works!
As Chetan Ranpariya mentioned, checked leading and trailing spaces. The way you do it depends on the SQL engine. For instance, in MySQL you can use https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_trim.
I have sql table with name and surname. Surname is in own column. The problem is with users with two surnames, because sometimes they add more than one space between surnames and then I have to find and fix them manualy.
How to find these surnames with more than one space in between?
If you want to find records which have more than one space then you can use the following trick:
SELECT surname
FROM yourTable
WHERE LENGTH(REPLACE(surname, ' ', '')) < LENGTH(surname) - 1
This query will detect two or more spaces in the surname column. If you want to also do an UPDATE this is possible, but it would be fairly database-specific, and you did not specify your database as of the time I wrote this answer.
First remove those extra spaces. Then add a constraint that makes sure it doesn't happen again:
alter table tablename add constraint surname_verify check (surname not like '% %')
(Or, even better, have a trigger making sure the surnames are properly spaced, cased etc.)
How to remove extra spaces? Depends on the dbms.
You can perhaps do something like:
update tablename set surname = replace(surname, ' ', ' ')
where surname like '% %'
The where clause isn't needed, but makes the transaction much smaller.
(Iterate to get rid of triple or more spaces.) Or use regexp_replace.
Even tidier:
select string = replace(replace(replace(' select single spaces',' ','<>'),'><',''),'<>',' ')
Output:
select single spaces
I am trying to find if a certain column requires TRIM function on it.
How can I find out if this column in a table has records that have white space either before or after the actual data.
You can check it using the TRIM function itself, not the most efficient but accurate:
Select *
From TableA
Where MyColumn <> TRIM(MyColumn)
Though if you're checking then turning around to trim anyway, you probably want to just do it in the first place, like this:
Select TRIM(MyColumn) as TrimmedMyColumn
From TableA
A quick and dirty way
WHERE LENGTH(TRIM(COL1)) <> LENGTH(COL1)
So why can't you use the following to find the leading spaces? I've been able to identify the records with leading spaces this way and using '% ' to find the trailing spaces.
SELECT mycolumn
FROM my_table
WHERE mycolumn LIKE ' %'
I've also used the following to remove both the leading and trailing spaces
Update My_table set Mycolumn = TRIM(Mycolumn)
which seems to work just fine.
You could use regular expressions in Oracle.
Example:
select * from your_table
where regexp_like(your_column, '^[ ]+.*')
or regexp_like(your_column, '.*[ ]+$')
select data1, length(data1)-length(replace(data1,' ','')) from t;
Following query will retrieve rows when one of Table fields T$DSCA has trailing spaces at the end:
SELECT * from TABLE_NAME A WHERE RAWTOHEX(SUBSTR(A.T$DSCA, LENGTH(T$DSCA),1)) ='A0' AND TRIM(T$DSCA) is not null;
I came across a weird situation when trying to count the number of rows that DO NOT have varchar values specified by a select statement. Ok, that sounds confusing even to me, so let me give you an example:
Let's say I have a field "MyField" in "SomeTable" and I want to count in how many rows MyField values do not belong to a domain defined by the values of "MyOtherField" in "SomeOtherTable".
In other words, suppose that I have MyOtherField = {1, 2, 3}, I wanna count in how many rows MyField value are not 1, 2 or 3. For that, I'd use the following query:
SELECT COUNT(*) FROM SomeTable
WHERE ([MyField] NOT IN (SELECT MyOtherField FROM SomeOtherTable))
And it works like a charm. Notice however that MyField and MyOtherField are int typed. If I try to run the exact same query, except for varchar typed fields, its returning value is 0 even though I know that there are wrong values, I put them there! And if I, however, try to count the opposite (how many rows ARE in the domain opposed to what I want that is how many rows are not) simply by supressing the "NOT" clause in the query above... Well, THAT works! ¬¬
Yeah, there must be tons of workarounds to this but I'd like to know why it doesn't work the way it should. Furthermore, I can't simply alter the whole query as most of it is built inside a C# code and basically the only part I have freedom to change that won't have an impact in any other part of the software is the select statement that corresponds to the domain (whatever comes in the NOT IN clause). I hope I made myself clear and someone out there could help me out.
Thanks in advance.
For NOT IN, it is always false if the subquery returns a NULL value. The accepted answer to this question elegantly describes why.
The NULLability of a column value is independent of the datatype used too: most likely your varchar columns has NULL values
Do deal with this, use NOT EXISTS. For non-null values, it works the same as NOT IN so is compatible
SELECT COUNT(*) FROM SomeTable S1
WHERE NOT EXISTS (SELECT * FROm SomeOtherTable S2 WHERE S1.[MyField] = S2.MyOtherField)
gbn has a more complete answer, but I can't be bothered to remember all that. Instead I have the religious habit of filtering nulls out of my IN clauses:
SELECT COUNT(*)
FROM SomeTable
WHERE [MyField] NOT IN (
SELECT MyOtherField FROM SomeOtherTable
WHERE MyOtherField is not null
)
Specifically, Sql Server 2005/T-Sql. I have a field that is mostly a series of two characters, and they're all supposed to be upper case but there's some legacy data that predates the current DB/System, and I need to figure out which records are in violation of the upper casing covenant.
I thought this would work:
select * from tbl where ascii(field1) <> ascii(upper(field1))
And indeed it returned me a handful of records. They've since been corrected, and now that query returns no data. But I've got people telling me there is still mixed case data in the DB, and I just found an example: 'FS' and 'Fs' are both reporting the same ascii value.
Why is this approach flawed? What is a better way to go about this, or how can I fix this approach to work correctly?
if all the date should have been in upper case just do an update
update tbl
set field1 = upper(field1)
but to answer your original question this query should give you the results that you expect:
select * from tbl
where field1 COLLATE Latin1_General_CS_AS <> upper(field1)
Edit: just noticed that the suggestion to use COLLATE was also posted by Ian
ASCII is only comparing the first letter. You'd have to compare each letter, or change the database collation to be case sensitive.
You can change collation on an entire database level, or just on one column for a specific query, so:
SELECT myColumn
FROM myTable
WHERE myColumn COLLATE Latin1_General_CS_AS <> upper(myColumn)
The ascii() function will only return the ascii number for the first character in an expression if you pass it a multiple character string. To do the comparison you want you need to look at individual characters, not entire fields.
The ASCII() function returns only the ASCII code value of the leftmost character of a character expression. Use UPPER() instead.
This might work:
select * from tbl
where cast(field1 as varbinary(256)) <> cast(upper(field1) as varbinary(256))
The methods described at Case sensitive search in SQL Server queries might be useful to you.
According to the documentation for ASCII(), it only returns the leftmost character.
I think you're going about this wrong.
You could simply:
select * from tbl where field1 <> upper(field1)
if the collation rules were set correctly, so why not fix the collation rules? If you can't change them permanently, try:
select * from tbl where
(field1 collate Latin1_General_CS_AS)
<> upper(field1 collate Latin1_General_CS_AS)