Match a set of characters from one table into the records of an other table - sql

I have two tables (T-SQL):
tblInvalidCharactersList tblMonthsRecords
+-----------+-----------+ +--------+-------------+
| CodePoint | Character | | RecRef | Name |
+-----------+-----------+ +--------+-------------+
| 38 | & | | 21 | Firs> name |
+-----------+-----------+ +--------+-------------+
| 64 | # | | 89 | #Second name|
+-----------+-----------+ +--------+-------------+
| 62 | > | | 321 | Third n«me |
+-----------+-----------+ +--------+-------------+
| 171 | « | | 381 | Fourth name |
+-----------+-----------+ +--------+-------------+
I want to find those records of the tblMonthsRecords which have at least one (or more) character(s) from the Character column of the tblInvalidCharactersList table.
I tried:
SELECT
[RecRef],
[Name]
FROM [tblMonthsRecords]
WHERE [Name] IN (SELECT Character FROM [tblInvalidCharactersList])
and it returns no results at all.
I even tried the NOT IN clause and as you may guess, returns all records.
The reason why I am not hardcoding the characters list within a LIKE clause is because I want the list to be dynamically updated.
You can think the tblInvalidCharactersList as a characters "black list".

I would use exists:
select mr.*
from tblMonthsRecords mr
where exists (select 1
from tblInvalidCharactersList icl
where charindex(icl.Character, mr.name) > 0
);
You don't seem to care about the actual invalid character.

IN will look for exact character match in Name column it will not search for the character in Name column
Use LIKE operator
select Distinct a.*
from tblMonthsRecords a
join tblInvalidCharactersList b
on a.Name like '%' + b.Character + '%'
Another way using charindex
charindex(b.Character,a.Name) > 0

Related

Filtering records not containing numbers

I have a table that has numbers in string format. Ideally the table should contain 10 digit number in string format, but it has many junk values. I wanted to filter out the records that are not ideal in nature.
Below is the sample table that I have:
+---------------+--------+----------------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 3423122334 | 10 | As expected, 10 character number |
| 6758439239 | 10 | As expected, 10 character number |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------------+
Expected Output:
+---------------+--------+----------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------+
Basically I want all the records that dont have 10 digit numbers in them.
I have tried out below query:
SELECT *
FROM t1
WHERE ID_UID LIKE '%[^0-9]%'
But this does not returns any records.
Have created a fiddle for the same.
P.S. The columns length and ##Comment are illustrative in nature.
You want RLIKE not LIKE:
SELECT *
FROM t1
WHERE ID_UID RLIKE '[^0-9]'
Note that % is a LIKE wildcard, not a regular expression wildcard. Also, regular expressions match the pattern anywhere it occurs, so no wildcards are needed for the beginning and end of the string.
If you want to find values that are not ten digits, then be explicit:
SELECT *
FROM t1
WHERE ID_UID NOT RLIKE '^[0-9]{10}$'

How to get every first result of select query in loop iterating over array of strings?

I have a table (e.g. Users) in PostgreSQL database. Its size is relatively large (ca. 4 GB of data) and I would like to get a table/result consisting of single rows fulfilling the select query. This query shall be executed for each element in an array of strings (couple dozens of elements).
Example single select for one element:
SELECT * FROM "Users" WHERE "Surname" LIKE 'Smith%' LIMIT 1
Value between ' and %' should be an element of input array.
EDIT: It doesn't matter for me whether I get record no. 1 or 2 for LIKE 'Smith%'
How can I achieve this?
I tried to append query results to some array variable within FOREACH loop but with no success.
Example source table:
| Id | Name | Surname |
|---- |-------- |---------- |
| 1 | John | Smiths |
| 2 | Adam | Smith |
| 3 | George | Kowalsky |
| 4 | George | Kowalsky |
| 5 | Susan | Connor |
| 6 | Clare | Connory |
| 7 | Susan | Connor |
And for ['Smith', 'Connor'] the output is:
| Id | Name | Surname |
|----|-------|---------|
| 1 | John | Smiths |
| 5 | Susan | Connor |
In Postgres you can use the ANY operator to compare a single value to all values of an array. This also works with together with the LIKE operator.
SELECT *
FROM "Users"
WHERE "Surname" like ANY (array['Smith%', 'Connor%'])
Note that LIKE is case sensitive, if you don't want that, you can use ILIKE
This will show you the logic. Syntax is up to you.
where 1 = 2
start of loop
or surname like 'Array Element goes here%'
end of loop

Efficient Classification of records by common letters in impala

I have a table in impala (TBL1), that contains different names with different number of first common letters. The table contains about 3M records. I would like to add add an new attribute to the table, where each common first letters will have a class. It is the same way as DENSE_RANK work but with dynamic number of first letters. The number of same first letters should not be less than p=3 letters (p = parameter).
Here is an example for the table and the required results:
| ID | Attr1 | New_Attr1 | Some more attribute...
+-------+--------------+-------------+-----------------------
| 1 | ZXA-12 | 1 |
| 2 | YL3300 | 2 |
| 3 | ZXA-123 | 1 |
| 4 | YL3400 | 2 |
| 5 | YL3-aaa | 2 |
| 6 | TSA 789 | 3 |
...
Does this do what you want?
select t.*,
dense_rank() over (order by strleft(attr1, 3)) as newcol
from . . .;
The "3" is your parameter.
As a note: In your example, you seem to have assigned the new value in reverse alphabetic order. Hence, you would want desc for the order by.

Remove invalid data based on particular pattern SQL Server

I have a sample data like shown below
------------------------------------------------
| ID | Column 1 | Column 2 |
------------------------------------------------
| 1 | 0229-10010 |Valid |
------------------------------------------------
| 2 | 20483 |InValid |
------------------------------------------------
| 3 | 319574R06-STAT |Valid |
------------------------------------------------
| 4 | ,,,,,,,,,,,,,,1,,,,,,, |InValid |
------------------------------------------------
| 5 | "PBOM-SSE, CHAMBER" |Valid |
------------------------------------------------
| 6 | ""PBOM-SSE, CHAMBER |InValid |
------------------------------------------------
| 7 | "PBOM-SSE CHAMBER", |InValid |
------------------------------------------------
| 8 | #DRM-1102.Z |InValid |
------------------------------------------------
| 9 | DRM#1102.Z |Valid |
------------------------------------------------
| 10 |OEM-2-202 4079 KALREZ |Valid |
------------------------------------------------
| 11 |-OEM2202 4079 KALREZ# |InValid |
------------------------------------------------
What i want to do is i need to create a pattern in such a way that i need to fetch only invalid data. Just for representation i have mentioned Valid and Invalid. In my table i don't have any flag as such.
Here the trick is same, wildcard characters appearing at different places makes different sense. Consider record ID-5 and Id-6. In both the cases wildcard characters are same, but the position decides whether its valid or not. Again position is also not so clear. I guess you can make out why particular record in column 1 is valid and invalid. In record 8, '#' before that item doesn't makes sense, where as # after Alphabet makes sense (in record 9).
In record 2, there are lot of blank spaces before number, that's why its invalid, but that doesn't mean that space itself is wild card.
I have written query like below.
SELECT [PartNumber]
FROM [IBSSSystems].[dbo].[Part]
WHERE (PartNumber LIKE '%[?;.,$^#&*{}:"<>/|\ %'']%'
OR PartNumber LIKE '%[%'
OR PartNumber LIKE '%]%')
The above query understands that whenever it see any wildcard character in a record , it fetches that. But I need the query in such a way that it understands and fetches only invalid data. I guess there will be lot of And and Or in the resulting query, but i'm confused. I hope you can help me out. Thanks in advance.
SELECT [PartNumber]
FROM [IBSSSystems].[dbo].[Part]
WHERE (PartNumber LIKE '[^A-Za-z0-9"]%' ESCAPE '\' -- When the First character is special charater its InValid ( " is an exception)
OR PartNumber LIKE '%[^A-Za-z0-9" ]' ESCAPE '\' -- When the Last character is special charater its InValid ( " is an exception, also trailing spaces are exception)
OR PartNumber LIKE '%[^A-Za-z0-9 ][^A-Za-z0-9 ]%' -- When there are two or more consecutive special charaters its InValid
OR PartNumber LIKE '%[\^\[\]\\_?;$#&*{}:<>/|''~`]%' ESCAPE '\' -- Add characters here which do not allowed to have any occurrence in the string
)

How to get records from a table where some field's value is in camel-case

I have a table like this,
+----+-----------+
| Id | Value |
+----+-----------+
| 1 | ABC_DEF |
| 31 | AcdEmc |
| 44 | AbcDef |
| 2 | BAA_CC_CD |
| 55 | C_D_EE |
+----+-----------+
I need a query to get the records which Value is only in camelcase (ex: AcdEmc, AbcDef etc. not ABC_DEF).
Please note that this table has only these two types of string values.
You can use UPPER() for this
select * from your_table
where upper(value) <> value COLLATE Latin1_General_CS_AS
If your default collation is case-insensitive you can force a case-sensitive collation in your where clause. Otherwise you can remove that part from your query.
Based on the sample data, the following will work. I think the issue we're dealing with is checking whether the string contains underscores.
SELECT * FROM [Foo]
WHERE Value NOT LIKE '%[_]%';
See Fiddle
UPDATE: Corrected error. I forgot '_' meant "any character".