Filtering records not containing numbers - sql

I have a table that has numbers in string format. Ideally the table should contain 10 digit number in string format, but it has many junk values. I wanted to filter out the records that are not ideal in nature.
Below is the sample table that I have:
+---------------+--------+----------------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 3423122334 | 10 | As expected, 10 character number |
| 6758439239 | 10 | As expected, 10 character number |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------------+
Expected Output:
+---------------+--------+----------------------------+
| ID_UID | Length | ##Comment |
+---------------+--------+----------------------------+
| +112323456705 | 13 | Contains special character |
| 4323456432 | 11 | Contains blank |
| 58_4323129 | 10 | Contains special character |
| 4567$%6790 | 10 | Contains special character |
| 45684938901 | 11 | Is 11 characters |
| 4568 38901 | 10 | Contains blank |
+---------------+--------+----------------------------+
Basically I want all the records that dont have 10 digit numbers in them.
I have tried out below query:
SELECT *
FROM t1
WHERE ID_UID LIKE '%[^0-9]%'
But this does not returns any records.
Have created a fiddle for the same.
P.S. The columns length and ##Comment are illustrative in nature.

You want RLIKE not LIKE:
SELECT *
FROM t1
WHERE ID_UID RLIKE '[^0-9]'
Note that % is a LIKE wildcard, not a regular expression wildcard. Also, regular expressions match the pattern anywhere it occurs, so no wildcards are needed for the beginning and end of the string.
If you want to find values that are not ten digits, then be explicit:
SELECT *
FROM t1
WHERE ID_UID NOT RLIKE '^[0-9]{10}$'

Related

Match a set of characters from one table into the records of an other table

I have two tables (T-SQL):
tblInvalidCharactersList tblMonthsRecords
+-----------+-----------+ +--------+-------------+
| CodePoint | Character | | RecRef | Name |
+-----------+-----------+ +--------+-------------+
| 38 | & | | 21 | Firs> name |
+-----------+-----------+ +--------+-------------+
| 64 | # | | 89 | #Second name|
+-----------+-----------+ +--------+-------------+
| 62 | > | | 321 | Third n«me |
+-----------+-----------+ +--------+-------------+
| 171 | « | | 381 | Fourth name |
+-----------+-----------+ +--------+-------------+
I want to find those records of the tblMonthsRecords which have at least one (or more) character(s) from the Character column of the tblInvalidCharactersList table.
I tried:
SELECT
[RecRef],
[Name]
FROM [tblMonthsRecords]
WHERE [Name] IN (SELECT Character FROM [tblInvalidCharactersList])
and it returns no results at all.
I even tried the NOT IN clause and as you may guess, returns all records.
The reason why I am not hardcoding the characters list within a LIKE clause is because I want the list to be dynamically updated.
You can think the tblInvalidCharactersList as a characters "black list".
I would use exists:
select mr.*
from tblMonthsRecords mr
where exists (select 1
from tblInvalidCharactersList icl
where charindex(icl.Character, mr.name) > 0
);
You don't seem to care about the actual invalid character.
IN will look for exact character match in Name column it will not search for the character in Name column
Use LIKE operator
select Distinct a.*
from tblMonthsRecords a
join tblInvalidCharactersList b
on a.Name like '%' + b.Character + '%'
Another way using charindex
charindex(b.Character,a.Name) > 0

Efficient Classification of records by common letters in impala

I have a table in impala (TBL1), that contains different names with different number of first common letters. The table contains about 3M records. I would like to add add an new attribute to the table, where each common first letters will have a class. It is the same way as DENSE_RANK work but with dynamic number of first letters. The number of same first letters should not be less than p=3 letters (p = parameter).
Here is an example for the table and the required results:
| ID | Attr1 | New_Attr1 | Some more attribute...
+-------+--------------+-------------+-----------------------
| 1 | ZXA-12 | 1 |
| 2 | YL3300 | 2 |
| 3 | ZXA-123 | 1 |
| 4 | YL3400 | 2 |
| 5 | YL3-aaa | 2 |
| 6 | TSA 789 | 3 |
...
Does this do what you want?
select t.*,
dense_rank() over (order by strleft(attr1, 3)) as newcol
from . . .;
The "3" is your parameter.
As a note: In your example, you seem to have assigned the new value in reverse alphabetic order. Hence, you would want desc for the order by.

Most efficient way to query a word & synonym table

I have a WORDTB table with words and their synonyms: ID, WORD1, WORD2, WORD3, WORD4, WORD5. These words are arranged according to their frequency. When any word is given I want to query and retrieve the most frequent synonym of that particular word which is the word in WORD1 column.
This is the query I tried and it works fine, but I think this is inefficient.
SELECT WORD1
FROM WORDTB
WHERE WORD1='xxxx'
OR WORD2='xxxx'
OR WORD3='xxxx'
OR WORD4='xxxx'
OR WORD5='xxxx'
Can anyone suggest a more efficient way of doing this.
A more scalable solution would be to use a single row for each word.
synonym_words(word_id, synonym_id, word, popularity)
Fields:
word_id: The primary key for a word.
synonym_id: The word_id of the first synonym word.
word: The synonym text.
popularity: The sort order for the list of synonyms, 1 being the most popular.
Sample table data:
word_id | synonym_id | word | popularity
==============================================
1 | 1 | start | 1
2 | 1 | begin | 2
3 | 1 | originate | 3
4 | 1 | initiate | 4
5 | 1 | commence | 5
6 | 1 | create | 6
7 | 1 | startle | 7
8 | 1 | leave | 8
9 | 9 | end | 1
10 | 9 | ending | 2
11 | 9 | last | 3
12 | 9 | goal | 4
13 | 9 | death | 5
14 | 9 | conclusion | 6
15 | 9 | close | 7
16 | 9 | closing | 8
Assuming that the words will not change but their popularity may over time, the query should not break if you were to change the popularity order of the words so that the most popular synonym for a word was changed. You want your query to return the most popular word (popularity = 1) which shares the same synonym_id as the word used in the search.
SQL query:
SELECT word FROM synonym_words
WHERE synonym_id = (SELECT synonym_id FROM synonym_words WHERE word = 'conclusion')
AND popularity = 1

Combine column x to n in OpenRefine

I have a table with an unknown number of columns, and I need to combine all columns after a certain point. Consider the following:
| A | B | C | D | E |
|----|----|---|---|---|
| 24 | 25 | 7 | | |
| 12 | 3 | 4 | | |
| 5 | 5 | 5 | 5 | |
Columns A-C are known, and the information in them correct. But column D to N (an unknown number of columns starting with D) needs to be combined as they are all parts of the same string. How can I combine an unknown number of columns in OpenRefine?
As some columns may have empty cells (the string may be of various lengths) I also need to disregard empty cells.
There is a two step approach to this that should work for you.
From the first column you want to merge (Col D in this case) choose Transpose->Transpose cells across columns into rows
You will be asked to set some options. You'll want to choose 'From Column' D and 'To Column' N. Then choose to transpose into One Column, assign a name to that column, make sure the option to 'Ignore Blank Cells' is checked (should be checked by default. Then click Transpose.
You'll get the values that were previously in cols D-N appearing in rows. e.g.
| A | B | C | D | E | F |
|----|----|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 |
Transposes to:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 4 |
| | | | 5 |
| | | | 6 |
You can then use the dropdown menu from the head of the 'new' column to choose
Edit cells->Join multi-value cells
You'll be asked what character you want to use to separate the characters in the joined cell. Probably in your use case you can delete the joining character and combine the cells without any joining characters. This will give you:
| A | B | C | new |
|----|----|---|-----|
| 1 | 2 | 3 | 456 |

Unique string table in SQL and replacing index values with string values during query

I'm working on an old SQL Server database that has several tables that look like the following:
|-------------|-----------|-------|------------|------------|-----|
| MachineName | AlarmName | Event | AlarmValue | SampleTime | ... |
|-------------|-----------|-------|------------|------------|-----|
| 3 | 180 | 8 | 6.780 | 2014-02-24 | |
| 9 | 67 | 8 | 1.45 | 2014-02-25 | |
| ... | | | | | |
|-------------|-----------|-------|------------|------------|-----|
There is a separate table in the database that only contains unique strings, as well as the index for each unique string. The unique string table looks like this:
|----------|--------------------------------|
| Id | String |
|----------|--------------------------------|
| 3 | MyMachine |
| ... | |
| 8 | High CPU Usage |
| ... | |
| 67 | 404 Error |
| ... | |
|----------|--------------------------------|
Thus, when we want to get something out of the database, we get the respective rows out, then lookup each missing string based on the index value.
What I'm hoping to do is to replace all of the string indexes with the actual values in a single query without having to do post-processing on the query result.
However, I can't figure out how to do this in a single query. Do I need to use multiple JOINs? I've only been able to figure out how to replace a single value by doing something like -
SELECT UniqueString.String AS "MachineName" FROM UniqueString
JOIN Alarm ON Alarm.MachineName = UniqueString.Id
Any help would be much appreciated!
Yes, you can do multiple joins to the UniqueStrings table, but change the order to start with the table you are reporting on and use unique aliases for the joined table. Something like:
SELECT MN.String AS 'MachineName', AN.String as 'AlarmName' FROM Alarm A
JOIN UniqueString MN ON A.MachineName = MN.Id
JOIN UniqueString AN ON A.AlarmName = AN.Id
etc for any other columns