Teradata SQL select string of multiple capital letters in a text field - sql

Any help would be much appreciated on figuring out how to identify Acronyms within a text field that has mixed upper and lower case letters.
For example, we might have
"we used the BBQ sauce on the Chicken"
I need my query to SELECT "BBQ" and nothing else in the cell.
There could be multiple capitalized string per row
The output should include the uppcase string.
Any ideas are much appreciated!!

This is going to be kind of ugly. I tried to use REGEXP_SPLIT_TO_TABLE to just pull out the all caps words, but couldn't make it work.
I would do it by first using strtok_split_to_table, so each word will end up in it's own row.
First, some dummy data:
create volatile table vt
(id integer,
col1 varchar(20))
on commit preserve rows;
insert into vt
values (1,'foo BAR');
insert into vt
values (2,'fooBAR');
insert into vt
values(3,'blah FOO FOO blah');
We can use strtok_split_to_table on this:
select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t
That will split each value into separate rows, using a space as a delimiter.
Finally, we can compare each of those values to that value in upper case:
select
vt.id,
vt.col1,
tok_key,
tok_num,
tok_value,
case when upper(t.tok_value) = t.tok_value (CASESPECIFIC) then tok_value else '0' end
from
(
select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t
) t
inner join vt
on t.tok_key = vt.id
order by id,tok_num
Taking our lovely sample data, you'll get:
+----+-------------------+---------+---------+-----------+-------------+
| id | col1 | tok_key | tok_num | tok_value | TEST_OUTPUT |
+----+-------------------+---------+---------+-----------+-------------+
| 1 | foo BAR | 1 | 1 | foo | 0 |
| 1 | foo BAR | 1 | 2 | BAR | BAR |
| 2 | fooBAR | 2 | 1 | fooBAR | 0 |
| 3 | blah FOO FOO blah | 3 | 1 | blah | 0 |
| 3 | blah FOO FOO blah | 3 | 2 | FOO | FOO |
| 3 | blah FOO FOO blah | 3 | 3 | FOO | FOO |
| 3 | blah FOO FOO blah | 3 | 4 | blah | 0 |
+----+-------------------+---------+---------+-----------+-------------+

Defining acronyms as all uppercase words with 2 to 5 characters with a '\b[A-Z]{2,5}\b' regex:
WITH cte AS
( -- using #Andrew's Volatile Table
SELECT *
FROM vt
-- only rows containing acronyms
WHERE RegExp_Similar(col1, '.*\b[A-Z]{2,5}\b.*') = 1
)
SELECT
outkey,
tokenNum,
CAST(RegExp_Substr(Token, '[A-Z]*') AS VARCHAR(5)) AS acronym -- 1st uppercase word
--,token
FROM TABLE
( RegExp_Split_To_Table
( cte.id,
cte.col1,
-- split before an acronym, might include additional characters after
-- [^A-Z]*? = any number of non uppercase letters (removed)
-- (?= ) = negative lookahead, i.e. check, but don't remove
'[^A-Z]*?(?=\b[A-Z]{2,5}\b)',
'' -- defaults to case sensitive
) RETURNS
( outKey INT,
TokenNum INT,
Token VARCHAR(30000) -- adjust to match the size of your input column
)
) AS t
WHERE acronym <> ''

I am not 100% sure what are you trying to do but I thing you have many options. I.e.:
Option 1) check if the acronym (like BBQ) exist in the string (basic syntax)
SELECT CHARINDEX ('BBQ',#string)
In this case you would need a table of all know acronyms you want to check for and then loop through each of them to see if there is a match for your string and then return the acronym.
DECLARE #string VARCHAR(100)
SET #string = 'we used the BBQ sauce on the Chicken'
create table : [acrs]
--+--- acronym-----+
--+ BBQ +
--+ IBM +
--+ AMD +
--+ ETC +
--+----------------+
SELECT acronym FROM [acrs] WHERE CHARINDEX ([acronym], #string ) > 0)
This should return : 'BBQ'
Option 2) load up all the upper case characters into a temp table etc. for further logic and processing. I think you could use something like this...
DECLARE #string VARCHAR(100)
SET #string = 'we used the BBQ sauce on the Chicken'
-- make table of all Upper case letters and process individually
;WITH cte_loop(position, acrn)
AS (
SELECT 1, SUBSTRING(#string, 1, 1)
UNION ALL
SELECT position + 1, SUBSTRING(#string, position + 1, 1)
FROM cte_loop
WHERE position < LEN(#string)
)
SELECT position, acrn, ascii(acrn) AS [ascii]
FROM cte_loop
WHERE ascii(acrn) > 64 AND ascii(acrn) < 91 -- see the ASCII table for all codes
This would return table like this:

Related

Replace values in a column for all rows

I have a column with entries like:
column:
156781
234762
780417
and would like to have the following:
column:
0000156781
0000234762
0000780417
For this I use the following query:
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as a from table)
However, I don't know how to replace the values in the whole column.
I already tried with:
UPDATE table
SET column= (
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as columnfrom table)
But I get the following error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The answer to your question is going to depend on the data type of your column. If it is a text column for example VARCHAR then you can modify the value in the table. If it is a number type such as INT it is the value and not the characters which is stored.
We can also express this by saying that "0" + "1" = "01" whilst 0 + 1 = 1.
In either case we can format the value in a query.
create table numberz(
val1 int,
val2 varchar(10));
insert into numberz values
(156781,'156781'),
(234762,'234762'),
(780417,'780417');
/* required format
0000156781
0000234762
0000780417
*/
select * from numberz;
GO
val1 | val2
-----: | :-----
156781 | 156781
234762 | 234762
780417 | 780417
UPDATE numberz
SET val1 = isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1),
val2 = isnull(
replicate('0',
10 - len(val2)),'')
+ rtrim(val2);
GO
3 rows affected
select * from numberz;
GO
val1 | val2
-----: | :---------
156781 | 0000156781
234762 | 0000234762
780417 | 0000780417
select isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1)
from numberz
GO
| (No column name) |
| :--------------- |
| 0000156781 |
| 0000234762 |
| 0000780417 |
db<>fiddle here
Usually, when we need to show values in specificity format these processes are performed using the CASE command or with other functions on the selection field list, mean without updating. In such cases, we can change our format to any format and anytime with changing functions. As dynamic fields.
For example:
select id, lpad(id::text, 6, '0') as format_id from test.test_table1
order by id
Result:
id format_id
-------------
1 000001
2 000002
3 000003
4 000004
5 000005
Maybe you really need an UPDATE, so I wrote a sample query for an UPDATE command too.
update test.test_table1
set
id = lpad(id::text, 6, '0');

SQL Server - Ordering Combined Number Strings Prior To Column Insert

I have 2 string columns (thousands of rows) with ordered numbers in each string (there can be zero to ten numbers in each string). Example:
+------------------+------------+
| ColString1 | ColString2 |
+------------------+------------+
| 1;3;5;12; | 4;6' |
+------------------+------------+
| 1;5;10 | 2;26; |
+------------------+------------+
| 4;7; | 3; |
+------------------+------------+
The end result is to combine these 2 columns, sort the numbers in
ascending order and then put each number into individual columns (smallest, 2nd smallest etc).
e.g. Colstring1 is 1;3;5;12; and ColString2 is 4;6; needs to return 1;3;4;5;6;12; which I then use xml to allocated into columns.
Everthing works fine using xml apart from the step to order the numbers (i.e I'm getting 1;3;5;12;4;6; when I combine the strings i.e. not in ascending order).
I've tried put them into a JSON array first to order, thinking I could do a top[1] etc but that did not work.
Any help on how to combine the 2 columns and order them before inserting into columns:
Steps so far:
Example data:
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ColString1 VARCHAR(50), ColString2 VARCHAR(50));
INSERT INTO #tbl (ColString1, ColString2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;'),
('14;', '3;8;');
XML Approach (Combines strings and puts into columns but not in the correct order):
;WITH Split_Numbers (xmlname)
AS
(
SELECT
CONVERT(XML,'<Names><name>'
+ REPLACE ( LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1),';', '</name><name>') + '</name></Names>') AS xmlname
FROM #tbl
)
SELECT
xmlname.value('/Names[1]/name[1]','int') AS Number1,
xmlname.value('/Names[1]/name[2]','int') AS Number2,
xmlname.value('/Names[1]/name[3]','int') AS Number3,
xmlname.value('/Names[1]/name[4]','int') AS Number4,
xmlname.value('/Names[1]/name[5]','int') AS Number5
--etc for additional columns
FROM Split_Numbers
Current Output: numbers not in correct order,
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 5 | 12 | 4 |
| 1 | 5 | 10 | 2 | 26 |
| 14 | 3 | 8 | NULL | NULL |
+---------+---------+---------+---------+---------+
Desired Output: numbers in ascending order.
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 4 | 5 | 6 |
| 1 | 2 | 5 | 10 | 26 |
| 3 | 8 | 14 | NULL | NULL |
+---------+---------+---------+---------+---------+
JSON Approach: combines the columns into a JSON array but I still can't order correctly when in JSON format.
REPLACE ( CONCAT('[', LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1), ']') ,';',',')
Any help will be greatly appreciated whether there is a way to order the xml or JSON string prior to entry. Happy to consider an alternative way if there is an easier solution.
You can use string_agg() and string_split():
select t.*, newstring
from t cross apply
(select string_agg(value, ',') order by (value) as newstring
from (select s1.value
from unnest(colstring1, ',') s1
union all
select s2.value
from unnest(colstring2, ',') s2
) s
) s;
That said, you should probably put your effort into fixing the data model. Storing numbers in strings is bad. Storing multiple values in a string is bad, bad. If the numbers are foreign references to other tables, that is bad, bad, bad, bad, bad.
While waiting for a DDL and sample data population, etc., here is a conceptual example for you. It is using XQuery and its FLWOR expression.
CTE does most of the heavy lifting:
Concatenates both columns values into one string. CONCAT() function protects against NULL values.
Converts it into XML data type.
Sorts XML elements by converting their values to int data type in the FLWOR expression.
Filters out XML elements with no legit values.
The rest is trivial.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(100), col2 VARCHAR(100));
INSERT INTO #tbl (col1, col2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ';';
;WITH rs AS
(
SELECT *
, CAST('<root><r><![CDATA[' +
REPLACE(CONCAT(col1, col2), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML).query('<root>
{
for $x in /root/r[text()]
order by xs:int($x)
return $x
}
</root>') AS sortedXML
FROM #tbl
)
SELECT ID
, c.value('(r[1]/text())[1]','INT') AS Number1
, c.value('(r[2]/text())[1]','INT') AS Number2
, c.value('(r[3]/text())[1]','INT') AS Number3
-- continue with the rest of the columns
FROM rs CROSS APPLY sortedXML.nodes('/root') AS t(c);
Output
+----+---------+---------+---------+
| ID | Number1 | Number2 | Number3 |
+----+---------+---------+---------+
| 1 | 1 | 3 | 4 |
| 2 | 1 | 2 | 5 |
+----+---------+---------+---------+

Query SQL Server with wildcard IN the database

I need to build a query where the criteria must match with wildcard in the database.
With an example it will be clearest.
I have a column with a field like this 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX.
The ~ char is a wildcard.
So the following criterias must match :
~63-4-AKS~M
963-4-AKS1M
963-4-AKS~M2RN21AXA150AAA
963-4-AKSAM2RN21AXA150AAA
963-4-AKSCM2RN21AXA150A060C1D1DSDXX
963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX
I've tried so much things my head hurt :(
In the other way (with wildcard from the criteria) no problem, easy. But in this way I cannot find the key.
The problem is when I have a ~ in the field it doesn't match. So here only the first and last match with the following statement :
SELECT myField FROM myTable WHERE myField LIKE REPLACE('%' + myCriteria + '%', '~', '_');
It seems the patterns and the field are adjusted to the left.
If this is indeed the case, with my head bowed (full of sadness), here is a function.
create function is_a_match (#myField varchar(100),#myCriteria varchar(100))
returns bit
as
begin
declare #i int = 0
,#is_a_match bit = 1
,#len_myField int = len(#myField)
,#len_myCriteria int = len(#myCriteria)
,#myField_c char(1)
,#myCriteria_c char(1)
While 1=1
begin
set #i += 1
if #i > #len_myCriteria break
if #i > #len_myField
begin
set #is_a_match = 0
break
end
set #myField_c = substring(#myField ,#i,1)
set #myCriteria_c = substring(#myCriteria,#i,1)
if not (#myField_c = '~' or #myCriteria_c = '~' or #myField_c = #myCriteria_c)
begin
set #is_a_match = 0
break
end
end
return #is_a_match
end
GO
select myCriteria
,dbo.is_a_match (myField,myCriteria) as is_a_match
from (values ('~63-4-AKS~M' )
,('963-4-AKS1M' )
,('963-4-AKS~M2RN21AXA150AAA' )
,('963-4-AKSAM2RN21AXA150AAA' )
,('963-4-AKSCM2RN21AXA150A060C1D1DSDXX' )
,('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX' )
,('963-4-AKS~M2RN21AXA150~~~0C1X1D~~XX' )
,('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XXYY')
) c (myCriteria)
,(values ('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX' )
) f (myField)
+---------------------------------------+------------+
| myCriteria | is_a_match |
+---------------------------------------+------------+
| ~63-4-AKS~M | 1 |
+---------------------------------------+------------+
| 963-4-AKS1M | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150AAA | 1 |
+---------------------------------------+------------+
| 963-4-AKSAM2RN21AXA150AAA | 1 |
+---------------------------------------+------------+
| 963-4-AKSCM2RN21AXA150A060C1D1DSDXX | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1X1D~~XX | 0 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XXYY | 0 |
+---------------------------------------+------------+
You are mixing between the field and the patterns.
The field may not hold wildcards.
E.g.
This is not a match because of the 'A's
963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX
963-4-AKSAM2RN21AXA150AAA
If you could tighten the constraints on the wildcard, you might have a fighting chance here. What I mean is that you generate the valid permutations if a wildcard is presented in the persisted data. Then query the permutations with your existing query.
But if every wildcard has 36 possible options, this becomes exponentially painful.

SQL for comparison of strings comprised of number and text

I need to compare 2 strings that contains number and possibly text. for example I have this table:
id | label 1 | label 2 |
1 | 12/H | 1 |
2 | 4/A | 41/D |
3 | 13/A | 3/F |
4 | 8/A | 8/B |
..
I need to determine the direction so that if Label 1 < Label2 then Direction is W (with) else it is A (against). So I have to build a view that presents data this way:
id | Direction
1 | A |
2 | W |
3 | A |
4 | W |
..
I'm using postgres 9.2.
WITH x AS (
SELECT id
,split_part(label1, '/', 1)::int AS l1_nr
,split_part(label1, '/', 2) AS l1_txt
,split_part(label2, '/', 1)::int AS l2_nr
,split_part(label2, '/', 2) AS l2_txt
FROM t
)
SELECT id
,CASE WHEN (l1_nr, l1_txt) < (l2_nr, l2_txt)
THEN 'W' ELSE 'A' END AS direction
FROM x;
I split the two parts with split_part() and check with an ad-hoc row type to check which label is bigger.
The cases where both labels are equal or where either one is NULL have not been defined.
The CTE is not necessary, it's just to make it easier to read.
-> sqlfiddle
You can try something like:
SELECT id, CASE WHEN regexp_replace(label_1,'[^0-9]','','g')::numeric <
regexp_replace(label_2,'[^0-9]','','g')::numeric
THEN 'W'
ELSE 'A'
END
FROM table1
regexp_replace deletes all non numeric characters from the string ::numeric converts the string to numeric.
Details here: regexp_replace, pattern matching, CASE WHEN

Selecting a record based on integer being in an array field

I have a database of houses. Within the houses mssql database record is a field called areaID. A house could be in multiple areas so an entry could be as follows in the database:
+---------+----------------------+-----------+-------------+-------+
| HouseID | AreaID | HouseType | Description | Title |
+---------+----------------------+-----------+-------------+-------+
| 21 | 17, 32, 53 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 23 | 23, 73 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 24 | 53, 12, 153, 72, 153 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
| 23 | 23, 53 | B | data | data |
+---------+----------------------+-----------+-------------+-------+
If I open a page that called for houses only in area 53 how would I search for it. I know in MySQL you can use find_in_SET but I am using Microsoft SQL Server 2005.
If your formatting is EXACTLY
N1, N2 (e.g.) one comma and space between each N
Then use this WHERE clause
WHERE ', ' + AreaID + ',' LIKE '%, 53,%'
The addition of the prefix and suffix makes every number, anywhere in the list, consistently wrapped by comma-space and suffixed by comma. Otherwise, you may get false positives with 53 appearing in part of another number.
Note
A LIKE expression will be anything but fast, since it will always scan the entire table.
You should consider normalizing the data into two tables:
Tables become
House
+---------+----------------------+----------+
| HouseID | HouseType | Description | Title |
+---------+----------------------+----------+
| 21 | B | data | data |
| 23 | B | data | data |
| 24 | B | data | data |
| 23 | B | data | data |
+---------+----------------------+----------+
HouseArea
+---------+-------
| HouseID | AreaID
+---------+-------
| 21 | 17
| 21 | 32
| 21 | 53
| 23 | 23
| 23 | 73
..etc
Then you can use
select * from house h
where exists (
select *
from housearea a
where h.houseid=a.houseid and a.areaid=53)
2 options, change the id's of AreaId so that you can use the & operator OR create a table that links the House and Area's....
What datatype is AreaID?
If it's a text field you could something like
WHERE (
AreaID LIKE '53,%' -- Covers: multi number seq w/ 53 at beginning
OR AreaID LIKE '% 53,%' -- Covers: multi number seq w/ 53 in middle
OR AreaID LIKE '% 53' -- Covers: multi number seq w/ 53 at end
OR AreaID = '53' -- Covers: single number seq w/ only 53
)
Note: I haven't used SQL-Server in some time, so I'm not sure about the operators. PostgreSQL has a regex function, which would be better at condensing that WHERE statement. Also, I'm not sure if the above example would include numbers like 253 or 531; it shouldn't but you still need to verify.
Furthermore, there are a bunch of functions that iterate through arrays, so storing it as an array vs text might be better. Finally, this might be a good example to use a stored procedure, so you can call your homebrewed function instead of cluttering your SQL.
Use a Split function to convert comma-separated values into rows.
CREATE TABLE Areas (AreaID int PRIMARY KEY);
CREATE TABLE Houses (HouseID int PRIMARY KEY, AreaIDList varchar(max));
GO
INSERT INTO Areas VALUES (84);
INSERT INTO Areas VALUES (24);
INSERT INTO Areas VALUES (66);
INSERT INTO Houses VALUES (1, '84,24,66');
INSERT INTO Houses VALUES (2, '24');
GO
CREATE FUNCTION dbo.Split (#values varchar(512)) RETURNS table
AS
RETURN
WITH Items (Num, Start, [Stop]) AS (
SELECT 1, 1, CHARINDEX(',', #values)
UNION ALL
SELECT Num + 1, [Stop] + 1, CHARINDEX(',', #values, [Stop] + 1)
FROM Items
WHERE [Stop] > 0
)
SELECT Num, SUBSTRING(#values, Start,
CASE WHEN [Stop] > 0 THEN [Stop] - Start ELSE LEN(#values) END) Value
FROM Items;
GO
CREATE VIEW dbo.HouseAreas
AS
SELECT h.HouseID, s.Num HouseAreaNum,
CASE WHEN s.Value NOT LIKE '%[^0-9]%'
THEN CAST(s.Value AS int)
END AreaID
FROM Houses h
CROSS APPLY dbo.Split(h.AreaIDList) s
GO
SELECT DISTINCT h.HouseID, ha.AreaID
FROM Houses h
INNER JOIN HouseAreas ha ON ha.HouseID = h.HouseID
WHERE ha.AreaID = 24