Query SQL Server with wildcard IN the database - sql

I need to build a query where the criteria must match with wildcard in the database.
With an example it will be clearest.
I have a column with a field like this 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX.
The ~ char is a wildcard.
So the following criterias must match :
~63-4-AKS~M
963-4-AKS1M
963-4-AKS~M2RN21AXA150AAA
963-4-AKSAM2RN21AXA150AAA
963-4-AKSCM2RN21AXA150A060C1D1DSDXX
963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX
I've tried so much things my head hurt :(
In the other way (with wildcard from the criteria) no problem, easy. But in this way I cannot find the key.
The problem is when I have a ~ in the field it doesn't match. So here only the first and last match with the following statement :
SELECT myField FROM myTable WHERE myField LIKE REPLACE('%' + myCriteria + '%', '~', '_');

It seems the patterns and the field are adjusted to the left.
If this is indeed the case, with my head bowed (full of sadness), here is a function.
create function is_a_match (#myField varchar(100),#myCriteria varchar(100))
returns bit
as
begin
declare #i int = 0
,#is_a_match bit = 1
,#len_myField int = len(#myField)
,#len_myCriteria int = len(#myCriteria)
,#myField_c char(1)
,#myCriteria_c char(1)
While 1=1
begin
set #i += 1
if #i > #len_myCriteria break
if #i > #len_myField
begin
set #is_a_match = 0
break
end
set #myField_c = substring(#myField ,#i,1)
set #myCriteria_c = substring(#myCriteria,#i,1)
if not (#myField_c = '~' or #myCriteria_c = '~' or #myField_c = #myCriteria_c)
begin
set #is_a_match = 0
break
end
end
return #is_a_match
end
GO
select myCriteria
,dbo.is_a_match (myField,myCriteria) as is_a_match
from (values ('~63-4-AKS~M' )
,('963-4-AKS1M' )
,('963-4-AKS~M2RN21AXA150AAA' )
,('963-4-AKSAM2RN21AXA150AAA' )
,('963-4-AKSCM2RN21AXA150A060C1D1DSDXX' )
,('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX' )
,('963-4-AKS~M2RN21AXA150~~~0C1X1D~~XX' )
,('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XXYY')
) c (myCriteria)
,(values ('963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX' )
) f (myField)
+---------------------------------------+------------+
| myCriteria | is_a_match |
+---------------------------------------+------------+
| ~63-4-AKS~M | 1 |
+---------------------------------------+------------+
| 963-4-AKS1M | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150AAA | 1 |
+---------------------------------------+------------+
| 963-4-AKSAM2RN21AXA150AAA | 1 |
+---------------------------------------+------------+
| 963-4-AKSCM2RN21AXA150A060C1D1DSDXX | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX | 1 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1X1D~~XX | 0 |
+---------------------------------------+------------+
| 963-4-AKS~M2RN21AXA150~~~0C1D1D~~XXYY | 0 |
+---------------------------------------+------------+
You are mixing between the field and the patterns.
The field may not hold wildcards.
E.g.
This is not a match because of the 'A's
963-4-AKS~M2RN21AXA150~~~0C1D1D~~XX
963-4-AKSAM2RN21AXA150AAA

If you could tighten the constraints on the wildcard, you might have a fighting chance here. What I mean is that you generate the valid permutations if a wildcard is presented in the persisted data. Then query the permutations with your existing query.
But if every wildcard has 36 possible options, this becomes exponentially painful.

Related

Teradata SQL select string of multiple capital letters in a text field

Any help would be much appreciated on figuring out how to identify Acronyms within a text field that has mixed upper and lower case letters.
For example, we might have
"we used the BBQ sauce on the Chicken"
I need my query to SELECT "BBQ" and nothing else in the cell.
There could be multiple capitalized string per row
The output should include the uppcase string.
Any ideas are much appreciated!!
This is going to be kind of ugly. I tried to use REGEXP_SPLIT_TO_TABLE to just pull out the all caps words, but couldn't make it work.
I would do it by first using strtok_split_to_table, so each word will end up in it's own row.
First, some dummy data:
create volatile table vt
(id integer,
col1 varchar(20))
on commit preserve rows;
insert into vt
values (1,'foo BAR');
insert into vt
values (2,'fooBAR');
insert into vt
values(3,'blah FOO FOO blah');
We can use strtok_split_to_table on this:
select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t
That will split each value into separate rows, using a space as a delimiter.
Finally, we can compare each of those values to that value in upper case:
select
vt.id,
vt.col1,
tok_key,
tok_num,
tok_value,
case when upper(t.tok_value) = t.tok_value (CASESPECIFIC) then tok_value else '0' end
from
(
select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t
) t
inner join vt
on t.tok_key = vt.id
order by id,tok_num
Taking our lovely sample data, you'll get:
+----+-------------------+---------+---------+-----------+-------------+
| id | col1 | tok_key | tok_num | tok_value | TEST_OUTPUT |
+----+-------------------+---------+---------+-----------+-------------+
| 1 | foo BAR | 1 | 1 | foo | 0 |
| 1 | foo BAR | 1 | 2 | BAR | BAR |
| 2 | fooBAR | 2 | 1 | fooBAR | 0 |
| 3 | blah FOO FOO blah | 3 | 1 | blah | 0 |
| 3 | blah FOO FOO blah | 3 | 2 | FOO | FOO |
| 3 | blah FOO FOO blah | 3 | 3 | FOO | FOO |
| 3 | blah FOO FOO blah | 3 | 4 | blah | 0 |
+----+-------------------+---------+---------+-----------+-------------+
Defining acronyms as all uppercase words with 2 to 5 characters with a '\b[A-Z]{2,5}\b' regex:
WITH cte AS
( -- using #Andrew's Volatile Table
SELECT *
FROM vt
-- only rows containing acronyms
WHERE RegExp_Similar(col1, '.*\b[A-Z]{2,5}\b.*') = 1
)
SELECT
outkey,
tokenNum,
CAST(RegExp_Substr(Token, '[A-Z]*') AS VARCHAR(5)) AS acronym -- 1st uppercase word
--,token
FROM TABLE
( RegExp_Split_To_Table
( cte.id,
cte.col1,
-- split before an acronym, might include additional characters after
-- [^A-Z]*? = any number of non uppercase letters (removed)
-- (?= ) = negative lookahead, i.e. check, but don't remove
'[^A-Z]*?(?=\b[A-Z]{2,5}\b)',
'' -- defaults to case sensitive
) RETURNS
( outKey INT,
TokenNum INT,
Token VARCHAR(30000) -- adjust to match the size of your input column
)
) AS t
WHERE acronym <> ''
I am not 100% sure what are you trying to do but I thing you have many options. I.e.:
Option 1) check if the acronym (like BBQ) exist in the string (basic syntax)
SELECT CHARINDEX ('BBQ',#string)
In this case you would need a table of all know acronyms you want to check for and then loop through each of them to see if there is a match for your string and then return the acronym.
DECLARE #string VARCHAR(100)
SET #string = 'we used the BBQ sauce on the Chicken'
create table : [acrs]
--+--- acronym-----+
--+ BBQ +
--+ IBM +
--+ AMD +
--+ ETC +
--+----------------+
SELECT acronym FROM [acrs] WHERE CHARINDEX ([acronym], #string ) > 0)
This should return : 'BBQ'
Option 2) load up all the upper case characters into a temp table etc. for further logic and processing. I think you could use something like this...
DECLARE #string VARCHAR(100)
SET #string = 'we used the BBQ sauce on the Chicken'
-- make table of all Upper case letters and process individually
;WITH cte_loop(position, acrn)
AS (
SELECT 1, SUBSTRING(#string, 1, 1)
UNION ALL
SELECT position + 1, SUBSTRING(#string, position + 1, 1)
FROM cte_loop
WHERE position < LEN(#string)
)
SELECT position, acrn, ascii(acrn) AS [ascii]
FROM cte_loop
WHERE ascii(acrn) > 64 AND ascii(acrn) < 91 -- see the ASCII table for all codes
This would return table like this:

How to convert string to number based on units

I am trying to change the following strings into their respective numerical values, by identifying the units (millions or billions) and then multiplying accordingly. I believe I am having issues with the variable types but can't seem to find a solution. Any tips?
1.44B to 1,440,000,000
1.564M to 1,564,000
UPDATE [_ParsedXML_Key_Stats]
SET [Value] = CASE
WHEN right(rtrim([_ParsedXML_Key_Stats].[Value]),1) = 'B' And [_ParsedXML_Key_Stats].[NodeName] = 'EBITDA'
THEN substring(rtrim([_ParsedXML_Key_Stats].[Value]),1,len([_ParsedXML_Key_Stats].[Value])-1) * 1000000000
WHEN right(rtrim([_ParsedXML_Key_Stats].[Value]),1) = 'M' And [_ParsedXML_Key_Stats].[NodeName] = 'EBITDA'
THEN substring(rtrim([_ParsedXML_Key_Stats].[Value]),1,len([_ParsedXML_Key_Stats].[Value])-1) * 1000000
ELSE 0
END
With your original query I got a conversion error as the multiplication was treating the decimal value as an int, I guess you might have experienced the same problem.
One remedy that fixed it was to turn the factor into a decimal by adding .0 to it.
If you want to get the number formatted with commas you can use format function like so: FORMAT(CAST(value AS DECIMAL), 'N0') (be sure to specify appropriate length and precision for the decimal type).
Sample test data and output from SQL Fiddle below:
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE [_ParsedXML_Key_Stats] (value VARCHAR(50), NodeName VARCHAR(50));
INSERT [_ParsedXML_Key_Stats] VALUES
('111', 'SOMETHING ELSE'),
('999', 'EBITDA'),
('47.13B', 'EBITDA'),
('1.44B', 'EBITDA'),
('1.564M', 'EBITDA');
WITH cte AS
(
SELECT
Value,
CAST(LEFT([Value],LEN([Value])-1) AS DECIMAL(28,6)) AS newValue,
RIGHT(RTRIM([Value]),1) AS c
FROM [_ParsedXML_Key_Stats]
WHERE [NodeName] = 'EBITDA'
AND RIGHT(RTRIM([Value]),1) IN ('B','M')
)
UPDATE cte
SET [Value] =
CASE
WHEN c = 'B' THEN newValue * 1000000000.0
WHEN c = 'M' THEN newValue * 1000000.0
END;
Query 1:
SELECT *, FORMAT(CAST(Value AS DECIMAL(18,0)),'N0') AS formattedValue
FROM _ParsedXML_Key_Stats
Results:
| value | NodeName | formattedValue |
|--------------------|----------------|----------------|
| 111 | SOMETHING ELSE | 111 |
| 999 | EBITDA | 999 |
| 47130000000.000000 | EBITDA | 47,130,000,000 |
| 1440000000.000000 | EBITDA | 1,440,000,000 |
| 1564000.000000 | EBITDA | 1,564,000 |

Aggregates on the right side of an APPLY cannot reference columns from the left side

I am trying to make some sense of a xBase type database with some 2000 tables. Rather than importing them all into a SQL Server database, I wanted to import the tables one-by-one using a 'SELECT INTO tmpDBF' statement, then extract what I want to know like table structure and value ranges for each of the columns. Then, when I import the next table I want to be able to run the same query against a differently structured tmpDBF table.
I was hoping to do this using a cross apply, but I come up against the above error message.
select cols.column_name 'Name', cols.data_type 'Type', mv.minV 'Minimum'
from information_schema.columns cols
cross apply (select MIN(cols.column_name) minV FROM tmpDBF ) mv
where cols.table_name = 'tmpDBF'
Is there way to restructure the query or did I turn into a dead-end street?
Added on October 6:
Given tmpDBF
Who | Zip
--------|------
Charlie | 97689
Foxtrot | 92143
Delta | 12011
I would like to see the following result
Name | Type | Minimum | Maximum
-----|---------|---------|--------
who | varchar | Charlie | Foxtrot
Zip | int | 12011 | 96789
I realise that the Minimum and Maximum columns need to be cast as varchars.
This is not possible for two reasons.
you cannot dynamically change a column name in a query
you cannot mix multiple datatypes in a single column.
But to get you something similar to what you are looking for you can flip the problem around like this:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.a(c1 INT, c2 INT, c3 DATE);
INSERT INTO dbo.a VALUES(1,2,'2013-04-05'),(4,5,'2010-11-10'),(7,8,'2012-07-09');
Query 1:
SELECT
MIN(c1) c1_min,MAX(c1) c1_max,
MIN(c2) c2_min,MAX(c2) c2_max,
MIN(c3) c3_min,MAX(c3) c3_max
FROM dbo.a;
Results:
| C1_MIN | C1_MAX | C2_MIN | C2_MAX | C3_MIN | C3_MAX |
|--------|--------|--------|--------|------------|------------|
| 1 | 7 | 2 | 8 | 2010-11-10 | 2013-04-05 |
That gives you all the column minima and maxima in a single row. (It's not dynamic yet. Stay with me...)
To make it a little more readable you can use a sort of UNPIVOT like this:
Query 2:
SELECT
CASE X.FN WHEN 1 THEN 'MIN' ELSE 'MAX' END AS FN,
CASE X.FN WHEN 1 THEN c1_min ELSE c1_max END AS c1,
CASE X.FN WHEN 1 THEN c2_min ELSE c2_max END AS c2,
CASE X.FN WHEN 1 THEN c3_min ELSE c3_max END AS c3
FROM(
SELECT
MIN(c1) c1_min,MAX(c1) c1_max,
MIN(c2) c2_min,MAX(c2) c2_max,
MIN(c3) c3_min,MAX(c3) c3_max
FROM dbo.a)AGG
CROSS JOIN (VALUES(1),(2))X(FN)
ORDER BY X.FN;
Results:
| FN | C1 | C2 | C3 |
|-----|----|----|------------|
| MIN | 1 | 2 | 2010-11-10 |
| MAX | 7 | 8 | 2013-04-05 |
Now to make it dynamic we have to build that query on the fly, like this:
Query 3:
DECLARE #cmd NVARCHAR(MAX);
SET #cmd =
'SELECT CASE X.FN WHEN 1 THEN ''MIN'' ELSE ''MAX'' END AS FN'+
(SELECT ',CASE X.FN WHEN 1 THEN '+name+'_min ELSE '+name+'_max END AS '+name
FROM sys.columns WHERE object_id = OBJECT_ID('dbo.a')
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)')+
' FROM(SELECT '+
STUFF((SELECT ',MIN('+name+') '+name+'_min,MAX('+name+') '+name+'_max'
FROM sys.columns WHERE object_id = OBJECT_ID('dbo.a')
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'')+
' FROM dbo.a)AGG CROSS JOIN (VALUES(1),(2))X(FN) ORDER BY X.FN;';
EXEC(#cmd);
Results:
| FN | C1 | C2 | C3 |
|-----|----|----|------------|
| MIN | 1 | 2 | 2010-11-10 |
| MAX | 7 | 8 | 2013-04-05 |
This query takes the columns of the table at runtime, builds the appropriate query dynamically and executes it. It contains the table name ('dbo.a') in three places. If you want it to work with different tables you need to replace all three.
Try something like
select cols.column_name 'Name', cols.data_type 'Type', mv.minV 'Minimum'
from information_schema.columns cols
cross apply (select MIN(cols.column_name) minV FROM tmpDBF
WHERE tmpDBF.CommonCol = cols.CommonCol) mv
where cols.table_name = 'tmpDBF'

SQL for comparison of strings comprised of number and text

I need to compare 2 strings that contains number and possibly text. for example I have this table:
id | label 1 | label 2 |
1 | 12/H | 1 |
2 | 4/A | 41/D |
3 | 13/A | 3/F |
4 | 8/A | 8/B |
..
I need to determine the direction so that if Label 1 < Label2 then Direction is W (with) else it is A (against). So I have to build a view that presents data this way:
id | Direction
1 | A |
2 | W |
3 | A |
4 | W |
..
I'm using postgres 9.2.
WITH x AS (
SELECT id
,split_part(label1, '/', 1)::int AS l1_nr
,split_part(label1, '/', 2) AS l1_txt
,split_part(label2, '/', 1)::int AS l2_nr
,split_part(label2, '/', 2) AS l2_txt
FROM t
)
SELECT id
,CASE WHEN (l1_nr, l1_txt) < (l2_nr, l2_txt)
THEN 'W' ELSE 'A' END AS direction
FROM x;
I split the two parts with split_part() and check with an ad-hoc row type to check which label is bigger.
The cases where both labels are equal or where either one is NULL have not been defined.
The CTE is not necessary, it's just to make it easier to read.
-> sqlfiddle
You can try something like:
SELECT id, CASE WHEN regexp_replace(label_1,'[^0-9]','','g')::numeric <
regexp_replace(label_2,'[^0-9]','','g')::numeric
THEN 'W'
ELSE 'A'
END
FROM table1
regexp_replace deletes all non numeric characters from the string ::numeric converts the string to numeric.
Details here: regexp_replace, pattern matching, CASE WHEN

How to merge multiple rows into one row with filtering rules in SQL Server

I have a table like this:
+---------------+---------------+----------------+---------------------+
| MedicalCardId | DiagnosisType | DiagnosisOrder | Symptom |
+---------------+---------------+----------------+---------------------+
| 1 | Main | 1 | Lung Cancer |
| 1 | Secondary | 1 | High Blood Pressure |
| 1 | Secondary | 2 | Heart Attack |
| 1 | Secondary | 3 | Gastritis |
| 2 | Main | 1 | Diabetes |
| 2 | Secondary | 1 | Kidney Malfunction |
| 3 | Main | 1 | Flu |
+---------------+---------------+----------------+---------------------+
The DiagnosisOrder for each 'Main' DiagnosisType is 1, and for 'Secondary' DiagnosisType of the same MedicalCardId, it restarts to increase from 1.
I would like to merge multiple rows of the same MedicalCardId into a single row, and each Symptom becomes a new column depending on its DiagnosisType and DiagnosisOrder
The query result is expected to be like:
+---------------+-------------+---------------------+-------------------+-------------------+
| MedicalCardId | MainSymptom | SecondarySymptom1 | SecondarySymptom2 | SecondarySymptom3 |
+---------------+-------------+---------------------+-------------------+-------------------+
| 1 | Lung Cancer | High Blood Pressure | Heart Attack | Gastritis |
| 2 | Diabetes | Kidney Malfunction | | |
| 3 | Flu | | | |
+---------------+-------------+---------------------+-------------------+-------------------+
I've tried using PIVOT, but I'm unable to apply it to my practice.
You can try with conditional aggregation -
select MedicalCardId,
max(case when DiagnosisType='Main' then Symptom end) as MainSymptom,
max(case when DiagnosisType='Secondary' and DiagnosisOrder=1 then Symptom end) as SecondarySymptom1,
max(case when DiagnosisType='Secondary' and DiagnosisOrder=2 then Symptom end) as SecondarySymptom2,
max(case when DiagnosisType='Secondary' and DiagnosisOrder=3 then Symptom end) as SecondarySymptom3
from tablename
group by MedicalCardId
I believe you need to create a dynamic pivot table. The reason why you can’t use a normal pivot table query is because you don’t know how many Secondary Symptoms there are and therefore you don’t know how many columns to create. Below is a stored procedure that works. The first step is creating a VARCHAR (#Columns) variable that will be used to store the dynamic column names these will be [Main], [Secondary1], [Secondary2], [Secondary3] so on and so forth (I used a case statement to create the column names per your expected query result). The second step is creating another VARCHAR (#SQL) variable that will contain the pivot table SQL query. In this step you will use string concatenation to put this variable together.
Kris Wenzel has a great tutorial on dynamic pivot tables at essentialsql.com here is the link https://www.essentialsql.com/create-dynamic-pivot-table-sql-server/
Here is the stored procedure.
USE [TestDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author,,Name>
-- Create date: <Create Date,,>
-- Description: <Description,,>
-- =============================================
CREATE PROCEDURE [dbo].[GenerateData]
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
--GATHER PIVOT COLUMNS DYNAMICALLY
DECLARE #Columns as VARCHAR(MAX)
SELECT #Columns =
COALESCE(#Columns + ', ','') + QUOTENAME([Diagnosis])
FROM
(SELECT DISTINCT case when [DiagnosisOrder] = 1 and [DiagnosisType] = 'Main' then 'MainSymptom' else 'SecondarySymptom' + CAST([DiagnosisOrder] AS VARCHAR) end [Diagnosis] FROM [TestDB].[dbo].[test] ) AS B
ORDER BY B.[Diagnosis]
--CREATE SQL QUERY FOR PIVOT TABLE
DECLARE #SQL as VARCHAR(MAX)
SET #SQL = 'SELECT MedicalCardId, ' + #Columns + '
FROM
(
select [MedicalCardId]
,[Diagnosis]
,[Sympton]
from
(
SELECT [MedicalCardId]
,case when [DiagnosisOrder] = 1 and [DiagnosisType] = ''Main'' then ''MainSymptom'' else ''SecondarySymptom'' + CAST([DiagnosisOrder] AS VARCHAR) end [Diagnosis]
,[Sympton]
FROM [TestDB].[dbo].[test]
) A
) t
PIVOT(
MAX([Sympton])
FOR [Diagnosis] IN (' + #Columns + ')
) AS pivot_table order by [MedicalCardId]'
--EXECUTE SQL
EXEC(#SQL)
END
GO