Problem Statement:
I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column "Formula Components"
The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%'
However, I also want to keep the "Square Brackets" if they are to appear in the formula.
To Illustrate,
Input Data:
ID | Formula
------|-------------------------------------------
1 | ([x1] + [x2]) / 100
2 | ([y1] - [x2]) * 100
3 | z1 - z3
4 | [z4] % z3
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100
Desired Output
ID | Formula | FormulaComponents
------|------------------------------------------ |-----------------
1 | ([x1] + [x2]) / 100 | [x1],[x2]
2 | ([y1] - [x2]) * 100 | [y1],[x2]
3 | z1 - z3 | [z1],[z3]
4 | [z4] % z3 | [z4],[z3]
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100 | [x1],[y1],[z3]
As you can see above,
Row 1. The Formula column consists of two variable, so the Formula
components are [x1],[x2]
Row 5. Note that x1 appears two times in the formula; Once as x1
and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of
times in the Formula Column, but should appear only once in the FormulaComponents Column
P.S.: The Order of the variables appearing in the "FormulaComponents" column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on
To summarize: I want to write a SELECT statement in T-SQL that will create this new column.
You can split the string using string_split() and then carefully reaggregate the results:
select *
from t cross apply
(select string_agg('[' + value + ']', ',') as components
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) s
) s;
Here is a db<>fiddle.
This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.
EDIT:
SQL Server 2016 has string_split() but not string_agg(). That fan be replaced with XML "stuff":
You can split the string using string_split() and then carefully reaggregate the results:
select *
from t cross apply
(select stuff( (select distinct ',[' + value + ']'
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) t
order by 1
for xml path ('')
), 1, 1, '') as components
) s;
I am trying to come up with an SQLite query which would retrieve all the row values between two given values (A and B) in the query,upon meeting a condition.
if (value B given is greater than the maximum value of B in the table):
- retrieve all values between A and B
Sample Table: inventory
Prod_name | model | location |
tesla | "5.6.1" | CA
toyota | "4.7.1" | WA
kia | "6.8.1" | MD
tesla | "2.6.2" | CA
chev | "7.8.4" | AZ
Input given : model between ("5.0.0" to "8.2.0")
Output : (telsa,5.6.1,CA),(kia,6.8.1,MD) , (chev,7.8.4,AZ)
Input given : model between ("5.0.0" to "6.9.0")
Output: Query should not run as "7.8.4" > "6.9.0"
i.e ( the max value in the table is greater than the upper limit of input query.
Also to note is the model name is TEXT format. I need help to retrieving
I have tried "CASE" statements of sqlite but was not able to retrieve
multiple columns in the subquery.
select
case
when (select 1000000 * replace(model, '.', 'x') +
1000 * replace(substr(model, instr(model, '.') + 1), '.', 'x') +
replace(model, '.', '000') % 1000 as md from inventory ORDER BY md
DESC LIMIT 1) > (select 1000000 * replace('5.0.0', '.', 'x') +
1000 * replace(substr('5.0.0', instr('5.0.0', '.') + 1), '.', 'x') +
replace('5.0.0', '.', '000') % 1000)
THEN (select model from inventory where
1000000 * replace(model, '.', 'x') +
1000 * replace(substr(model, instr(model, '.') + 1), '.', 'x') +
replace(model, '.', '000') % 1000
between
1000000 * replace('5.0.0' '.', 'x') +
1000 * replace(substr(''5.0.0'', instr('5.0.0', '.') + 1), '.',
'x') +
replace('5.0.0', '.', '000') % 1000
and
1000000 * replace('8.5.0', '.', 'x') +
1000 * replace(substr('8.5.0', instr('8.5.0', '.') + 1), '.', 'x') +
replace('8.5.0', '.', '000') % 1000 )
END from inventory
I believe that the following will do what you want :-
/* Query using model in n.n.n format */
SELECT * FROM inventory
WHERE
((1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000)
BETWEEN
(
SELECT 1000000 * substr('5.0.0',1,instr('5.0.0','.') -1)
+ (1000 * replace(substr('5.0.0',instr('5.0.0','.') + 1),'.','x'))
+ replace('5.0.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
/* MAX COndition */
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
>
(
SELECT MAX(((1000000 * substr(model,1,instr(model,'.')-1))
+ (1000 * replace(substr(model,instr(model,'.') + 1),'.','x'))
+ replace(model,'.','000') % 1000))
FROM inventory
)
ORDER BY
(1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000
;
I am curious to know how this could be used in the current solution.
Or if you have any other approach ?
I would suggest that you are grossly over-complicating matters by using a model that is formatted as n.n.n.
If you were to convert that model to an integer value matters could be greatly simplified.
If you really want to keep the model as n.n.n then perhaps ALTER the table to add a column that stores the model as an integer. e.g. you could, as a one of, use :-
ALTER TABLE inventory ADD COLUMN model_value INTEGER DEFAULT -1;
This adds the column model_value
The ALTER could be followed by a mass UPDATE to then set the values for existing rows e.g. :-
UPDATE inventory SET model_value =
(1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000;
To circumvent needing to change the insert and pre-calculate the model_value, you could add an AFTER INSERT TRIGGER e.g. :-
CREATE TRIGGER IF NOT EXISTS inventory_generate_modelvalue AFTER INSERT ON inventory
BEGIN
UPDATE inventory
SET model_value = (1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000
WHERE model_value < 0 OR model_value IS NULL
;
END;
Note that if you currently use INSERT without specifying the columns, then the insert would have to be adjusted to specify the columns to be used for the insert, OR you could hard code -1 or NULL for the new column.
The query would then be simpler as :-
/* Query using model_value) */
SELECT * FROM inventory
WHERE model_value
BETWEEN
(
SELECT 1000000 * substr('5.0.0',1,instr('5.0.0','.') -1)
+ (1000 * replace(substr('5.0.0',instr('5.0.0','.') + 1),'.','x'))
+ replace('5.0.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
>
(SELECT MAX(model_value) FROM inventory)
ORDER BY model_value
;
If you wanted convert the model value to n.n.n format you could use base this upon :-
SELECT prod_name,
CAST (model_value / 1000000 AS TEXT)
||'.'
|| CAST((model_value % 1000000) / 1000 AS TEXT)
||'.'
||CAST(model_value % 1000 AS TEXT)
AS model,
location
FROM inventory;
Of course if you had a function within your program or used integer values rather than n.n.n then matters would be even simpler.
Testing
The following code was used for testing the above :-
DROP TABLE IF EXISTS inventory;
DROP TRIGGER IF EXISTS inventory_generate_modelvalue;
CREATE TABLE IF NOT EXISTS inventory (prod_name TEXT ,model TEXT,location TEXT);
INSERT INTO inventory VALUES ('tesla','5.6.1','CA'),('toyota','4.7.1','WA'),('kia','6.8.1','MD'),('tesla','2.6.2','CA'),('chev','7.8.4','AZ') ;
/* Add new column for model as an integer value */
ALTER TABLE inventory ADD COLUMN model_value INTEGER DEFAULT -1;
/* Update existing data for new column */
UPDATE inventory SET model_value =
(1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000;
CREATE TRIGGER IF NOT EXISTS inventory_generate_modelvalue AFTER INSERT ON inventory
BEGIN
UPDATE inventory
SET model_value = (1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000
WHERE model_value < 0 OR model_value IS NULL
;
END;
-- INSERT INTO inventory VALUES('my new model','5.0.1','AA',null),('another','0.999.999','ZZ',-1);
SELECT * FROM inventory;
/* Query using model in n.n.n format */
SELECT * FROM inventory
WHERE
((1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000)
BETWEEN
(
SELECT 1000000 * substr('5.0.0',1,instr('5.0.0','.') -1)
+ (1000 * replace(substr('5.0.0',instr('5.0.0','.') + 1),'.','x'))
+ replace('5.0.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
/* MAX COndition */
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
>
(
SELECT MAX(((1000000 * substr(model,1,instr(model,'.')-1))
+ (1000 * replace(substr(model,instr(model,'.') + 1),'.','x'))
+ replace(model,'.','000') % 1000))
FROM inventory
)
ORDER BY
(1000000 * substr(model,1,instr(model,'.')-1)) +
(1000 * replace(substr(model,instr(model,'.') + 1),'.','x')) +
replace(model,'.','000') % 1000
;
/* Query using model_value) */
SELECT * FROM inventory
WHERE model_value
BETWEEN
(
SELECT 1000000 * substr('5.0.0',1,instr('5.0.0','.') -1)
+ (1000 * replace(substr('5.0.0',instr('5.0.0','.') + 1),'.','x'))
+ replace('5.0.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
AND
(
SELECT 1000000 * substr('8.5.0',1,instr('8.5.0','.') -1)
+ (1000 * replace(substr('8.5.0',instr('8.5.0','.') + 1),'.','x'))
+ replace('8.5.0','.','000') % 1000
)
>
(SELECT MAX(model_value) FROM inventory)
ORDER BY model_value
;
SELECT prod_name,
CAST (model_value / 1000000 AS TEXT)
||'.'
|| CAST((model_value % 1000000) / 1000 AS TEXT)
||'.'
||CAST(model_value % 1000 AS TEXT)
AS model,
location
FROM inventory;
I have a column with data as given below -
I want to remove any [space] before and after the first instance of the '-' character in the data so that I can get the following cleansed data -
How to write this as a SQL Query ?
Try this one
CREATE TABLE Spaces(
Value VARCHAR(45)
);
INSERT INTO Spaces VALUES
('B2555 - 30...'),
('Babc30 - 40 ...'),
('B5- 50..'),
('B6AfG066ML -60..');
SELECT CASE WHEN CHARINDEX(' -', Value) > 0 THEN
STUFF(Value, CHARINDEX(' -', Value), 1, '')
ELSE
Value
End Result
FROM
(
SELECT CASE WHEN CHARINDEX('- ', Value) > 0 THEN
STUFF(Value, CHARINDEX('- ', Value) + 1, 1, '')
ELSE
Value
End Value
FROM
(
SELECT CASE WHEN CHARINDEX(' - ', Value) > 0 THEN
STUFF(Value, CHARINDEX(' - ', Value), 1, '')
ELSE
Value
End Value
FROM Spaces
) T1
) T2;
Returns:
+------------------------+
| Result |
+------------------------+
| B2555-30- ABC - ABC... |
| Babc30-40 ... |
| B5-50.. |
| B6AfG066ML-60.. |
+------------------------+
Demo
Here's a another option for you.
This is assuming the following:
Only remove lending or trailing space around the first instance of '-', all others are to be preserved.
Only accounts for 1 and only 1 leading or trailing space.
Could have already "cleaned" data.
Give this a try:
DECLARE #TestData TABLE
(
[StringData] NVARCHAR(100)
);
INSERT INTO #TestData (
[StringData]
)
VALUES ( 'ADFADSF- ASDFSADF - Q343243498' )
, ( 'ABC - EFSSADF - 2345234532' )
, ( 'EFGSADFSA -ASDFSADF - 2342345234' )
, ( 'ASDF34 - ASDLFASDJF - 234234 - 34324' )
, ( 'ABC-123 - 465 - 685' );
SELECT *
, STUFF([StringData]
, CHARINDEX('-', [StringData]) - 1
, 3
, REPLACE(SUBSTRING([StringData], CHARINDEX('-', [StringData]) - 1, 3), ' ', '')
) AS [CleanStringData]
FROM #TestData;
Basically what this does is strip 1 character before '-' to one after out, replacing that will those same character but with spaces removed if they exists.
I have a data set (approx 900k lines) where I need to split a data based on a '(' or ')'. For Example
Table A data:-
> Vendor Is_Active
ABC(1263) 1
efgh (187 1
pqrs 890ag) 1
xyz 1
lmno(488) 1
(9867-12) 1
Output
ID Name
1263 ABC
187 efgh
890ag pqrs
xyz
488 lmno
9867-12
I tried query
SELECT
vendor,
CASE WHEN vendor LIKE '%(%' OR vendor LIKE '%)%'
THEN REPLACE(REPLACE(RIGHT(Vendor, charindex(' ', reverse(vendor)) - 1),'(',''),')','')
END AS 'test'
FROM
tableA
Error :- Msg 536, Level 16, State 4, Line 13 Invalid length parameter
passed to the RIGHT function.
You can remove chars ( and ) then search for number occurrence. Check this query
declare #t table (
vendor varchar(100)
)
insert into #t values
('ABC(1263)')
,('efgh (187')
,('pqrs 890ag)')
,('xyz')
,('lmno(488)')
,('(9867-12)')
select
ID = case when p = 0 then '' else substring(v, p, len(v)) end
, Name = case when p = 0 then v else left(v, p - 1) end
from
#t
cross apply (select v = replace(replace(vendor, '(', ''), ')', '')) q1
cross apply (select p = patindex('%[0-9]%', v)) q2
Output
ID Name
---------------
1263 ABC
187 efgh
890ag pqrs
xyz
488 lmno
9867-12
Hmmm. I'm thinking:
select v.*, v2.name,
replace(stuff(v.x, 1, len(v2.name) + 1, ''), ')', '') as id
from (values ('ABC(1263)'), ('abc'), ('(1234)')) v(x) cross apply
(values (left(v.x, charindex('(', v.x + '(') - 1))) v2(name);
I find apply useful for repetitive string operations.
SELECT
(CASE WHEN Vendor LIKE '%(%)' THEN SUBSTRING(Vendor,CHARINDEX('(',Vendor)+1,CHARINDEX(')',Vendor)-CHARINDEX('(',Vendor)-1)
WHEN Vendor LIKE '%(%' THEN SUBSTRING(Vendor,CHARINDEX('(',Vendor)+1,LEN(Vendor))
WHEN Vendor LIKE '%)%' THEN SUBSTRING(Vendor,CHARINDEX(' ',Vendor)+1,(CHARINDEX(')',Vendor)-CHARINDEX(' ',Vendor))-1)
ELSE ''
END )AS ID ,
(CASE WHEN Vendor LIKE '%(%)' THEN SUBSTRING(Vendor,1,CHARINDEX('(',Vendor)-1)
WHEN Vendor LIKE '%(%' THEN SUBSTRING(Vendor,1,CHARINDEX('(',Vendor)-1)
WHEN Vendor LIKE '%)%' THEN SUBSTRING(Vendor,1,CHARINDEX(' ',Vendor))
ELSE Vendor END ) AS Name
FROM Table A
I'm trying to count how many words there are in a string in SQL.
Select ("Hello To Oracle") from dual;
I want to show the number of words. In the given example it would be 3 words though there could be more than one space between words.
You can use something similar to this. This gets the length of the string, then substracts the length of the string with the spaces removed. By then adding the number one to that should give you the number of words:
Select length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
See SQL Fiddle with Demo
If you use the following data:
CREATE TABLE yourtable
(yourCol varchar2(15))
;
INSERT ALL
INTO yourtable (yourCol)
VALUES ('Hello To Oracle')
INTO yourtable (yourCol)
VALUES ('oneword')
INTO yourtable (yourCol)
VALUES ('two words')
SELECT * FROM dual
;
And the query:
Select yourcol,
length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
The result is:
| YOURCOL | NUMBOFWORDS |
---------------------------------
| Hello To Oracle | 3 |
| oneword | 1 |
| two words | 2 |
Since you're using Oracle 11g it's even simpler-
select regexp_count(your_column, '[^ ]+') from your_table
Here is a sqlfiddle demo
If your requirement is to remove multiple spaces too, try this:
Select length('500 text Oracle Parkway Redwood Shores CA') - length(REGEXP_REPLACE('500 text Oracle Parkway Redwood Shores CA',
'( ){1,}', '')) NumbofWords
from dual;
Since I have used the dual table you can test this directly in your own development environment.
DECLARE #List NVARCHAR(MAX) = ' ab a
x'; /*Your column/Param*/
DECLARE #Delimiter NVARCHAR(255) = ' ';/*space*/
DECLARE #WordsTable TABLE (Data VARCHAR(1000));
/*convert by XML the string to table*/
INSERT INTO #WordsTable(Data)
SELECT Data = y.i.value('(./text())[1]', 'VARCHAR(1000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
/*Your total words*/
select count(*) NumberOfWords
from #WordsTable
where Data is not null;
/*words list*/
select *
from #WordsTable
where Data is not null
/from this Logic you can continue alon/