Oracle RegEx in a Cast Procedure - sql

I have a Cast Procedure for a table with "raw" data. Any time a record comes from any of our locations into the raw table, my procedure "cleans" the data and loads it into a new table. The original raw table is all varchars and my procedure converts date and number fields to the proper data types. From the clean table, a Java program selects any new records on a daily basis and FTPs them off in a file to another dept. Have just learned that a few of the fields accept input from users and on a rare occasion, someone uses a pipe in what they input. A pipe symbol happens to be the delimiter that the other dept is using and whenever a pipe shows up in the middle of a field, it throws a wrench on their end.
I've never used REGEX or REGEXP_REPLACE in Oracle before. There are only three fields where the users can input data - MISTINTCOMMENT, PALETTE, COLORID. How do I use REGEX or REGEXP_REPLACE to replace any pipes with a space? Do I want to do it on each field? Or is this something I should "wrap around" the entire statement (in case there's a field I missed where someone might be able to input a pipe)?
Here is the portion of the procedure where the Values are cleaned and inserted into new table. How to best use RegEx with this?
VALUES (CASE
WHEN THECOSTCENTER IS NOT NULL
THEN THECOSTCENTER
ELSE (SUBSTR(TRIM(THESENDING_QMGR), -6))
END,
CASE
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 5) = 'PDPOS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 8, 4))
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 8) = 'PROD_POS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 9, 4))
ELSE TO_NUMBER(NVL(THESTORENBR,'0'))
END,
TO_NUMBER(NVL(THECONTROLNBR,'0')), TO_NUMBER(NVL(THELINENBR,'0')), THESALESNBR, TO_NUMBER(NVL(THEQTYMISTINT,'0')), THEREASONCODE, THEMISTINTCOMMENT,
THESIZECODE, THETINTERMODEL, THETINTERSERIALNBR, TO_NUMBER(NVL(THEEMPNBR,'0')), TO_DATE(THETRANDATE,'YYYY-MM-DD'), THETRANTIME, THECDSADLFLD,
THEPRODNBR, THEPALETTE, THECOLORID, TO_DATE(THEINITTRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONSMISTINTED,'0'),'999999999.99'), THEUPDATEEMPNBR,
TO_DATE(THEUPDATETRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONS,'0'),'999999999.99'), THEFORMSOURCE, THEUPDATETRANTIME, THESOURCEIND,
TO_DATE(THECANCELDATE,'YYYY-MM-DD'), THECOLORTYPE, TO_NUMBER(NVL(THECANCELEMPNBR,'0')), TO_BOOLEAN(THENEEDEXTRACTED), TO_BOOLEAN(THEMISTINTMQXTR),
THEDATASOURCE, THETRANGUID, TO_NUMBER(NVL(THETERMNBR,'0')), TO_NUMBER(NVL(THETRANNBR,'0')), TO_NUMBER(NVL(THETRANID,'0')), THEID, THETINTABLESALESNBR,
TO_NUMBER(NVL(THERETURNQTY,'0')), THECREATED_TS, THEXMIT_GUID, THESENDING_QMGR, THEMSG_ID, THEPUT_TS,
THEBROKER_NAME, THECHECKSUM);

If you have to use a REGEXP_REPLACE to replace pipes, escape them:
REGEXP_REPLACE(x, '\|', ' ')
This is useful to know when your more complex expressions include a pipe.
In this case, REPLACE that performs literal text search and replace will suffice:
REPLACE(x, '|', ' ')

Related

How to search for separated values in cloumns from a merged values column

I have a database where the data I need to work with is stored into two different columns. I also need to import an excel file and the data in this excel file is all together only separated by a dash. So either I need to figure out how to create a query, maybe an alias, or how to split the column by the dash and then make the query with the data split up.
The code I was trying was the following:
SELECT
CAST (dbo_predios.codigo_manzana_predio as nvarchar(55))+'-
'+CAST(dbo_predios.codigo_lote_predio as nvarchar(55)) as ROL_AVALUO
FROM dbo_predios
WHERE ROL_AVALUO like '%9132-2%'
That is one way I tried, but I don't know well how to split by a determined symbol. The data on the excel comes in the exact same way that I wrote in the "like" portion of the code.
I believe this is what you are after from the sounds of it:
SELECT
[locateDashInString] = CHARINDEX('-', e.FieldHere, 0) --just showing you where it finds the dash
,[SubstringBeforeItemLocated] =
SUBSTRING(
e.FieldHere --string to search from
,0 --starting index
,CHARINDEX('-', e.FieldHere, 0) --index of found item
)
,[SubstringAfterItemLocated] =
SUBSTRING(
e.FieldHere --string to search from
,CHARINDEX('-', e.FieldHere, 0) + 1 --starting index for substring
,LEN(e.FieldHere) --finish substring at this point
)
FROM ExcelImportedDataTable e
The locateDashInString column is just to show you where it finds the '-' symbol, you don't actually need it, the other two columns are a split of the value so '9132-2' split into two values/two columns.
**Just note that this will only work if you always have the format of val1-val2 in the data. Aslong as the format is the same it should be fine.

Sqlite - How to remove substring in a field using wildcards

I would like to remove all occurrences of a certain string pattern across all fields in a table.
For example, find all occurrences of the pattern "<XY>*<Xy>" where * represents any possible configuration of characters. I want to remove just those substrings and leave the remainder of the string intact.
This is an example of what I would like to use as my SQL command, but of course this doesn't work:
UPDATE Table SET Field = replace(Field, '<XY>*<Xy>', '');
What is the solution?
Here is an option which attempts to splice around the <XY>...</XY> tags:
UPDATE yourTable
SET Field = SUBSTR(Field, 1, INSTR(Field, '<XY>') - 1) ||
SUBSTR(Field, INSTR(Field, '</XY>') + 5)
WHERE Field LIKE '%<XY>%</XY>%'
It updates fields containing this pattern with the concatenation of everything coming before the first <XY> and everything coming after the second </XY>.
Note that I used <XY>...</XY> rather than what you had originally, because INSTR() is not case sensitive, and both tags would appear as being the same thing.
Demo
The demo is for MySQL but the sytnax is almost identical to SQLite.

Using SQL to make specific changes in a database.

I am trying to figure out some commands/code in SQL.
I have database with names, addresses IDs etc, but I have to convert firstname values ending in “jnr” to “(Jnr)” and those ending in “snr” to “(Snr)”.
How do I do this?
update table TABLE_NAME set NAMES = '*xyz*Jnr' where NAMES like '%jnr'
Update or select:
PASTE(column, CHAR_LENGTH(column)-3, 1, UPPER(SUBSTRING(column FROM CHAR_LENGTH(column)-3 FOR 1)
WHERE column LIKE '%jnr' OR column LIKE '%snr'
PASTE is used to put in one character at position 3 from end,
CHAR_LENGTH to get length of column value,
UPPER converts character to upper case,
SUBSTRING is used to pick one character here (j or s),
LIKE is used to find values ending with jnr, or snr.
All ANSI SQL (no dbms specified!)

how do I retrieve data from a sql table with huge number of inputs for a single column

I have a Company table in SQL Server and I would like to retrieve list of data related to particular companies and list of companies is very huge of around 200 company names and I am trying to use IN clause of T-SQL which is complicating the retrieval as few the companies have special characters in their name like O'Brien and so its throwing up an error as it is obvious.
SELECT *
FROM COMPANY
WHERE COMPANYNAME IN
('Archer Daniels Midland'
'Shell Trading (US) Company - Financial'
'Redwood Fund, LLC'
'Bunge Global Agribusiness - Matt Thibodeaux'
'PTG, LLC'
'Morgan Stanley Capital Group'
'Vitol Inc.'..
.....
....
.....)
Above is the script that is not working for obvious reasons, is there any way I can input those company names from an excel file and retrieve the data?
The easiest way would be to make a table and join it:
CREATE TABLE dbo.IncludedCompanies (CompanyName varchar(1000)
INSERT INTO dbo.IncludedCompanies
VALUES
('Archer Daniels Midland'),
('PTG, LLC')
...
SELECT *
FROM Company C
JOIN IncludedCompanies IC
ON C.CompanyName = IC.CompanyName
I do not think that mysql knows how to handle excel format, but you can fix your query.
Check how complicated names are stored in database (check if they have escape characters in them or anything else".
Replace all ' with \' in your query and it will take care of the ' characters
mysql> select now() as 'O\'Brian'; returns
O'Brian
2014-03-17 15:06:39
So i'm guessing you have a excel sheet with a column containing these names, and you want to use this in your where clause. In addition, some of the values have special characters in them, which needs to be escaped.
First thing you do is to escape the '-characters. You do this in excel, with a search replace for all occurences of ' with '' (the escaped version in sqlserver (\' in MySQL.)) Then, create a new column on each side side of your companies column, and in the first row input a ' on the left hand side, and ', on the right. Then use the copy cell functionality (the little square in the bottom right of the cell when you select it) to copy the cells to the left and right to all the rows, as far as the company list goes (just grab the square and pull it downwards..)
Then, take your list, now containing three columns and x rows and paste it into your favorite text editor. It should look something like this:
' Company#1 ',
' Company with special '' char ',
[...]
' Last company ',
Now, you will have some whitespace to get rid of. Use search replace and replace two space characters with nothing, and repeat (or take the space from the first ' to the start of the text and replace this with nothing.
Now, you should have a list of:
'Company#1',
'Company with special '' char',
[...]
'Last company',
Remove the last comma, and you'll have a valid list of parameters to your in-clause (or a (temporary) table if you want to keep your query a bit cleaner.)

Count with muliple where conditions in ms access

I have the query below;
Select count(*) as poor
from records where deviceId='00019' and type='Poor' and timestamp between #14-Sep-2012 01:01:01# and #24-Sep-2012 01:01:01#
table is like;
id. deviceId, type, timestamp
data is like;
data is like;
1, '00019', 'Poor', '19-Sep-2012 01:01:01'
2, '00019', 'Poor', '19-Sep-2012 01:01:01'
3, '00019', 'Poor', '19-Sep-2012 01:01:01'
4, '00019', 'Poor', '19-Sep-2012 01:01:01'
i am trying to count the devices with a specific specific type.
Please help.. access always returns wrong data. it is returning 1 while 00019 has 4 entries for poor
Type and timestamp are both reserved words, so enclose them in square brackets in your query like this: [type] and [timestamp]. I doubt those reserved words are the cause of your problem, but it's hard to predict exactly when reserved words will cause query problems, so just rule out this possibility by using the square brackets.
Beyond that, stored text values sometimes contained extra non-visible characters. Check the lengths of the stored text values to see whether any are longer than expected.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len([type]) AS LenOfType,
Len([timestamp]) AS LenOfTimestamp
FROM records;
In comments you mentioned spaces (ASCII value 32) in your stored values. I had been thinking we were dealing with other non-printable/invisible characters. If you have one or more actual space characters at the beginning and/or end of a stored deviceId value, the Trim() function will discard them. So this query will give you different length numbers in the two columns:
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Trim(deviceId)) AS LenOfDeviceId_NoSpaces
FROM records;
If the stored values can also include spaces within the string (not just at the beginning and/or end), Trim() will not remove those. In that case, you could use the Replace() function to discard all the spaces. Note however a query which uses Replace() must be run from inside an Access application session --- you can't use it from Java code.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Replace(deviceId, ' ', '')) AS LenOfDeviceId_NoSpaces
FROM records;
If that query returns the same length numbers in both columns, then we are not dealing with actual space characters (ASCII value 32) ... but some other type of character(s) which look "space-like".
If you want to count devices with specific type irrespective of deviceids then use this:
Select count(*) as excellent
from records where type='Poor'
If you want to count devices with specific deviceid irrespective of types then use this:
Select count(*) as excellent
from records where deviceId='00019'