Converting SQL statement to Regular Expression - can it be done? - sql

I need to convert the below SQL statement into Regular Expression.
CASE WHEN TypeCode >= 400
AND TypeCode < 700
THEN Amt * -1
ELSE Amt
END
Background: I am putting a bank transactions file (BAI2 file) in to a system for transactional matching (matching bank transactions to GL transactions). In order to get these transactions to match, the fields to match on have to be exactly the same. However, in the GL a $500 check may be input as -500 (because the company's cash account is being reduced by $500 for a utility bill), but BAI files store all amounts as positive values. I need to use the transaction type code from the bank to identify whether an amount should be a debit or a credit (in reference to the GL).
I have SQL developers that can do this using SQL, but this tool I'm using to do the BAI data manipulation requires the logic to be input as Regular Expression.
Can anyone assist in applying the appropriate signage (positive or negative) to these amounts for bank transactions? Can this even be done? I'm a new poster so please bear with any ignorance and let me know if I can provide further details/information.

Maybe try this:
^[4-6][0-9]{2}$
This will look for numbers between 400-699 inclusive.
EDIT: Changed \d to [0-9] as \d may include weird characters as well.

I feel your pain...no bank makes it easy.
'\d' means exactly '[0-9]', '\d' is shorter.
Banks have done a good job at fooling the masses that the word 'credit' is good and 'debit' is bad (me thinks that is their intent...).
You can't know if the value should be negative or positive unless you KNOW which direction the money is flowing in.
IF the table has separate columns for INWARDS/RECEIVED funds and OUTWARDS/PAID funds, then you do not need positive and negative indication.
IF, however, there is only ONE column in the table for all AMOUNTS, whether moved INTO or taken OUT FROM the account, then you definitely need SIGNED VALUES (positive/negative indication).
Either "-$nnn" or "($nnn)", with or without the "$".
If you have TWO COLUMN tables (PAID IN and PAID OUT) then just use "$nnn" without SIGNS.
If you have a SINGLE COLUMN table, then you can replace "$" with "-$" using:
$value =~ s/\$/-\$/;
The above is a perl example.
'\$' means "literal SIGIL" (i.e. dollar sign).
To match any value between and including 399-700 use the following regex:
^(399|[4-6]\d\d|700)$
That should match exactly what you want.
So you could do something like (in perl):
if ($TypeCode =~ m/^([4-6]\d\d|399|700)$/) { # if code matches pattern
# -EITHER-
$Amount=~ s/\$/-\$/; # prepend "-" to "$"
# -OR-
$Amount=~ s/\$/-/; # replace "$" with "-"
}
The '^' (carat - start of line or string) and '$' (sigil - end of line or string) surrounding the regex stop it from matching anything with 4 or more digits, like 3399 or 47421.
I moved '[4-6]\d\d' to the front as that will match 300 of 302 possible codes (and, I dunno, it may save a few milliseconds of processing).
'[4-6]' = '[456]', which means the digits from '4' to '6'.
TEST THIS ON SAMPLE DATA FIRST!

Related

How to use Regex to lowercase catalogue values without any logic codes

For a loan domain we pass some catalogue values eg. if a customer is primary or secondary customer like that. So i need to check the values irrespective of uppercase, lowercase, camelcase. Software which i am using will accept only regex codes not any Java, js codes (it is different scripting). I am trying to convert only with regexp but still getting error.
If catalogue_value ~"(/A-Z/)" then
Catalogue_value ~"/l"
Endif
As i am learning regex as of now still figuring for correct expressions to use.
Kindly please tell me correct format to use regex to change into lowercase / uppercase
If i understood your problem you want to search without worrying about the case, for example the data is Paul, and you want to find this record searching by PAUL, paul, PaUl, etc?
One common to technique to do that is to put both sides all in upper or lower case, without regex, for example, in javascript:
"Paul".toLowerCase() === "paUL".toLowerCase()
In SQL:
select case when LOWER('Paul') = LOWER('paUL') then 1 else 0 end

The set of atomic irrational numbers used to express the character table and corresponding (unitary) representations

I want to calculate the irrational number, expressed by the following formula in gap:
3^(1/7). I've read through the related description here, but still can't figure out the trick. Will numbers like this appear in the computation of the character table and corresponding (unitary) representations?
P.S. Basically, I want to figure out the following question: For the computation of the character table and corresponding (unitary) representations, what is the minimum complete set of atomic irrational numbers used to express the results?
Regards,
HZ
You can't do that with GAP's standard cyclotomic numbers, as seventh roots of 3 are not cyclotomic. Indeed, suppose $r$ is such a root, i.e. a rot of the polynomial $f = x^7-3 \in \mathbb{Q}[x]$. Then $r$ is cyclotomic if and only if the field extension \mathbb{Q}[x] is a subfield of a cyclotomic field. By Kronecker-Weber this is equivalent to that field being an abelian extension, i.e., the Galois group is abelian. One can check that this is not the case here (the Galois group is a semidirect product of C_7 with C_6).
So, $r$ is not cyclotomic.

Using SQL - how do I match an exact number of characters?

My task is to validate existing data in an MSSQL database. I've got some SQL experience, but not enough, apparently. We have a zip code field that must be either 5 or 9 digits (US zip). What we are finding in the zip field are embedded spaces and other oddities that will be prevented in the future. I've searched enough to find the references for LIKE that leave me with this "novice approach":
ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9]'
AND ZIP NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Is this really what I must code? Is there nothing similar to...?
ZIP NOT LIKE '[\d]{5}' AND ZIP NOT LIKE '[\d]{9}'
I will loath validating longer fields! I suppose, ultimately, both code sequences will be equally efficient (or should be).
Thanks for your help
Unfortunately, LIKE is not regex-compatible so nothing of the sort \d. Although, combining a length function with a numeric function may provide an acceptable result:
WHERE ISNUMERIC(ZIP) <> 1 OR LEN(ZIP) NOT IN(5,9)
I would however not recommend it because it ISNUMERIC will return 1 for a +, - or valid currency symbol. Especially the minus sign may be prevalent in the data set, so I'd still favor your "novice" approach.
Another approach is to use:
ZIP NOT LIKE '%[^0-9]%' OR LEN(ZIP) NOT IN(5,9)
which will find any row where zip does not contain any character that is not 0-9 (i.e only 0-9 allowed) where the length is not 5 or 9.
There are few ways you could achieve that.
You can replace [0-9] with _ like
ZIP NOT LIKE '_'
USE LEN() so it's like
LEN(ZIP) NOT IN(5,9)
You are looking for LENGTH()
select * from table WHERE length(ZIP)=5;
select * from table WHERE length(ZIP)=9;
To test for non-numeric values you can use ISNUMERIC():
WHERE ISNUMERIC(ZIP) <> 1

Formatted output with leading zeros in Fortran

I have some decimal numbers that I need to write to a text file with leading zeros when appropriate. I've done some research on this, and everything I've seen suggests something like:
REAL VALUE
INTEGER IVALUE
IF (VALUE.LT.0) THEN
IVALUE = CEILING(VALUE)
ELSE
IVALUE = FLOOR(VALUE)
ENDIF
WRITE(*,1) IVALUE, ABS(VALUE)-ABS(IVALUE)
1 FORMAT(I3.3,F5.4)
As I understand it, the IF block and ABS parts should allow this to work for all values on -100 < VALUE < 1000. If I set VALUE = 12.3456, the code above should produce "012.3456" as the output, and it does. However if I have something like VALUE = -12.3456, I'm getting "(3 asterisks).3456" as my output. I know the asterisks usually shows up when there are not enough characters provided for in the FORMAT statement, but 3 should be enough in this example (1 character for the "-" and two characters for "12"). I haven't tested this yet with something like VALUE = -9.876, but I'd expect the output to be "-09.8760".
Is there something wrong in my understanding of how this works? Or is there some other limitation of this technique that I'm violating?
UPDATE: Okay I've looked into this some more, and it seems to be a combination of a negative value and the I3.3 format. If VALUE is positive and I have the I3.3, it will put leading zeros as expected. If VALUE is negative and I only have I3 as my format, I get the correct value output, but it will be padded with spaces before the negative sign instead of padded with zeros after the negative (so -9.8765 is output as " -9.8765", but that leading space breaks what I'm using the .txt file for, so it's not acceptable).
Tho problem is with your integer data edit descriptor. With I3.3 you require at least 3 digits and the field width is only 3. There is no place for the minus sign. Use I4.3 or, In Fortran 95 and above, I0.3.
Answer to your edit: Use I0.3, it uses the minimum number of characters necessary.
But finally, you just probably want this: WRITE(*,'(f0.3)') VALUE
Of course, I could get what I'm looking for by changing it up a little bit to
REAL VALUE
INTEGER IVALUE
IF (VALUE.LT.0) THEN
WRITE(*,1) FLOOR(ABS(IVALUE)), ABS(VALUE)-FLOOR(ABS(VALUE))
1 FORMAT('-',I2.2,F5.4)
ELSE
WRITE(*,2) FLOOR(VALUE), ABS(VALUE)-FLOOR(BS(VALUE))
2 FORMAT(I3.3,F5.4)
ENDIF
But this feels a lot clunkier, and in reality I'm going to try to be writing multiple values in the same line, which will lead to really messy IF blocks or complex cursor movement, which I'd like to avoid if at all possible.
as another way to skin the cat.. I'd prefer not to do arithmatic on the data at all but just work on the format:
character*8 fstring/'(f000.4)'/
val=12.34
if(val.gt.1)then
write(fstring(3:5),'(i0)')6+floor(log10(val))
elseif(val.lt.-1)then
write(fstring(3:5),'(i0)')7+floor(log10(-val))
elseif(val.ge.0)
write(fstring(3:5),'(i0)')6
else
write(fstring(3:5),'(i0)')7
endif
write(*,fstring)val
just for fun with modern fortran that supports character functions you can roll that up in a function and end up with a construct like this:
write(*,'('//fstring(val1)//','//fstring(val2)//')')val1,val2

Data Cleanup, post conversion from ALLCAPS to Title Case

Converting a database of people and addresses from ALL CAPS to Title Case will create a number of improperly capitalized words/names, some examples follow:
MacDonald, PhD, CPA, III
Does anyone know of an existing script that will cleanup all the common problem words? Certainly, it will still leave some mistakes behind (less common names with CamelCase-like spellings, i.e. "MacDonalz").
I don't think it matters much, but the data currently resides in MSSQL. Since this is a one-time job, I'd export out to text if a solution requires it.
There is a thread that posed a related question, sometimes touching on this problem, but not addressing this problem specifically. You can see it here:
SQL Server: Make all UPPER case to Proper Case/Title Case
Don't know if this is of any help
private static function ucNames($surname) {
// ( O\' | \- | Ma?c | Fitz ) # attempt to match Irish, Scottish and double-barrelled surnames
$replaceValue = ucwords($surname);
return preg_replace('/
(?: ^ | \\b ) # assertion: beginning of string or a word boundary
( O\' | \- | Ma?c | Fitz ) # attempt to match Irish, Scottish and double-barrelled surnames
( [^\W\d_] ) # match next char; we exclude digits and _ from \w
/xe',
"'\$1' . strtoupper('\$2')",
$replaceValue);
}
It's a simple PHP function that I use to set surnames to correct case that works for names like O'Connor, McDonald and MacBeth, FitzPatrick, and double-barrelled names like Hedley-Smythe
Here is the answer I was looking for:
There is a data company, Melissa Data, who publishes some API and applications for database cleanup -- geared mostly around the direct marketing industry.
I was able to use two applications to solve my problem.
StyleList: this app, among other
things, converts ALL CAPS to mixed
case and in the process it does not
dirty up the data, leaving titles
such as CPA, MD, III, etc. in tact;
as well as natural, common
camel-case names such as McDonalds.
Personator: I used personator to break the Full Name fields into Prefix, First Name, Middle Name, Last Name, and Suffix. To be honest, it was far from perfect, but the data I gave it was pretty challenging (often no space separating a middle name and a suffix). This app does a number of other usefult things as well, including assigning gender to most names. It's available as an API you can call, too.
Here is a link to the solutions offered by Melissa Data:
http://www.melissadata.com/dqt/index.htm
For me, the Melissa Data apps did much of the heavy lifting and the remaining dirty data was identifiable and fixable in SQL by reporting on LEFT x or RIGHT x counts -- the dirt typically has the least uniqueness, patterns easily discovered and fixed.