I have a column say LINES with the below string patters. I want to extract the date from the strings. For example for each lines I would need the date i.e 20201123 or 20201124 whichever the case may be.
Since the dates are in different positions I can't really use substring for this. How do I go about this ? Is there a simpler REGEX method within substring that I can apply to this.
Here is a simple reproduced code for testing.
create volatile table TEST
(LINES VARCHAR(1000) CHARACTER SET LATIN NOT CASESPECIFIC)
ON COMMIT PRESERVE ROWS;
insert into TEST values('path/to/file/OVERALL_GOTO_Datas.20201123.dat');
insert into TEST values('path/to/file/endartstmov20201124.20201124.dat');
insert into TEST values('path/to/file/TESTDEV20201123.20201123.5.0014.CHK.dat');
insert into TEST values('path/to/file/DEVTOTES20201124.20201124.5.0109.CHK.dat');
insert into TEST values('path/to/file/STORE_PARTNER.20201124.20201124.0.0501.CHK.dat');
SELECT * FROM TEST;
Appreciate your responses. Thanks.
Using the teradata REGEXP_SUBSTR
You should be able to use this regex :
SELECT REGEXP_SUBSTR(LINES, '(:?\.([0-9]{8})\.)')
see : https://regex101.com/r/WRqEmY/2
An other way is with regexp_extract ( https://teradata.github.io/presto/docs/148t/functions/regexp.html )
SELECT regexp_extract(LINES, '(?:\.([0-9]{8})\.)', 1)
Related
I have a large list of SQL commands such as
SELECT * FROM TEST_TABLE
INSERT .....
UPDATE .....
SELECT * FROM ....
etc. My goal is to parse this list into a set of results so that I can easily determine a good count of how many of these statements are SELECT statements, how many are UPDATES, etc.
so I would be looking at a result set such as
SELECT 2
INSERT 1
UPDATE 1
...
I figured I could do this with Regex, but I'm a bit lost other than simply looking at everything string and comparing against 'SELECT' as a prefix, but this can run into multiple issues. Is there any other way to format this using REGEX?
You can add the SQL statements to a table and run them through a SQL query. If the SQL text is in a column called SQL_TEXT, you can get the SQL command type using this:
upper(regexp_substr(trim(regexp_replace(SQL_TEXT, '\\s', ' ')),
'^([\\w\\-]+)')) as COMMAND_TYPE
You'll need to do some clean up to create a column that indicates the type of statement you have. The rest is just basic aggregation
with cte as
(select *, trim(lower(split_part(regexp_replace(col, '\\s', ' '),' ',1))) as statement
from t)
select statement, count(*) as freq
from cte
group by statement;
SQL is a language and needs a parser to turn it from text into a structure. Regular expressions can only do part of the work (such as lexing).
Regular Expression Vs. String Parsing
You will have to limit your ambition if you want to restrict yourself to using regular expressions.
Still you can get some distance if you so want. A quick search found this random example of tokenizing MySQL SQL statements using regex https://swanhart.livejournal.com/130191.html
Table contains data as below
Table Name is REGISTER
Column Name is EXAM_CODE
Values like ('S6TJ','S7','S26','S24')
I want answer like below
Result set - > (6,7,26,24)
Please suggest solution - since regexp_replace is not recognized built in function name in SQL.
The complexity of the answer depends on two things: the RDBMS used and whether the numbers in the EXAM_CODE are contiguous.
I have assumed that the RDBMS is SQL Server and the numbers in EXAM_CODE are always contiguous. If not, please advise and I can revise the answer.
The following SQL shows a way of accomplishing the above using PATINDEX.:
CREATE TABLE #REGISTER (EXAM_CODE VARCHAR(10));
INSERT INTO #REGISTER VALUES ('S6TJ'),('S7'),('S26'),('S24');
SELECT LEFT(EXAM_CODE, PATINDEX('%[^0-9]%', EXAM_CODE) - 1)
FROM (
SELECT RIGHT(EXAM_CODE, LEN(EXAM_CODE) - PATINDEX('%[0-9]%', EXAM_CODE) + 1) + 'A' AS EXAM_CODE
FROM #REGISTER
) a
DROP TABLE #REGISTER
This outputs:
6
7
26
24
PATINDEX matches a specified pattern against a string (or returns 0 if there is no match).
Using this, the inner query fetches all of the string AFTER the first occurence of a number. The outer query then strips any text that may appear on the end of the string.
Note: The character A is appended to the result of the inner query in order to ensure that the PATINDEX check in the outer query will make a match. Otherwise, PATINDEX would return 0 and an error would occur.
I am using query as follows to get any records that begins with any character, has bunch of 0s and ends with number (1 in this case).
where column like '_%[0]1'
But the issue is it's even returning me d0101 etc. which I don't want. I just want d0001, or r0001. Can I use it to exactly match pattern, not partially using like?
Any other options in ms-sql?
SQL-Server does not really do proper regular expressions but you can generate the search clause you want like this:
where column like '_%1' and column not like '_%[^0]%1'
The second condition will exclude all cases where you have a character other than 0 in the middle of the string.
It will allow strings of all possible lengths, provided they start with an arbitrary character, then have any number of 0s and finish with a 1. All other strings will not satisfy the where clause.
create table tst(t varchar(10));
insert into tst values('d0101');
insert into tst values('d0001');
insert into tst values('r0001');
select * from tst where PATINDEX('%00%1', t)>0
or
select * from tst where t like '%00%1'
You use the _ to say that you don't care what char is there (single char) and then use the rest of the string you want:
DECLARE # TABLE (val VARCHAR(100))
INSERT INTO #
VALUES
('d0001'),
('f0001'),
('e0005'),
('e0001')
SELECT *
FROM #
WHERE val LIKE '_0001'
This code only really handles your two simple examples. If it is more complex, add it to your post.
I want to put a condition in my query where I have a column that should contain second position as an alphabet.
How to achieve this?
I've tried with _[A-Z]% in where clause but is not working. I've also tried [A-Z]%.
Any inputs please?
I think you want mysql query. like this
SELECT * FROM table WHERE column REGEXP '^.[A-Za-z]+$'
or sql server
select * from table where column like '_[a-zA-Z]%'
You can use regular expression matching in your query. For example:
SELECT * FROM `test` WHERE `name` REGEXP '^.[a-zA-Z].*';
That would match the name column from the test table against a regex that verifies if the second character is either a lowercase or uppercase alphabet letter.
Also see this SQL Fiddle for an example of data it does and doesn't match.
agree with #Gordon Linoff, your ('_[A-Z]%') should work.
if not work, kindly add some sample data with your question.
Declare #Table Table
(
TextCol Varchar(20)
)
Insert Into #Table(TextCol) Values
('23423cvxc43f')
,('2eD97S9')
,('sAgsdsf')
,('3Ss08008')
Select *
From #Table As t
Where t.TextCol Like '_[A-Z]%'
The use of '%[A-Z]%' suggests that you are using SQL Server. If so, you can do this using LIKE:
where col like '_[A-Z]%'
For LIKE patterns, _ represents any character. If the first character needs to be a digit:
where col like '[0-9][A-Z]%'
EDIT:
The above doesn't work in DB2. Instead:
where substr(col, 2, 1) between 'A' and 'Z'
I have a table with a column with strings that looke like this:
static-text-here/1abcdefg1abcdefgpxq
From this string 1abcdefg is repeated twice, so I want to remove that partial string, and return:
static-text-here/1abcdefgpxq
I can make no guarantees about the length of the repeat string. In pure SQL, how can this operation be performed?
regexp_replace('static-text-here/1abcdefg1abcdefgpxq', '/(.*)\1', '/\1')
fiddle
If you can guarantee a minimum length of the repeated string, something like this would work:
select REGEXP_REPLACE
(input,
'(.{10,})(.*?)\1+',
'\1') "Less one repetition"
from tablename tn where ...;
I believe this can be expanded to meet your case with some cleverness.
It seems to me that you might be pushing SQL beyond what it is capable/designed for. Is it possible for you to handle this situation programmatically in the layer that lays under the data layer where this type of thing can be more easily handled?
The REPLACE function should be enough to solve the problem.
Test table:
CREATE TABLE test (text varchar(100));
INSERT INTO test (text) VALUES ('pxq');
INSERT INTO test (text) VALUES ('static-text-here/pxq');
INSERT INTO test (text) VALUES ('static-text-here/1abcdefgpxq');
INSERT INTO test (text) VALUES ('static-text-here/1abcdefg1abcdefgpxq');
Query:
SELECT text, REPLACE(text, '1abcdefg1abcdefg', '1abcdefg') AS text2
FROM test;
Result:
TEXT TEXT2
pxq pxq
static-text-here/pxq static-text-here/pxq
static-text-here/1abcdefgpxq static-text-here/1abcdefgpxq
static-text-here/1abcdefg1abcdefgpxq static-text-here/1abcdefgpxq
AFAIK the REPLACE function is not in the SQL99 standard, but most DBMSs support it. I tested it here, and it works with MySQL, PostgreSQL, SQLite, Oracle and MS SQL Server.