Extract the number from the Title column in SQL - sql

Please help the query extract the number after "_tid-" from the Title column.

Use the Vertica Regular Expression function collection. Regular expressions in Vertica correspond to the perl regex functionality in their behaviour.
So, with your input, search for the first group that follows _tid- and consists of consecutive digits (\d)...
WITH
-- your input, don't use in final query ...
indata(Title,Extract_ID) AS (
SELECT 'sdffsdvprocessortype%3Alnjklel&text=&textSearch=&pageSize=10&SSid=psj6tcd5b1g7_tid-87945','87945'
UNION ALL SELECT 'https://www.google.com/hk/en/age.html?SSid=ps_c8x4v1r2a3_tid-8952777456','8952777456'
UNION ALL SELECT 'https://www.google.com/hk/en/ge/dhci.html?SSid=ps_7228fk5sbh_tid-5879','5879'
UNION ALL SELECT 'https://www.google.com/fr/fr/c/3328412?q=%3%3AtSearch=&pageSize=10&SSid=pseydgg8h2_tid-9858867','9858867'
UNION ALL SELECT 'https://www.google.com/fr/fr/seas/1011028701?SSid=ps_yne5j6fmqv_tid-6879582','6879582'
UNION ALL SELECT 'https://www.google.com/il/en/sera/p/1010192786?SSid=ps_gydi5nk673_tid-5577484126','5577484126'
UNION ALL SELECT 'pid=ps_98qcokfh3_tid-548965&q=%3Arelevance%3Afacet_Processorstype','548965'
UNION ALL SELECT 'pid=ps_345ey5na9_tid-95861469rq=%3relevance%3Afacet_Processo','95861469'
UNION ALL SELECT 'npyamjhsgx_tid-002154785%20/p/1010192775?SSid=ps_npyamjhsgx_tid-002154785','002154785'
UNION ALL SELECT 'https://www.google.com/us/en/ke.html?ssid=ps_wc998kn__tid-0012889','0012889'
)
-- end of your input, real query starts here ...
SELECT
REGEXP_SUBSTR(
title -- input string
, '_tid-(\d+)' -- regular experssion (note the bit in parentheses, that's the first group)
, 1 -- starting point
, 1 -- occurrence ordinal number
, '' -- modifier (case insensitive, etc. check perl docu)
, 1 -- parentheses base grouping expression's ordinal number
) AS calc_extract
, *
FROM indata;
-- out calc_extract | Title | Extract_ID
-- out --------------+------------------------------------------------------------------------------------------------+------------
-- out 87945 | sdffsdvprocessortype%3Alnjklel&text=&textSearch=&pageSize=10&SSid=psj6tcd5b1g7_tid-87945 | 87945
-- out 8952777456 | https://www.google.com/hk/en/age.html?SSid=ps_c8x4v1r2a3_tid-8952777456 | 8952777456
-- out 5879 | https://www.google.com/hk/en/ge/dhci.html?SSid=ps_7228fk5sbh_tid-5879 | 5879
-- out 9858867 | https://www.google.com/fr/fr/c/3328412?q=%3%3AtSearch=&pageSize=10&SSid=pseydgg8h2_tid-9858867 | 9858867
-- out 6879582 | https://www.google.com/fr/fr/seas/1011028701?SSid=ps_yne5j6fmqv_tid-6879582 | 6879582
-- out 5577484126 | https://www.google.com/il/en/sera/p/1010192786?SSid=ps_gydi5nk673_tid-5577484126 | 5577484126
-- out 548965 | pid=ps_98qcokfh3_tid-548965&q=%3Arelevance%3Afacet_Processorstype | 548965
-- out 95861469 | pid=ps_345ey5na9_tid-95861469rq=%3relevance%3Afacet_Processo | 95861469
-- out 002154785 | npyamjhsgx_tid-002154785%20/p/1010192775?SSid=ps_npyamjhsgx_tid-002154785 | 002154785
-- out 0012889 | https://www.google.com/us/en/ke.html?ssid=ps_wc998kn__tid-0012889 | 0012889

Related

Sort each character in a string from a specific column in Snowflake SQL

I am trying to alphabetically sort each value in a column with Snowflake. For example I have:
| NAME |
| ---- |
| abc |
| bca |
| acb |
and want
| NAME |
| ---- |
| abc |
| abc |
| abc |
how would I go about doing that? I've tried using SPLIT and the ordering the rows, but that doesn't seem to work without a specific delimiter.
Using REGEXP_REPLACE to introduce separator between each character, STRTOK_SPLIT_TO_TABLE to get individual letters as rows and LISTAGG to combine again as sorted string:
SELECT tab.col, LISTAGG(s.value) WITHIN GROUP (ORDER BY s.value) AS result
FROM tab
, TABLE(STRTOK_SPLIT_TO_TABLE(REGEXP_REPLACE(tab.col, '(.)', '\\1~'), '~')) AS s
GROUP BY tab.col;
For sample data:
CREATE OR REPLACE TABLE tab
AS
SELECT 'abc' AS col UNION
SELECT 'bca' UNION
SELECT 'acb';
Output:
Similar implementation as Lukasz's, but using regexp_extract_all to extract individual characters in the form of an array that we later split to rows using flatten . The listagg then stitches it back in the order we specify in within group clause.
with cte (col) as
(select 'abc' union
select 'bca' union
select 'acb')
select col, listagg(b.value) within group (order by b.value) as col2
from cte, lateral flatten(regexp_extract_all(col,'.')) b
group by col;

Oracle SQL: Merge two results in 1 row

I'm currently creating a SQL query to get the results of all records from two tables that are connected via ID. Is there anyway to return the results in 1 row if there are multiple records link to 1 id from a different table? Below are my SQL query, current result and what is the expected result of the query.
Current query:
SELECT
'A' AS "actionIndicator", 'A' AS "target",
crdExpt.CRD_PAN,
acnExpt.ACN_ATP_ID, acnExpt.ACN_ACCOUNT_NUMBER,
FROM
tbl1 crdExpt, tbl2 acnExpt, tbl3 crdAcnExpt
where tbl1 is the record for card numbers, tbl2 is the record for account numbers and tbl3 is where the linking of card and account numbers.
Current result is like this:
CRD_PAN | ACN_ATP_ID| ACN_ACCOUNT_NUMBER
123456789 | 23 | 99112345678
123456789 | 24 | 99012345678
What I'm trying to achieve is if there 2 account numbers linked to 1 card, the expected output is:
CRD_PAN | ACN_ATP_ID| ACN_ACCOUNT_NUMBER |ACN_ATP_ID2 | ACN_ACCOUNT_NUMBER2
123456789 | 23 | 99112345678 | 24 | 99012345678
By OP request in the comments:
I used the following example data (the result of your query) on this SQL Fiddle
CREATE TABLE test(
CRD_PAN VARCHAR(256),
ACN_ATP_ID VARCHAR(256),
ACN_ACCOUNT_NUMBER VARCHAR(256)
);
INSERT INTO test(CRD_PAN, ACN_ATP_ID, ACN_ACCOUNT_NUMBER)
SELECT '123456789', '23', '99112345678' FROM DUAL
UNION ALL
SELECT '123456789', '24', '99012345678' FROM DUAL
;
From there, I ran the following query:
SELECT
CRD_PAN,
LISTAGG(ACN_ATP_ID, ', ') WITHIN GROUP (ORDER BY CRD_PAN) AS ACN_ATP_ID,
LISTAGG(ACN_ACCOUNT_NUMBER, ',') WITHIN GROUP (ORDER BY CRD_PAN) AS ACN_ATP_ID
FROM
test
GROUP BY
CRD_PAN
Which gave me:
| CRD_PAN | ACN_ATP_ID | ACN_ATP_ID |
|:---------:|:----------:|:-----------------------:|
| 123456789 | 23, 24 | 99012345678,99112345678 |
So, I believe a solution could be:
WITH
test AS (
SELECT
'A' AS "actionIndicator", 'A' AS "target",
crdExpt.CRD_PAN,
acnExpt.ACN_ATP_ID, acnExpt.ACN_ACCOUNT_NUMBER,
FROM tbl1 crdExpt, tbl2 acnExpt, tbl3 crdAcnExpt
),
listdata AS (
SELECT
CRD_PAN,
LISTAGG(ACN_ATP_ID, ', ') WITHIN GROUP (ORDER BY CRD_PAN) AS ACN_ATP_ID,
LISTAGG(ACN_ACCOUNT_NUMBER, ',') WITHIN GROUP (ORDER BY CRD_PAN) AS ACN_ATP_ID
FROM
test
GROUP BY
CRD_PAN
)
SELECT * FROM listdata
The LISTAGG function allows you to move multiple rows into one, separated by some sort of character (I used ,), and a subquery was used to demonstrate capturing your data, aggregating it, and then returning it

Seperate phone numbers from string in cell - random order

I have a bunch of data that contains a phone number and a birthday as well as other data.
{1997-06-28,07742367858}
{07791100873,1996-07-14}
{30/01/1997,07974335488}
{1997-04-04,07701003703}
{1996-03-11,07480227283}
{1998-06-20,07713817233}
{1996-09-13,07435148920}
{"21 03 2000",07548542539,1st}
{1996-03-09,07539248008}
{07484642432,1996-03-01}
I am trying to extract the phone number from this, however unsure on how to get this out when the data is not always in the same order.
I would expect to one column that return a phone number, the next which returned a birthday then another which return any arbitrary value in the 3rd column slot.
You can try to sort parts of each string by the number of digits they contain. This can be done with the expression:
select length(regexp_replace('1997-06-28', '\D', '', 'g'))
length
--------
8
(1 row)
The query removes curly brackets from strings, splits them by comma, sorts elements by the number of digits and aggregates back to arrays:
with my_data(str) as (
values
('{1997-06-28,07742367858}'),
('{07791100873,1996-07-14}'),
('{30/01/1997,07974335488}'),
('{1997-04-04,07701003703}'),
('{1996-03-11,07480227283}'),
('{1998-06-20,07713817233}'),
('{1996-09-13,07435148920}'),
('{"21 03 2000",07548542539,1st}'),
('{1996-03-09,07539248008}'),
('{07484642432,1996-03-01}')
)
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc)
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
Result:
id | array_agg
----+--------------------------------
1 | {07742367858,1997-06-28}
2 | {07791100873,1996-07-14}
3 | {07974335488,30/01/1997}
4 | {07701003703,1997-04-04}
5 | {07480227283,1996-03-11}
6 | {07713817233,1998-06-20}
7 | {07435148920,1996-09-13}
8 | {07548542539,"21 03 2000",1st}
9 | {07539248008,1996-03-09}
10 | {07484642432,1996-03-01}
(10 rows)
See also this answer Looking for solution to swap position of date format DMY to YMD if you want to normalize dates. You should modify the function:
create or replace function iso_date(text)
returns date language sql immutable as $$
select case
when $1 like '__/__/____' then to_date($1, 'DD/MM/YYYY')
when $1 like '____/__/__' then to_date($1, 'YYYY/MM/DD')
when $1 like '____-__-__' then to_date($1, 'YYYY-MM-DD')
when trim($1, '"') like '__ __ ____' then to_date(trim($1, '"'), 'DD MM YYYY')
end
$$;
and use it:
select id, a[1] as phone, iso_date(a[2]) as birthday, a[3] as comment
from (
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc) as a
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
) s
id | phone | birthday | comment
----+-------------+------------+---------
1 | 07742367858 | 1997-06-28 |
2 | 07791100873 | 1996-07-14 |
3 | 07974335488 | 1997-01-30 |
4 | 07701003703 | 1997-04-04 |
5 | 07480227283 | 1996-03-11 |
6 | 07713817233 | 1998-06-20 |
7 | 07435148920 | 1996-09-13 |
8 | 07548542539 | 2000-03-21 | 1st
9 | 07539248008 | 1996-03-09 |
10 | 07484642432 | 1996-03-01 |
(10 rows)

Sort the digits of a numerical string

I need to SORT all the digits from some string values in Postgres.
For instance, if I have two strings, e.g.
"70005" ==> "00057"
"70001" ==> "00017"
"32451" ==> "12345"
I can't cast the strings to integer or bigint due to my logic limitations. Is it possible to do this?
Use a recursive cte. Take the first char. if is '0' ignore it other wise go to the begining of target string.
Then use LPAD to append 0 until you get length 10.
SQL DEMO
WITH RECURSIVE cte (id, source, target) as (
SELECT 1 as id, '70001' as source , '' as target
UNION
SELECT 2 as id, '70005' as source , '' as target
UNION ALL
SELECT id,
substring(source from 2 for length(source)-1) as source,
CASE WHEN substring(source from 1 for 1) = '0' THEN target
ELSE substring(source from 1 for 1) || target
END
FROM cte
WHERE length(source) > 0
), reverse as (
SELECT id,
target,
row_number() over (partition by id
order by length(target) desc) rn
FROM cte
)
SELECT id, LPAD(target::text, 10, '0')
FROM reverse
WHERE rn = 1
OUTPUT
| id | lpad |
|----|------------|
| 1 | 0000000017 |
| 2 | 0000000057 |
Assuming that your data is organized like this:
Table: strings
| id | string |
|----+---------|
| 1 | '70005' |
| 2 | '70001' |
etc...
Then you can use a query like this:
SELECT all_digits.id,
array_to_string(array_agg(all_digits.digit ORDER BY all_digits.digit), '')
FROM (
SELECT strings.id, digits.digit
FROM strings, unnest(string_to_array(strings.string, NULL)) digits(digit)
) all_digits
GROUP BY all_digits.id
What this query does is split your table up into one row for each character in the string, sorts the table, and then aggregates the characters back into a string.
There's a SQL fiddle here: http://sqlfiddle.com/#!15/7f7fb0/14

Regular Expression - Get specific group value in SQL

I have the following regular expression problem.
Input:
123_321_009
3111_00_001
5123_123
555
666_A66
777_B77_777
Output request as below:
123_321
3111_00
5123
555
666_A66
777_B77
Is there any way to get the value of the output above?
I tried below statement but lack the idea how to get the value i needed.
^(.*?)\\s?([_0-9])?$
Value appearing after the last underscore are not needed.
You can use REGEXP_REPLACE to remove the numbers following/including the last underscore.
SQL Fiddle
Query:
with x(y) as (
select '123_321_009' from dual union all
select '3111_00_001' from dual union all
select '5123_123' from dual union all
select '666_A66' from dual union all
select '777_B77_777' from dual union all
select '555' from dual
)
select y, regexp_replace(y,'_\d+$') substr
from x
Results:
| Y | SUBSTR |
|-------------|---------|
| 123_321_009 | 123_321 |
| 3111_00_001 | 3111_00 |
| 5123_123 | 5123 |
| 666_A66 | 666_A66 |
| 777_B77_777 | 777_B77 |
| 555 | 555 |
Pattern:
_ --matches an underscore
\d+ --matches one or more numbers
$ --matches end of the string
Effectively, this matches all the digits following/including the last underscore. Third parameter is regexp_replace is omitted. So, the pattern is
removed and replaced by nothing.
Since you a using Oracle, you can use a regex_replace(val,pattern,'') function with this pattern.
The following patterns would satisfy the samples you provided:
_[0-9]{3}$
_[0-9]*$
Here is a demonstration of this approach using SQL*Plus:
SCOTT#dev> WITH tab(num_val) AS
2 ( SELECT '123_321_009' FROM dual
3 UNION ALL
4 SELECT '3111_00_001' FROM dual
5 UNION ALL
6 SELECT '5123_123' FROM dual
7 UNION ALL
8 SELECT '555' FROM dual
9 )
10 SELECT tab.num_val,
11 regexp_replace(tab.num_val,'_[0-9]{3}$') approach_1,
12 regexp_replace(tab.num_val,'_[0-9]*$') approach_2
13 FROM tab
14 /
NUM_VAL APPROACH_1 APPROACH_2
=========== ====================== ==================
123_321_009 123_321 123_321
3111_00_001 3111_00 3111_00
5123_123 5123 5123
555 555 555
If you provided a larger sampling (or a more specific rule), a more specific solution could be provided.