max consecutive digits in a string - google-bigquery

I am trying to count the number of MAX consecutive digits that appear in a string column, let me give an example to illustrate better what I am trying to do. If I have a table called email
email
lucas1234#gmail.com
fer12#gmail.com
lupal#gmail.com
carlos1perez222#gmail.com
carlos11perez222#gmail.com
lucila1#gmail.com
my expected output would be
email count_cons_digits
lucas1234#gmail.com 4
fer12#gmail.com 2
lupal#gmail.com 0
carlos1perez222#gmail.com 3
carlos11perez222#gmail.com 3
lucila1#gmail.com 1
Check that this question is very similar to :
Number of consecutive digits in a column string
but the only difference is that the function from the results is not contemplating cases with only one digit in the email (like lucila1#gmail.com). In this case, the expected result should be 1 but the proposed function is giving 0. And also whenever the email contains "two sections" of consecutive digits (carlos11perez222#gmail.com). In this case, the expected output is to be 3 but is given 5.

Consider below approach
select *,
ifnull((select length(digits) len
from unnest(regexp_extract_all(email, r'\d+')) digits
order by len desc
limit 1
), 0) as count_cons_digits
from your_table
if applied to sample data in your question - output is

You may also try this approach using regex:
WITH email AS
(SELECT 'lucas1234#gmail.com' mail,
UNION ALL SELECT 'fer12#gmail.com',
UNION ALL SELECT 'lupal#gmail.com',
UNION ALL SELECT 'carlos1perez222#gmail.com',
UNION ALL SELECT 'carlos11perez222#gmail.com',
UNION ALL SELECT 'lucila1#gmail.com')
SELECT email,
(LENGTH(REGEXP_REPLACE(REGEXP_REPLACE(email.mail, r'[A-Za-z]+\d+[A-Za-z]+', ''),r'[A-Za-z.#]+',''))) AS count_cons_digits,
FROM email;
Output:

Related

Get multiple occurrence of string from a column in sql query

I have a table which has the following data
Ticketid created Details
205853669 2020-03-05 #CLOSE# Next action value://346004/ next action value://346002/ or value://346008/
205853670 2020-03-06 #Archive Next action value://346088/ next action value://346077/ or value://346057/
The string "value://" pattern is same in all column, I want to extract those numbers from the string.
ticketid Numbers
205853669 346004
205853669 346002
205853669 346008
205853670 346088
205853670 346077
205853670 346057
I am using standard Sql only
I have created something like below.
select ticketid,TRIM(REPLACE(SUBSTR(
details, STRPOS(details, "value//"),10
),"value//"","")) AS number from table
Below is for BigQuery Standard SQL
#standardSQL
SELECT Ticketid, Numbers
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(Details, r'value://(\d+)/')) Numbers
If to apply to sample data from your question - output is
Row Ticketid Numbers
1 205853669 346004
2 205853669 346002
3 205853669 346008
4 205853670 346088
5 205853670 346077
6 205853670 346057
The below query would work. This query splits the comment on value then extracts the 6 digit id.
with `project.dataset.table` as (
select id, split(details, 'value://') AS number from (
select '1' as id, '#CLOSE# Next action value://346004/ next action value://346002/ or value://346008/' as details
union all
select '2' as id, '#Archive Next action value://346088/ next action value://346077/ or value://346057/'
)
)
select id, regexp_extract(number1, "\\d{6}") as number
from `project.dataset.table` ,
UNNEST( number ) number1
where regexp_extract(number1, "\\d{6}") is not null
It has one remark about UNNEST function. As per documentation
The UNNEST operator takes an ARRAY and returns a table, with one row for each element in the ARRAY.
If you have only a few 'values://' for each comment then this wouldn't cause as much problem, but if there would be unlimited number of 'value://' this might become a performance bottleneck so keep that in mind. On the other hand this is the only way I know how to achieve that using CloudSQL.

Dual command in Big Query

"I am trying to achieve dual command in big query"
"I tried using the temp tables but not able to achieve it"
Oracle query: SELECT LEVEL - 1 F FROM
DUAL CONNECT BY LEVEL <= 2
"I expect the output in below format "
F
1
2
I have salary table with salaries : 50$ and 200$
I want to have duplicate of each row : 50$ ,-50$,200$ and -200$ is the output which i am expecting like 4 rows in total
You can use
SELECT
1
from (
select SESSION_USER())
to return a resultset with just one row.
Since BigQuery doesn't support CONNECT BY clause and since you want to get a positive and negative values from your data, you could try using a simple query like this one:
SELECT my_value FROM `project.dataset.table`
UNION ALL
SELECT -my_value FROM `project.dataset.table`
Notice the - in the second query as it'll give you negative values.
Hope it helps.

Extracting number of specific length from a string in Postgres

I am trying to extract a set of numbers from comments like
"on april-17 transactions numbers are 12345 / 56789"
"on april-18 transactions numbers are 56789"
"on may-19 no transactions"
Which are stored in a column called "com" in table comments
My requirement is to get the numbers of specific length. In this case length of 5, so 12345 and 56789 from the above string separately, It is possible to to have 0 five digit number or more more than 2 five digit number.
I tried using regexp_replace with the following result, I am trying the find a efficient regex or other method to achieve it
select regexp_replace(com, '[^0-9]',' ', 'g') from comments;
regexp_replace
----------------------------------------------------
17 12345 56789
I expect the result to get only
column1 | column2
12345 56789
There is no easy way to create query which gets an arbitrary number of columns: It cannot create one column for one number and at the next try the query would give two.
For fixed two columns:
demo:db<>fiddle
SELECT
matches[1] AS col1,
matches[2] AS col2
FROM (
SELECT
array_agg(regexp_matches[1]) AS matches
FROM
regexp_matches(
'on april-17 transactions numbers are 12345 / 56789',
'\d{5}',
'g'
)
) s
regexp_matches() gives out all finds in one row per find
array_agg() puts all elements into one array
The array elements can be give out as separate columns.

Comparing values in oracle when one value is partially masked

Here is what I am trying to do in a Oracle SQL query:
I have an account number that is X characters long (Example: 6001055555). I have a table that has part of the same account number but most of the number is masked (Examples: 600##########, 6001######, 600244####).
I am trying to match the number passed in 6001055555 to one of the following values 600##########, 6001######, 600244####.
In this example, account number 6001055555 should return 6001###### (from the above list). I can get to the point where the lengths are the same but am not sure how to address the match - I am looking at using REGEX expressions but am not sure if that' the correct path.
You can use the regular LIKE comparison in this case:
SQL> WITH DATA AS (
2 SELECT '600##########' acct FROM dual UNION ALL
3 SELECT '6001######' acct FROM dual UNION ALL
4 SELECT '600244####' acct FROM dual
5 )
6 SELECT *
7 FROM DATA
8 WHERE '6001055555' LIKE REPLACE (acct, '#', '_');
ACCT
-------------
6001######
We're used to seeing COLUMN LIKE :var but switching terms is also valid (:var LIKE column).
If my understanding is rite, this is what u may be expecting...
select regexp_substr('6001055555',replace('600##########','#'),1) from dual;
If you got any value from this query you may conclude that the account number is matched with the masking values

SQL Select using distinct and Cast [duplicate]

This question already exists:
Closed 10 years ago.
Possible Duplicate:
SQL Select DISTINCT using CAST
Let me try this one more time... I'm not a sql guy so please bear with me as I try to explain this... I have a table called t_recordkeepingleg with three columns of data. Column1 is named LEGTRIPNUMBER that happens to be a string that starts with the letter Q followed by 4 numbers. I need to strip off the Q and convert the remaining 4 characters (numbers) to an integer. Everyone with me so far? Column2 of this table is named LEGDATE. Column3 is named LEGGROUP.
Here's the input scenario
LEGTRIPNUMBER LEGDATE LEGGROUP
Q1001 08/12/12 0001
Q1001 09/15/12 0002
Q1002 09/01/12 0001
Q1002 09/08/12 0003
Q1002 09/09/12 0002
As you can see the input table has rows where LEGTRIPNUMBER occurs more than once. I only want the first occurrence.
This is my current select statement - it works but returns all rows.
SELECT *,
CAST(
substring("t_RecordkeepingLeg"."LEGTRIPNUMBER",2,4) as INT
) as Num_Trip_Num
FROM "1669"."dbo"."t_RecordkeepingLeg" "t_RecordkeepingLeg"
Where left "t_RecordkeepingLeg"."LEGTRIPNUMBER",1) = 'Q'
I want to modify this so that it only selects ONE occurance of the Qnnnn. When the row gets selected I want to have LEGDATE and LEGGROUP available to me. How do I do this?
Thank you,
Can it be as simple as below? I've just added condiotion on leggroup being 0001
SELECT *,
CAST(substring("t_RecordkeepingLeg"."LEGTRIPNUMBER",2,4) as INT) as Num_Trip_Num
FROM "1669"."dbo"."t_RecordkeepingLeg" "t_RecordkeepingLeg"
Where left ("t_RecordkeepingLeg"."LEGTRIPNUMBER",1) = 'Q'
and "t_RecordkeepingLeg"."LEGGROUP"='0001'
If you have a unique primay key in your table you can do something like the below;
SELECT CAST(
substring("t_RecordkeepingLeg"."LEGTRIPNUMBER",2,4) as INT
) as Num_Trip_Num
FROM "1669"."dbo"."t_RecordkeepingLeg" "t_RecordkeepingLeg"
Where "t_RecordkeepingLeg"."ID" In(
Select Min("t_RecordkeepingLeg"."ID")
From "1669"."dbo"."t_RecordkeepingLeg" "t_RecordkeepingLeg"
Where left ("t_RecordkeepingLeg"."LEGTRIPNUMBER",1) = 'Q'
Group By "t_RecordkeepingLeg"."LEGTRIPNUMBER"
)
Which values of LEGDATE & LEGGROUP do you want for the distinct LEGTRIPNUMBER? there are multiple non-distinct possibilities and the concept of "first occurrence" is only valid with an explicit order.
To get the values where LEGDATE is the earliest for example;
select Num_Trip_Num, LEGDATE, LEGGROUP from (
select
cast(substring(t_RecordkeepingLeg.LEGTRIPNUMBER, 2, 4) as INT) as Num_Trip_Num,
row_number() over (partition by substring(t_RecordkeepingLeg.LEGTRIPNUMBER, 2, 4) order by t_RecordkeepingLeg.LEGDATE asc) as row,
t_RecordkeepingLeg.LEGDATE,
t_RecordkeepingLeg.LEGGROUP
from t_RecordkeepingLeg
where left (t_RecordkeepingLeg.LEGTRIPNUMBER, 1) = 'Q'
) T
where row = 1