regex trim the part of the string sql

regex trim the part of the string sql - sql

My data lives in Big Query. There is one column that needs REGEX extraction. The example of the string is below:
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hc_hr
src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_healthcare
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_hr
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hr_healthcare
My desired output is this:
my_campaign=goal
my_campaign=goal
Basically I need to trim everything but my_campaign=goal
The code I wrote is in SQL, below:
LOWER(REGEXP_EXTRACT(my_column,r'my_campaign=([^&])')) AS my_campaign
it returns everything with my_campaign my_campaign=abb_hc_hr, my_campaign=goal_healthcare etc. How should I change the existing code to just grab my_campaign=goal?
Thank you.

Below is for BigQuery Standard SQL
You should use below
SELECT
LOWER(REGEXP_EXTRACT(my_column,r'(my_campaign=[^&]*)&?')) AS my_campaign
FROM your_table
WHERE LOWER(my_column) LIKE '%my_campaign=goal_%'
if applied to sample data from your question - output is
Row my_campaign
1 my_campaign=goal_healthcare
2 my_campaign=goal_hr

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.

You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)

You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

MS SQL - Show values of dynamic length

I have a column called command in my table where information is stored about e-mails that should be sent out by the system.
The data looks like this:
<_email><property name="To">test#test.se;test#tester.se;test#test.com</property><property name="From">sender#sender.se</property>
<_email><property name="To">test#test.se</property><property name="From">sender#sender.com</property>
I want to use a select statement to only display the e-mail addresses of those who will receive the e-mail. By doing this, the output should look like this:
Example row 1:
test#test.se;test#tester.se;test#test.com
Example row 2:
test#test.se
I can't use substring since the e-mail addresses varies in length. I assume that it's possible to achieve this somehow by using regular expressions, but I cannot manage to resolve it.
Can you please help me out?
Thanks!
/ Krustofski

You can use the CharIndex function in order to retrieve the start and the end of your string, and use a little math to execute the substring:
Select Substring(MyColumn,
CharIndex('<property name="To">', MyColumn) + 20,
CharIndex('</property>', MyColumn) -
CharIndex('<property name="To">', MyColumn) - 20
)
From MyTable
I tested with your table values, and it works.

Why does my update query to replace string not work?

I have an Access table where I have transaction IDs in the below format:
Transaction_ID
39296165-1
39296165-2
39296165-3
39284029-1
39284029-2
I am trying to write a query which finds the dash and removes the -1,-2,-3 etc., so I can then de-duplicate based on the string before the dash.
I've written the below:
UPDATE mytable
SET Transaction_ID=Left(Transaction_ID,InStr(1,Transaction_ID,"-")-1)*
Which works fine, however, when it comes across a Transaction_ID which doesn't have a dash in the string, it gives me a type conversion and replaces the string with a blank value.
Any advice on error-trapping this?

Add a WHERE clause to only update if InStr does not return -1:
WHERE InStr(1,Transaction_ID,"-") > 0

This would also work and would be more efficient.
WHERE Transaction_ID LIKE "*-*"

How to Filter WHERE Field Value LIKE any of the values stored in a Multi Value Parameter in SQL

I have a report (built using SSRS) that uses a multi-value parameter.
I want to add a Filter onto my SQL Query WHERE FieldA is LIKE any of the values stored in the parameter.
So FieldA might have the following values:
BOBJAMESLOUISE
MARYBOB
JENNY
JOHNLOUISEJAMES
BOB
JENNYJAMESMIKE
And #ParamA might have the following values:
Bob, Louise
Therefore in this example only records 1, 3, 4 and 5 should be returned
Thanks to any help in advance :)
P.S I'm using SQL Server 2008

You will want to implement a function like the split function. This can take a comma separated value list and separate it into rows like you want.
Below is a link for a couple of different versions, any of them will work for you. It also tells you how to use it.
Split Function

I am guessing its not the spiting sting part that is the issue since just googling for SQL split string you can find a lot of example. In your case what you would want after the split string is something like this. Assuming that the split string function you end up using returns a table of values Here is what your comparison query for with field A would look like.
SELECT * FROM YourTableWithFieldA WHERE (#ParamA IS NULL OR EXISTS ( SELECT * FROM YourSplitFunctionThatReturnsATableOfValues(#ParamA) SplitTable WHERE (FieldA Like '%'+SplitTable.Value+'%')))

Convert special chars to RAW format in Oracle

How do I convert a special chars like '#' to RAW format in Oracle?I need it for searching in the blob like this.
following code is giving me all rows in table as a result
dbms_lob.instr(gob_a_document, utl_raw.cast_to_raw('C#')) <> 0)
Or is there a better way?

I tried your code out, and I think it's correct for that line. On Oracle 11.2.0.1, I used your code to do something basically the same:
select v.*
from V_INCOMING_MAIL v
where dbms_lob.instr(v.message_text, utl_raw.cast_to_raw('C#'),1,1) <> 0;
This selected 9 rows out of the 15 thousand in the view. From an ad-hoc sampling of those rows and some others, that seems to be working OK.
So, perhaps the problem lies in the other lines in the SQL statement?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

regex trim the part of the string sql - sql

Related

SQL group by middle part of string

MS SQL - Show values of dynamic length

Why does my update query to replace string not work?

How to Filter WHERE Field Value LIKE any of the values stored in a Multi Value Parameter in SQL

Convert special chars to RAW format in Oracle

Categories

Resources