regex trim the part of the string sql - sql

My data lives in Big Query. There is one column that needs REGEX extraction. The example of the string is below:
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hc_hr
src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_healthcare
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=goal_hr
?src=abb_fh_uit*_source=h&_medium=cpm&my_campaign=abb_hr_healthcare
My desired output is this:
my_campaign=goal
my_campaign=goal
Basically I need to trim everything but my_campaign=goal
The code I wrote is in SQL, below:
LOWER(REGEXP_EXTRACT(my_column,r'my_campaign=([^&])')) AS my_campaign
it returns everything with my_campaign my_campaign=abb_hc_hr, my_campaign=goal_healthcare etc. How should I change the existing code to just grab my_campaign=goal?
Thank you.

Below is for BigQuery Standard SQL
You should use below
SELECT
LOWER(REGEXP_EXTRACT(my_column,r'(my_campaign=[^&]*)&?')) AS my_campaign
FROM your_table
WHERE LOWER(my_column) LIKE '%my_campaign=goal_%'
if applied to sample data from your question - output is
Row my_campaign
1 my_campaign=goal_healthcare
2 my_campaign=goal_hr

Related

SQL group by middle part of string

I have string column that looks usually approximately like this:
https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554
https://mapy.cz/turisticka?x=15.9380354&y=50.1990211&z=11&source=base&id=2197
https://mapy.cz/turisticka?x=12.8611357&y=49.8051338&z=16&source=base&id=1703157
I would like to group data by source which is part of the string - four letters behind "source=" (in the case above: firm) and then simply count them. Is there a way to achieve this directly in SQL code? I am using hadoop.
Data is a set of strings that look like above. My expected result is summary table with two columns: 1) Each type of the source (there is about 20 possible and their length is different so I cannot use sipmle substring). Ideally I am looking for solution that says: For the grouping use four letters that come after "source=" 2) Count of their occurences in all the strings.
There is just one source type in each string.
You can use regexp_extract():
select substr(regexp_extract(url, 'source[^&]+'), 8)
You can use charindex in MSSQL to get position of string and extract record
;with cte as (
SELECT SUBSTRING('https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554',
charindex('&source=','https://mapy.cz/zakladni?x=16.3360208&y=49.6718038&z=8&source=firm&id=13123554')
+8,4) AS ExtractString )
select ExtractString,count(ExtractString) as count from cte group by ExtractString;
There is equivalent function LOCATE in hiveql for charindex.

MS SQL - Show values of dynamic length

I have a column called command in my table where information is stored about e-mails that should be sent out by the system.
The data looks like this:
<_email><property name="To">test#test.se;test#tester.se;test#test.com</property><property name="From">sender#sender.se</property>
<_email><property name="To">test#test.se</property><property name="From">sender#sender.com</property>
I want to use a select statement to only display the e-mail addresses of those who will receive the e-mail. By doing this, the output should look like this:
Example row 1:
test#test.se;test#tester.se;test#test.com
Example row 2:
test#test.se
I can't use substring since the e-mail addresses varies in length. I assume that it's possible to achieve this somehow by using regular expressions, but I cannot manage to resolve it.
Can you please help me out?
Thanks!
/ Krustofski
You can use the CharIndex function in order to retrieve the start and the end of your string, and use a little math to execute the substring:
Select Substring(MyColumn,
CharIndex('<property name="To">', MyColumn) + 20,
CharIndex('</property>', MyColumn) -
CharIndex('<property name="To">', MyColumn) - 20
)
From MyTable
I tested with your table values, and it works.

Why does my update query to replace string not work?

I have an Access table where I have transaction IDs in the below format:
Transaction_ID
39296165-1
39296165-2
39296165-3
39284029-1
39284029-2
I am trying to write a query which finds the dash and removes the -1,-2,-3 etc., so I can then de-duplicate based on the string before the dash.
I've written the below:
UPDATE mytable
SET Transaction_ID=Left(Transaction_ID,InStr(1,Transaction_ID,"-")-1)*
Which works fine, however, when it comes across a Transaction_ID which doesn't have a dash in the string, it gives me a type conversion and replaces the string with a blank value.
Any advice on error-trapping this?
Add a WHERE clause to only update if InStr does not return -1:
WHERE InStr(1,Transaction_ID,"-") > 0
This would also work and would be more efficient.
WHERE Transaction_ID LIKE "*-*"

How to Filter WHERE Field Value LIKE any of the values stored in a Multi Value Parameter in SQL

I have a report (built using SSRS) that uses a multi-value parameter.
I want to add a Filter onto my SQL Query WHERE FieldA is LIKE any of the values stored in the parameter.
So FieldA might have the following values:
BOBJAMESLOUISE
MARYBOB
JENNY
JOHNLOUISEJAMES
BOB
JENNYJAMESMIKE
And #ParamA might have the following values:
Bob, Louise
Therefore in this example only records 1, 3, 4 and 5 should be returned
Thanks to any help in advance :)
P.S I'm using SQL Server 2008
You will want to implement a function like the split function. This can take a comma separated value list and separate it into rows like you want.
Below is a link for a couple of different versions, any of them will work for you. It also tells you how to use it.
Split Function
I am guessing its not the spiting sting part that is the issue since just googling for SQL split string you can find a lot of example. In your case what you would want after the split string is something like this. Assuming that the split string function you end up using returns a table of values Here is what your comparison query for with field A would look like.
SELECT * FROM YourTableWithFieldA WHERE (#ParamA IS NULL OR EXISTS ( SELECT * FROM YourSplitFunctionThatReturnsATableOfValues(#ParamA) SplitTable WHERE (FieldA Like '%'+SplitTable.Value+'%')))

Convert special chars to RAW format in Oracle

How do I convert a special chars like '#' to RAW format in Oracle?I need it for searching in the blob like this.
following code is giving me all rows in table as a result
dbms_lob.instr(gob_a_document, utl_raw.cast_to_raw('C#')) <> 0)
Or is there a better way?
I tried your code out, and I think it's correct for that line. On Oracle 11.2.0.1, I used your code to do something basically the same:
select v.*
from V_INCOMING_MAIL v
where dbms_lob.instr(v.message_text, utl_raw.cast_to_raw('C#'),1,1) <> 0;
This selected 9 rows out of the 15 thousand in the view. From an ad-hoc sampling of those rows and some others, that seems to be working OK.
So, perhaps the problem lies in the other lines in the SQL statement?