VB.net string split get the first value - vb.net

I am working with a string that will always have "Product 1:"
in it. I want to get the value of what comes after "product 1:". Here are some example of the string
Hello UserName Product 1: Alcatel Product 2: Samsung Truck: Fedex
Ending: Canada
Hello UserName Product 1: NOKIA Truck: Fedex Ending: Canada
Hello UserName Product 1: Alcatel Product 2: Samsung Product 3: NOKIA
Truck: Canada POST Ending: Brazil
Hello UserName Product 1: Alcatel-55 Special Product 2: Samsung
Product 3: NOKIA Truck: Canada POST Ending: Brazil
Hello UserName Product 1: Samsung Galaxy S6-33 Truck: Canada POST
Ending: Brazil
The string I am looking for:
Alcatel
NOKIA
Alcatel
Alcatel-55
Samsung Galaxy S6-33
I was having a little bit of luck with
sMessage.Split("Product 1:")(1).Split(":")(1)
But with above code I still get
Alcatel
NOKIA Truck
Alcatel
Alcatel-55
Samsung Galaxy S6-33 Truck

You could replace the undesired values with:
sMessage.Split("Product 1:")(1).Split(":")(1).Replace(" Truck", "")
You can also match the desired string with regular expressions:
(?<=Product 1: )((.*)(?= Product 2:)|(.*)(?= Truck:))
Example:
Dim regex As Regex = New Regex("(?<=Product 1: )((.*)(?= Product 2:)|(.*)(?= Truck:))")
Dim match As Match = regex.Match("Hello UserName Product 1: NOKIA Truck: Fedex Ending: Canada")
Note that if there are more delimiters than Product x and Truck you want to remove then you need to add those to the regular expression (or replace).
Edit
Updated the regular expression to be more generic in regards of delimiter words:
(?<=Product 1: )((.*)(?= Product 2:)|(.*?)(?= \w+:)|.*)
Now it will match on Product 2: or any other, or no delimiter.
Edit 2
More simplifications:
(?<=Product 1: )((.*?)(?= \w+ ?(\d+)?:)|.*)
This last regular expression also matches correctly if there is a delimiter with multiple digits like Products 123:.

This assumes you want the value between "Product 1:" and whatever the next "tag" happens to be, which in some cases is "Product 2:" and in other cases is "Truck:". Keep in mind that this is based only on the sample you provided - if you have other "tags" which can appear after the "Product 1:", you'll need to adjust.
The following regex will get you almost what you asked for
(?:Product 1:)(?<product>.+?)(?=(\sProduct 2:)|(\sTruck:))
Using that regex, and given your sample text, you will get capture groups named "product" as follows:
Alcatel
NOKIA
Alcatel
Alcatel-55 Special
Samsung Galaxy S6-33

Related

Bigquery REGEX getting numbers only that followed by certain text (unit)

i have tables that contain product name. i want to extract the numbers only from it. but only numbers followed by the unit (certain text) ex: gr, kg, ml, pcs.
product name | Extracted
milk 30ml | 30
Cigarette 20pcs | 20
Sugar 50gr | 50
1990 chocolate 10gr | 10
Is there any way to only getting number that followed certain text we desired? i just know how to extract numbers only but the last product will getting error.
Thank you
We can use REGEXP_EXTRACT here with a capture group:
SELECT product, REGEXP_EXTRACT(product, r'([0-9]+)(?:l|ml|gr|g|mg|[a-z]+s)\b') AS Extracted
FROM yourTable;

Use a SELECT statement to find a parts of a sentence in SQL Server

What is a SELECT statement in SQL that when I want to search for a word if a sentence of more than one word is searched:
For Example, I have that material table:
Olive oil
Mineral water
Rice
Watermelon
Fresh juice
Mini wafer
Mini milk
When I search for "Mi w" I want the following results to appear:
Mineral water
mini wafer
select column_name from material where column_name LIKE 'Mi%[ ]w%';
Try using this, it would select all entries from the column you want which matches the pattern. Refer this for more info:- LIKE

Adding in missing Country Codes into a dataset (GDP Dataset)

I have downloaded a dataset which has countries, their codes and their GDP by year in 4 columns (5 if you include the unique row number far left). I noticed however that there are some missing codes for the country codes and was wondering if anyone could help me out and tell me how to get those codes and add them in , probably from a seperate dataset I imagine . You can see this isin the pictures I posted. Second pictures shows the missing country code data. Thanks.
.
Your country codes look like ISO 3166-1, which are only defined for countries and not for the larger entities such as « East Asia » and « Western Offshoots ».
You could roll your own for these entities, see ISO country codes glossary:
User-assigned codes - If users need code elements to represent country names not included in ISO 3166-1, the series of letters [...] AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively, and the series of numbers 900 to 999 are available.
I think the easiest is to prefix them all with X so you know easily that they are your own codes. Then use the 2 next letters for initials:
East Asia: XEA
Western Offshoots: XWO
etc.

How to generate a dummy variable in Stata based on a sub-string of an existing string variable?

I am looking for a way to create a dummy variable which checks a variable called text against multiple given substrings like "book, buy, journey".
Now, I want to check if a observation has either book, buy, or journey in it. If there is one of these keywords found in the substring then the dummy variable should be 1, otherwise 0.
A example:
TEXT
Book your tickets now
Swiss is making your journey easy
Buy your holiday tickets now!
A touch of Austria in your lungs.
The desired outcome should be
dummy variable
1
1
1
0
I tried it with strpos and also regexm with very limited results.
Regards,
Johi
Using strpos may be tedious because you have to take capitalization into account, so I would use regular expressions.
* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 text
"Book your tickets now"
"Swiss is making your journey easy"
"Buy your holiday tickets now!"
"A touch of Austria in your lungs."
end
generate wanted = regexm(text, "[Bb]ook|[Bb]uy|[Jj]ourney")
list
Result:
. list
+--------------------------------------------+
| text wanted |
|--------------------------------------------|
1. | Book your tickets now 1 |
2. | Swiss is making your journey easy 1 |
3. | Buy your holiday tickets now! 1 |
4. | A touch of Austria in your lungs. 0 |
+--------------------------------------------+
See also this link for info on regular expressions.

How to find a specific word within a phrase?

I wish to know how to find a specific word within a phrase. I am trying to find the word "Pizza" within a set of keywords, however there is no keyword that only has "Pizza". There are keywords such as "Pizza Delivery" and "Pizza Delivery Boy", however they won't show up! How can I do this?
Desired output:
MOVIE KEYWORD
----------------------------------- ----------------------------------
Drive Angry Waitress
Taken France
Saving Private Ryan France
30 Minutes or Less Pizza Delivery
30 Minutes or Less Pizza Delivery Boy
My script:
SELECT MovieTitle AS "MOVIE", KEYWORDDESC AS "KEYWORD"
FROM TBLMOVIE
JOIN TBLKEYWORDDETAIL ON TBLMOVIE.MOVIEID = TBLKEYWORDDETAIL.MOVIEID
JOIN TBLKEYWORD ON TBLKEYWORDDETAIL.KEYWORDID = TBLKEYWORD.KEYWORDID
WHERE TBLKEYWORD.KEYWORDDESC IN ('France', 'Waitress', 'Pizza');
My output:
MOVIE KEYWORD
----------------------------------- ----------------------------------
Drive Angry Waitress
Taken France
Saving Private Ryan France
One method uses LIKE:
WHERE TBLKEYWORD.KEYWORDDESC LIKE '%France%' OR
TBLKEYWORD.KEYWORDDESC LIKE '%Waitress%' OR
TBLKEYWORD.KEYWORDDESC LIKE '%Pizza%'
Another method uses REGEXP_LIKE():
WHERE REGEXP_LIKE(TBLKEYWORD.KEYWORDDESC, 'France|Waitress|Pizza')
If you use REGEXP_LIKE() you should spend a little bit of time learning about regular expressions and how to use them.