if url_chk_str matches any ip address similar to xxx.xxx.xxx.xxx then do something.
here x denotes any integer value from 0 to 9
url_chk_str is any string in the input data.
For example :- 123.456.789.101
is the above scenario possible in apache pig? if yes how?
Let me know if someone wants any additional information.
You can do like this :
a = load '/path/data' using PigStorage() as (url_chk_str, somethingelse);
b = foreach a generate ((url_chk_str matches '[0-9].[0-9].[0-9].[0-9]') ? value_if_true : value_if_false)
Related
How can I stop Splunk considering hostname "host" more important than "host" key?
Let's suppose that I have the following logs:
color = red ; host = localhost
color = blue ; host = newhost
The following query works fine:
index=myindex | stats count by color
but the following doesn't:
index=myindex | stats count by host
because instead of considering "host" being the key from the log, it sees the Host header as "host".
How can I deal with this?
When there are two fields with the same name one of them has to "win". In this case, it's the one Splunk defines before it processes the event itself. As you probably know, every event is given 4 fields at input time: index, host, source, and sourcetype. Data from the event won't override these unless specifically told to do so in the config files.
To override the settings, put this in your transforms.conf file
[sethost]
REGEX = host\s*=\s*(\w+)
DEST_KEY = MetaData:Host
FORMAT = host::$1
You'll also need to reference the transform in your props.conf file
[mysourcetype]
TRANSFORMS-host = sethost
I would have thought this solution would be more prominent, but I found it buried deep in the Splunk docs.
https://docs.splunk.com/Documentation/Splunk/8.2.6/Metrics/Search
You can use reserved fields such as "source", "sourcetype", or "host" as dimensions. However, when extracted dimension names are reserved names, the name is prefixed with "extracted_" to avoid name collision. For example, if a dimension name is "host", search for "extracted_host" to find it.
So, in your case:
index=myindex | stats count by extracted_host
is there a way to extract only IPv4 from a file in JSON language using VB.net
For example I would like that when I open a JSON file from VB I can filter only IPv4 from this text for example: https://pastebin.com/raw/S7Vnnxqa
& i expect the results like this https://pastebin.com/raw/8L8Ckrwi i founded this website that he offer a tool to do that https://www.toolsvoid.com/extract-ip-addresses/ i put the link here to understand more what i mean but i don't want to use an external tool i want it to be converted from VB directly thanks for your help in advance.
Your "text" is JSON. Load it using the JSON parser of your choice (google VB.NET parse JSON), loop over the matches array and read the IP address from the http.host property of each element.
Here is an example how to do it using the Newtonsoft.Json package (see it working here on DotNetFiddle):
' Assume that the variable myJsonString contains the original string
Dim myJObject = JObject.Parse(myJsonString)
For Each match In myJObject("matches")
Console.WriteLine(match("http")("host"))
Next
Output:
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
2402:800:621b:33f1:d1e3:5544:4fcf:526e
178.136.75.125
188.167.212.252
...
If you want to extract only IPv4 and not IPv6, you can use a regular expression to check whether it matches:
Dim IPV4Regex = New Regex("^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$")
Dim ip = match("http")("host")
If IPV4Regex.Match(ip).Success Then
Console.WriteLine(ip)
End If
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
178.136.75.125
188.167.212.252
...
Of course it's always recommended to parse the input data in a structured way, to avoid surprises such as false positives. But if you just want to match anything that looks like an IP address, regardless of the input format (even if you just put hello1.2.3.4world in the textbox), then you could use just the regular expression and skip the structured approach (see it working here on DotNetFiddle):
Dim IPV4RegexWithWordBoundary = New Regex("\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")
Dim match = IPV4RegexWithWordBoundary.Match(myJsonString)
Do While match.Success
Console.WriteLine(match.Value)
match = match.NextMatch()
Loop
Here I modified the regular expression to use \b...\b instead of ^...$ so that it matches word boundaries instead of start/end of string. Note however that now we get IP addresses twice with the input that you provided, because the addresses exist more than once:
62.176.84.198
62.176.84.198
197.214.169.59
197.214.169.59
46.234.76.75
46.234.76.75
...
I have a requirement where i get mail address like jon's#gmail.com. But while sending to salesforce i need to send like jon\'sram#gmail.com .
This is the scenario where i need to use.
SELECT Email,Id FROM Contact WHERE Email = 'jon\'sram#gmail.com'
Now the scenario is like below, but what happen here email address contain single quot where it is failed.
SELECT Email,Id FROM Contact WHERE Email = 'jon'sram#gmail.com'
i have tried in different ways but they seems complected.
Please help
You can use below expression in "Set Payload" to original string:
#[message.payloadAs(java.lang.String).replace("'","\\\'")]
In order to concatenate you can use the ++ operator "Foo" ++ "bar" . Also take into account that in DW string can be represented by either " or ' . In your case I would say that you need to type 'jon\\'sram#gmail.com' so that it will end up generating what you want. Regards
I have a record as detailed below in pig
(e5a22039edba467cb738f3794de577b6,{(Fortnite),(OT4),(Main),(New User),(Manual)},bbeeabd3d3ed42c1a7e65838fabb16e3)
I would like to access the data in {(Fortnite),(OT4),(Main),(New User),(Manual)} along the other values in the record. Please suggest how I can do this
Thanks
You don't give us a lot of information, but I guess you can use $X to access your fields. " {(Fortnite),(OT4),(Main),(New User),(Manual)} " is bag in the second position so you can access it with $1 ($0 is the first element).
A sample of code to access to "Fortnite" would be :
A = LOAD 'file.txt' USING PigLoader(',') AS [.......];
B = FOREACH A GENERATE $1.$0 as Fortnite;
Is that ok ?
I have a CSV file with 3 columns: tweetid , tweet, and Userid. However within the tweet column there are comma separated values.
i.e. of 1 row of data:
`396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
I want to extract all 3 fields individually, but REGEX_EXTRACT is giving me an error with this code:
a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\"(.*)',1);
The error is:
error: Filter's condition must evaluate to boolean.
In the use case shared, reading the data using PigStrorage(',') will result in missing savava143 (last field value)
A = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(',') AS (f1,f2,f3);
DUMP A;
Output : A : Observe that the last field value is missing.
(396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.")
For the use case shared, to extract all the values from CSV file with field values having ',' we can use either CSVExcelStorage or CSVLoader.
Approach 1 : Using CSVExcelStorage
Ref : http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
Input : a.csv
396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
Pig Script :
REGISTER piggybank.jar;
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (f1,f2,f3);
DUMP A;
Output : A
(396124437168537600,I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.,savava143)
Approach 2 : Using CSVLoader
Ref : http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html
Below script makes use of CSVLoader(), DUMP A will result in the same output seen earlier.
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);
The error is that you do not want to FILTER based on a regex but GENERATE new fields based on a regex. To filter, you need to know if the line have to be filtered, hence the boolean requirement.
Therefore, you have to use :
b = FOREACH a GENERATE REGEX_EXTRACT(FIELD, REGEX, HOW_MANY_GROUPS_TO_RETURN);
However, as #Murali Rao said, your values are not just coma separated but CSV (think how you will handle a coma in tweet : it is not a field separator, just some content).