Apache Pig: REGEX_EXTRACT second dir name from a string containing a path - apache-pig

Good afternoon,
After extracting data from a NameNode log and filtering it, I get an output like:
DUMP USERS_AND_DIRS;
(UserName,/user/UserName/Dir1,/user/UserName/Dir2)
(UserName2,/user/UserName2/Dir1,/user/UserName2/Dir2)
(UserName,/hdfs/data/Dir1,/hdfs/data/Dir2)
(UserName,/user/UserName/Dir1,/user/UserName2/Dir2)
AS (User, Source, Destination)
Now I want to filter by users using directories that are not their own.
I have :
ONLY_IN_USER_DIR = FILTER USERS_AND_DIRS BY (Source MATCHES '/user/(.*)') OR (Destination MATCHES '/user/(.*)');
Which works fine. But this does't work :
USING_DIR_OF_OTHER_USER = FILTER ONLY_IN_USER_DIR BY NOT(Source MATCHES '/user/$User/(.*)') OR NOT (Destination MATCHES '/user/$User/(.*)');
Given the input, I'd like to only get :
(UserName,/user/UserName/Dir1,/user/UserName2/Dir2)
Which are users accessing files in directories other than their own, as a source or destination. I tried different things, yet I can't find how to do this ?
Edit: I think I can do something like
DIR_OF_OTHER_USER = FILTER ONLY_IN_USER_DIR BY User != REGEX_EXTRACT(Source,'/(.*)/',2) OR User != REGEX_EXTRACT(Destination,'/(.*)/',2);
But when Source is "/user/UserDir/Dir1/Dir2", REGEX_EXTRACT(Source,'/(.*)/',1) returns (user) and REGEX_EXTRACT(Source,'/(.*)/',2) returns nothing.
So I guess my question is now : how can I extract the second directory from a string containing a path ? From "/user/UserDir/Dir1/Dir2" and I would like to extract "UserDir"

how can I extract the second directory from a string containing a path ? From "/user/UserDir/Dir1/Dir2" and I would like to extract "UserDir"
How about
STRSPLIT(Source,'/').$2;
 

Ok, found it :
DIR_OF_OTHER_USER = FILTER ONLY_IN_USER_DIR BY (
User != REGEX_EXTRACT(Source, '^\\/([^\\/]*)\\/([^\\/]*)', 2) AND REGEX_EXTRACT(Source, '^\\/([^\\/]*)\\/([^\\/]*)', 1) == 'user'
) OR (
User != REGEX_EXTRACT(Destination, '^\\/([^\\/]*)\\/([^\\/]*)', 2) AND REGEX_EXTRACT(Source, '^\\/([^\\/]*)\\/([^\\/]*)', 1) == 'user'
);
This gets all files used by a user, which is in a /user/* directory but not in the user's directory /user/UserName/ . Although there might be a simpler answer.

Related

Using a variable content to generate a variable name

I have a script for a game server, and I'm stuck in some shit that look easy to solve
Exists a variable which receive content dynamically based on user action, so we will name this variable: example, and attrib some random value
local example = Potato
Then I have a function which sends a message to a discord webhook
SendWebhookMessage(varNAME, "Message content")
Where varname is the variable containing the link of the webhook.
I want to use the content of variable example to generate the variable name like
webhook_ds_example
So in this case it will be
webhook_ds_potato
Hope you guys could understand and help to solve
local menu = { name = "Baú" }
local cb_take = function(idname)
local citem = chest.items[idname]
local amount = vRP.prompt(source,"Quantidade:","")
amount = parseInt(amount)
if amount >= 0 and amount <= citem.amount then
local new_weight = vRP.getInventoryWeight(user_id)+vRP.getItemWeight(idname)*amount
if new_weight <= vRP.getInventoryMaxWeight(user_id) then
citem.amount = citem.amount - amount
local temp = os.date("%x %X")
vRP.logs("savedata/bau.txt","Bau: "..name.." [ID]: "..user_id.." /"..temp.." [FUNÇÃO]: Retirar / [ITEM]: "..idname.." / [QTD]: "..amount)
local webhook_bau_fac1 = ""
local webhook_bau_fac2 = ""
local webhook_bau_fac3 = ""
local webhook_bau_fac4 = ""
SendWebhookMessage(webhook_bau_..name,"```prolog\n[ID]: "..user_id.." "..identity.name.." "..identity.firstname.." \n[GUARDOU]: "..vRP.format(parseInt(amount)).." "..vRP.itemNameList(itemName).." \n[BAU]: "..chestName.." "..os.date("\n[Data]: %d/%m/%Y [Hora]: %H:%M:%S").." \r```")
You are looking for tables, which let you store lots of different named values in one variable.
local webhook_bau -- make a variable
-- create a table with 4 entries, put it in the variable
webhook_bau = {fac1="", fac2="", fac3="", fac4=""}
-- if you want to start with an empty table, use {} instead
-- change one of them based on the name
webhook_bau[name] = "something"
-- use one of the entries based on the name
SendWebhookMessage(webhook_bau[name], "whatever you want to send")
Table entries "magically" appear when you use them, you don't have to create them first. If you access an entry that doesn't exist, you will read the value nil. You can also delete an entry by putting nil in the entry.
In the example you provided, you have:
SendWebhookMessage(webhook_bau_..name,"```prolog\n[ID]: "..user_id.." "..identity.name.." "..identity.firstname.." \n[GUARDOU]: "..vRP.format(parseInt(amount)).." "..vRP.itemNameList(itemName).." \n[BAU]: "..chestName.." "..os.date("\n[Data]: %d/%m/%Y [Hora]: %H:%M:%S").." \r```")
You are missing quotes around webhook_bau_ which would result in an error trying to concatenate a nil variable.
I'm also not seeing where name is set (only menu.name), so I'm assuming you have that elsewhere in your program, if not, that will also be nil, so just make sure that is set somewhere as well.

Terraform: How Do I Setup a Resource Based on Configuration

So here is what I want as a module in Pseudo Code:
IF UseCustom, Create AWS Launch Config With One Custom EBS Device and One Generic EBS Device
ELSE Create AWS Launch Config With One Generic EBS Device
I am aware that I can use the 'count' function within a resource to decide whether it is created or not... So I currently have:
resource aws_launch_configuration "basic_launch_config" {
count = var.boolean ? 0 : 1
blah
}
resource aws_launch_configuration "custom_launch_config" {
count = var.boolean ? 1 : 0
blah
blah
}
Which is great, now it creates the right Launch configuration based on my 'boolean' variable... But in order to then create the AutoScalingGroup using that Launch Configuration, I need the Launch Configuration Name. I know what you're thinking, just output it and grab it, you moron! Well of course I'm outputting it:
output "name" {
description = "The Name of the Default Launch Configuration"
value = aws_launch_configuration.basic_launch_config.*.name
}
output "name" {
description = "The Name of the Custom Launch Configuration"
value = aws_launch_configuration.custom_launch_config.*.name
}
But how the heck do I know from the higher area that I'm calling the module that creates the Launch Configuration and Then the Auto Scaling Group which output to use for passing into the ASG???
Is there a different way to grab the value I want that I'm overlooking? I'm new to Terraform and the whole no real conditional thing is really throwing me for a loop.
Terraform: How to conditionally assign an EBS volume to an ECS Cluster
This seemed to be the cleanest way I could find, using a ternary operator:
output "name {
description = "The Name of the Launch Configuration"
value = "${(var.booleanVar) == 0 ? aws_launch_configuration.default_launch_config.*.name : aws_launch_configuration.custom_launch_config.*.name}
}
Let me know if there is a better way!
You can use the same variable you used to decide which resource to enable to select the appropriate result:
output "name" {
value = var.boolean ? aws_launch_configuration.custom_launch_config[0].name : aws_launch_configuration.basic_launch_config[0].name
}
Another option, which is a little more terse but arguably also a little less clear to a future reader, is to exploit the fact that you will always have one list of zero elements and one list with one elements, like this:
output "name" {
value = concat(
aws_launch_configuration.basic_launch_config[*].name,
aws_launch_configuration.custom_launch_config[*].name,
)[0]
}
Concatenating these two lists will always produce a single-item list due to how the count expressions are written, and so we can use [0] to take that single item and return it.

How can I create dynamic destination files name based on what is filtered?

For example if in my log line appears something like that [xxx], I must put this message in a file with a name starting as xxx.log
And if the message changes and appears [xxy] I must create a new log file named as xxy.log
How can I do that in a syslog-ng config file?
to filter for specific messages, you can use filter expressions in syslog-ng:
You can use regular expressions in the filter as well.
To use the results of the match in the filename, try using a named pattern in the filter expression:
filter f_myfilter {message("(?<name>pattern)");};
Then you can use the named match in the destination template:
destination d_file {
file ("/var/log/${name}.log");
};
Let me know if it works, I haven't had the time to test it.
I find this way to resolve mi problem.
parser p_apache {
csv-parser(columns("MY.ALGO", "MY.MOSTRAR", "MY.OTRA")
delimiters("|")
);
};
destination d_file {
file("/var/log/syslog-ng/$YEAR-$MONTH/$DAY/messages-${MY.ALGO:-nouser}.log");
};
Regex is the answer here.
Eg: I have a file name access2018-10-21.log for source so my access log source file entry becomes
file("/opt/liferay-portal-6.2-ee-sp13/tomcat-7.0.62/logs/access[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9][0-9].log" follow_freq(1) flags(no-parse));

filter result by taking out a matching regex in pig latin

I have some data that contains a url string, which all have some variety substring embeded.
my goal to to get a set of results which have the substring removed from the string:
e.g.
rawdata: {
id Long,
url String
}
here's some sample rawdata:
1,/213112341_v1.html
2,43524254243_v2.html
5,/000000_v3.html
5,/000000_v4.html
the result I want is:
1,/213112341.html
2,43524254243.html
5,/000000.html
so basically remove teh subversion number( _v1|_v2|v3|_v4) from the url and create unique results.
How do I do that in pig?
Thanks,
Your best bet would be to do something like the following:
FOREACH data GENERATE id, CONCAT(REGEX_EXTRACT(url, '(/?[0-9]*)_,',1),'.html');
EDIT:
How about trying the following if the data is more complicated
FOREACH data GENERATE id, CONCAT(STRSPLIT(url, '_v[0-9]',1),'.html')
That should get everything before the version #, with the concat adding the .html back in. If both the before verson number and after verison number sections are more comlicated you could do something like:
FOREACH data GENERATE id, CONCAT(FLATTEN(STRSPLIT(url, '_v[0-9]',2)))

SSIS filename - file count

I'm currently creating a flat file export for one of our clients, i've managed to get the file in the format they want, i'm trying to get the easiest way of creating a dynamic file name. I've got the date in as a variable and the path ect but they want a count in the file name. For example
File name 1 : TDY_11-02-2013_{1}_T1.txt. The {} being the count. So next weeks file would be TDY_17-02-2013_{2}_T1.txt
I cant see an easy way of doing this!! any idea's??
EDIT:
on my first answer, I thought you meant count of values returned on a query. My bad!
two ways to achieve this, you could loop into the destination folder, select the last file by date, get its value and increase 1, which sound like a lot of trouble. Why not a simple log table on the DB with last execution date and ID and then you compose your file name base on the last row of this table?
where exactly is your problem?
you can make a dynamic file name using expressions:
the count, you can use a "row count" component inside your data flow to assign the result to a variable and use the variable on your expression:
Use Script task and get the number inside the curly braces of the file name and store it in a variable.
Create a variable(FileNo of type int) which stores the number for the file
Pseudo code
string name = string.Empty;
string loction = #"D:\";
/* Get the path from the connection manager like the code below
instead of hard coding like D: above
string flatFileConn =
(string(Dts.Connections["Yourfile"].AcquireConnection(null) as String);
*/
string pattern = string.Empty;
int number = 0;
string pattern = #"{([0-9])}"; // Not sure about the correct regular expression to retrieve the number inside braces
foreach (string s in Directory.GetFiles(loction,"*.txt"))
{
name = Path.GetFileNameWithoutExtension(s);
Match match = Regex.Match(name, pattern );
if (match.Success)
{
dts.Variables["User::FileNo"].Value = int.Parse(match.Value)+1;
}
}
Now once you get the value use it in your file expression in the connection manager
#[User::FilePath] +#[User::FileName]
+"_{"+ (DT_STR,10,1252) #[User::FileNo] + "}T1.txt"