Adding tags to logstash events based on the md5 of the filename - amazon-s3

I'm trying to link logstash events with the eventual file location on AWS S3. We have the logstash agent indexing files directly, and when the file has finished being written to, we send it to S3.
To increase S3 performance, we're fanning out files by storing them like so:
hex(md5(filename.log))[0..2]/filename.log
This takes the first 3 characters of the md5 hexdigest, and stores the file in the folder with that prefix, providing a fairly solid fan out of files. Unfortunately, I can't work out how to tag each log event with this information.
There is the ruby filter type which allows you to execute ruby code, but I don't think it allows you to use the result of the computation.
filter {
ruby {
code => "require 'digest/md5'; Digest::MD5.hexdigest("mylong.file.name")[0..2]"
# now what?
}
}
Is there a way of attaching a tag or field based on a prefix of the md5?

Your code will have a variable event which is the event itself.
To add a field "foo" with value "bar", you could write something like this:
event["foo"] = "bar"
See how the file input does it, for example.
If you find your code is a bit unwieldy, in a config file, you could write your own input or filter plugin.

Try this:
filter {
ruby {
code => "require 'digest/md5';
event['md5'] = Digest::MD5.hexdigest("mylong.file.name")[0..2]"
}
}
The "md5" field is what you want.

Related

Mockoon | Configure mockoon with a seed to always return the same random data between requests

Although I am using faker to generate random data I would like the responses of a given endpoint to be the same on every request.
I noticed the general settings have an input text for setting a seed but does nothing regarding what I am aiming at.
Faker.js seeding is not always generating the same information.
Instead, it will always generate the same sequence. I'm not sure it's documented on Faker.js' website, but I added this to Mockoon's documentation.
If you want the answer to always be the same you can either:
use the inline body with "static" data (without templating), something like:
{
"username": "john"
}
use the data buckets feature as they are only generated once when the server starts.

How to create a custom named time based indexing using logstash?

Currently i am trying to create a time based indexing in an Elastic search using Logstash. I am trying to create a time based indexing.
I have come across and successfully tried the following output mechanism in Logstash config file.
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "logstash-%{+xxxx.ww}"
}
file{path=>"C:/output3.txt"}
}
This is working fine But the name is required to be "logstash".
I need to give it some custom name other than logstash.
Can you please suggest something.
Thanks again

Using a wildcard on S3 Event Notification prefix

I have a Lambda function that creates a thumbnail image for every image that gets uploaded to my bucket, it then places the Thumbnail inside another bucket. When I upload a user image (profile pic) I use the users ID and name as part of the key:
System-images/users/250/john_doe.jpg
Is there a way to use a wildcard in the prefix path? This is what I have so far but it doesn't work
No, you can't -- it's a literal prefix.
In your example, you could use either of these prefixes, depending on what else is in the bucket (if there are things sharing the common prefix that you don't want to match):
System-images/
System-images/users/
Wildcards in prefix/suffix filters of Lambda are not supported and will never be since the asterisk (*) is a valid character that can be used in S3 object key names. However, you could somehow fix this problem by adding a filter in your Lambda function. For example:
First, get the source key:
var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
Then, check if it is inside the users folder:
if (srcKey.indexOf('/users/') === -1) {
callback('Not inside users folder!');
return;
}

Indexing file paths or URIs in Lucene

Some of the documents I store in Lucene have fields that contain file paths or URIs. I'd like users to be able to retrieve these documents if their query terms contain a path or URI segment.
For example, if the path is
C:\home\user\research\whitepapers\analysis\detail.txt
I'd like the user to be able to find it by queriying for path:whitepapers.
Likewise, if the URI is
http://www.stackoverflow.com/questions/ask
A query containing uri:questions would retrieve it.
Do I need to use a special analyzer for these fields, or will StandardAnaylzer do the job? Will I need to do any pre-processing of these fields? (To replace the forward slashes or backslashes with spaces, for example?)
Suggestions welcome!
You can use StandardAnalyzer.
I tested this, by adding the following function to Lucene's TestStandardAnalyzer.java:
public void testBackslashes() throws Exception {
assertAnalyzesTo(a, "C:\\home\\user\\research\\whitepapers\\analysis\\detail.txt", new String[]{"c","home", "user", "research","whitepapers", "analysis", "detail.txt"});
assertAnalyzesTo(a, "http://www.stackoverflow.com/questions/ask", new String[]{"http", "www.stackoverflow.com","questions","ask"});
}
This unit test passed using Lucene 2.9.1. You may want to try it with your specific Lucene distribution. I guess it does what you want, while keeping domain names and file names unbroken. Did I mention that I like unit tests?

Get id of file upload control

I am trying to find the name of ID of the input item that coresponds to the
file that is being uploaded...
<input type="file" id="FUtxtval1" name="FUtxtval1"/>
iterating over input items to find the first file input field:
function FindFirstFileFieldId()
{
var inputFields = document.getElementsByTagName("input")
for(var i=0;i<inputFields.length;i++)
{
if(inputFields[i].type=="file")
return inputFields[i].id;
}
}
The ID of the element is simply "FUtxtval1" (whatever is in the ID tag)
--
For JavaScript you can access this by using
var element = document.getElementById('FUtxtval1');
So you could then do something like
document.element.disabled=true;
--
For jQuery (Also JavaScript) you would use
$('#FUtxtval1').whatever
--
For PHP you would use
$_POST['FUtxtval1']
Assuming this is part of a form
For PHP if you actually want the file you use the handle
$_FILES['FUtxtval1']['whateverwanted'];
See http://www.tizag.com/phpT/fileupload.php
If the problem is that there may be many input tags on the form, and you're interested in discovering which one is specifically used for uploading files, this bit of jQuery code would accomplish that:
var id = $('input[type=file]').attr('id');
If the problem is that you know the element's ID but do not know the name of the field, you can use:
var name = $('#FUtxtval1').attr('name');
If you're hoping to find out the filename of the file your visitor has chosen in that field through JavaScript, you're stuck. JavaScript does not get any access to that information. You'll have to submit the form and let a server-side script determine the filename at that time.
If I understand correctly, you are trying to obtain the id of the uploaded file using javascript? If so, you will have to process the uploaded file using php ($_FILES['FUtxtval1']) and then print the id to a javascript variable.
Is that what you wanted?
If not, update your q to provide a bit more info about what you are trying to achieve.