I want to quickly and efficiently analyze my apache log files.
Is there a software that would read in apache log files and display visually (no parser) with a menu statistics such as distinct IPs, requests type, ...?
Since you have (odd) requirement to not have a parser, you'll need to output your logs in a descriptive way (e.g. json). So, update your apache config to write json, then use a shipper like filebeat to send them to a store like elasticsearch where you can visualize them with a tool like kibana.
The parser (logstash, in ELK's case) will allow you to add more value to your log data, so I wouldn't dismiss it so quickly.
Related
I dont have much experience with elk stack I basically only know the basics.
Something i.e. filebeat gets data and sends it to logstash
Logstash processes it and sends it Elastic search
Kibana uses elastic search to visualise data
(I hope that thats correct)
I need to create an elk system where data from three different projects is passed, stored and visualised.
Project no1. Uses MongoDB and I need to get all the information from 1 table into kibana
Project no2. Also uses MongoDB and I need to get all the information from 1 table into kibana
Project no3. Uses mysql and I need to get a few tables from that database into kibana
All three of these projects are on the same server
The thing is for Projects 1 and 2 I need the data flow to be constant (i.e. if a user registers I can see that in kabana)
But for Project no3. I only need the data when I need to generate a report (this project functions as a BI of sorts)
So my question is how does one go about creating an elk architecture that gets the inputs from these 3 sources and is able to combine into one elk project.
My best guess is :
Project No1 -> filebeat -> logstash
Project No2 -> filebeat -> logstash
Project No3 -> logstash
(logstash here being a single instance that then feeds into elastic)
Would this be a realistic approach?
I also stumbled upon redis, and from the looks of it it looks like it can combine all the data sources into one and then feed the output to logstash.
What would be the better approach?
Finally, I mentioned filebeat, but from what I understand it basically reads the data from a log file. Would that mean that I would have to re-write all my database entries into a log file in order to feed them into logstash or can logstash tap into the DB without an intermediary.
I tried looking for all of this online, but for some reason the internet is a bit scarce on ELK stack beginner questions.
Thanks
filebeat is used for shipping logs to logstash, you can't use it for reading items from DB. But you can read from DB using logstash's input plugins.
From what you're describing you'll need a logstash instance with 3 pipelines (one per project)
For project 3 you can use Logstash JDBC input plugin to connect to your mysql DB and read new/updated lines based on some "last_updated" column.
JDBC input plugin has a cron confguration value, that allows you to set it up to run periodically and read updated lines with an SQL query that you define in configuration.
For projects 1-2 you can also use the JDBC input plugin with mongoDB.
There is also an Open Source implementation for a mongoDB input plugin on git. You can check this post for how to use it here.
(see the full list of input plugins here)
If that works for you and you manage to set it up, then the rest will be about the same for all three configurations.
i.e. using filter plugins to modify data, and Elasticsearch output plugin to push data to an elastic index.
So I have typical run of the mill logs from Nginx and tomcat servers which are just single line text files with typical log format. I have changed the tomcat access logs to output pipe delimited fields so I can easily process them using some unix scripts. I'd like to get rid of my unix scripts and move to using cloudwatch to process my logs in a similar manner, however I found out that cloudwatch really doesn't understand anything beyond timestamp, message, and logstream by default.
It will add fields using JSON, but JSON is verbose when it comes to log files. I'd like to just let it process a CSV file which seems like an obvious alternative to JSON. I'm willing to change my log format to meet a requirement like that, but I can't find any information about how I could do that.
Is my only option to translate my logs into JSON in order to add fields to cloudwatch? I am aware of the parse command, but I find that cumbersome to reconstitute my fields every time I want to build a query. Especially since these will mostly be access logs which will have numerous fields. I have aws cloudwatch log agent setup on my systems and I'm currently sending these logs to cloudwatch.
The closest thing there is to handling space delimited log files is to use Metric Filters. Or at least that's how the authors of CloudWatch designed it.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html
The best examples of this is here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CountOccurrencesExample.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/ExtractBytesExample.html
Not sure if this is going to work for what I'm trying to do with logs, but it's a start. And it's the closest thing to a proper answer. If you want it done right, you gotta do it yo'self.
Using Apache 2.4, we had logging set to LogLevel=info and got lots of lines. In there were AH00128: File does not exist lines. Since then we reset to LogLevel=warn and have minimal entries.
However, I've been approached by people wanting to see the AH00128 lines, to keep abreast of hacking attempts.
Is there a way to configure Apache to always log certain errors, even when LogLevel is set to minimal reporting?
Thanks,
Jerome.
It's not currently possible to filter or promote messages based on ID or text and it requires some non trivial work to add it.
The best you could do is use the more liberal LogLevel (at least for the core module (LogLevel warn core:info) and then create a small filtering program to discard core:info w/o the message ID you care about.
Piped loggers can be as simple a shell or perl script that reads log entries from stdin and does whatever you want with them.
I have generated webgrapgh db in apache nutch using command 'bin/nutch webgraph -segmentDir crawl/segments -webgraphdb crawl/webgraphdb'.... It generated three folders in crawl/webgraphdb which are inlinks, outlinks and nodes. Each of those folders contained two binary files like data and index. How to get visual web graph in apache nutch? What is the use of web graph?
The Webgraph is intented to be a step in the score calculation based on the link structure (i.e webgraph):
webgraph will generate the data structure for the specified segment/s
linkrank will calculate the score based on the previous structures
scoreupdater will update the score from the webgraph back into the crawldb
Be aware that this program is very CPU/IO intensive and that will ignore the internal links of a website by default.
You could use the nodedumper command to get useful data out of the webgraph data, including the actual score of a node and the highest scored inlinks/outlinks. But this is not intented to be visualized, although you could parse the output of this command and generate any visualization that you may need.
That being said, since Nutch 1.11 the plugin index-links has been added, which will allow you to index into Solr/ES the inlinks and outlinks of each URL. I've used this plugin indexing into Solr along with the sigmajs library to generate some graph visualizations of the link structure of my crawls, perhaps this could suit your needs.
I have a server with 10+ virtual domains (most running Mediawiki). I'd like to be able to watch their traffic remotely with something nicer than tail -f . I could cobble something together, but was wondering if something super-deluxe already exists that involves a minimum of hacking and support. This is mostly to understand what's going on, not so much for security (though it could serve that role too). It must:
be able to deal with vhost log files
be able to handle updates every 10 seconds or so
Be free/open source
The nice to haves are:
Browser based display (supported by a web app/daemon on the server)
Support filters (bots, etc)
Features like counters for pages, with click to view history
Show a nice graphical display of a geographic map, timeline, etc
Identify individual browsers
Show link relationships (coming from remote site, to page, to another page)
Be able to identify logfile patterns (editing or creating a page)
I run Debian on the server.
Thanks!
Take a look at Splunk.
I'm not sure if it supports real time (~10 second) updates but there are a ton of features and it's pretty easy to get set up.
The free version has some limitations but there is also an enterprise version.
Logstash is the current answer. (=
Depending on the volume, Papertrail could be free for you. It is the closest thing to a tail -f and is searchable, archivable and also sends alerts based on custom criteria.