ELK stack configuration - redis

I dont have much experience with elk stack I basically only know the basics.
Something i.e. filebeat gets data and sends it to logstash
Logstash processes it and sends it Elastic search
Kibana uses elastic search to visualise data
(I hope that thats correct)
I need to create an elk system where data from three different projects is passed, stored and visualised.
Project no1. Uses MongoDB and I need to get all the information from 1 table into kibana
Project no2. Also uses MongoDB and I need to get all the information from 1 table into kibana
Project no3. Uses mysql and I need to get a few tables from that database into kibana
All three of these projects are on the same server
The thing is for Projects 1 and 2 I need the data flow to be constant (i.e. if a user registers I can see that in kabana)
But for Project no3. I only need the data when I need to generate a report (this project functions as a BI of sorts)
So my question is how does one go about creating an elk architecture that gets the inputs from these 3 sources and is able to combine into one elk project.
My best guess is :
Project No1 -> filebeat -> logstash
Project No2 -> filebeat -> logstash
Project No3 -> logstash
(logstash here being a single instance that then feeds into elastic)
Would this be a realistic approach?
I also stumbled upon redis, and from the looks of it it looks like it can combine all the data sources into one and then feed the output to logstash.
What would be the better approach?
Finally, I mentioned filebeat, but from what I understand it basically reads the data from a log file. Would that mean that I would have to re-write all my database entries into a log file in order to feed them into logstash or can logstash tap into the DB without an intermediary.
I tried looking for all of this online, but for some reason the internet is a bit scarce on ELK stack beginner questions.
Thanks

filebeat is used for shipping logs to logstash, you can't use it for reading items from DB. But you can read from DB using logstash's input plugins.
From what you're describing you'll need a logstash instance with 3 pipelines (one per project)
For project 3 you can use Logstash JDBC input plugin to connect to your mysql DB and read new/updated lines based on some "last_updated" column.
JDBC input plugin has a cron confguration value, that allows you to set it up to run periodically and read updated lines with an SQL query that you define in configuration.
For projects 1-2 you can also use the JDBC input plugin with mongoDB.
There is also an Open Source implementation for a mongoDB input plugin on git. You can check this post for how to use it here.
(see the full list of input plugins here)
If that works for you and you manage to set it up, then the rest will be about the same for all three configurations.
i.e. using filter plugins to modify data, and Elasticsearch output plugin to push data to an elastic index.

Related

Importing RedisTimeSeries data into Grafana

I've got a process storing RedisTimeSeries data in a Redis instance on Docker. I can access the data just fine with the RedisInsight CLI:
I can also add Redis as a data source to Grafana:
I've imported the dashboards:
But when I actually try to import the data into a Grafana dashboard, the query just sits there:
TS.RANGE with a value of - +, or two timestamps, also produces nothing: (I do get results when entering it into the CLI, but not as a CLI query in Grafana.
What could I be missing?
The command you should be using in the Grafana dashboard for retrieving and visualising the data in time series stored in Redis with RedisTimeSeries is TS.RANGE for a specific key, or TS.MRANGE in combination with a filter that selects a set of time series matching this filter. List of commands with RedisTimeSeries: https://oss.redislabs.com/redistimeseries/commands/ (you're using TS.INFO which does only retrieve metadata of time series key, not the actual samples within)
So I looked into this a bit more. Moderators deleted my last answer because it didn't 'answer' the question.
There is a github issue for this. One of the developers also responded. It is broken and has been for awhile. Grafana doesn't seem to want to maintain this datasource at the moment. IMHO they should remove the redis timeseries support from their plugin library if it isn't fully baked.
[redis datasource issue for TS.RANGE]
[1]: https://github.com/RedisGrafana/grafana-redis-datasource/issues/254
Are you trying to display a graph (eg, number of people vs time)? If so, perhaps that TS.INFO is not the right command and you should use something like TS.MRANGE.
Take a look at
https://redislabs.com/blog/how-to-use-the-new-redis-data-source-for-grafana-plug-in/
for some more examples.

Adding fields to Cloudwatch without using JSON

So I have typical run of the mill logs from Nginx and tomcat servers which are just single line text files with typical log format. I have changed the tomcat access logs to output pipe delimited fields so I can easily process them using some unix scripts. I'd like to get rid of my unix scripts and move to using cloudwatch to process my logs in a similar manner, however I found out that cloudwatch really doesn't understand anything beyond timestamp, message, and logstream by default.
It will add fields using JSON, but JSON is verbose when it comes to log files. I'd like to just let it process a CSV file which seems like an obvious alternative to JSON. I'm willing to change my log format to meet a requirement like that, but I can't find any information about how I could do that.
Is my only option to translate my logs into JSON in order to add fields to cloudwatch? I am aware of the parse command, but I find that cumbersome to reconstitute my fields every time I want to build a query. Especially since these will mostly be access logs which will have numerous fields. I have aws cloudwatch log agent setup on my systems and I'm currently sending these logs to cloudwatch.
The closest thing there is to handling space delimited log files is to use Metric Filters. Or at least that's how the authors of CloudWatch designed it.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html
The best examples of this is here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CountOccurrencesExample.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/ExtractBytesExample.html
Not sure if this is going to work for what I'm trying to do with logs, but it's a start. And it's the closest thing to a proper answer. If you want it done right, you gotta do it yo'self.

Elasticsearch access control based on field value

I am currently investigating the ELK (Elasticsearch, Logstash, Kibana) stack for centralized log file analysis.
The plan is to store logs of multiple applications in the same Elasticsearch cluster using logstash and day-based indexes.
All documents contain a field called application, e.g."application": "superapp".
Now we are looking for a way to implement access control like this:
A) Superuser: is able to see log entries of all applications.
B) Developer: can only see log entries of the applications he is allowed to. For example the dev team for application "superapp" should only be able to see the entries for this application.
To wrap it up: we need access control based on the value in the field application.
While reading the documentation for Elastisearch and Shield I could not find an obvious way to do it.
Any ideas how we could realize this in a way that would also work with Kibana 3 and 4?
My first idea was to use aliases which are being automatically assigned to documents using index templates. I am wondering if this is the right direction.
I asked this question here on the elasticsearch Google Group and got this reply:
"You can separate out the different types of logs into their own indices which would make things much easier, you could also setup an alias with a filter and then provide access to that alias to certain users.
Currently KB isn't multi-tenanted but it is a feature that is going to be added, you'd have to setup multiple instances with each going to their own alias."
To sum it up: multi-tenancy needs to addressed at the frontend (Kibana) and the backend (Elasticsearch).
Frontend: Use Proxies for Kibana
https://github.com/salyh/elastic-defender
https://github.com/fangli/kibana-authentication-proxy
Backend: Several approaches using filtered alias and alias templates
Limiting Indexes and Operations
Faking Index per User with Aliases -
http://engineering.aweber.com/using-elasticsearchs-aliases/
http://opennomad.com/content/controlling-access-elasticsearch-filtered-aliases-nginx-and-tokens

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.

WSO2 Gadget Gen Tool -

I have an external Hadoop cluster (CDH4) with Hive. I used the Gadget Gen tool (BAM 2.3.0) to create a simple table gadget, but no data is populated when I add the gadget to a dashboard using the URL supplied from the gadget gen tool.
Here's my data source settings from the Gadget Generator Wizard
jdbc:hive://x.x.x.x:10000/default
org.apache.hadoop.hive.jdbc.HiveDriver
I added the following jar files to make sure I had everything required for the JDBC connection and restarted wso2server:
hive-exec-0.10.0-cdh4.2.0.jar hive-jdbc-0.10.0-cdh4.2.0.jar
hive-metastore-0.10.0-cdh4.2.0.jar hive-service-0.10.0-cdh4.2.0.jar
libfb303-0.9.0.jar commons-logging-1.0.4.jar slf4j-api-1.6.4.jar
slf4j-log4j12-1.6.1.jar hadoop-core-2.0.0-mr1-cdh4.2.0.jar
I see map reduce jobs running on my cluster during step 2 and 3 of the wizard (and the wizard shows me previews of the actual data), but I don't see any jobs submitted after the gadget is generated.
Any help appreciated.
Gadgen gen tool is for RDBMS database such as MySQL,h2, etc. you can't provide hive URL from the gadget gen tool and run it.
Generally in WSO2 BAM, the hive is used to summarize the collected data which was stored in cassandra and write the summarized final result on RDBMS database. Then from Gadget-gen tool, the gdaget xmls are created by pointing to the final result stored RDBMS database.
You can find more information on WSO2 BAM 2.3.0 documentation. http://docs.wso2.org/wiki/display/BAM230/Gadget+Generation+Tool
Make sure the URL generated for the location of Gadget XML has the correct IP/Host Name. See whether the given gadget xml is located in the registry location of the generated url. You do not have to worry about Hive / Hadoop / Cassandra stuff as they are not relevant to the Gadget. Only the RDBMS (H2 by default) data matters. Hope your problem will be resolved when Gadget location is corrected.