Importing RedisTimeSeries data into Grafana - redis

I've got a process storing RedisTimeSeries data in a Redis instance on Docker. I can access the data just fine with the RedisInsight CLI:
I can also add Redis as a data source to Grafana:
I've imported the dashboards:
But when I actually try to import the data into a Grafana dashboard, the query just sits there:
TS.RANGE with a value of - +, or two timestamps, also produces nothing: (I do get results when entering it into the CLI, but not as a CLI query in Grafana.
What could I be missing?

The command you should be using in the Grafana dashboard for retrieving and visualising the data in time series stored in Redis with RedisTimeSeries is TS.RANGE for a specific key, or TS.MRANGE in combination with a filter that selects a set of time series matching this filter. List of commands with RedisTimeSeries: https://oss.redislabs.com/redistimeseries/commands/ (you're using TS.INFO which does only retrieve metadata of time series key, not the actual samples within)

So I looked into this a bit more. Moderators deleted my last answer because it didn't 'answer' the question.
There is a github issue for this. One of the developers also responded. It is broken and has been for awhile. Grafana doesn't seem to want to maintain this datasource at the moment. IMHO they should remove the redis timeseries support from their plugin library if it isn't fully baked.
[redis datasource issue for TS.RANGE]
[1]: https://github.com/RedisGrafana/grafana-redis-datasource/issues/254

Are you trying to display a graph (eg, number of people vs time)? If so, perhaps that TS.INFO is not the right command and you should use something like TS.MRANGE.
Take a look at
https://redislabs.com/blog/how-to-use-the-new-redis-data-source-for-grafana-plug-in/
for some more examples.

Related

ELK stack configuration

I dont have much experience with elk stack I basically only know the basics.
Something i.e. filebeat gets data and sends it to logstash
Logstash processes it and sends it Elastic search
Kibana uses elastic search to visualise data
(I hope that thats correct)
I need to create an elk system where data from three different projects is passed, stored and visualised.
Project no1. Uses MongoDB and I need to get all the information from 1 table into kibana
Project no2. Also uses MongoDB and I need to get all the information from 1 table into kibana
Project no3. Uses mysql and I need to get a few tables from that database into kibana
All three of these projects are on the same server
The thing is for Projects 1 and 2 I need the data flow to be constant (i.e. if a user registers I can see that in kabana)
But for Project no3. I only need the data when I need to generate a report (this project functions as a BI of sorts)
So my question is how does one go about creating an elk architecture that gets the inputs from these 3 sources and is able to combine into one elk project.
My best guess is :
Project No1 -> filebeat -> logstash
Project No2 -> filebeat -> logstash
Project No3 -> logstash
(logstash here being a single instance that then feeds into elastic)
Would this be a realistic approach?
I also stumbled upon redis, and from the looks of it it looks like it can combine all the data sources into one and then feed the output to logstash.
What would be the better approach?
Finally, I mentioned filebeat, but from what I understand it basically reads the data from a log file. Would that mean that I would have to re-write all my database entries into a log file in order to feed them into logstash or can logstash tap into the DB without an intermediary.
I tried looking for all of this online, but for some reason the internet is a bit scarce on ELK stack beginner questions.
Thanks
filebeat is used for shipping logs to logstash, you can't use it for reading items from DB. But you can read from DB using logstash's input plugins.
From what you're describing you'll need a logstash instance with 3 pipelines (one per project)
For project 3 you can use Logstash JDBC input plugin to connect to your mysql DB and read new/updated lines based on some "last_updated" column.
JDBC input plugin has a cron confguration value, that allows you to set it up to run periodically and read updated lines with an SQL query that you define in configuration.
For projects 1-2 you can also use the JDBC input plugin with mongoDB.
There is also an Open Source implementation for a mongoDB input plugin on git. You can check this post for how to use it here.
(see the full list of input plugins here)
If that works for you and you manage to set it up, then the rest will be about the same for all three configurations.
i.e. using filter plugins to modify data, and Elasticsearch output plugin to push data to an elastic index.

Dask dataframe read parquet format fails from http

I have been dealing with this problem for a week.
I use the command
from dask import dataframe as ddf
ddf.read_parquet("http://IP:port/webhdfs/v1/user/...")
I got invalid parquet magic.
However ddf.read_parquet is Ok with "webhdfs://"
I would like the ddf.read_parquet works for http because I want to use it in dask-ssh cluster for workers without hdfs access.
Although the comments already partly answer this question, I thought I would add some information as an answer
HTTP(S) is supported by dask (actually fsspec) as a backend filesystem; but to get partitioning within a file, you need to get the size of that file, and to resolve globs, you need to be able to get a list of links, neither of which are necessarily provided by any given server
webHDFS (or indeed httpFS) don't work like HTTP downloads, you need to use a specific API to open a file and fetch a final URL on a cluster member to that file; so the two methods are not interchangeable
webHDFS is normally intended for use outside of the hadoop cluster; within the cluster, you would probably use plain HDFS ("hdfs://"). However, kerberos-secured webHDFS can be tricky to work with, depending on how the security was set up.

Backfill Google Analytics in BigQuery

I'm looking for a workaround on the following issue. Hope someone can help.
I'm unable to backfill data in the ga_sessions_ table in BigQuery through product linking in GA. e.g. partition ga_sessions_20180517 is missing
This specific view has already been linked before. Google documentation says that historical load is only done once per view (hence, the issue) (https://support.google.com/analytics/answer/3416092?hl=en)
Is there any way to work around it?
Kind regards,
Martijn
You can use Google Analytics Reporting API to get the data for that view. This method has lot of restrictions like sometimes the data is sampled/only 7 dimensions can be exported in one call, but at least you will be able to fetch your data in a partitioned manner.
Documentation hereDoc
If you need a lot of dimensions/metrics in hit level format, scitylana.com has a service that can provide this data historically.
If you have a clientId set in a custom dimension the data-quality is near perfect.
It also works without a clientId set.
You can get all history as available through the API.
You can get 100+ dimensions/metrics in one batch into BQ.

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.

Deleting rows in datastore by time range

I have a CKAN datastore with a column named "recvTime" of type timestamp (i.e. using "timestamp" as type at datastore_create time, as shown in this link). Example value for this column is "2014-06-12T16:08:39.542000".
I have a large numbers of records in the datastore (thousands) and I would like to delete the rows before a given date in "recvTime". My first thought was doing it using the REST API with the datastore_delete operation using a range filter, but it is not possible as described in the following Q&A.
Is there any other way of solving the issue, please?
Given that I have access to the host where CKAN server is running, I wonder if this could be achieved executing a regular SQL sentence on the Postgresql engine where the datastore is persisted. However, I haven't found information about manipulating the CKAN underlying datamodel in the CKAN documentation, so don't know if this a good idea or if it is risky...
Any workaround or information pointer is highly welcome. Thanks!
You could definitely do this directly on the underlying database if you were willing to dig in there (the structure is pretty simple with tables named after the corresponding resource id). You could even turn this into an API of your own using an extension (though you'd want to be careful about permissions).
You might also be interested in the new support (master only atm) for extending the DataStore API via a plugin in an extension - see https://github.com/ckan/ckan/pull/1725