How to see actual HIVE queries that are running behind YARN applications in Ambari

How to see actual HIVE queries that are running behind YARN applications in Ambari - hadoop-yarn

I have some hive jobs executing in YARN, so in Yarn, if I list applications, I see some HIVE application and corresponding Templeton applications. So, in Ambari how do I see the actual HIVE query that those running HIVE applications are executing? Does Ambari provide any option for this?

Not sure if you still need this answered, but if you are running on the TEZ engine, you can find the actual hive query used by taking the following steps:
1) From the Ambari home page, hover over the 3X3 grid icon in the top
right corner, and select "Tez View"
2) Next, you can either search by application ID or the hive query
itself to find your application.
3) Select your application - the entire hive query should be displayed
here, as well as a nice progress and status indicator.

Related

Google cloud, BigQuery assets table, os_inventory.update_time

We have Inventory setup for GCP compute engines (VMs) and run export commands (daily) to create project level assets tables (for each project) in a BigQuery under my-monitoring project.
Then we can query VMs, let say installed packages. A query example, to see if the package "xxx" does exists for the VMs deployed at the test-project.
SELECT pack.value.installed_package.apt_package, os_inventory.update_time FROM `my-monitoring.InventoryLogs.test-project_compute_googleapis_com_Instance`,
unnest (os_inventory.items) as pack
where pack.value.installed_package.apt_package.package_name = 'xxx'
The problem is, it always shows as the package xxx does exist, if it was ever installed. If I remove the package later on, still this command shows existence. An output something like this:
As I understand, it shows an old logs. I looked at some other VMs, the value of os_inventory.update_time is very old. Does anyone know what this value is for and when it refreshes? I was expecting it gets updated every time we run export assets command. Any ideas how to query for the latest packages values at inventory table?
Or even any other solutions how to query for a particular package existence for all VMs across all projects?

BigQuery Error loading location is interrupting scheduled queries

A few days ago, I started receiving an error in my Scheduled Queries dashboard Error loading location europe-west8: BigQuery Data Transfer Service does not yet support location: europe-west8.
I'm in the US, so I have set all 4 of my storage buckets are set to US or REGION, and have confirmed their locations.
Datasets are all US:
Scheduled queries are all Region "us"
Since this error started, my BigQuery Scheduled Queries that append data to tables have stopped running.
Where can I change the setting that seems to be calling europe-west8?

You need to check the region of the dataset you are using. The destination table for your scheduled query must be in the same region as the data being queried.
You can see the scheduled queries are supported in these locations here.
You specify a location for storing your BigQuery data when you create a dataset. After you create the dataset, the location cannot be changed, but you can copy the dataset to a different location, or manually move (recreate) the dataset in a different location.
You can see more information about how locations work in BigQuery here.
EDIT
This is a known issue from BigQuery UI, and the engineering team is aware of and is working towards a solution, although so far there isn't a specific ETA. Feel free to start the issue to raise further awareness towards it.
There are two possible workarounds you can try to circumvent this.
More specifically,
Workaround#1
Using the old UI, you can do it by clicking on "Disable editor
tabs".
Workaround#2
In Scheduled Query Editor > click the SCHEDULE dropdown > choose "Enable scheduled queries".
The overlay shows up with the message box ("Enable scheduled queries").
Click anywhere on the screen to close the overlay
Click the SCHEDULE dropdown again, and the create/update options are there.

If you are running schedule queries check that the processing location is set to the location of your data source and the destination table is also correct.
Checking the docs about setting a query location.
https://cloud.google.com/bigquery/docs/scheduling-queries

Importing RedisTimeSeries data into Grafana

I've got a process storing RedisTimeSeries data in a Redis instance on Docker. I can access the data just fine with the RedisInsight CLI:
I can also add Redis as a data source to Grafana:
I've imported the dashboards:
But when I actually try to import the data into a Grafana dashboard, the query just sits there:
TS.RANGE with a value of - +, or two timestamps, also produces nothing: (I do get results when entering it into the CLI, but not as a CLI query in Grafana.
What could I be missing?

The command you should be using in the Grafana dashboard for retrieving and visualising the data in time series stored in Redis with RedisTimeSeries is TS.RANGE for a specific key, or TS.MRANGE in combination with a filter that selects a set of time series matching this filter. List of commands with RedisTimeSeries: https://oss.redislabs.com/redistimeseries/commands/ (you're using TS.INFO which does only retrieve metadata of time series key, not the actual samples within)

So I looked into this a bit more. Moderators deleted my last answer because it didn't 'answer' the question.
There is a github issue for this. One of the developers also responded. It is broken and has been for awhile. Grafana doesn't seem to want to maintain this datasource at the moment. IMHO they should remove the redis timeseries support from their plugin library if it isn't fully baked.
[redis datasource issue for TS.RANGE]
[1]: https://github.com/RedisGrafana/grafana-redis-datasource/issues/254

Are you trying to display a graph (eg, number of people vs time)? If so, perhaps that TS.INFO is not the right command and you should use something like TS.MRANGE.
Take a look at
https://redislabs.com/blog/how-to-use-the-new-redis-data-source-for-grafana-plug-in/
for some more examples.

"Dataset not found in location" error message when running scheduled query

I have a dataset located in Europe-west3, and i'm trying to setup scheduled queries on that dataset. However, when setting up the scheduled query, the "processing location" option doesn't contain Europe-west3 as an option. Leaving it as "default" makes the processing location be US, and then the query is unable to run. There are only like 7 procesing locations available, i tried both EU and Europe-west2, but neither work.
I don't really know what to do to get my queries to run on schedule. I can run the queries just fine normally, but trying to schedule them the processing location simply wont let me pick the correct location.
Any ideas?

Currently Schedule Queries does not support region europe-west3. Follow (star) this public issue tracker to stay updated.
Right now if you need to implement scheduled queries you should create a replica of that dataset in another region that is supported and run them there. I would suggest creating a copy of that dataset in another region. However, this feature is also not available for region europe-west3 right now.
I hope you can achieve what you desire without many headaches.

Ubuntu + PBS + Apache? How can I show a list of running jobs as a website?

Is there a plugin/package to display status information for a PBS queue? I am currently running an apache webserver on the login-node of my PBS cluster. I would like to display status info and have the ability to perform minimal queries without writing it from scratch (or modifying an age old python script, ala jobmonarch). Note, the accepted/bountied solution must work with Ubuntu.
Update: In addition to ganglia as noted below, I also looked that the Rocks Cluster Toolkit, but I firmly want to stay with Ubuntu. So I've updated the question to reflect that.
Update 2: I've also looked at PBSWeb as well as MyPBS neither one appears to suit my needs. The first is too out-of-date with the current system and the second is more focused on cost estimation and project budgeting. They're both nice, but I'm more interested in resource availability, job completion, and general status updates. So I'm probably just going to write my own from scratch -- starting Aug 15th.

Have you tried Ganglia?
I have no personal experience but few sysadmin I know are using it.
Following pages may help,
http://taos.groups.wuyasea.com/articles/how-to-setup-ganglia-to-monitor-server-stats/3
http://coe04.ucalgary.ca/rocks-documentation/2.3.2/monitoring-pbs.html
my two cents

Have you tried using nagios: http://www.nagios.org/ ?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to see actual HIVE queries that are running behind YARN applications in Ambari - hadoop-yarn

I have some hive jobs executing in YARN, so in Yarn, if I list applications, I see some HIVE application and corresponding Templeton applications. So, in Ambari how do I see the actual HIVE query that those running HIVE applications are executing? Does Ambari provide any option for this?

Related

Google cloud, BigQuery assets table, os_inventory.update_time

BigQuery Error loading location is interrupting scheduled queries

Importing RedisTimeSeries data into Grafana

"Dataset not found in location" error message when running scheduled query

Ubuntu + PBS + Apache? How can I show a list of running jobs as a website?

Categories

Resources