Impala has been in a state of Executing - impala

When I see the impala queries page, I found that there are a lot of has been showed that Executing the query, but the corresponding query is executed, whether the query is occupied memory, after the completion of hope everybody can help me have a look at

using the search input on the top of this page. You can search whatever you like by inputting your conditions. Such as "query_duration>500s"
Here is the API for you to get all the detail information about your job.
Have fun!

Related

Displaying data with pagination - get whole data at once or get part for each page?

Which way is the best in case of performance if I want to display data with pagination? Should I download all data from the DB and then locally switch those parts depending on the current page or get the data from the DB part by part?
Firstly I was opting for the second option but I started wondering, found this article and I'm lost now.
In my SQL queries, I'm using OFFSET and LIMIT attribute and since I'm also obtaining the last page of the pagination so the better option would be the first one as far as I understand? Important to notice is my database is quite small.
And the best one option would be to still use OFFSET but without reading last page or I'm wrong (in case of larger databases and improvement of performance)?
So at the end, I implemented it just like the article says. Removed the "move to last page" button so it won't be forced to count all rows and anyway I got some of sorting features like ASC/DESC by particular columns so still if user want to go to the last page he can casually click this filter option and he will get the last elements but using as I said before ASC/DESC queries which I hope are faster than OFFSET.

Slow Django Page - Doesn't Appear to be caused by SQL

I'm trying to debug a slow Django listview page. The page currently displays all tests (133), as well as the latest result for each test. There are approximately 60,000 results, each result having a FK relationship to a single test.
I've optimized the SQL (I think) by selecting the latest result for each test, and passing it in as a prefetch related to my tests query. Django Debug Toolbar shows that the SQL is taking ~350ms, but the page load is ~3.5s.
If I restrict the list view to a single test though, the SQL is ~7.5ms, and page load is ~100ms.
I'm not fetching anything from S3 etc, and not rendering any images or the like.
So my question is, what could be causing the slowness? It seems like it is the SQL, since as the result set grows so does page load, but maybe rendering of each item or something similar? Any guidance on what to look into next would be appreciated.
You can force execute your Django query set:
test_list = list(your_queryset)
and return in your view simple text
HttpResponse("return this string"),
then you can check time without rendering. Also Django Debug Toolbar and slow your app, maybe here your case: https://github.com/jazzband/django-debug-toolbar/issues/910

Paginate through BigQuery query results using python

I see that the BigQuery REST API allows for pagination through results, and I see that the BigQuery python client allows pagination when listing rows in a table (among other things), but I do not see a way to paginate through query results.
The Job makes a call to client.list_rows, but does not give the caller the option of passing in max_results:
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/job.py#L2404
It also does not expose sufficient information, as far as I can tell, to be able to build the Table definition (I don't see where to get the query schema) external to the Job and make the list_rows call myself.
Hopefully I'm just missing something...
Help would be greatly appreciated,
--Ben
jobs.getQueryResults do support a page token:
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults

Splunk query to filter results

I have some code deployed on 1 out of my 6 servers. I need a splunk query that pulls data from the other 5 hosts. Something like - All except this 1 host. I know the host option in splunk to look for the host's logs, but I have no idea how to do all except 1. Can someone please assist me?
The one box I am talking about has my latest code changes, and the other 5 have my old code. So I want to write a query to do a before vs after analysis.
Looks like you have your answer, but I use an alternative method that speeds things up for me.
Within your search results, you can quickly eliminate what you want to filter out by ALT-clicking on a value in a selected field. In your case, it would add NOT host="1" to your query and immediately update your results.
I find this particularly helpful when I'm in the preliminary stage of investigating an issue, and don't have enough information to know exactly where to look first. It makes it easy to rapidly eliminate what you don't need.
*Note: This may still be broken in Splunk 6, not sure if the bug has been fixed yet: http://answers.splunk.com/answers/109473/alt-click-not-working-selected-fields
Okay, I got the answer to my question. Just use !=. So if I want the results for all my hosts except host 1, all I do is - index=blah host!="1"

Ajax autocomplete extender populated from SQL

OK, first let me state that I have never used this control and this is also my first attempt at using a web service.
My dilemma is as follows. I need to query a database to get back a certain column and use that for my autocomplete. Obviously I don't want the query to run every time a user types another word in the textbox, so my best guess is to run the query once then use that dataset, array, list or whatever to then filter for the autocomplete extender...
I am kinda lost any suggestions??
Why not keep track of the query executed by the user in a session variable, then use that to filter any further results?
The trick to preventing the database from overloading I think is really to just limit how frequently the auto updater is allowed to update, something like once per 2 seconds seems reasonable to me.
What I would do is this: Store the current list returned by the query for word A server side and tie that to a session variable. This should be basically the entire list I would think. Then, for each new word typed, so long as the original word A exists, you can filter the session info and spit the filtered results out without having to query again. So basically, only query again when word A changes.
I'm using "session" in a PHP sense, you may be using a different language with different terminology, but the concept should be the same.
This question depends upon how transactional your data store is. Obviously if you are looking for US states (a data collection that would not change realistically through the life of the application) then I would either cache a System.Collection.Generic List<> type or if you wanted a DataTable.
You could easily set up a cache of the data you wish to query to be dependent upon an XML file or database so that your extender always queries the data object casted from the cache and the cache object is only updated when the datasource changes.
RAM is cheap and SQL is harder to scale than IIS so cache everything in memory:
your entire data source if is not
too large to load it in reasonable
time,
precalculated data,
autocomplete webservice responses.
Depending on your autocomplete desired behavior and performance you may want to precalculate data and create redundant structures optimized for reading. Make use of structs like SortedList (when you need sth like 'select top x ... where z like #query+'%'), Hashtable,...
While caching everything is certainly a good idea, your question about which data structure to use is an issue that wasn't fully answered here.
The best data structure for an autocomplete extender is a Trie.
You can find a good .NET article and code here.