Get top 3 results - keen-io

I have a query using the analysis type count, I've got it grouped by type and it is returning me 12 different groups with varying values.
Would it be possible to get only the 3 groups with the highest count from that query?

The Keen API doesn't (as of October 2015) support this directly, although it is a commonly requested feature. It may be added in the future but there is currently no timeline for that.
The best workaround is to do the sorting and trimming on the client side once the response has been received. This should only take a few lines of code in most programming languages. If you're working from a command line (e.g. via curl) then you could use jq to do it:
curl "https://api.keen.io/3.0/projects/...<insert your query URL>..." > result.json
cat result.json | jq '.result | sort_by(.result) | reverse | .[:3]'
Hope that helps! (Disclosure: I'm a platform engineer at Keen.)

Related

How do you run a saved query from Big Query cli and export result to CSV?

I have a saved query in Big Query but it's too big to export as CSV. I don't have permission to export to a new table so is there a way to run the query from the bq cli and export from there?
From the CLI you can't directly access your saved queries as it's a UI-only feature as of now but, as explained here there is a feature request for that.
If you just want to run it once to get the results you can copy the query from the UI and just paste it when using bq.
Using the docs example query you can try the following with a public dataset:
QUERY="SELECT word, SUM(word_count) as count FROM publicdata:samples.shakespeare WHERE word CONTAINS 'raisin' GROUP BY word"
bq query $QUERY > results.csv
The output of cat results.csv should be:
+---------------+-------+
| word | count |
+---------------+-------+
| dispraisingly | 1 |
| praising | 8 |
| Praising | 4 |
| raising | 5 |
| dispraising | 2 |
| raisins | 1 |
+---------------+-------+
Just replace the QUERY variable with your saved query.
Also, take into account if you are using Standard or Legacy SQL with the --use_legacy_sql flag.
Reference docs here.
Despite what you may have understood from the official documentation, you can get large query results from bq query, but there are multiple details you have to be aware of.
To start, here's an example. I got all of the rows of the public table usa_names.usa_1910_2013 from the public dataset bigquery-public-data by using the following commands:
total_rows=$(bq query --use_legacy_sql=false --format=csv "SELECT COUNT(*) AS total_rows FROM \`bigquery-public-data.usa_names.usa_1910_2013\`;" | xargs | awk '{print $2}');
bq query --use_legacy_sql=false --max_rows=$((total_rows + 1)) --format=csv "SELECT * FROM \`bigquery-public-data.usa_names.usa_1910_2013\`;" > output.csv
The result of this command was a CSV file with 5552454 lines, with the first two containing header information. The number of rows in this table is 5552452, so it checks out.
Here's where the caveats come in to play:
Regardless of what the documentation might seem to say when it comes to query download limits specifically, those limits seem to only apply to the Web UI, meaning bq is exempt from them;
At first, I was using the Cloud Shell to run this bq command, but the number of rows was so big that streaming the result set into it killed the Cloud Shell instance! I had to use a Compute instance with at least the same resources that of an n1-standard-4 (4vCPUs, 16GiB RAM), and even with all of this RAM, the query took me 10 minutes to finish (note that the query itself runs server-side, it's just a problem of buffering the results);
I'm manually copy-pasting the query itself, as there doesn't seem to be a way to reference saved queries directly from bq;
You don't have to use Standard SQL, but you have to specify max_rows, because otherwise it'll only return you 100 rows (100 is the current default value of this argument);
You'll still be facing the usual quotas & limits associated with BigQuery, so you might want to run this as a batch job or not, it's up to you. Also, don't forget that the maximum response size for a query is 128 MiB, so you might need to break the query into multiple bq query commands in order to not hit this size limit. If you want a public table that's big enough to hit this limitation during queries, try the samples.wikipedia one from bigquery-public-data dataset.
I think that's about it! Just make sure you're running these commands on a beefy machine and after a few tries it should give you the result you want!
P.S.: There's currently a feature request to increase the size of CSVs you can download from the Web UI. You can find it here.

Error itgensql005 when fetching serial number from Exact Online GoodsDeliveryLines for upload to Freshdesk ticket

I want to exchange information between ExactOnline and Freshdesk based on deliveries (Exact Online Accounts -> Freshdesk Contacts, Exact Online deliveries -> Freshdesk tickets).
The serial number of delivered goods is not available in either the ExactOnlineREST..GoodsDeliveryLines table nor in ExactOnlineXML..DeliveryLines.
The following query lists all columns that are also documented on Exact Online REST API GoodsDeliveryLines:
select * from goodsdeliverylines
All other fields of the documentation on REST APIs are included in GoodsDeliveryLines, solely serial numbers and batch numbers not.
I've tried - as on ExactOnlineXML tables where there column only come into existence when actually specified - to use:
select stockserialnumbers from goodsdeliverylines
This raises however an error:
itgensql005: Unknown identifier 'stockserialnumbers'.
How can I retrieve the serial numbers?
StockSerialNumbers is an array, on the Exact Online documentation it says:
Collection of batch numbers
so far every delivery lines, there can be 0, 1 or more serial numbers included.
These serial numbers were not available till some time ago; please make sure you upgrade to at least build 16282 of the Exact Online SQL provider. It should work then using a query on a separate table:
select ssrdivision
, ssritemcode
, ssrserialnumber
from GoodsDeliveryLineSerialNumbers
Output:
ssrdivision | ssritemcode | ssrserialnumber
----------- | ----------- | ---------------
868,035 | OUT30074 | 132
868,035 | OUT30074 | 456
Use of serial numbers may require more modules from the respective supplier Exact Online like "Trade", but when you can see them in the web user interface, then you have them already. If you get an HTTP 401 unauthorized, you don't have the module for serial numbers.
Since stockserialnumbers is actually a list and not a single field, you have to query it using the entity GoodsDeliveryLineSerialNumbers, which you can find in the latest release.
select * from GoodsDeliveryLineSerialNumbers
If you execute the above query, you will get the fields for GoodsDeliveryLine and those of the underlying serial numbers. The latter fields are prefixed with Ssr to disambiguate both entities. This means you don't need an additional join on GoodsDeliveryLine, which may benefit performance.

CouchDB API query with ?limit=0 returns one row - bug or feature?

I use CouchDB 1.5.0 and noticed a strange thing:
When I query some API action, for example:
curl -X GET "http://localhost:5984/mydb/_changes?limit=1"
I get the same result with limit=1 and with limit=0 and with limit=-55. In all cases is a one row from the start of list.
Although, PostgreSQL returns:
Zero rows when LIMIT 0
Message ERROR: LIMIT must not be negative when LIMIT -55
My question is mainly concerned with the API design. I would like to know your opinions.
It's a flaw or maybe it's good/acceptable practice?
This is how the _changes api is designed. If you do not specify the type of feed i.e long-poll, continuous etc the default is to return a list of all the changes in a single results array.
If you want a row by row result of the changes in the database specify the type of feed in the url like so
curl -X GET "http://localhost:5984/mydb/_changes?feed=continuous"
Another point to note that in the _changes api using 0 has the same effect as using 1 in limit parameter.

Doing multiple queries in Postgresql - conditional loop

Let me first start by stating that in the last two weeks I have received ENORMOUS help from just about all of you (ok ok not all... but I think perhaps two dozen people commented, and almost all of these comments were helpful). This is really amazing and I think it shows that the stackoverflow team really did something GREAT altogether. So thanks to all!
Now as some of you know, I am working at a campus right now and I have to use a windows machine. (I am the only one who has to use windows here... :( )
Now I manage to setup (ok, IT department did that for me) and populate a Postgres database (this I did on my own), with about 400 mb of data. Which perhaps is not so much for most of you heavy Ppostgre users, but I was more used to sqlite database for personal use which rarely exceeded 2mb ever.
Anyway, sorry for being so chatty - now the queries from that database work
nicely. I use ruby to do queries actually.
The entries in the Postgres database are interconnected, in as far as they are like
"pointers" - they have one field that points to another field.
Example:
entry 3667 points to entry 35785 which points to entry 15566. So it is quite simple.
The main entry is 1, so the end of all these queries is 1. So, from any other number, we can reach 1 in the end as the last result.
I am using ruby to make as many individual queries to the database until the last result returned is 1. This can take up to 10 individual queries. I do this by logging into psql with my password and data, and then performing the SQL query via -c. This probably is not ideal, it takes a little time to do these logins and queries, and ideally I would have to login only once, perform ALL queries in Postgres, then exit with a result (all these entries as result).
Now here comes my question:
- Is there a way to make conditional queries all inside of Postgres?
I know how to do it in a shell script and in ruby but I do not know if this is available in postgresql at all.
I would need to make the query, in literal english, like so:
"Please give me all the entries that point to the parent entry, until the last found entry is eventually 1, then return all of these entries."
I already "solved" it by using ruby to make several queries until 1 is eventually returned, but this strikes me as fairly inelegant and possibly not effective.
Any information is very much appreciated - thanks!
Edit (argh, I fail at pasting...):
Example dataset, the table would be like this:
id | parent
----+---------------+
1 | 1 |
2 | 131567 |
6 | 335928 |
7 | 6 |
9 | 1 |
10 | 135621 |
11 | 9 |
I hope that works, I tried to narrow it down solely on example.
For instance, id 11 points to id 9, and id 9 points to id 1.
It would be great if one could use SQL to return:
11 -> 9 -> 1
Unless you give some example table definitions, what you're asking for vaguely reminds of a tree structure which could be manipulated with recursive queries: http://www.postgresql.org/docs/8.4/static/queries-with.html .

how to list job ids from all users?

I'm using the Java API to query for all job ids using the code below
Bigquery.Jobs.List list = bigquery.jobs().list(projectId);
list.setAllUsers(true);
but it doesn't list me job ids that were run by Client ID for web applications (ie. metric insights) I'm using private key authentication.
Using the command line tool 'bq ls -j' in turn giving me only the metric insight job ids but not the ones ran with the private key auth. Is there a get all method?
The reason I'm doing this is trying to get better visibility into what queries are eating up our data usage. We have multiple sources of queries: metric insights, in house automation, some done manually, etc.
As of version 2.0.10, the bq client has support for API authorization using service account credentials. You can specify using a specific service account with the following flags:
bq --service_account your_service_account_here#developer.gserviceaccount.com \
--service_account_credential_store my_credential_file \
--service_account_private_key_file mykey.p12 <your_commands, etc>
Type bq --help for more information.
My hunch is that listing jobs for all users is broken, and nobody has mentioned it since there is usually a workaround. I'm currently investigating.
Jordan -- It sounds like you're honing in on what we want to do. For all access that we've allowed into our project/dataset we want to produce an aggregate/report of the "totalBytesProcessed" for all queries executed.
The problem we're struggling with is that we have a handful of distinct java programs accessing our data, a 3rd party service (metric insights) and 7-8 individual users who have query access via the web interface. Fortunately the incoming data only has one source so explaining the cost for that is simple. For queries though I am kinda blind at the moment (and it appears queries will be the bulk of the monthly bill).
It would be ideal if I can get the underyling data for this report with just one listing made with some single top level auth. With that I think from the timestamps and the actual SQL text I can attribute each query to a source.
One thing that might make this problem far easier is if there were more information in the job record (or some text adornment in the job_id for queries). I don't see that I can assign my own jobIDs on queries (perhaps I missed it?) and perhaps recording some source information in the job record would be possible? Just thinking out loud now...
There are three tables you can query for this.
region-**.INFORMATION_SCHEMA.JOBS_BY_{USER, PROJECT, ORGANIZATION}
Where ** should be replaced by your region.
Example query for JOBS_BY_USER in the eu region:
select
count(*) as num_queries,
date(creation_time) as date,
sum(total_bytes_processed) as total_bytes_processed,
sum(total_slot_ms) as total_slot_ms_cost
from
`region-eu.INFORMATION_SCHEMA.JOBS_BY_USER` as jobs_by_user,
jobs_by_user.referenced_tables
group by
2
order by 2 desc, total_bytes_processed desc;
Documentation is available at:
https://cloud.google.com/bigquery/docs/information-schema-jobs