Rally Lookback API: Inequalities with a Tasks State value - rally

When I ask for all snapshots in a given timeframe where the previous State value of a Task object is less than "Completed", I get zero results. However, if I switch around the inequality to greater than "Completed", I get the results I expected. I would have assumed that the states "Defined" and "In-Progress" were less than "Completed".
_TypeHierarchy:"Task",
"_PreviousValues.State":{$lt: "Completed"},
State: "Completed",
The above query returns 0 results for a specified time range. But the query below returns 4137 results for the same time range (note, the only difference is the switch in inequalities from less than to greater than):
_TypeHierarchy:"Task",
"_PreviousValues.State":{$gt: "Completed"},
State: "Completed",

The issue is that the LBAPI is not currently treating Task State as a drop down field like it should. We will enter a defect on it. Thanks for pointing out the problem!
In the meantime, you should be able to get the desired results using either '$ne: null' or '$in: ["Defined", "In-Progress"]'.

Related

How to identify records which have clusters or lumps in data?

I have a tableau table as follows:
This data can be visualized as follows:
I'd like to flag cases that have lumps/clusters. This would flag items B, C and D because there are spikes only in certain weeks of the 13 weeks. Items A and E would not be flagged as they mostly have a 'flat' profile.
How can I create such a flag in Tableau or SQL to isolate this kind of a case?
What I have tried so far?:
I've tried a logic where for each item I calculate the MAX and MEDIAN. Items that need to be flagged will have a larger (MAX - MEDIAN) value than items that have a fairly 'flat' profile.
Please let me know if there's a better way to create this flag.
Thanks!
Agree with the other commenters that this question could be answered in many different ways and you might need a PhD in Stats to come up with an ideal answer. However, given your basic requirements this might be the easiest/simplest solution you can implement.
Here is what I did to get here:
Create a parameter to define your "spike". If it is going to always be a fixed number you can hardcode this in your formulas. I called min "Min Spike Value".
Create a formula for the Median Values in each bucket. {fixed [Buckets]: MEDIAN([Values])} . (A, B, ... E = "Buckets"). This gives you one value for each letter/bucket that you can compare against.
Create a formula to calculate the difference of each number against the median. abs(sum([Values])-sum([Median Values])). We use the absolute value here because a spike can either be negative or positive (again, if you want to define it that way...). I called this "Spike to Current Value abs difference"
Create a calculated field that evaluates to a boolean to see if the current value is above the threshold for a spike. [Spike to Current Value abs difference] > min([Min Spike Value])
Setup your viz to use this boolean to highlight the spikes. The beauty of the parameter is you can change the value for what a spike should be and it will highlight accordingly. Above the value was 4, but if you change it to 8:

Calculating a proxy bit size in a BigQuery table

How does one go about calculating the bit size of each record in BigQuery sharded tables across a range of time?
Objective: how much has it grown over time
Nuances: Of the 70 some fields, some records would have nulls for most, some records would have long string text grabbed directly from the raw logs, and some of them could be float/integer/date types.
Wondering if there's an easy way to do a proxy count of the bit size for one day and then I can expand that to a range of time.
Example from my experience:
One of my tables is daily sharded table with daily size of 4-5TB. Schema has around 780 fields. I wanted to understand cost of each data-point (bit-size) [it was used then for calculating ROI based on cost/usage]
So, let me give you an idea on how cost (bit-size) side of it was approached.
The main piece here is use of dryRun property of Jobs: Query API
Setting dryRun to true allows BigQuery (instead of actually running job) return statistics about the job such as how many bytes would be processed. And that’s exactly what is needed here!
So, for example, below Request is designed to get cost of trafficSource.referralPath in ga_session table for 2017-01-05
POST https://www.googleapis.com/bigquery/v2/projects/yourBillingProject/queries?key={YOUR_API_KEY}
{
"query": "SELECT trafficSource.referralPath FROM yourProject.yourDataset.ga_sessions_20170105`",
"dryRun": true,
"useLegacySql": false
}
You can get this value by parsing totalBytesProcessed out of Response. See example of such response below
{
"kind": "bigquery#queryResponse",
"jobReference": {
"projectId": "yourBillingProject"
},
"totalBytesProcessed": "371385",
"jobComplete": true,
"cacheHit": false
}
So, you can write relatively simple script in the client of your choice that:
reads schema of your table – you can use Tables: get API for this or if schema is known and readily available you can just simply hardcode it
organize loop through all (each and every) field in the schema
inside loop – call query api and extract size of respective filed (as it is outlined above)) and of course log it (or just collect it in memory)
As a result of above - you will have list of all fields with their respective size
If now, you need to analyze those sizes changes over the time – you can wrap above with yet another loop where you will iterate through as many days as you need and collect stats for each and every day
if you are not interested in day-by-day analysis - you just can make sure your query actually queries the range you are interested with. This can be done with use of a Wildcard Table
I consider this relatively easy way to go with
Me personally, I remember doing this with Go-lang, but it doesn't matter - you can use any client that you are most comfortable with
Hope this will help you!

How can I control the order of results? Lucene range queries in Cloudant

I've got a simple index which outputs a "score" from 1000 to 12000 in increments of 1000. I want to get a range of results from a lo- to high -score, for example;
q=score:[1000 TO 3000]
However, this always returns a list of matches starting at 3000 and depending on the limit (and number of matches) it might never return any 1000 matches, even though they exist. I've tried to use sort:+- and grouping but nothing seems to have any impact on the returned result.
So; how can the order of results returned be controlled?
What I ideally want is a selection of matches from the range but I assume this isn't possible, given that the query just starts filling the results in from the top?
For reference the index looks like this;
function(doc) {
var score = doc.score;
index("score", score, {
"store": "yes"
});
...
I cannot comment on this so posting an answer here:
Based on the cloudant doc on lucene queries, there isn't a way to sort results of a query. The sort options given there are for grouping. And even for grouped results I never saw sort work. In any case it is supposed to sort the sequence of the groups themselves. Not the data within.
#pal2ie you are correct, and Cloudant has come back to me confirming it. It does make sense, in some way, but I was hoping I could at least control the direction (lo->hi, hi->lo). The solution I have implemented to get a better distribution across the range is to not use range queries but instead;
create a distribution of the number of desired results for each score in the range (a simple, discrete, Gaussian for example)
execute individual queries for each score in the range with limit set to the number of desired results for that score
execute step 2 from min to max, filling up the result
It's not the most effective since it means multiple round-trips to the server but at least it gives me full control over the distribution in the range

jsFiddle API to get row count of user's fiddles

So, I had a nice thing going on a jsFiddle where I listed all my fiddles on one page:
jsfiddle.net/show
However, they have been changing things slowly this year, and I've already had to make some changes to keep it running. The newest change is rather annoying. Of course, I like to see ALL my fiddles all at once, make it easier to just hit ctrl+f and find what I might be looking for, but they' made it hard to do now. Used to I could just set the limit to 99999, and see everything, but now it appears I can't go past how many i actually have (186 atm).
I tried using a start to limit solution, but when it got to last 10|50 (i tried like start={x}&limit10 and start={x}&limit50) it would die. Namely because last pull had to be exact count. Example, I have 186, and use the by 10's solution, then it would die at start=180&limit=10.
I've search the API docs but can't seem to find a row count or anything of that manner. Anyone know of a good feasible solution that wont have me overloading there servers doing a constant single row check?
I'm having the same problem as you are. Then I checked the docs (Displaying user’s fiddles - Result) and found out that if you include callback=Api parameter, an additional overallResultSetCount field is included in the JSON response. I checked your fiddles and currently you have total of 229 public fiddles.
The solution I can think of will force you to only request twice. The first request's parameters doesn't matter as long as you have callback=Api. Then you send the second request in which your limit will be overallResultSetCount value.
Edit:
It's not in the documentation, however, I think the result set is limited to 200 entries only (hence your start/limit is from 0 - 199). I tried to query more than the 200 range but I get a Error 500. I couldn't find another user whose fiddle count is more than 200 (most of the usernames I tested only have less than 100 fiddles like zalun, oskar, and rpflorence).
Based on this new observation, you can update your script like this:
I have tested that if the total fiddle count is less than 200,
adding start=0&limit=199 parameter will only return all the
fiddles. Hence, you can add that parameter on your initial call.
Check if your total result set is more than 200. If yes, update your
parameters to reflect the range for the remaining result set (in
this case, start=199&limit=229) and add the new result set to your
old result set. Else, show/print the result set you initially got from your first query.
Repeat steps 1 and 2, if your total count reaches 400, 600, etc (any
multiple of 200).

Cumulative Flow Data for Unscheduled Items?

I understand how to get cumulative flow data on releases with the ReleaseCumulativeFlowData object - however this requires a ReleaseObjectID. I am looking for a way to get the same data for all the items that are not scheduled in a release, and it does not appear that I can query for where the ReleaseObjectID is null.
Is there any way using CumulativeFlow data to get the number of story points for unscheduled stories on a given day- or is my best bet to either parse the revision history logs using the 1.x API, or use the Lookback API?
Basically, what I am trying to get to is to be able to represent how the total scope of a project has changed over time, including items that are scheduled, as well as items that are estimated in the backlog but are not yet scheduled. - As far as I can tell, there is not an out-of-the-box way to get this information (without revision logs or diving into learning the Lookback API right now), but I am crossing my fingers that I am wrong.
I recommend learning the Lookback API, as this is exactly the sort of question it was designed to answer.
You can find the docs here: https://rally1.rallydev.com/analytics/doc/
For example, if you say:
find:{
_ProjectHierarchy:279050021,
_TypeHierarchy: "HierarchicalRequirement",
Children: null,
ScheduleState:{$lt:"In-Progress"},
__At:"current"
},
fields:["ObjectID", "Name", "PlanEstimate"]
You're looking for snapshots for items under the project with OID 279050021, that are Stories (HierarchicalRequirements), with no children (so are leaf stories), that are in a schedule state earlier than "In-Progress" and we should look for snapshots that are valid today ("current"), but you could have put any ISO 8601 date in there as a string. The fields parameter then specifies which fields of the snapshots to return. While you're learning what's there, it can be helpful to use fields=true and use this Chrome plugin for pretty-printing the JSON response: https://chrome.google.com/webstore/detail/jsonview-and-jsonlint-for/mfjgkleajnieiaonjglfmanlmibchpam However, you should specify the exact list of fields you want when going to production, since fields=true is limited to 200 results.
As a full URL this looks like:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/41529001/artifact/snapshot/query.js?find={_ProjectHierarchy:279050021, _TypeHierarchy: "HierarchicalRequirement", Children: null, ScheduleState:{$lt:"In-Progress"}, __At:"current"}&fields=["ObjectID", "Name", "PlanEstimate"]
But make sure to swap in your own workspace OID (for 41529001) and project OID (for 279050021) or the above URL won't work for you.