Excel Powerpivot measure conundrum- Average (of average?)

Excel Powerpivot measure conundrum- Average (of average?) - powerpivot

I have a powerpivot table that shows work_tickets and timestamps for each step taken towards resolution:
`Ticket | Step | Time | **TicketDuration**
--------------------------------------
1 1 5:30 15
1 2 5:33 15
1 3 5:45 15
2 1 6:00 10
2 2 6:05 10
2 3 6:10 10
[ticketDuration] is a calculated column I added on my own. Now I'm trying to create a measure for the [AverageTicketDuration] so that it returns 12.5 minutes for the table above{ (15+10)/2 }. I haven't got a clue how to use DAX to produce the results. Please help!

What you are looking for is the AVERAGEX function, which has the following definition AVERAGEX(<table>,<expression>)
The idea being that it will iterate though each row of a defined table applying your calculation, then average the results.
In the below example, I use Table1 as the table name.
To start with to iterate along tickets we would use the following VALUES( Table1[ticket]) which will return the unique values in the ticket column.
Then assuming that your ticket duration is always the same within a ticket ID, the aggregation method used in the expression would be Average(Table1[Ticket]). Since for example of ticket 1, (15 + 15 + 15)/3 = 15
Put together the measure would look like below:\
measure:=AVERAGEX( VALUES( Table1[ticket]), AVERAGE(Table1[Ticket Duration]))
The result when dropped into a pivot using your sample data.

Related

Bigquery query performance when using starts_with() on a table of 12Mil rows

I have a table company_totals, that has the following schema -
column_name
column_data_type
company
STRING
link
STRING
full_count
FLOAT
starts_with_count
FLOAT
Number of rows = 12,000,000. Table size = 1.6 GB. CLUSTERED BY = company link. SEARCH INDEX created on column = link.
I have the following select statement which is taking beyond 6 hours and the execution results in timeout - Operation timed out after 6.0 hours. Consider reducing the amount of work performed by your operation so that it can complete within this limit.)
SELECT first_table.company, first_table.link, null as full_count, SUM(second_table.full_count) AS starts_with_count
FROM company_totals first_table, company_totals second_table
WHERE STARTS_WITH(second_table.link, first_table.link)
group by first_table.company, first_table.link
The above query calculates values of the column starts_with_count which is the sum of values of another column full_count, based on a starts_with() condition. In the company_totals table, the column starts_with_count is what I want to fill. I have added the expected values for this column manually to show my expectation. Other column values are already present in the table. The starts_with_count value is sum (full_count) where its link appears in other rows.
company
link
full_count
starts_with_count (expected)
abc
http://www.abc.net1
1
15 (= sum (full_count) where link like 'http://www.abc.net1%')
abc
http://www.abc.net1/page1
2
9 (= sum (full_count) where link like 'http://www.abc.net1/page1%')
abc
http://www.abc.net1/page1/folder1
3
3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%')
abc
http://www.abc.net1/page1/folder2
4
4
abc
http://www.abc.net1/page2
5
5
xyz
http://www.xyz.net1/
6
21
xyz
http://www.xyz.net1/page1/
7
15
xyz
http://www.xyz.net1/page1/file1
8
8
Highly appreciate any help in this issue.

InfluxDB v1.8: subquery using MAX selector

I'm using InfluxDB 1.8 and trying to make a little more complex query than Influx was made for.
I want to retrieve all data that refers to the last month stored, based on tag and field values that my script stores (not the default "time" field that Influx creates). Say we have this infos measurement:
time
field_month
field_week
tag_month
tag_week
some_data
1631668119209113500
8
1
8
1
random
1631668119209113500
8
2
8
2
random
1631668119209113500
8
3
8
3
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
1
9
1
random
1631668119209113500
9
2
9
2
random
Which 8 refers to August, 9 to September, and then some_data that is stored on a given week of that month.
I can use MAX selector at field_month to get the last month of the year stored (can't use Flux date package because I'm using v1.8). Further, I want the data grouped by tag_month and tag_week so I can COUNT how many times some_data was stored on each week of the month, that's why the same data is repeated in field and tag keys. Something like that:
SELECT COUNT(field_month) FROM infos WHERE field_month = 9 GROUP BY tag_month, tag_week
Replacing 9 by MAX Selector:
SELECT COUNT(field_month) FROM infos WHERE field_month = (SELECT MAX(field_month) FROM infos) GROUP BY tag_month, tag_week
The first query works (see results here), but not the second.
Am I doing something wrong? Is there any other possibility to make this work in v1.8?
NOTE: I know Influx wasn't supposed to be used like that. I've tried and managed this easily with PostgreSQL, using an adapted form of the second query above. But while we straighten things up to use Postgres, we have to use InfluxDB v1.8.

in postgresql you can try :
SELECT COUNT(field_month) FROM infos WHERE field_month =
(SELECT field_month FROM infos ORDER BY field_month DESC limit 1)
GROUP BY tag_month, tag_week;

Counting latest instance of multiple only based on filter context

I've got a large table of events that have occurred in an inventory of vehicles, which affect whether they are in service or out of service. I would like to create a measure that would be able to count the number of vehicles in the various inventories at any point in time, based on the events in this table.
This table is pulled from a SQL database into an Excel 2016 sheet, and I'm using PowerPivot to try to come up with the DAX measure.
Here is some example data event_list:
vehicle_id event_date event event_sequence inventory
100 2018-01-01 purchase 1 in-service
101 2018-01-01 purchase 1 in-service
102 2018-02-04 purchase 1 in-service
100 2018-02-07 maintenance 2 out-of-service
101 2018-02-14 damage 2 out-of-service
101 2018-02-18 repaired 3 in-service
100 2018-03-15 repaired 3 in-service
102 2018-05-01 damage 2 out-of-service
103 2018-06-03 purchase 1 in-service
I'd like to be able to create a pivot table in Excel (or use CUBE functions, etc) to get an output table like this:
date in-service out-of-service
2018-02-04 3 0
2018-02-14 1 2
2018-03-15 3 0
2018-06-03 3 1
Essentially, I want to be able to calculate the inventory based on any date in time. The example only has a few dates, but hopefully provides enough of a picture.
I've basically come up with this so far, but it counts more vehicles than desired - I can't figure out how to only take the latest event_sequence or event_date and use that to count the inventory.
cumulative_vehicles_at_date:=CALCULATE(
COUNTA([vehicle_id]),
IF(IF(HASONEVALUE (event_list[event_date]), VALUES (event_list[event_date]))>=event_list[event_date],event_list[event_date])
)
I tried using MAX() and EARLIER() functions, but they don't seem to work.
Edit: Added the PowerBI tag as I'm now using that software to attempt to solve this as well. See comments on Alexis Olson's answer.

I think I've found a much cleaner method than I gave previously.
Let's add two columns onto the event_list table. One which counts vehicles "in-service" on that date and one which counts vehicles "out-of-service" on that date.
InService =
VAR Summary = SUMMARIZE(
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])),
event_list[vehicle_id],
"MaxSeq", MAX(event_list[event_sequence]))
VAR Filtered = FILTER(event_list,
event_list[event_sequence] =
MAXX(
FILTER(Summary,
event_list[vehicle_id] = EARLIER(event_list[vehicle_id])),
[MaxSeq]))
RETURN SUMX(Filtered, 1 * (event_list[inventory] = "in-service"))
You can create an analogous calculated column for OutOfService or you can just take the total minus the InService count.
OutOfService =
CALCULATE(
DISTINCTCOUNT(event_list[vehicle_id]),
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])))
- event_list[InService]
Now all you have to do is put event_date on the matrix visual rows section and add the InService and OutOfService columns to the values section (use Maximum or Minimum for the aggregation option rather than Sum).
Here's the logic behind the calculated column InService:
We first create a Summary table which calculates the maximal event_sequence value for each vehicle. (We filter the event_date to only consider dates up to the current one we are working with.)
Now that we know what the last event_sequence value is for each vehicle, we use that to filter the entire table down to just the rows that correspond to those vehicles and sequence values. The filter goes through the table row by row and checks to see if the sequence value matches the one we calculated in the Summary table. Note that when we filter the Summary table to just the vehicle we are currently working with, we only get a single row. I'm just using MAXX to extract the [MaxSeq] value. (It's kind of like using LOOKUPVALUE, but you can't use that on a variable.)
Now that we've filtered the table just to the most recent events for each vehicle, all we need to do is count how many of them are "in-service". I used a SUMX here where the 1*(True/False) coerces the boolean value to return 1 or 0.

This is pretty difficult. I don't have a great answer, but here's something that kind of works.
You'll create a new calculated table where you'll calculate the status for each vehicle on each date. Start with the base cross join for each vehicle and each date:
= CROSSJOIN(VALUES(event_list[vehicle_id]), VALUES(event_list[event_date]))
Then add a calculated column to find the max sequence number for each vehicle on that date.
Sequence = MAXX(
FILTER(event_list,
event_list[event_date] <= Cross[event_date] &&
event_list[vehicle_id] = Cross[vehicle_id]),
event_list[event_sequence])
Now you can lookup the inventory value for each vehicle/sequence pair with another calculated column:
Inventory = LOOKUPVALUE(
event_list[inventory],
event_list[vehicle_id], Cross[vehicle_id],
event_list[event_sequence], Cross[Sequence])
The result should look something like this:
Once you have this, you can create a matrix using this calculated table. Put the event_date on the rows and Inventory on the columns. Filter out blank inventory values in the visual level filter and put the vehicle_id in the values field, using a count or distinct count as the aggregation method (instead of the default sum).
It should look like this:

SQL percentage usage calculation using 2 columns

Trying to get the percentage usage for a report based on the following columns:
Dept Ext Sec1 Sec2 StartDate EndDate
---------------------------------------------------------------
1 1234 5 5 2017-05-01:08:00:00 2017-05-04:08:00:10
2 1230 8 8 2017-05-01:09:10:00 2017-05-04:09:10:11
1 1234 15 15 2017-05-02:08:01:00 2017-05-04:08:01:20
I need to display the percentage time the user spent on the phone, based on the total seconds in Sec1, for the time period. If needs be, I can create a 3rd column with the percentage total as part of the creation job (the final table is generated form a join query of 2 other tables). Thanks

I had to add these lines to my creatDB query to get the right results:
alter table compinfo.dbo.pabxreport add TotalSec Int
alter table compinfo.dbo.pabxreport add TotalPer Decimal(14,8)
update compinfo.dbo.pabxreport
set TotalSec= (
select sum(billsec1) from pabxreport)
update compinfo.dbo.pabxreport
set TotalPer= (billsec1 * 100.00 / Totalsec)

Group by time span in rails

I want get output from users table based on time of creation of record. Time is stored in created_at column.
Output will be like this:
Time user count
2 am - 6 am 10
6 am - 10 am 5
10 am - 2 pm 5
2 pm - 6 pm 5
6 pm - 10 pm 5
10 pm - 2 am 5
I can't do group by created_at. Solution I found is to create another column say time_span and update that column to 2 am - 6 am if created_at time falls in this span and then I can do group_by on time_span column. Any better solution?

My suggestions is to create another column on the database, this way you avoid calculations on selects at an expense of a simple column.

I'm not sure what you mean by not being able to use group_by, but the following will work:
hours = Users.all.collect {|u| u.created_at.hour}
ranges = [(2...6), (6...10), (10...14), (14...18), (18...22), (22...26)]
summary = hours.group_by {|h| ranges.find {|r| r === (h<2 ? h+24 : h)}}
ranges.each {|r| puts "#{r} = #{(summary[r] || []).length}"}
There are probably opportunities to simplify this and you could push the grouping up into the database if you'd like, but I thought I'd go ahead and share.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Excel Powerpivot measure conundrum- Average (of average?) - powerpivot

Related

Bigquery query performance when using starts_with() on a table of 12Mil rows

InfluxDB v1.8: subquery using MAX selector

Counting latest instance of multiple only based on filter context

SQL percentage usage calculation using 2 columns

Group by time span in rails

Categories

Resources