QlikView - Struggling to produce Bubble / scatter from record-data - qlikview

I would like to make a Bubble-scatter where the data looks like:
Each row is an 'event', with a Day of the event and the event's grade
Day | Grade
------------
1 | A
1 | A
1 | B
1 | (empty)
1 | B
2 | A
I want this to turn into a bubble graph that looks like :
Day along the X-axis ( 1, 2)
On Y-axis I would like to see A,B (vertically)
And I would expect
one big bubble for day-1 A
one big bubble for day-2 A
one little bubble for day-2 A
Given the data above
It is either refusing to display anything at all saying 'undefined values'
I'm really struggling to understand how this bubble/scatter works, and the documentation isn't helping
It asks for Dimension, Measure, Measure and I am putting in many variations of Day, Count(Grade) and Avg(Grade)

Bubble charts are a bit tricky.
In your case you need 2 dimensions: dimension 1 (Grade) and dimension 2 (Day).
Expression is as simple as sum(1) (take into account that you are already double-filtering the data because you have 2 dimensions). To avoid extra bubbles due to the Null entries, just select "ignore null values" in the dimension panel options.

Related

How to find frequency of element list in data frame using pandas?

I have a list and a data frame. I want to find the number of each word in the list (some words in the list are pair) for each "emotions" in the data frame.
Here is my list:
[(frozenset({'know'}), 16528),
(frozenset({'im'}), 39047),
(frozenset({'feeling'}), 99455),
(frozenset({'like'}), 49332),
(frozenset({'feel', 'im'}), 16602),
(frozenset({'feeling', 'im'}), 23488),
(frozenset({'feel'}), 202985),
(frozenset({'feel', 'like'}), 42162),
(frozenset({'time'}), 17203),
(frozenset({'really'}), 17247)]
and this is my data frame:
Unnamed: 0 id text emotions
0 0 27383 [feel, awful, job, get, position, succeed, hap... sadness
1 1 110083 [im, alone, feel, awful] sadness
2 2 140764 [ive, probably, mentioned, really, feel, proud... joy
3 3 100071 [feeling, little, low, day, back] sadness
4 4 2837 [beleive, much, sensitive, people, feeling, te... love
Here is the expected output:
6 columns for six existed emotions and the last column is for totall count.

LINQ - Select rows based on whether their sum meets a condition

I’ve run into a problem, as I cannot get a proper working LINQ statement here.
Suppose I have a DataTable with x rows and I have to sort based on the sum of the Quantity column. Then I have a condition Requested Quantity = 20. I need to find the rows equal to the exact sum of RequestedQuantity, but only where the combination of 3 rows is equal to it.
+-----+----------+
| Bin | Quantity |
+-----+----------+
| 1 | 10 |
| 2 | 5 |
| 3 | 5 |
| 4 | 10 |
| 5 | 15 |
+-----+----------+
I can’t seem to figure out the proper LINQ syntax to get this to work. My starting point is this:
From row In StorageBins.AsEnumerable.GroupBy( _
Convert.ToDouble(Function (x) x("Quantity"), cultureInfo)).Sum( _
Function (y) Convert.ToDouble(y("Quantity"), cultureInfo) = _
Double.Parse(RequestedQuantity,cultureInfo))
Initially, I am just trying to get any rows that are equal to my condition. My end-goal, however, is getting any three rows that exactly sum up to my Requested quantity.
I’m not an expert in LINQ, unfortunately. I hope some of you might be!
Maybe I'm missing something, but this actually seems like a pretty complicated problem. Pick any 3 records, but only 3, that add up to exactly 20. How many rows are there in the database? Because this could get to be quite a few potential combinations pretty quickly. And what do you do after you get the 3? Do you have to go back through recursively and group up the other records as well? Or you just need the first set of 3 that add up to 20?
Assuming you just need the first 3, I would do something like this:
Get the first record that is less that 20. Remove it from your input list and put it into your target set.
Then get the first record that is less than 20 minus the first value. ie if the first value was a '5', get records that are less than 15 (20 minus 5). This ensures you 'leave room' for the third value. Remove it from the original list and into your target set.
Then get the first record that is exactly 20 minus number one minus number two. Remove it from the input list and into the target set.
Now you would have to do this in iterators. If there is no value that meets the third criterion, release the third value from your target set and put it back in your input list. Then go back to step 2 and pick the next record that matches step 2 (and ideally that is not equal to the previous value). And if you exhaust all of the iterations through step 2, go back to step one and pick the next value there, and start the whole thing over again...
Unless I'm misunderstanding your requirement...

Hive, pass different row into a function

I have two tables. In first table, each row represents a polygon. In second table, each row represents a point. I want to find if each point within one of the polygon. I try to use ST_Contains, ST_MultiPolygon, and ST_Point in hive. I think there is a way to feed all rows into ST_MultiPolygon, but not sure how to do that. The following is my test data.
pid | shape
1 | [2,0,3,0,3,1,2,1]
2 | [0,0,1,0,1,1,0,1]
This is Polygon table.
pid | x | y
1 | 0.5 | 0.5
2 | 2.1 | 0.5
3 | 1.5 | 0.5
This is Point table
I want to get the result like
pid | is_in
1 | true
2 | true
3 | false
Here is the way I think to solve this problem. what I wanna do is to determine if a point is in one of the Polygon that is stored in hive. Suppose I have 2 polygons [2,0,3,0,3,1,2,1] and [0,0,1,0,1,1,0,1], actually they really are [(2,0), (3,0), (3,1), (2,1)] and [(0,0), (1,0), (1,1), (0,1)]. The reason that I stored in this odd way is because ST_MultiPolygon takes this kind of format as a parameter, like ST_MultiPolygon(array(2,0,3,0,3,1,2,1)). Combined ST_MultiPolygon with ST_Contains and st_point, I can get a boolean result that indicates if a point is in a MultiPolygon. ST_MultiPolygon can even takes multiple arrays, such as ST_MultiPolygon(array(2,0,3,0,3,1,2,1), array(0,0,1,0,1,1,0,1)). By this way, if there is a way for me to feed all Polygons into MultiPolygons, then I know if a point in one of the Polygon.
Any comment will be appreciated.

Libreoffice Calc Finding MAX from a subset of results

I have a Libreoffice Calc workbook for tracking writing, with 3 sheets in it. 'Time Tracking', 'Time Summary' and 'Yearly Stats'. 'Time Tracking' is where user data is entered, 'Time Summary' is a pivot table for 'Time Tracking'; and 'Yearly Stats' shows long-term progress.
Time Summary (running off some test data) looks a bit like this:
|Column A (Weeks) | ... |Column M (Total Words)
-------+-----------------------+-----+----------------------
Row 7 |02/10/17 - 08/10/17 | |3500
Row 8 |13/11/17 - 19/11/17 | |2300
Row 9 |30/04/18 - 06/05/18 | |1000
Row 10 |30/10/17 - 05/11/17 | |700
Yearly Stats looks like this:
|A |B |C
-------+--------------------+--------+----
Row 1 | |2017 |2018
Row 2 |Total Words |6500 |1000
...
Row 7 |Max Words (Week) |3500 |3500
The formula for 'Yearly Stats'.B7:C7 is currently =MAX($'Time Summary'.$M$7:$M$10), but I need to modify it to filter by the year on the column heading.
https://ask.libreoffice.org/en/question/62260/minif-and-maxif-function-in-calc/ looked to be useful, but when I tried it, the MAX from the formula was returning the MAX of ROW - being 10 - rather than ROW returning the position of the MAX value - even though it seems to work in the example file from the link.
The example formula is:
=IFERROR(INDEX($Sheet1.$J$2:$J$13,MAX(ROW($Sheet1.$J$2:$J$13)*($Sheet1.$A$2:$A$13=A2))-1,1),NA())
My formula uses RIGHT() to compare the last two characters of the column heading with the last two chars of the week in $'Time Summary':$A$7:$A$10 and is:
=IFERROR(INDEX($'Time Summary'.$M$7:$M$10,MAX(ROW('Time Summary'.$M$7:$M$10)*(RIGHT($'Time Summary'.$A7:$A$10,2)=RIGHT(B1,2)))-6,1),NA())
I have, of course, remembered to press CTRL+SHIFT+ENTER as the instructions say, to get the array in the formula to work.
So that's the explanation of my problem. What is it that I'm getting wrong?
Ok, this is a bit long-winded, but I've managed to solve the problem by using the following formula:
=IF(MAX(IF(RIGHT(INDIRECT(CONCATENATE("$'Time Summary'.$A7:$A$",COUNTIF($'Time Summary'.$A:$A,"<>''")+2)),2)=RIGHT(B1,2),INDIRECT(CONCATENATE("$'Time Summary'.$Q$",ROW(INDIRECT(CONCATENATE("$'Time Summary'.$Q7:$Q$",COUNTIF($'Time Summary'.$Q:$Q,"<>''")+5))))),0))>0,MAX(IF(RIGHT(INDIRECT(CONCATENATE("$'Time Summary'.$A$7:$A$",COUNTIF($'Time Summary'.$A:$A,"<>''")+2)),2)=RIGHT(B1,2),INDIRECT(CONCATENATE("'Time Summary'.$Q",ROW(INDIRECT(CONCATENATE("$'Time Summary'.$Q$7:$Q$",COUNTIF($'Time Summary'.$Q:$Q,"<>''")+5))))),0)),NA())
It is wrapped in an IF that replaces any 0 result with '#NA' (just for neatness of output).
Also the right half of the ranges specified make use of a calculation to figure out where the bottom row is, leaving out the total, so that's another reason it's so huge.

Highlighting Values in a Crystal Reports Crosstab based on sibling values

I have crosstab which has row columns indicating different classes, and then peoples names across the top.
| | Required | Person 1 | Person 2 | Person 3 |
| Class 1 | 8 6 | 1 6 | 3 6 | 4 6 |
| Class 2 | 6 2 | 3 2 | 2 2 | 1 2 |
Each field contains 2 values The first value is the number of hours spent in the class, the second field is the number of hours required for certification.
The Required field id my grand total summary.
In the cross tab expert the fields are defined as follows.
Rows:
Command.descr -> a field containing the class names
Columns:
Command.fullname -> a field containing students full names
Summarized Fields:
Sum of Command.evlength -> summation of all time spent in a given course
Max of #required -> this formula returns the number of required hours based on the course name
I am trying to highlight the field Sum of Command.evlength if it is greater than or equal to the value of Max of #required.
My solution was to perform background formatting. Right-Click on the Sum of Command.evlength field, select Format Field. Click the borders tab, check Background, and enter a formula.
The formula I was using is:
if CurrentFieldValue >= {#required} then color(152, 251, 152) else crNoColor
This is not the correct formula. My crosstab has been placed in the footer, which causes {#required} to contain the last value in the grid which in the above example is 2.
From my research I thought I would have to use GridRowColumnValue(row or column name) to access the value of {#required} in the crosstab, but I could not come up with the correct string to represent it.
Does anyone have a way for me to correctly perform this comparison?
Frustratingly I don't think you can use the highlighting expert to compare to a dynamic value. You could swap the columns round then add the following formulas:
To the max_of_required background colour:
whileprintingrecords;
global numbervar required_hrs := currentfieldvalue;
crNoColor;
To the sum_of_command.evlength background colour:
whileprintingrecords;
global numbervar required_hrs;
if currentfieldvalue >= required_hrs then
crRed
else
crNoColor;
I think there are a few other ways but i'm not as confident with those so start here.