I want to create a condiditional formula for some charts in qliksense.
I want to calculate the average for a KPI ATD , if a certain condition of another column is valid, column W = 1. So for example:
Class W ATD
A 1 1
A 1 3
A 0 1
B 1 1
Should lead to for class A: Condi.Avg= 2
In general it should be then in a new table (for W=1):
Class Condi.Avg
A 2
B 1
Right now I have:
Avg({<W= {1}> ATD)
which leads to a column in my charts with -:
How can I change this?
I think there is a typo in your expression.
Avg({<W = {'1'}>} ATD)
This should provide some result.
Edit (from the author):
Avg({< [W] = {'1']>} ATD)
is working
As promised, I tried making my own table, here are my results.
Here is my load script:
LOAD * INLINE [
Class, W, ATD
A, 1, 1
A, 1, 3
A, 0, 1
B, 1, 1
];
Then I added a table object with 1 dimension with the field Class, and 1 measure with the expression:
Avg({<W={'1'}>}ATD)
This results in the following table:
Which is exactly the same as your expected result:
Class Condi.Avg
A 2
B 1
It might be the case that one of your other dimensions are interfering with your measure.
Edit from the Author:
Avg({<[W]={'1]>}ATD) is working
Related
I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL
I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.
The code below:
df = pd.read_csv('./filename.csv', header='infer').dropna()
df.groupby(['category_code','event_type']).event_type.count().head(20)
Returns the following table:
How can I obtain, for all the sub groups under event_type that have both "purchase" and "view", the ratio between the total of "purchase" and the total of "view"?
In this specific case, for instance, I need a function that returns:
1/57
1/232
3/249
Eventually, I will need to plot such result.
I have been trying for a day, without success. I am still new to pandas, and I searched across every possible forum without finding anything useful.
Next time please consider adding a sample of your data as text instead of as an image. It helps us testing..
Anyway, in your case you can combine different dataframe methods, such as groupby, as you have already done, and pivot_table. I used this data just as an example:
category_code event_type
0 A purchase
1 A view
2 B view
3 B view
4 C view
5 D purchase
6 D view
7 D view
You can create a new column from your groupby
df['event_count'] = df.groupby(['category_code', 'event_type'])\
.event_type.transform('count')
Then create a pivot_table
my_table = df.pivot_table(values='event_count',
index='category_code',
columns='event_type',
fill_value=0)
Then, finally, you can calculate the purchase_ratio directly:
my_table['purchase_ratio'] = my_table['purchase'] / my_table['view']
Which results in the following DataFrame:
event_type purchase view purchase_ratio
category_code
A 1 1 1.0
B 0 2 0.0
C 0 1 0.0
D 1 2 0.5
I wanted to create DataFrame with 2 columns, one called 'id' , one called 'SalePrice'
submission = pd.DataFrame({'SalePrice':pre})
It looks like this
SalePrice
0 183242.025920
1 188796.451732
2 187878.763989
3 179789.672031
I know that I can name the index, but I need instead name it as a normal column name, on the same level as SalePrice. Anyone knows how to do that?
Try create it with DataFrame constructor
submission = pd.DataFrame({'SalePrice':pre,'id':np.arange(len(per))})
Just use reset_index, same as #Andy L. suggested. here's the full code:
submission = pd.DataFrame({'SalePrice':[1,2,3,4]}).reset_index()
submission.rename(columns = {'index':'id'}, inplace=True)
print(submission)
The output:
id SalePrice
0 0 1
1 1 2
2 2 3
3 3 4
I am trying to condense down a data table which has separate rows for a particular ID: one row has an intent string and the following rows have one or more log strings. There can be more than one set of intents/logs for each ID. I want to pull down the intent string cells in a separate column so they are listed on the same row/s as the associated log strings.
I've "tried" LAG(tobi_intent, 1,0) OVER (ORDER BY datevalue) as AssociatedIntent
but firstly, this isn't valid code, and secondly, wouldn't ensure that the associated intent and logs are for the same ID.
Can anyone advise on the correct sql code to get the output below?
expected table output:
ID log intent associated_intent
1 x
1 b x
1 a x
1 u
1 f u
2 x
2 f x
5 e
5 a e
5 s e
Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks
If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..
In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.