Pandas column classification

Pandas column classification - pandas

Here I am working on a fit bit data set where I have 35 User's Id column and all other activity columns which are of different dates, so now I need to classify all other columns with respective to user Id column in order to perform my analysis can any one help in this

What you're looking for is the fuction groupby(). You can group by one or more columns, and the perform aggregations per group on the rest of the columns
df.groupby(['id', 'date']).agg([...])

Related

Getting another column from same row which has first non-null value in column

I have a SQL table like this and I want to find the average adjusted amt for products partitioned by store_id that looks like this
Here, I need to compute the adj_amt which is the product of the previous two columns.
For this, I need to fill the nulls in the avg_quantity with the first non_null value in the partition. The query I use is below.
select
CASE WHEN av_quantity is null then
# the boolen here is for non-null values
first_value(av_quantity, True) over (partition by store_no order by product_id
range between current row and unbounded following
)
else av_quantity
end as adj_av_quantity
I'm having trouble with the SQL required to get the adjusted cost, since its not pulling the first non_null value for factor but still fetches it based on the same row for the adj_av_quantity. any thoughts on how I could do this?
FYI I've simplified the data here. The actual dataset is pretty huge (> 125 million rows with 800+ columns) so I won't be able to use joins and have to do this via window functions. I'm using spark-sql

Display proportion of data based on sum of a field

I have a problem with Tableau. In my worksheet I want to be able to use Sum(Field 1) to filter data I see on my sheet. So for example, If sum(sales)<20 only show the sales information which their sum is less than 20. When I try to create a calculated field to with the expression above Tableau converts it to a boolean filter instead of sliding my data to meet the filter criteria.
Is there any solution for the problem?
Thanks

you could filter the result using an having clause
select my_key, sum(field1)
from my_table
group by my_key
having sum(field1)<20

Create and define functions to apply to different segments/filters that exist (Instead of doing one very long SQL query)

I am trying to come up with some arithmetic calculations for some survey data. I want to do these calculations for a number of segments and want to figure out how to do it without writing numerous SELECT statements.
This is what I have so far:
FACT table. This tables holds survey data at a respondent level - for example, if a survey had 10 questions, this table will have 11 columns: a column to identify the respondent_ID and 10 other columns to identify the responses to those questions.
DIMENSION table. This table segments we want to view the survey data by at a respondent level - for example, if we want to view survey responses by membership_status and age_bracket, this table will have 3 columns: a column to identify the respondent_ID, and two columns to identify membership_status and age_bracket.
OUTPUT.
I want to get aggregate calculations to summarizes the responses to the survey overall and to each question. I also want to be able to get this information for all possible segments that exist in the DIMENSIONS table.
I can do the query below, however I'll need to do this for every segment:
SELECT
COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IN ('8', '9', '10') THEN f.RESPONDENT_ID END))*1.0 / COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IS NOT NULL THEN f.RESPONDENT_ID END))*1.0 AS CSAT_1
FROM FACT f
JOIN DIMENSION d ON f.RESPONDENT_ID = d.RESPONDENT_ID
WHERE d.MEMBERSHIP_STATUS = 'ACTIVE'
The calculation above gives us something called a top 3 box. That is just one calculation, I will need to do many of them. Additionally, ever calculation will need to be done for each segment. In order to get a calculation for nonactive members, I would need to run another query and set d.MEMBERSHIP_STATUS = 'INACTIVE' and I would need to run another query with no filter, to get the overall calculation.
Is there a way I could store all my arithmetic calculations needed in my output as a function (maybe in a temp table or something) - my thought is that it'll be better to set the functions somewhere, and then when I need to calculate the output, I would some how call the function to do all the calculations I need, and give me all the calculations for every segment I have?
I can't fully envision how to get there, or if this is even a good solution, so guidance and detailed SQL code would be extremely helpful.Examples please!

excel vba: How to Sort Group of Row based on column value

I have the following data in my excel. and my requirements is to sort a group of rows based on Forecast Age in Descending order. I have already tried the built in sort function in excel but it doesn't meet my requirements.
Let's say for example my forecast age is in Range("BO54")=17 and Range("BO88")=19, Range("BO88") value is bigger than the other so it should to be on the top. Range("BO88")'s group is Range("A78:CA91") .
In short I want 88th row and it's group of row which is Range("A78:CA91") to be at the top. Is there a way to do this without sorting the data within Range("A78:CA91")?

Well I can suggest you one way which might be lengthy to code.
First you can run a loop on all rows with a condition that if "Description" = "Forecast Age" then paste it in a new sheets. This will give you list of rows with every "house" and age. Sort it based on age and add a column which will assign rank() it based on age.
After this add a rank column in your main data and vlookup the above obtained ranks to each house and then finally, sort the entire table based on these ranks.
You should obtain the result you expect.
This is just a heads up. You can optimize the process and improvise it.

Calculating a Ratio using Column A & Column B - in Powerpivot/MDX/DAX, not in SQL

I have a query to pull clickthrough for a funnel, where if a user hit a page it records as "1", else NULL --
SELECT datestamp
,COUNT(visits) as Visits
,count([QE001]) as firstcount
,count([QE002]) as secondcount
,count([QE004]) as thirdcount
,count([QE006]) as finalcount
,user_type
,user_loc
FROM
dbname.dbo.loggingtable
GROUP BY user_type, user_loc
I want to have a column for each ratio, e.g. firstcount/Visits, secondcount/firstcount, etc. as well as a total (finalcount/Visits).
I know this can be done
in an Excel PivotTable by adding a "calculated field"
in SQL by grouping
in PowerPivot by adding a CalculatedColumn, e.g.
=IFERROR(QueryName[finalcount]/QueryName[Visits],0)
BUT I need give the report consumer the option of slicing by just user_type or just user_loc, etc, and excel will tend to ADD the proportions, which won't work b/c
SUM(A/B) != SUM(A)/SUM(B)
Is there a way in DAX/MDX/PowerPivot to add a calculated column/measure, so that it will be calculated as SUM(finalcount)/SUM(Visits), for any user-defined subset of the data (daterange, user type, location, etc.)?

Yes, via calculated measures. calculated columns are for creating values that you want to see on rows/columns/report header...calculated measures are for creating values that you want to see in the values section of a pivot table and can slice/dice by the columns in the model.
The easiest way would be to create 3 calculated "measures" in the calculation area of the powerpivot sheet.
TotalVisits:=SUM(QueryName[visits])
TotalFinalCount:=SUM(QueryName[finalcount])
TotalFinalCount2VisitsRatio:=[TotalFinalCount]/[TotalVisits]
You can then slice the calculated measure [TotalFinalCount2VisitsRatio] by user_type or just user_loc (or whatever) and the value will be calculated correctly. The difference here is that you are explicitly telling the xVelocity engine to SUM-then-DIVIDE. If you create the calculated column, then the engine thinks you want to DIVIDE-then-SUM.
Also, you don't have to break down the measure into 3 separate measures...it's just good practice. If you're interested in learning more, I'd recommend this book...the author is the PowerPivot/DAX guru and the book is very straightforward.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas column classification - pandas

Here I am working on a fit bit data set where I have 35 User's Id column and all other activity columns which are of different dates, so now I need to classify all other columns with respective to user Id column in order to perform my analysis can any one help in this

What you're looking for is the fuction groupby(). You can group by one or more columns, and the perform aggregations per group on the rest of the columns df.groupby(['id', 'date']).agg([...])

Related

Getting another column from same row which has first non-null value in column

Display proportion of data based on sum of a field

Create and define functions to apply to different segments/filters that exist (Instead of doing one very long SQL query)

excel vba: How to Sort Group of Row based on column value

Calculating a Ratio using Column A & Column B - in Powerpivot/MDX/DAX, not in SQL

Categories

Resources