Pandas Pivot table : Get only the value counts and not columns

Pandas Pivot table : Get only the value counts and not columns - pandas

I am trying to make a pivot table with a data set with many columns.
When making a pivot table with code below I get all the columns which I don't want.
I only want the counts and not any other columns there. Can i achieve this ?
table1 = pd.pivot_table(dfCALCNoExcecption,index=['AD Platform','Agent Program'],columns=None,aggfunc='count')
The output of above code in excel output is like below( I have not pasted the whole as there are around 50 columns):
The Desired Output I am trying to get:

You can group by your data based on the columns 'AD Plataform' and 'Agent Program'. After that, you can sum all the values of the column that has the quantity of the machines. Here is my code:
df.groupby(['AD Plataform', 'Agent Program'])['AD Hostname'].sum()

This is not complete but a part of this can be achieved by Groupby. I am not sure how to rename the third column to "Count"
dfAgentTable3 = dfCALCNoExcecption.groupby(['AD Platform', 'Agent Program'])['AD Hostname'].count().sort_index(ascending=True)

Related

How to combine rows in BigQuery that share a similar name

i'm having trouble creating a query that'll group together responses from multiple rows that share a similar name and count the specific response record in them.
the datatable i currently have looks like this
test_control
values
test
selected
control
selected
test us
not selected
control us
selected
test mom
not selected
control mom
selected
what i'd like, is an output like the below that only counts the number of "selected" responses and groups together the rows that have either "control" or "test" in the name"
test_control
values
test
3
control
1
The query i have below is wrong as it doesn't give me an output of anything. The group by section is where im lost as i'm not sure how to do this. tried to google but couldn't seem to find anything. appreciate any help in advance!!!
SELECT distinct(test_control), values FROM `total_union`
where test_control="%test%" and values="selected"
group by test_control, values

use below
SELECT
REGEXP_EXTRACT(test_control, r'^(TEST|CONTROL) ') AS test_control,
COUNTIF(values = 'selected') AS values
FROM `total_union`
GROUP BY 1

As mentioned by #Mikhail Berlyant, you can use REGEX_EXTRACT to match the expression and COUNTIF to get the count of the total number of matching expressions according to the given condition. Try below code to get the expected output :
Code
SELECT
REGEXP_EXTRACT(test_control, r'^(test|control)') AS test_control,
COUNTIF(values = "selected") AS values
FROM `project.dataset.testvalues`
group by 1
Output

How to create pivot table from non-numerical dataset by counting the instances from one column?

I have this dataset that looks like this:
I have tried to do this:
df.groupby(['Phase','frames','Origin_Type']).size()
and
pd.pivot_table(india, values = ['frames', 'Phase', 'Origin_Type'], index =['frames'],
columns = ['Phase', 'Origin_Type'], aggfunc = sum)
But both didnt give me the right results. I want to transform it to this (see pic below) wherein the values should be the sum of each theme found in each 'Origin_Type' per phase.
LINK to dataset here

You can check here crosstab
pd.crosstab(india['Location'],[india['Phase'], india['Origin_Type']])

How to use a google sheets pivot query to output strings

I have a (much larger) table like this sample:
I am trying to output a table that looks like this:
The closest I can get with a pivot query returns numerical results in the value fields, rather than the desired text strings
=query(Data, "Select D,count(D) group by D Pivot B")
I resorted to a series of formulas to build my row and column headers, and then fill in the data field - See Version 3 in the sample sheet. But I couldn't figure out how to fill in the data with a single formula - as opposed to copying and pasting in the data field, which is not desirable with a dynamic number of row and column headers based on the original data.
Is there a way to wrap my data field formula (in cell B44 of the sample) in an arrayformula that will fill the data field, with a dynamic number of columns and rows?
Or even more advanced is there a formula that will deliver my desired results table in a single formula?

This should work, it's a bit difficult to explain, but i could demonstrate the various parts if you opened up your sheet to editable...
=ARRAYFORMULA(TRANSPOSE(QUERY(TRIM(SPLIT(TRANSPOSE(QUERY(QUERY({CHAR(10)&A2:A11,B2:B11&"|"&D2:D11&"|"},"select MAX(Col1) group by Col1 pivot Col2"),,9^9)),"|",0,0)),"select Col1,MAX(Col3) where Col1<>'' group by Col1 pivot Col2 order by Col1 desc label Col1'Project'")))

Perform calculation without having to do it manually for each column?

I have the following view set up in SQL Server:
VIEW
(left table: population data per year; middle table: municipalities; right table: municipality areas in km²)
Query
SELECT
dbo.T_GEMEINDE.GKZ, dbo.T_GEMEINDE.NAME,
dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.FLAECHE_KM2 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.DAUERSIEDLUNGSRAUM_KM2 AS [ges. Fläche / Dauersiedlungsr.],
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.J2017 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.FLAECHE_KM2 AS [ges. Bevölkerungsdichte],
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.J2017 / dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.DAUERSIEDLUNGSRAUM_KM2 AS [Bevölkerungsdichte Dauersiedlungsraum]
FROM
dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE
INNER JOIN
dbo.T_GEMEINDE ON dbo.T_BASE_DAUERSIEDLUNGSRAUM_GEMEINDE.GKZ = dbo.T_GEMEINDE.GKZ
INNER JOIN
dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN ON dbo.T_GEMEINDE.GKZ = dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN.GKZ
The last column in the view contains a calculation (population density for 132 municipalities for a certain year) for the year 2017 and uses the column J2017 from the table seen on the left. This is the output (Bevölkerungsdichte Dauersiedlungsraum):
Current output:
OUTPUT
Desired output:
The rightmost column (Bevölkerungsdichte Dauersiedlungsraum) seen in the provided output screenshot has the output data of the calculation for the year 2017. The same output has to be generated for all the other years, but each as a separate column.
Question: how do I perform the calculation which you can see in the last column in the view for all years (J2017-J2050) without having to do it manually for each year column?
Thanks in advance.

if you want someone to provide you with a complete solution then you will need to supply:
CREATE TABLE statements for the 3 tables
INSERT INTO... statements to provide sample data for all 3 tables
However, if you just want a suggestion about how to approach this problem then I would use an UNPIVOT statement to create a view/table that
holds all the columns in dbo.T_BASE_GEMEINDE_BEVOELKERUNG_JAHR_BEGINN
apart from the "year" columns (J2017, J2018, j2019, ...)
adds a single "year" column with values from 2017 to 2050
adds a single value column to hold the population for each year
By joining your existing tables to this new table/view and grouping by your new "year" column you should achieve what you want

how can I summarize different columns to make totals by row?

how can I summarize different columns to make totals by row?
on the picture below you can see my statement, definitely is something wrong there because is returning NULL value, but I don't know what it is. I want to create a TOTAL column summarizing WOSE, WO, SSSE and SS per row. Could someone help me with that?

It is because of null values in the columns -Use the following instead -
SUM(COALESCE(WOSE,0) +COALESCE(WO,0) + COALESCE(SSSE,0)+COALESCE(SS,0))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas Pivot table : Get only the value counts and not columns - pandas

You can group by your data based on the columns 'AD Plataform' and 'Agent Program'. After that, you can sum all the values of the column that has the quantity of the machines. Here is my code: df.groupby(['AD Plataform', 'Agent Program'])['AD Hostname'].sum()

This is not complete but a part of this can be achieved by Groupby. I am not sure how to rename the third column to "Count" dfAgentTable3 = dfCALCNoExcecption.groupby(['AD Platform', 'Agent Program'])['AD Hostname'].count().sort_index(ascending=True)

Related

How to combine rows in BigQuery that share a similar name

How to create pivot table from non-numerical dataset by counting the instances from one column?

How to use a google sheets pivot query to output strings

Perform calculation without having to do it manually for each column?

how can I summarize different columns to make totals by row?

Categories

Resources