Pandas difficult to add new column with condition? - pandas

I was trying to do multiple group and also adding count to new column.
My input file
OrderDate Region Rep Item Units Unit Cost Total
----------------------------------------------------------
1/6/18 East Jones Pencil 95 1.99 189.05
1/23/18 Central Kivell Binder 50 19.99 999.50
2/9/18 Central Jardine Pencil 36 4.99 179.64
2/26/18 Central Gill Pen 27 19.99 539.73
3/15/18 West Sorvino Pencil 56 2.99 167.44
4/1/18 East Jones Binder 60 4.99 299.40
4/18/18 Central Andrews Pencil 75 1.99 149.25
4/18/18 West Jones Pencil 75 1.99 149.25
I am trying to do like
Region Rep Count same/diff
-------------------------------
east jones 2 2-same
jones
central Kivell 4 >3 differnce
Jardine
Gill
Andrews
West Sorvino 2 2-different
West jones1
My code:
df1 = pd.read_excel(excel_path, sheet_name = 'SalesOrders', index_col=0)
df3 = (df1.groupby('Region')['Rep'].value_counts())
print(df3)
Please help me to do this. Thanks
In rep column, based on Region i have done group by to know Rep values. if Rep member are same then 2 same people, consider central region has 4 different people working so it i greater than 3 .

Related

pandas: how to create new columns based on two columns and aggregate the results

I am trying to perform a sort of aggregation, but with the creation of new columns.
Let's take the example of the dataframe below:
df = pd.DataFrame({'City':['Los Angeles', 'Denver','Denver','Los Angeles'],
'Car Maker': ['Ford','Toyota','Ford','Toyota'],
'Qty': [50000,100000,80000,70000]})
That generates this:
City
Car Maker
Qty
0
Los Angeles
Ford
50000
1
Denver
Toyota
100000
2
Denver
Ford
80000
3
Los Angeles
Toyota
70000
I would like to have one line per city and the Car Maker as a new column with the Qty related to that City:
City
Car Maker
Ford
Toyota
0
Los Angeles
Ford
50000
70000
1
Denver
Toyota
80000
100000
Any hints on how to achieve that?
I've tried some options with transforming it on a dictionary and compressing on a function, but I am looking for a more pandas' like solution.
df.pivot(index='City', columns='Car Maker', values='Qty').reset_index()
Try dataframe.pivot_table()
df.pivot_table(values='Qty', index=['City', 'Car Maker'], columns='Car Maker').reset_index()

ScaNN weight features for similary search

I am using ScaNN to perform similarity searches and would like to place more emphasis on some features than others when performing a similarity search.
for example, if I have the following data
name | age | country | income
John 29 US $47k
Susan 28 US $44k
Bill 26 US $39k
Sarah 35 UK $100k
Jack 34 UK $90k
Maggie 37 UK $95k
and income has more importance, then given the following query:
George, 28, US, $100k
it would return
Sarah, Jack, Maggie
adding more weight to the income feature.
Training data values are normalized before building the similarity index
df_np = preprocessing.normalize(df[features])
and likewise the query values are normalized before performing a search
np_q = preprocessing.normalize([list(query.values())])

Conditionally concatenate rows of a dataframe and process additional columns based on the condition

I have an Input Dataframe that the following :
NAME TEXT START END
Tim Tim Wagner is a teacher. 10 20.5
Tim He is from Cleveland, Ohio. 20.5 40
Frank Frank is a musician. 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. 62 70
Frank He performed at the Carnegie Hall last year. 70 85
Frank It was fantastic listening to him. 85 90
Want output dataframe as follows:
NAME TEXT START END
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. 10 40
Frank Frank is a musician 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him. 62 90
Appreciate your help on this.
Thanks
Try:
grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
.agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
.reset_index().drop('group', axis=1)
Output:
NAME TEXT START END
0 Tim Tim Wagner is a teacher. He is from Cleveland,... 10.0 40.0
1 Frank Frank is a musician. 40.0 50.0
2 Tim He like to travel with his family 50.0 62.0
3 Frank He is a performing artist who plays the cello.... 62.0 90.0

extracting data using beautifulsoup from wiki

I pretty new to this,
What I am trying to accomplished is having a table with distrcits and their various neighborhoods but my final code just list all neighborhoods in a list format without assigning them to a specific district.
url = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
type(soup)
print(soup.prettify())
Toronto_table = soup.find('table',{'class':'wikitable sortable'})
links = Toronto_table.find_all('a')
neighborhoods = []
for link in links:
neighborhoods.append(link.get('title'))
print(neighborhoods)
df_neighborhoods = pd.DataFrame(neighborhoods)
df_neighborhoods
You can simply read_html and print the table.
import pandas as pd
f_states=pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto')
print(f_states[6])
Output :
District Number Neighbourhoods Included
0 C01 Downtown, Harbourfront, Little Italy, Little P...
1 C02 The Annex, Yorkville, South Hill, Summerhill, ...
2 C03 Forest Hill South, Oakwood–Vaughan, Humewood–C...
3 C04 Bedford Park, Lawrence Manor, North Toronto, F...
4 C06 North York, Clanton Park, Bathurst Manor
5 C07 Willowdale, Newtonbrook West, Westminster–Bran...
6 C08 Cabbagetown, St. Lawrence Market, Toronto wate...
7 C09 Moore Park, Rosedale
8 C10 Davisville Village, Midtown Toronto, Lawrence ...
9 C11 Leaside, Thorncliffe Park, Flemingdon Park
10 C13 Don Mills, Parkwoods–Donalda, Victoria Village
11 C14 Newtonbrook East, Willowdale East
12 C15 Hillcrest Village, Bayview Woods – Steeles, Ba...
13 E01 Riverdale, Danforth (Greektown), Leslieville
14 E02 The Beaches, Woodbine Corridor
15 E03 Danforth (Greektown), East York, Playter Estat...
16 E04 The Golden Mile, Dorset Park, Wexford, Maryval...
17 E05 Steeles, L'Amoreaux, Tam O'Shanter – Sullivan
18 E06 Birch Cliff, Oakridge, Hunt Club, Cliffside
19 E08 Scarborough Village, Cliffcrest, Guildwood, Eg...
20 E09 Scarborough City Centre, Woburn, Morningside, ...
21 E10 Rouge (South), Port Union (Centennial Scarboro...
22 E11 Rouge (West), Malvern
23 W01 High Park, South Parkdale, Swansea, Roncesvall...
24 W02 Bloor West Village, Baby Point, The Junction (...
25 W03 Keelesdale, Eglinton West, Rockcliffe–Smythe, ...
26 W04 York, Glen Park, Amesbury (Brookhaven), Pelmo ...
27 W05 Downsview, Humber Summit, Humbermede (Emery), ...
28 W06 New Toronto, Long Branch, Mimico, Alderwood
29 W07 Sunnylea (The Queensway – Humber Bay)
30 W08 The Kingsway, Central Etobicoke, Eringate – Ce...
31 W09 Kingsview Village-The Westway, Richview (Willo...
32 W10 Rexdale, Clairville, Thistletown - Beaumond He...

Combining like minded entries in SQL [duplicate]

This question already has answers here:
SQL Sum Multiple rows into one
(5 answers)
Closed 8 years ago.
What I would like to do is combine like minded sets into one clear entry. Here is some example data:
Item Warehouse Quantity
Apple Northeast 100
Apple Midwest 2000
Apple South 300
Orange Northeast 400
Orange Midwest 800
Orange South 100
Orange West 100
Strawberry Northeast 550
Strawberry Midwest 750
Strawberry South 250
Strawberry East 350
What I would like is for the SQL query to return the total quantity from all the warehouses. The hopeful output would be something such as:
Item Quantity
Apple 2400
Orange 1400
Strawberry 1900
Any help would be amazing, thank you!
select Item, sum(Quantity) as TotalQuantity
from {tablename}
group by Item;