I am trying to perform a sort of aggregation, but with the creation of new columns.
Let's take the example of the dataframe below:
df = pd.DataFrame({'City':['Los Angeles', 'Denver','Denver','Los Angeles'],
'Car Maker': ['Ford','Toyota','Ford','Toyota'],
'Qty': [50000,100000,80000,70000]})
That generates this:
City
Car Maker
Qty
0
Los Angeles
Ford
50000
1
Denver
Toyota
100000
2
Denver
Ford
80000
3
Los Angeles
Toyota
70000
I would like to have one line per city and the Car Maker as a new column with the Qty related to that City:
City
Car Maker
Ford
Toyota
0
Los Angeles
Ford
50000
70000
1
Denver
Toyota
80000
100000
Any hints on how to achieve that?
I've tried some options with transforming it on a dictionary and compressing on a function, but I am looking for a more pandas' like solution.
df.pivot(index='City', columns='Car Maker', values='Qty').reset_index()
Try dataframe.pivot_table()
df.pivot_table(values='Qty', index=['City', 'Car Maker'], columns='Car Maker').reset_index()
I am using ScaNN to perform similarity searches and would like to place more emphasis on some features than others when performing a similarity search.
for example, if I have the following data
name | age | country | income
John 29 US $47k
Susan 28 US $44k
Bill 26 US $39k
Sarah 35 UK $100k
Jack 34 UK $90k
Maggie 37 UK $95k
and income has more importance, then given the following query:
George, 28, US, $100k
it would return
Sarah, Jack, Maggie
adding more weight to the income feature.
Training data values are normalized before building the similarity index
df_np = preprocessing.normalize(df[features])
and likewise the query values are normalized before performing a search
np_q = preprocessing.normalize([list(query.values())])
I have an Input Dataframe that the following :
NAME TEXT START END
Tim Tim Wagner is a teacher. 10 20.5
Tim He is from Cleveland, Ohio. 20.5 40
Frank Frank is a musician. 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. 62 70
Frank He performed at the Carnegie Hall last year. 70 85
Frank It was fantastic listening to him. 85 90
Want output dataframe as follows:
NAME TEXT START END
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio. 10 40
Frank Frank is a musician 40 50
Tim He like to travel with his family 50 62
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him. 62 90
Appreciate your help on this.
Thanks
Try:
grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
.agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
.reset_index().drop('group', axis=1)
Output:
NAME TEXT START END
0 Tim Tim Wagner is a teacher. He is from Cleveland,... 10.0 40.0
1 Frank Frank is a musician. 40.0 50.0
2 Tim He like to travel with his family 50.0 62.0
3 Frank He is a performing artist who plays the cello.... 62.0 90.0
I pretty new to this,
What I am trying to accomplished is having a table with distrcits and their various neighborhoods but my final code just list all neighborhoods in a list format without assigning them to a specific district.
url = "https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
type(soup)
print(soup.prettify())
Toronto_table = soup.find('table',{'class':'wikitable sortable'})
links = Toronto_table.find_all('a')
neighborhoods = []
for link in links:
neighborhoods.append(link.get('title'))
print(neighborhoods)
df_neighborhoods = pd.DataFrame(neighborhoods)
df_neighborhoods
You can simply read_html and print the table.
import pandas as pd
f_states=pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto')
print(f_states[6])
Output :
District Number Neighbourhoods Included
0 C01 Downtown, Harbourfront, Little Italy, Little P...
1 C02 The Annex, Yorkville, South Hill, Summerhill, ...
2 C03 Forest Hill South, Oakwood–Vaughan, Humewood–C...
3 C04 Bedford Park, Lawrence Manor, North Toronto, F...
4 C06 North York, Clanton Park, Bathurst Manor
5 C07 Willowdale, Newtonbrook West, Westminster–Bran...
6 C08 Cabbagetown, St. Lawrence Market, Toronto wate...
7 C09 Moore Park, Rosedale
8 C10 Davisville Village, Midtown Toronto, Lawrence ...
9 C11 Leaside, Thorncliffe Park, Flemingdon Park
10 C13 Don Mills, Parkwoods–Donalda, Victoria Village
11 C14 Newtonbrook East, Willowdale East
12 C15 Hillcrest Village, Bayview Woods – Steeles, Ba...
13 E01 Riverdale, Danforth (Greektown), Leslieville
14 E02 The Beaches, Woodbine Corridor
15 E03 Danforth (Greektown), East York, Playter Estat...
16 E04 The Golden Mile, Dorset Park, Wexford, Maryval...
17 E05 Steeles, L'Amoreaux, Tam O'Shanter – Sullivan
18 E06 Birch Cliff, Oakridge, Hunt Club, Cliffside
19 E08 Scarborough Village, Cliffcrest, Guildwood, Eg...
20 E09 Scarborough City Centre, Woburn, Morningside, ...
21 E10 Rouge (South), Port Union (Centennial Scarboro...
22 E11 Rouge (West), Malvern
23 W01 High Park, South Parkdale, Swansea, Roncesvall...
24 W02 Bloor West Village, Baby Point, The Junction (...
25 W03 Keelesdale, Eglinton West, Rockcliffe–Smythe, ...
26 W04 York, Glen Park, Amesbury (Brookhaven), Pelmo ...
27 W05 Downsview, Humber Summit, Humbermede (Emery), ...
28 W06 New Toronto, Long Branch, Mimico, Alderwood
29 W07 Sunnylea (The Queensway – Humber Bay)
30 W08 The Kingsway, Central Etobicoke, Eringate – Ce...
31 W09 Kingsview Village-The Westway, Richview (Willo...
32 W10 Rexdale, Clairville, Thistletown - Beaumond He...
This question already has answers here:
SQL Sum Multiple rows into one
(5 answers)
Closed 8 years ago.
What I would like to do is combine like minded sets into one clear entry. Here is some example data:
Item Warehouse Quantity
Apple Northeast 100
Apple Midwest 2000
Apple South 300
Orange Northeast 400
Orange Midwest 800
Orange South 100
Orange West 100
Strawberry Northeast 550
Strawberry Midwest 750
Strawberry South 250
Strawberry East 350
What I would like is for the SQL query to return the total quantity from all the warehouses. The hopeful output would be something such as:
Item Quantity
Apple 2400
Orange 1400
Strawberry 1900
Any help would be amazing, thank you!
select Item, sum(Quantity) as TotalQuantity
from {tablename}
group by Item;