Creating a list of nested dict from DataFrame?

Creating a list of nested dict from DataFrame? - pandas

I am trying to create a list of nested dict from the following multiindex dataframe where indices are form the first level {key:value} and the rows under the indices form the nested dict.
[{'date':1980, 'country': 'United States', 'country_id':840
'count':42, 'players' : [{'player_name: xxxx','ranking': 46, 'hand': 'r'}, {'player_name: yyy', 'ranking':20, 'hand': 'r'}...]},
{'date':1980, 'country': 'Czech Republic', 'country_id':203,
'count':42, 'players' : [{'player_name: xxxx','ranking': 46, 'hand':'r'},
{'player_name: yyy', 'ranking':20, 'hand': 'r'}...]},...
{'date':1982, 'country': 'United States', 'country_id':840,
'count':42, 'players' : [{'player_name: xxxx','ranking': 46, 'hand': 'r'},...]
HAND RANKING PLAYER_NAME
DATE COUNTRY COUNTRY_ID COUNT
1980 United States 840 42 R 46 Tim Gullikson
42 L 96 Nick Saviano
42 L 3 Jimmy Connors
42 L 79 Bruce Manson
Czech Republic 203 2 R 23 Tomas Smid
2 R 65 Pavel Slozil
New Zealand 554 3 R 66 Chris Lewis NZL
.
.
1982 United States 840 42 L 46 Van Winitsky
42 R 24 Steve Denton
42 R 26 Mel Purcell
3 R 76 Russell Simpson
.
.

combination of groupby, to_dict('records'), and other nuances
lod = []
for name, group in df.groupby(level=[0, 1, 2, 3]):
d = dict(zip(df.index.names, name))
d['players'] = group.to_dict('records')
lod.append(d)
lod
[{'count': 2,
'country': 'Czech Republic',
'country_id': 203,
'date': 1980,
'players': [{'HAND': 'R', 'PLAYER_NAME': 'Tomas Smid', 'RANKING': 23},
{'HAND': 'R', 'PLAYER_NAME': 'Pavel Slozil', 'RANKING': 65}]},
{'count': 3,
'country': 'New Zealand',
'country_id': 554,
'date': 1980,
'players': [{'HAND': 'R', 'PLAYER_NAME': 'Chris Lewis NZL', 'RANKING': 66}]},
{'count': 42,
'country': 'United States',
'country_id': 840,
'date': 1980,
'players': [{'HAND': 'R', 'PLAYER_NAME': 'Tim Gullikson', 'RANKING': 46},
{'HAND': 'L', 'PLAYER_NAME': 'Nick Saviano', 'RANKING': 96},
{'HAND': 'L', 'PLAYER_NAME': 'Jimmy Connors', 'RANKING': 3},
{'HAND': 'L', 'PLAYER_NAME': 'Bruce Manson', 'RANKING': 79}]},
{'count': 3,
'country': 'United States',
'country_id': 840,
'date': 1982,
'players': [{'HAND': 'R', 'PLAYER_NAME': 'Russell Simpson', 'RANKING': 76}]},
{'count': 42,
'country': 'United States',
'country_id': 840,
'date': 1982,
'players': [{'HAND': 'L', 'PLAYER_NAME': 'Van Winitsky', 'RANKING': 46},
{'HAND': 'R', 'PLAYER_NAME': 'Steve Denton', 'RANKING': 24},
{'HAND': 'R', 'PLAYER_NAME': 'Mel Purcell', 'RANKING': 26}]}]

Related

How do I get API data into a Pandas DataFrame?

I am pulling in betting data from an API and I would like to get it into a DataFrame. I would like the DataFrame to have the following columns: [away_team, home_team, spread, overUnder] I am using the following code:
import cfbd
configuration = cfbd.Configuration()
configuration.api_key['Authorization'] = 'XXX'
configuration.api_key_prefix['Authorization'] = 'Bearer'
from __future__ import print_function
import time
import cfbd
from cfbd.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cfbd.BettingApi(cfbd.ApiClient(configuration))
game_id = 56 # int | Game id filter (optional)
year = 56 # int | Year/season filter for games (optional)
week = 56 # int | Week filter (optional)
season_type = 'regular' # str | Season type filter (regular or postseason) (optional) (default to regular)
team = 'team_example' # str | Team (optional)
home = 'home_example' # str | Home team filter (optional)
away = 'away_example' # str | Away team filter (optional)
conference = 'conference_example' # str | Conference abbreviation filter (optional)
try:
# Betting lines
api_response = api_instance.get_lines(year=2021, week=7, season_type='regular', conference='SEC')
pprint(api_response)
except ApiException as e:
print("Exception when calling BettingApi->get_lines: %s\n" % e)
API Response:
[{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Auburn',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Arkansas',
'id': 401282104,
'lines': [{'awayMoneyline': 155,
'formattedSpread': 'Arkansas -3.5',
'homeMoneyline': -180,
'overUnder': '53.5',
'overUnderOpen': '53.0',
'provider': 'Bovada',
'spread': '-3.5',
'spreadOpen': '-3.5'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Kentucky',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Georgia',
'id': 401282105,
'lines': [{'awayMoneyline': 1000,
'formattedSpread': 'Georgia -23.5',
'homeMoneyline': -2200,
'overUnder': '44.5',
'overUnderOpen': '44.5',
'provider': 'Bovada',
'spread': '-23.5',
'spreadOpen': '-23.5'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Florida',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'LSU',
'id': 401282106,
'lines': [{'awayMoneyline': -370,
'formattedSpread': 'Florida -10.0',
'homeMoneyline': 285,
'overUnder': '58.5',
'overUnderOpen': '58.0',
'provider': 'Bovada',
'spread': '10.0',
'spreadOpen': '10.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Alabama',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Mississippi State',
'id': 401282107,
'lines': [{'awayMoneyline': -950,
'formattedSpread': 'Alabama -17.5',
'homeMoneyline': 600,
'overUnder': '57.5',
'overUnderOpen': '59.0',
'provider': 'Bovada',
'spread': '17.5',
'spreadOpen': '17.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Texas A&M',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Missouri',
'id': 401282108,
'lines': [{'awayMoneyline': -310,
'formattedSpread': 'Texas A&M -9.0',
'homeMoneyline': 255,
'overUnder': '60.5',
'overUnderOpen': '61.0',
'provider': 'Bovada',
'spread': '9.0',
'spreadOpen': '9.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Vanderbilt',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'South Carolina',
'id': 401282109,
'lines': [{'awayMoneyline': 750,
'formattedSpread': 'South Carolina -18.5',
'homeMoneyline': -1400,
'overUnder': '51.0',
'overUnderOpen': '51.0',
'provider': 'Bovada',
'spread': '-18.5',
'spreadOpen': '-20.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Ole Miss',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Tennessee',
'id': 401282110,
'lines': [{'awayMoneyline': -150,
'formattedSpread': 'Ole Miss -3.0',
'homeMoneyline': 130,
'overUnder': '80.5',
'overUnderOpen': '78.0',
'provider': 'Bovada',
'spread': '3.0',
'spreadOpen': '3.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7}]
I need help getting this output into a DataFrame. Thank you in advance.

You could iterate over the json data, extracting the information that you need and creating a new structure to hold this data. After iterating over all your data, you can create a dataframe from what you extracted.
I made an example using a dataclass to store the data you need:
import json
import pandas as pd
from dataclasses import dataclass
#dataclass
class BettingData:
away_team: str
home_team: str
spread: str
overUnder: str
json_data = json.loads(open('sample_data.json', 'r').read())
content = []
for entry in json_data:
for line in entry['lines']:
data = BettingData(away_team=entry['away_team'],
home_team=entry['home_team'],
spread=line['spread'],
overUnder=line['overUnder'])
content.append(data)
df = pd.DataFrame(content)
print(df)
And the output is:
away_team home_team spread overUnder
0 Auburn Arkansas -3.5 53.5
1 Kentucky Georgia -23.5 44.5
2 Florida LSU 10.0 58.5
3 Alabama Mississippi State 17.5 57.5
4 Texas A&M Missouri 9.0 60.5
5 Vanderbilt South Carolina -18.5 51.0
6 Ole Miss Tennessee 3.0 80.5

Pandas: How to filter rows in dataframe which is not equal to the combination of columns in other dataframe?

Below are the two dataframes. I was trying to filter rows in df_2 which are not equal to the combination of df_count rows. How can I achieve this objective?
import pandas as pd
df_1 = pd.DataFrame({'Name_1':['tom', 'jack', 'tom', 'jack', 'tom', 'nick', 'tom', 'jack', 'tom', 'jack'],
'Name_2':['sam', 'sam', 'ruby', 'sam','sam', 'jack', 'ruby', 'sam','ruby', 'sam']})
df_count = df_1.groupby(['Name_1','Name_2']).size().reset_index().rename(columns={0:'count'}).sort_values(['count'], ascending = False)
df_count = df_count.head(2)
df_count = df_count[['Name_1','Name_2']]
df_2 = pd.DataFrame({'Name_1':['tom', 'nick', 'tom', 'jack', 'tom', 'nick', 'tom', 'jack'],
'Name_2':['sam', 'mike', 'ruby', 'sam', 'sam', 'jack', 'ruby', 'sam'],
'Salary':[200, 500, 1000, 7000, 100, 300, 1200, 900],
'Currency':['AUD', 'CAD', 'JPY', 'USD', 'GBP', 'CAD', 'INR', 'USD']})

pd.merge(df_2,df_count, indicator=True, how='outer').query('_merge=="left_only"').drop('_merge', axis=1)
Output:
Name_1 Name_2 Salary Currency
0 tom sam 200 AUD
1 tom sam 100 GBP
2 nick mike 500 CAD
7 nick jack 300 CAD
Answer taken from here.

Pandas Outerjoin New Rows

I have two dataframes df1 and df2.
df1 = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col3': [100, 120, 130, 200, 190, 210],})
df2 = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col4': [120, 140, 120, 200, 190, 210],})
I do an outerjoin on the two dataframes:
df = pd.merge(df1, df2[['Col1', 'Col4']], on= 'Col1', how='outer')
I get a new dataframe but I don't get the entries for Col2 for df2. I get
df = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses', 'NaN', 'NaN', 'NaN'],
'Col3': [100, 120, 130, 200, 190, 210, 'NaN', 'NaN', 'NaN'],
'Col4': [120, 140, 120, 'NaN', 'NaN', 'NaN', '200', '190', '210']})
But what I want is:
df = pd.DataFrame({
'Col1': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat', 'mas', 'apc', 'ywt'],
'Col2': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Col3': [100, 120, 130, 200, 190, 210, 'NaN', 'NaN', 'NaN'],
'Col4': [120, 140, 120, 'NaN', 'NaN', 'NaN', '200', '190', '210']})
I want to have the entries for Col2 from df2 as new rows in the merged dataframe

Controlling decimal precision after resetting index of unstacked Pandas data frame

My data is as follows:
test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
})
I reset the index and create a pivot table, as follows:
temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')
Then, I create two new data frames:
s1 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2 = temp_table.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value
Then, I unstack these data frames:
s1.div(s2).unstack()
Which gives me:
Sector Germany USA
Manufacturer
---
BMW 19.117647 11.052632
Ford 42.592593 13.333333
Mercedes 20.454545 13.829787
Then, I reset the index:
df_out = s1.div(s2).reset_index()
Which gives me:
Manufacturer Sector Value
0 BMW Germany 19.117647
1 BMW USA 11.052632
2 Ford Germany 42.592593
3 Ford USA 13.333333
4 Mercedes Germany 20.454545
5 Mercedes USA 13.829787
I would like to be able to round the Value column to 2 decimal places.
I tried to use the round() function, as follows:
df_out['Value'].round(2)
But, this doesn't seem to affect the values when I call df_out again.
What is the best way to control the decimal precision in this case?
Thanks!

Huggingface's BERT tokenizer not adding pad token

It's not entirely clear from the documentation, but I can see that BertTokenizer is initialised with pad_token='[PAD]', so I assume when you encode with add_special_tokens=True then it would automatically pad it. Given that pad_token_id=0, I can't see any 0s in the token_ids however:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokens = tokenizer.tokenize(text)
token_ids = tokenizer.encode(text, add_special_tokens=True, max_length=2048)
# Print the original sentence.
print('Original: ', text)
# Print the sentence split into tokens.
print('\nTokenized: ', tokens)
# Print the sentence mapped to token ids.
print('\nToken IDs: ', token_ids)
Output:
Original: Toronto's key stock index ended higher in brisk trading on Thursday, extending Wednesday's rally despite being weighed down by losses on Wall Street.
The TSE 300 Composite Index rose 29.80 points to close at 5828.62, outperforming the Dow Jones Industrial Average which slumped 21.27 points to finish at 6658.60.
Toronto added to Wednesday's 55-point rally while investors took profits in New York after the Dow's 92-point gains, said MMS International analyst Katherine Beattie.
"That shows that the markets are very fragile," Beattie said. "They (investors) want to take advantage of any strength to sell," she said.
Toronto was also buoyed by its heavyweight gold group which jumped nearly 2.2 percent, aided by firmer COMEX gold prices. The key June contract rose $1.00 to $344.30.
Ten of Toronto's 14 sub-indices posted gains, led by golds, transportation, forestry products and consumer products.
The weak side included conglomerates, base metals and utilities.
Trading was heavy at 100 million shares worth C$1.54 billion ($1.1 billion).
Advancing stocks outnumbered declines 556 to 395, with 276 issues flat.
Among hot stocks, Bre-X Minerals Ltd. rose 0.13 to 2.30 on 5.0 million shares as investors continued to consider the viability of its Busang gold discovery in Indonesia.
Kenting Energy Services Inc. rose 0.25 to 9.05 after Precision Drilling Corp. amended its takeover offer
Bakery and foodstuffs maker George Weston Ltd. jumped 4.50 to close at 74.50, the TSE's top gainer.
Tokenized: ['toronto', "'", 's', 'key', 'stock', 'index', 'ended', 'higher', 'in', 'brisk', 'trading', 'on', 'thursday', ',', 'extending', 'wednesday', "'", 's', 'rally', 'despite', 'being', 'weighed', 'down', 'by', 'losses', 'on', 'wall', 'street', '.', 'the', 'ts', '##e', '300', 'composite', 'index', 'rose', '29', '.', '80', 'points', 'to', 'close', 'at', '58', '##28', '.', '62', ',', 'out', '##per', '##form', '##ing', 'the', 'dow', 'jones', 'industrial', 'average', 'which', 'slumped', '21', '.', '27', 'points', 'to', 'finish', 'at', '66', '##58', '.', '60', '.', 'toronto', 'added', 'to', 'wednesday', "'", 's', '55', '-', 'point', 'rally', 'while', 'investors', 'took', 'profits', 'in', 'new', 'york', 'after', 'the', 'dow', "'", 's', '92', '-', 'point', 'gains', ',', 'said', 'mm', '##s', 'international', 'analyst', 'katherine', 'beat', '##tie', '.', '"', 'that', 'shows', 'that', 'the', 'markets', 'are', 'very', 'fragile', ',', '"', 'beat', '##tie', 'said', '.', '"', 'they', '(', 'investors', ')', 'want', 'to', 'take', 'advantage', 'of', 'any', 'strength', 'to', 'sell', ',', '"', 'she', 'said', '.', 'toronto', 'was', 'also', 'bu', '##oy', '##ed', 'by', 'its', 'heavyweight', 'gold', 'group', 'which', 'jumped', 'nearly', '2', '.', '2', 'percent', ',', 'aided', 'by', 'firm', '##er', 'come', '##x', 'gold', 'prices', '.', 'the', 'key', 'june', 'contract', 'rose', '$', '1', '.', '00', 'to', '$', '344', '.', '30', '.', 'ten', 'of', 'toronto', "'", 's', '14', 'sub', '-', 'indices', 'posted', 'gains', ',', 'led', 'by', 'gold', '##s', ',', 'transportation', ',', 'forestry', 'products', 'and', 'consumer', 'products', '.', 'the', 'weak', 'side', 'included', 'conglomerate', '##s', ',', 'base', 'metals', 'and', 'utilities', '.', 'trading', 'was', 'heavy', 'at', '100', 'million', 'shares', 'worth', 'c', '$', '1', '.', '54', 'billion', '(', '$', '1', '.', '1', 'billion', ')', '.', 'advancing', 'stocks', 'outnumbered', 'declines', '55', '##6', 'to', '395', ',', 'with', '276', 'issues', 'flat', '.', 'among', 'hot', 'stocks', ',', 'br', '##e', '-', 'x', 'minerals', 'ltd', '.', 'rose', '0', '.', '13', 'to', '2', '.', '30', 'on', '5', '.', '0', 'million', 'shares', 'as', 'investors', 'continued', 'to', 'consider', 'the', 'via', '##bility', 'of', 'its', 'bus', '##ang', 'gold', 'discovery', 'in', 'indonesia', '.', 'kent', '##ing', 'energy', 'services', 'inc', '.', 'rose', '0', '.', '25', 'to', '9', '.', '05', 'after', 'precision', 'drilling', 'corp', '.', 'amended', 'its', 'takeover', 'offer', 'bakery', 'and', 'foods', '##tu', '##ffs', 'maker', 'george', 'weston', 'ltd', '.', 'jumped', '4', '.', '50', 'to', 'close', 'at', '74', '.', '50', ',', 'the', 'ts', '##e', "'", 's', 'top', 'gain', '##er', '.']
Token IDs: [101, 4361, 1005, 1055, 3145, 4518, 5950, 3092, 3020, 1999, 28022, 6202, 2006, 9432, 1010, 8402, 9317, 1005, 1055, 8320, 2750, 2108, 12781, 2091, 2011, 6409, 2006, 2813, 2395, 1012, 1996, 24529, 2063, 3998, 12490, 5950, 3123, 2756, 1012, 3770, 2685, 2000, 2485, 2012, 5388, 22407, 1012, 5786, 1010, 2041, 4842, 14192, 2075, 1996, 23268, 3557, 3919, 2779, 2029, 14319, 2538, 1012, 2676, 2685, 2000, 3926, 2012, 5764, 27814, 1012, 3438, 1012, 4361, 2794, 2000, 9317, 1005, 1055, 4583, 1011, 2391, 8320, 2096, 9387, 2165, 11372, 1999, 2047, 2259, 2044, 1996, 23268, 1005, 1055, 6227, 1011, 2391, 12154, 1010, 2056, 3461, 2015, 2248, 12941, 9477, 3786, 9515, 1012, 1000, 2008, 3065, 2008, 1996, 6089, 2024, 2200, 13072, 1010, 1000, 3786, 9515, 2056, 1012, 1000, 2027, 1006, 9387, 1007, 2215, 2000, 2202, 5056, 1997, 2151, 3997, 2000, 5271, 1010, 1000, 2016, 2056, 1012, 4361, 2001, 2036, 20934, 6977, 2098, 2011, 2049, 8366, 2751, 2177, 2029, 5598, 3053, 1016, 1012, 1016, 3867, 1010, 11553, 2011, 3813, 2121, 2272, 2595, 2751, 7597, 1012, 1996, 3145, 2238, 3206, 3123, 1002, 1015, 1012, 4002, 2000, 1002, 29386, 1012, 2382, 1012, 2702, 1997, 4361, 1005, 1055, 2403, 4942, 1011, 29299, 6866, 12154, 1010, 2419, 2011, 2751, 2015, 1010, 5193, 1010, 13116, 3688, 1998, 7325, 3688, 1012, 1996, 5410, 2217, 2443, 22453, 2015, 1010, 2918, 11970, 1998, 16548, 1012, 6202, 2001, 3082, 2012, 2531, 2454, 6661, 4276, 1039, 1002, 1015, 1012, 5139, 4551, 1006, 1002, 1015, 1012, 1015, 4551, 1007, 1012, 10787, 15768, 21943, 26451, 4583, 2575, 2000, 24673, 1010, 2007, 25113, 3314, 4257, 1012, 2426, 2980, 15768, 1010, 7987, 2063, 1011, 1060, 13246, 5183, 1012, 3123, 1014, 1012, 2410, 2000, 1016, 1012, 2382, 2006, 1019, 1012, 1014, 2454, 6661, 2004, 9387, 2506, 2000, 5136, 1996, 3081, 8553, 1997, 2049, 3902, 5654, 2751, 5456, 1999, 6239, 1012, 5982, 2075, 2943, 2578, 4297, 1012, 3123, 1014, 1012, 2423, 2000, 1023, 1012, 5709, 2044, 11718, 15827, 13058, 1012, 13266, 2049, 15336, 3749, 18112, 1998, 9440, 8525, 21807, 9338, 2577, 12755, 5183, 1012, 5598, 1018, 1012, 2753, 2000, 2485, 2012, 6356, 1012, 2753, 1010, 1996, 24529, 2063, 1005, 1055, 2327, 5114, 2121, 1012, 102]

No, it would not. There is a different parameter to allow padding:
transformers >=3.0.0 padding (accepts True, max_length and False as values)
transformers < 3.0.0 pad_to_max_length (accepts True or False as Values)
add_special_tokens will add the [CLS] and the [SEP] token (101 and 102 respectively).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creating a list of nested dict from DataFrame? - pandas

Related

How do I get API data into a Pandas DataFrame?

Pandas: How to filter rows in dataframe which is not equal to the combination of columns in other dataframe?

Pandas Outerjoin New Rows

Controlling decimal precision after resetting index of unstacked Pandas data frame

Huggingface's BERT tokenizer not adding pad token

Categories

Resources