Scraping data from website with selenium and pandas - pandas

Scraping a website and the table looks like this and i think it has two tables because the rank and names are a separate table, so im not sure how to get all that and put it all together as 1 csv
this is the website i want to scrape, its a partial table without membership
https://fantasydata.com/nba/dfs-projections/fanduel?date=02-03-2022&dfsoperator=2&dfsslateid=18504
screenshot
Im useing
WebDriverWait(driver,30).until(EC.presence_of_element_located((By.CSS_SELECTOR,"table")))
dfs = pd.read_html(driver.page_source, header=None)
driver.implicitly_wait(120)
dvp_projections = {}
for idx, table in enumerate(dfs):
temp_df = table.iloc[1:]
dvp_projections[idx] = temp_df
temp_df.to_csv('/home/joe/NBA/Sportsdata_dvp.csv' ,index=False)
but im only getting this and also im missing the header
List

What you'll want to do is join/merge/concat the tables. But you'll want them to concat the columns, not the rows (which is what pd.concat() does by default.)
So try:
df = pd.concat(dfs, axis=1)

Related

How can I iterate/ transpose /append a data frame to another one?

for row in range(1, len(df)):
try:
df_out, orthogroup, len_group = HOG_get_group_stats(df.loc[row, "HOG"])
temp_df = pd.DataFrame()
for id in range(len(df_out)):
print(" ")
temp_df = pd.concat([df, pd.DataFrame(df_out.iloc[id, :]).T], axis=1)
temp_df["HOG"] = orthogroup
temp_df["len_group"] = len_group
print(temp_df)
except:
print(row, "no")
Here I have a script that does the following:
Iterate over df and apply the HOG_get_group_stats function to the HOG column in df and then, get 3 variables as outputs. (Basically, the function creates some stats as a data frame called df_out, and extracts some information as two more columns called orthogroup, len_group)
Create an empty template called temp_df
Transpose the df_out data frame and make it one single row and then, concatenate with the df we used in the beginning as columns.
Add orthogroup, len_group columns to the end of the temp_df
Problem:
It prints out the data however, when I try to see the temp_df as a data frame it shows only a single row ( probably the last one) means that my concatenation of several data frames doesn't work.
Questions:
How can I iterate and then append a data frame as columns?
Is there an easier way to iterate over a data frame? (e.g. iterrow)
Is there a better way to transpose rows to columns in a data frame? ( e.g. pivot, melt)
Any help would be appreciated!!
You can find the sample files to df, df_out,temp_df and expected output_sample table here :
Sample_files

How to index a column with two values pandas

I have two dataframes:
Dataframe #1
Reads the values--Will only be interested in NodeID AND GSE
sta = pd.read_csv(filename)
Dataframe #2
Reads the file, use pivot and get the following result
sim = pd.read_csv(headout,index_col=0)
sim['Layer'] = sim.groupby('date').cumcount() + 1
sim['Layer'] = 'L' + sim['Layer'].astype(str)
sim = sim.pivot(index = None , columns = 'Layer').T
This gives me the index column to be with two values. (The header is blank for the first one, and Layers for the second) i.e 1,L1.
What I need help on is:
I can not find a way to rename that first blank in the index to 'NodeID'.
I want to name it that so that I can do the lookup function and use NodeID in both dataframes so that I can bring in the 'GSE' values from the first dataframe to the second.
I have been googling way to rename that first column in the second dataframe and I can not seem to find an solution. Any ideas help at this point. I think my pivot function might be wrong...
This is a picture of dataframe #2 before pivot. The number 1-4 are the Node ID.
when I export it to csv to see what the dataframe looks like I get this..
Try
df.rename(columns={"Index": "your preferred name"})
if it is your index then do -
df = df.reset_index()
df.rename(columns={"index": "your preferred name"})

How to concat 3 dataframes with each into sequential columns

I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]

Merge two csv files that have a similar row structure but no common index between them

I have two csv files that I want to merge, by adding the column information from one csv to another. However they have no common index between them, but they do have the same amount of rows(they are in order). I have seen many examples of joining csv files based on index and on same numbers, however my csv files have no similar index, but they are in order. I've tried a few different examples with no luck.
mycsvfile1
"a","1","mike"
"b","2","sally"
"c","3","derek"
mycsvfile2
"boy","63","retired"
"girl","55","employed"
"boy","22","student"
Desired outcome for outcsvfile3
"a","1","mike","boy","63","retired"
"b","2","sally","girl","55","employed"
"c","3","derek","boy","22","student"
Code:
import csv
import panada
df2 = pd.read_csv("mycsvfile1.csv",header=None)
df1 = pd.read_csv("mycsvfile2.csv", header=None)
df3 = pd.merge(df1,df2)
Using
df3 = pd.merge([df1,df2])
Adds the data into a new row which doesn't help me. Any assistance is greatly appreciated.
If both dataframes have numbered indexes (i.e. starting at 0 and increasing by 1 - which is the default behaviour of pd.read_csv), and assuming that both DataFrames are already sorted in the correct order so that the rows match up, then this should do it:
df3 = pd.merge(df1,df2, left_index=True, right_index=True)
You do not have any common columns between df1 and df2 , besides the index . So we can using concat
pd.concat([df1,df2],axis=1)

Is there a faster way through list comprehension to iterate through two dataframes?

I have two dataframes, one contains screen names/display names and another contains individuals, and I am trying to create a third dataframe that contains all the data from each dataframe in a new row for each time a last name appears in the screen name/display name. Functionally this will create a list of possible matching names. My current code, which works perfectly but very slowly, looks like this:
# Original Social Media Screen Names
# cols = 'userid','screen_name','real_name'
usernames = pd.read_csv('social_media_accounts.csv')
# List Of Individuals To Match To Accounts
# cols = 'first_name','last_name'
individuals = pd.read_csv('individuals_list.csv')
userid, screen_name, real_name, last_name, first_name = [],[],[],[],[]
for index1, row1 in individuals.iterrows():
for index2, row2 in usernames.iterrows():
if (row2['Screen_Name'].lower().find(row1['Last_Name'].lower()) != -1) | (row2['Real_Name'].lower().find(row1['Last_Name'].lower()) != -1):
userid.append(row2['UserID'])
screen_name.append(row2['Screen_Name'])
real_name.append(row2['Real_Name'])
last_name.append(row1['Last_Name'])
first_name.append(row1['First_Name'])
cols = ['UserID', 'Screen_Name', 'Real_Name', 'Last_Name', 'First_Name']
index = range(0, len(userid))
match_list = pd.DataFrame(index=index, columns=cols)
match_list = match_list.fillna('')
match_list['UserID'] = userid
match_list['Screen_Name'] = screen_name
match_list['Real_Name'] = real_name
match_list['Last_Name'] = last_name
match_list['First_Name'] = first_name
Because I need the whole row from each column, the list comprehension methods I have tried do not seem to work.
The thing you want is to iterate through a dataframe faster. Doing that with a list comprehension is, taking data out of a pandas dataframe, handling it using operations in python, then putting it back in a pandas dataframe. The fastest way (currently, with small data) would be to handle it using pandas iteration methods.
The next thing you want to do is work with 2 dataframes. There is a tool in pandas called join.
result = pd.merge(usernames, individuals, on=['Screen_Name', 'Last_Name'])
After the merge you can do your filtering.
Here is the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html