Appending data into gsheet using google colab without giving a range - google-colaboratory

I have a simple example to test on how to append data to a tab in gsheet using colab. For example here is the first code snippet to update the data first time;
from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)
df = pd.DataFrame({'a': ['apple','airplane','alligator'], 'b': ['banana', 'ball', 'butterfly'], 'c': ['cantaloupe', 'crane', 'cat']})
df2 = [df.columns.to_list()] + df.values.tolist()
wb = gc.open('test_wbr_feb13')
wsresults2 = wb.worksheet('Sheet2')
wsresults2.update(None,df2)
This works for me as show in the screenshot;
First screenshot
Since it is my work account, I am not able to give a link of the gsheet, apologies for that. Next I need to check if we can append the data to existing data. To this end, I use following code;
from gspread_dataframe import get_as_dataframe, set_with_dataframe
wb = gc.open('test_wbr_feb13')
wsresults2 = wb.worksheet('Sheet2')
set_with_dataframe(wsresults2, df)
Please note that we don't know the rows from which we need to insert the data, it can be variable depending on the data size. But the output is still the same, plz see screenshot. Can I please get help on how to append data into gsheet using this approach? thanks
Second screenshot

Related

How to filter Socrata API dataset by multiple values for a single field?

I am attempting to create a CSV file using Python by reading from this specific api:
https://dev.socrata.com/foundry/data.cdc.gov/5jp2-pgaw
Where I'm running into trouble is that I would like to specify multiple values of "loc_admin_zip" to search for at once. For example, returning a CSV file where the zip is either "10001" or "10002". However, I can't figure out how to do this, I can only get it to work if "loc_admin_zip" is set to a single value. Any help would be appreciated. My code so far:
import pandas as pd
from sodapy import Socrata
client = Socrata("data.cdc.gov", None)
results = client.get("5jp2-pgaw",loc_admin_zip = 10002)
results_df = pd.DataFrame.from_records(results)
results_df.to_csv('test.csv')

How to import Pandas data frames in a loop [duplicate]

So what I'm trying to do is the following:
I have 300+ CSVs in a certain folder. What I want to do is open each CSV and take only the first row of each.
What I wanted to do was the following:
import os
list_of_csvs = os.listdir() # puts all the names of the csv files into a list.
The above generates a list for me like ['file1.csv','file2.csv','file3.csv'].
This is great and all, but where I get stuck is the next step. I'll demonstrate this using pseudo-code:
import pandas as pd
for index,file in enumerate(list_of_csvs):
df{index} = pd.read_csv(file)
Basically, I want my for loop to iterate over my list_of_csvs object, and read the first item to df1, 2nd to df2, etc. But upon trying to do this I just realized - I have no idea how to change the variable being assigned when doing the assigning via an iteration!!!
That's what prompts my question. I managed to find another way to get my original job done no problemo, but this issue of doing variable assignment over an interation is something I haven't been able to find clear answers on!
If i understand your requirement correctly, we can do this quite simply, lets use Pathlib instead of os which was added in python 3.4+
from pathlib import Path
csvs = Path.cwd().glob('*.csv') # creates a generator expression.
#change Path(your_path) with Path.cwd() if script is in dif location
dfs = {} # lets hold the csv's in this dictionary
for file in csvs:
dfs[file.stem] = pd.read_csv(file,nrows=3) # change nrows [number of rows] to your spec.
#or with a dict comprhension
dfs = {file.stem : pd.read_csv(file) for file in Path('location\of\your\files').glob('*.csv')}
this will return a dictionary of dataframes with the key being the csv file name .stem adds this without the extension name.
much like
{
'csv_1' : dataframe,
'csv_2' : dataframe
}
if you want to concat these then do
df = pd.concat(dfs)
the index will be the csv file name.

How to properly iterate over a for loop using Dask?

When I run a loop like this (see below) using dask and pandas, only the last field in the list gets evaluated. Presumably this is because of "lazy-evaluation"
import pandas as pd
import dask.dataframe as ddf
df_dask = ddf.from_pandas(df, npartitions=16)
for field in fields:
df_dask["column__{field}".format(field=field)] = df_dask["column"].apply(lambda _: [__ for __ in _ if (__ == field)], meta=list)
If I add .compute() to the last line:
df_dask["column__{field}".format(field=field)] = df_dask["column"].apply(lambda _: [__ for __ in _ if (__ == field)], meta=list).compute()
it then works correctly, but is this the most efficient way of doing this operation? Is there a way for Dask to add all the items from the fields list at once, and then run them in one-shot via compute()?
edit ---------------
Please see screenshot below for a worked example
You will want to call .compute() at the end of your computation to trigger work. Warning: .compute assumes that your result will fit in memory
Also, watch out, lambdas late-bind in Python, so the field value may end up being the same for all of your columns.
Here's one way to do it, where string check is just a sample function that returns True/False. The issue was the late binding of lambda functions.
from functools import partial
def string_check(string, search):
return search in string
search_terms = ['foo', 'bar']
for s in search_terms:
string_check_partial = partial(string_check, search=s)
df[s] = df['YOUR_STRING_COL'].apply(string_check_partial)

Flask button to save table from query as csv

I have a flask app that runs a query and returns a table. I would like to provide a button on the page so the user can export the data as a csv.
The problem is that the query is generated dynamically based on form input.
#app.route('/report/<int:account_id>', methods=['GET'])
def report(account_id):
if request == 'GET':
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE account_id = :account_id', account_id=account_id)
entries = [dict(title=row[0], text=row[1]) for row in c.fetchall()]
return render_template('show_results.html', entries=entries)
On the html side it's just a simple table, looping over the rows and rendering them. I'm using bootstrap for styling, and included a tablesorter jquery plugin. None of this is really consequential. I did try one javascript exporter I found, but since my content is rendered dynamically, it saves a blank CSV.
Do I need to do some ajax-style trickery to grab a csv object from the route?
I solved this myself. For anyone who comes across this I find it valuable for the specific use case within flask. Here's what I did.
import cx_Oracle # We are an Oracle shop, and this changes some things
import csv
import StringIO # allows you to store response object in memory instead of on disk
from flask import Flask, make_response # Necessary imports, should be obvious
#app.route('/export/<int:identifier>', methods=['GET'])
def export(load_file_id):
si = StringIO.StringIO()
cw = csv.writer(si)
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE column_val = :identifier', identifier=identifier)
rows = c.fetchall()
cw.writerow([i[0] for i in c.description])
cw.writerows(rows)
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response
For anyone using flask with sqlalchemy, here's an adjustment to tadamhicks answer, also with a library update:
import csv
from io import StringIO
from flask import make_response
si = StringIO()
cw = csv.writer(si)
records = myTable.query.all() # or a filtered set, of course
# any table method that extracts an iterable will work
cw.writerows([(r.fielda, r.fieldb, r.fieldc) for r in records])
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response

Deep copy of a pandas panel?

When I try to copy a pandas panel object using the instructions provided in the online documentation, I do not get the expected bahavior.
Maybe this will illustrate the problem:
import pandas as pd
# make first panel with some bogus numbers
dates = pd.date_range('20130101',periods=6)
df1 = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('EFGH'))
pnl = {}
pnl['alpha'] = df1
pnl['beta'] = df2
# copy pnl into pnl2
# according to online docs the default is 'deep=True'
# but it chokes when I try to specify deep=True
pnl2 = pnl.copy()
# now delete column C from pnl2['alpha']
del pnl2['alpha']['C']
#Now when I try to find column C in the original panel (pnl) it's gone!
I figure there must be a slick solution to this, but I couldn't find it in the online docs, nor in Wes McKinney's book (my only book on pandas...).
Any tips/advise much appreciated!
You didn't make a Panel, just a dict of DataFrames. Add this line to convert it to a Panel object, and it should work as you expect.
pnl = pd.Panel(pnl)