Flask button to save table from query as csv - sql

I have a flask app that runs a query and returns a table. I would like to provide a button on the page so the user can export the data as a csv.
The problem is that the query is generated dynamically based on form input.
#app.route('/report/<int:account_id>', methods=['GET'])
def report(account_id):
if request == 'GET':
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE account_id = :account_id', account_id=account_id)
entries = [dict(title=row[0], text=row[1]) for row in c.fetchall()]
return render_template('show_results.html', entries=entries)
On the html side it's just a simple table, looping over the rows and rendering them. I'm using bootstrap for styling, and included a tablesorter jquery plugin. None of this is really consequential. I did try one javascript exporter I found, but since my content is rendered dynamically, it saves a blank CSV.
Do I need to do some ajax-style trickery to grab a csv object from the route?

I solved this myself. For anyone who comes across this I find it valuable for the specific use case within flask. Here's what I did.
import cx_Oracle # We are an Oracle shop, and this changes some things
import csv
import StringIO # allows you to store response object in memory instead of on disk
from flask import Flask, make_response # Necessary imports, should be obvious
#app.route('/export/<int:identifier>', methods=['GET'])
def export(load_file_id):
si = StringIO.StringIO()
cw = csv.writer(si)
c = g.db.cursor()
c.execute('SELECT * FROM TABLE WHERE column_val = :identifier', identifier=identifier)
rows = c.fetchall()
cw.writerow([i[0] for i in c.description])
cw.writerows(rows)
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response

For anyone using flask with sqlalchemy, here's an adjustment to tadamhicks answer, also with a library update:
import csv
from io import StringIO
from flask import make_response
si = StringIO()
cw = csv.writer(si)
records = myTable.query.all() # or a filtered set, of course
# any table method that extracts an iterable will work
cw.writerows([(r.fielda, r.fieldb, r.fieldc) for r in records])
response = make_response(si.getvalue())
response.headers['Content-Disposition'] = 'attachment; filename=report.csv'
response.headers["Content-type"] = "text/csv"
return response

Related

Twint: Twitter Data Extraction

I am extracting data from Twitter using Twint. The following code extracts tweets that contain my target keyword. After extracting all the relevant tweets, it changes the data to a dataframe (Tweets_df). While it is running it is storing the data in RAM. This is making it slower. I want to export each tweet and append it to the dataframe when the code is running. any help is appreciated.
c = twint.Config()
c.Lang = "en"
c.Pandas = True
c.Store_csv = True
c.Search = "keyword"
twint.run.Search(c)
Tweets_df = twint.storage.panda.Tweets_df
Here is what I am trying to add: I don't want to wait until it finishes running to export the data.
for index,tweet in Tweets_df.iterrows():
with open('filename', 'a') as csvfile:
f=csv.writer(csvfile)
for i in my_data:
f.writerow(i)
csvfile.close()

Appending data into gsheet using google colab without giving a range

I have a simple example to test on how to append data to a tab in gsheet using colab. For example here is the first code snippet to update the data first time;
from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)
df = pd.DataFrame({'a': ['apple','airplane','alligator'], 'b': ['banana', 'ball', 'butterfly'], 'c': ['cantaloupe', 'crane', 'cat']})
df2 = [df.columns.to_list()] + df.values.tolist()
wb = gc.open('test_wbr_feb13')
wsresults2 = wb.worksheet('Sheet2')
wsresults2.update(None,df2)
This works for me as show in the screenshot;
First screenshot
Since it is my work account, I am not able to give a link of the gsheet, apologies for that. Next I need to check if we can append the data to existing data. To this end, I use following code;
from gspread_dataframe import get_as_dataframe, set_with_dataframe
wb = gc.open('test_wbr_feb13')
wsresults2 = wb.worksheet('Sheet2')
set_with_dataframe(wsresults2, df)
Please note that we don't know the rows from which we need to insert the data, it can be variable depending on the data size. But the output is still the same, plz see screenshot. Can I please get help on how to append data into gsheet using this approach? thanks
Second screenshot

How to filter Socrata API dataset by multiple values for a single field?

I am attempting to create a CSV file using Python by reading from this specific api:
https://dev.socrata.com/foundry/data.cdc.gov/5jp2-pgaw
Where I'm running into trouble is that I would like to specify multiple values of "loc_admin_zip" to search for at once. For example, returning a CSV file where the zip is either "10001" or "10002". However, I can't figure out how to do this, I can only get it to work if "loc_admin_zip" is set to a single value. Any help would be appreciated. My code so far:
import pandas as pd
from sodapy import Socrata
client = Socrata("data.cdc.gov", None)
results = client.get("5jp2-pgaw",loc_admin_zip = 10002)
results_df = pd.DataFrame.from_records(results)
results_df.to_csv('test.csv')

Flask: how to paginate cx_Oracle data between successive requests?

My Flask app needs to return a huge dataframe to the client application.
I'm using the pandas function read_sql to fetch chunks of data, for example:
import pandas as pd
sql = "select * from huge_table"
iterator = pd.read_sql(sql, con=my_cx_oracle_connection, chunksize=1000)
Where iterator would be used to fetch the whole data divided into small chunks of 1000 records each:
data = next(iterator, [])
while data:
yield data
data = next(iterator, [])
With this approach, I guess can "stream", or at least, paginate the data just as described in the Flask documentation.
However, to be able to do so, I would need to retain the state of the iterator between the HTTP /GET requests. How should one do this? Do I need some sort of global variable? But then, what about multiple clients?!
I'm missing something to make it work properly, and avoid fetching the same part of the data over and over.
Thanks.

Scrapy pull data from table rows

I'm trying to pull data from this page using Scrapy: https://www.interpol.int/notice/search/woa/1192802
The spider will crawl multiple pages but I have excluded the pagination code here to keep things simple. The problem is that the number of table rows that I want to scrape on each page can change each time.
So I need a way of scraping all the table data from the page no matter how many table rows it has.
First, I extracted all the table rows on the page. Then, I created a blank dictionary. Next, I tried to loop through each row and put it's cell data into the dictionary.
But it does not work and it is returning a blank file.
Any idea what's wrong?
# -*- coding: utf-8 -*-
import scrapy
class Test1Spider(scrapy.Spider):
name = 'test1'
allowed_domains = ['interpol.int']
start_urls = ['https://www.interpol.int/notice/search/woa/1192802']
def parse(self, response):
table_rows = response.xpath('//*[contains(#class,"col_gauche2_result_datasheet")]//tr').extract()
data = {}
for table_row in table_rows:
data.update({response.xpath('//td[contains(#class, "col1")]/text()').extract(): response.css('//td[contains(#class, "col2")]/text()').extract()})
yield data
What is this?
response.css('//td[contains(#class, "col2")]/text()').extract()
You are calling css() method but you are giving it a xpath
Anyways, here is the 100% working code, I have tested it.
table_rows = response.xpath('//*[contains(#class,"col_gauche2_result_datasheet")]//tr')
data = {}
for table_row in table_rows:
data[table_row.xpath('td[#class="col1"]/text()').extract_first().strip()] = table_row.xpath('td[#class="col2 strong"]/text()').extract_first().strip()
yield data
EDIT:
To remove the characters like \t\n\r etc, use regex.
import re
your_string = re.sub('\\t|\\n|\\r', '', your_string)
Try this.
I Hope it will help you.
# -*- coding: utf-8 -*-
import scrapy
class Test(scrapy.Spider):
name = 'test1'
allowed_domains = ['interpol.int']
start_urls = ['https://www.interpol.int/notice/search/woa/1192802']
def parse(self, response):
table_rows = response.xpath('//*[contains(#class,"col_gauche2_result_datasheet")]//tr')
for table_row in table_rows:
current_row = table_row.xpath('.//td/text()').extract()
print(current_row[0] + current_row[1].strip())