Read pivot data from a sheet - google-sheets-api

Read pivot data from a sheet - google-sheets-api

I would like to read pivot table with the google spreadsheet API with the python client to reproduce the pivot table on another sheet.
I saw this in the documentation https://developers.google.com/sheets/api/samples/pivot-tables#read_pivot_table_data but I don't know how to access to this API.

You have how to GET as an example in Python using the google-api-python-client library on the Method: spreadsheets.get reference for Google Sheets:
"""
BEFORE RUNNING:
---------------
1. If not already done, enable the Google Sheets API
and check the quota for your project at
https://console.developers.google.com/apis/api/sheets
2. Install the Python client library for Google APIs by running
`pip install --upgrade google-api-python-client`
"""
from pprint import pprint
from googleapiclient import discovery
# TODO: Change placeholder below to generate authentication credentials. See
# https://developers.google.com/sheets/quickstart/python#step_3_set_up_the_sample
#
# Authorize using one of the following scopes:
# 'https://www.googleapis.com/auth/drive'
# 'https://www.googleapis.com/auth/drive.file'
# 'https://www.googleapis.com/auth/drive.readonly'
# 'https://www.googleapis.com/auth/spreadsheets'
# 'https://www.googleapis.com/auth/spreadsheets.readonly'
credentials = None
service = discovery.build('sheets', 'v4', credentials=credentials)
# The spreadsheet to request.
spreadsheet_id = 'my-spreadsheet-id' # TODO: Update placeholder value.
# The ranges to retrieve from the spreadsheet.
ranges = [] # TODO: Update placeholder value.
# True if grid data should be returned.
# This parameter is ignored if a field mask was set in the request.
include_grid_data = False # TODO: Update placeholder value.
request = service.spreadsheets().get(spreadsheetId=spreadsheet_id, ranges=ranges, includeGridData=include_grid_data)
response = request.execute()
# TODO: Change code below to process the `response` dict:
pprint(response)

Related

Can Jupyter labs or pandas serve static files?

If I am running a Jupyter notebook server where an analyst can browse into server URL to run IPython, (typical IPython but hosted remotely) can the analyst export the data in CSV form as well to their local machine? Like download CSV file they are working on from a Pandas data wrangling process?
For example if a Pandas data frame is created something like this method below of IPython hosted remotely from an SQL query:
# Example python program to read data from a PostgreSQL table
# and load into a pandas DataFrame
import psycopg2
import pandas as pds
from sqlalchemy import create_engine
# Create an engine instance
alchemyEngine = create_engine('postgresql+psycopg2://test:#127.0.0.1', pool_recycle=3600);
# Connect to PostgreSQL server
dbConnection = alchemyEngine.connect();
# Read data from PostgreSQL database table and load into a DataFrame instance
dataFrame = pds.read_sql("select * from \"StudentScores\"", dbConnection);
pds.set_option('display.expand_frame_repr', False);
# Print the DataFrame
print(dataFrame);
# Close the database connection
dbConnection.close();
Would Jupyter labs have something to similar of serving a static file like a web server can do? For example a Flask server you can serve static file, sorry NOT a web developer but I have experimented with this in with Flask. The Flask code looks like this below for send_file to serve a static file:
from flask import Flask, request, send_file
import io
import os
import csv
app = Flask(__name__)
#app.route('/get_csv')
def get_csv():
"""
Returns the monthly weather csv file (Montreal, year=2019)
corresponding to the month passed as parameter.
"""
# Checking that the month parameter has been supplied
if not "month" in request.args:
return "ERROR: value for 'month' is missing"
# Also make sure that the value provided is numeric
try:
month = int(request.args["month"])
except:
return "ERROR: value for 'month' should be between 1 and 12"
csv_dir = "./static"
csv_file = "2019_%02d_weather.csv" % month
csv_path = os.path.join(csv_dir, csv_file)
# Also make sure the requested csv file does exist
if not os.path.isfile(csv_path):
return "ERROR: file %s was not found on the server" % csv_file
# Send the file back to the client
return send_file(csv_path, as_attachment=True, attachment_filename=csv_file)
Can a IPython or pandas serve a static file if IPython is hosted on a remote machine?

Send a pickled dataframe as a file to FastAPI route

I would like to post a pickled dataframe file into a FastAPI route. However, I keep getting an error. Could anyone suggest how to fix my scripts?
This is interesting for me since I would like to preserve the dataframe column types & details. Therefore, posting the dataframe directly to a route is not an option as those details & types would be lost after JSONification.
main.py
from fastapi import FastAPI, File
app = FastAPI(debug=True)
#app.post("/luminex_file/")
async def handle_luminex_data(file: bytes = File(...)):
df = pd.read_pickle(io.BytesIO(file))
return {"file_size": len(file), "df": df}
request.py
files = {'file': open('dataframe.pickle', 'rb')}
response = requests.post("http://127.0.0.1:8000/luminex_file/", files=files)

How to feed an audio file from S3 bucket directly to Google speech-to-text

We are developing a speech application using Google's speech-to-text API. Now our data (audio files) get stored in S3 bucket on AWS. is there a way to directly pass the S3 URI to Google's speech-to-text API?
From their documentation it seems this is at the moment not possible in Google's speech-to-text API
This is not the case for their vision and NLP APIs.
Any ideas why this limitation for speech APIs?
And whats a good work around for this?

Currently, Google only allows audio files from either your local source or from Google's Cloud Storage. No reasonable explanation is given on the documentation about this.
Passing audio referenced by a URI
More typically, you will pass a uri parameter within the Speech request's audio field, pointing to an audio file (in binary format, not base64) located on Google Cloud Storage
I suggest you move your files to Google Cloud Storage. If you don't want to, there is a good workaround:
Use Google Cloud Speech API with streaming API. You are not required to store anything anywhere. Your speech application provides input from any microphone. And don't worry if you don't know handling inputs from microphone.
Google provides a sample code that does it all:
# [START speech_transcribe_streaming_mic]
from __future__ import division
import re
import sys
from google.cloud import speech
import pyaudio
from six.moves import queue
# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10) # 100ms
class MicrophoneStream(object):
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk):
self._rate = rate
self._chunk = chunk
# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True
def __enter__(self):
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b"".join(data)
def listen_print_loop(responses):
"""Iterates through server responses and prints them.
The responses passed is a generator that will block until a response
is provided by the server.
Each response may contain multiple results, and each result may contain
multiple alternatives; for details, see the documentation. Here we
print only the transcription for the top alternative of the top result.
In this case, responses are provided for interim results as well. If the
response is an interim one, print a line feed at the end of it, to allow
the next result to overwrite it, until the response is a final one. For the
final one, print a newline to preserve the finalized transcription.
"""
num_chars_printed = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
transcript = result.alternatives[0].transcript
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
#
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = " " * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + "\r")
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
print(transcript + overwrite_chars)
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if re.search(r"\b(exit|quit)\b", transcript, re.I):
print("Exiting..")
break
num_chars_printed = 0
def main():
language_code = "en-US" # a BCP-47 language tag
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=RATE,
language_code=language_code,
)
streaming_config = speech.StreamingRecognitionConfig(
config=config, interim_results=True
)
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses)
if __name__ == "__main__":
main()
# [END speech_transcribe_streaming_mic]
Dependencies are google-cloud-speech and pyaudio
For AWS S3, you can store your audio files there before/after you get the transcripts from Google Speech API.
Streaming is super fast as well.
And don't forget to include your credentials. You need to get authorized first by providing GOOGLE_APPLICATION_CREDENTIALS

How do I implement the Kubeflow "Run Paramters" with the TFX SDK specialized for GCP?

I am currently using Kubeflow as my orchestrator. The orchestrator is actually an instance of an AI platform pipeline hosted on GCP. How do I create run-time parameters using the Tensorflow Extended SDK? I suspect that this is the class that I should use, however the documentation is not very meaningful nor does it provide any examples. https://www.tensorflow.org/tfx/api_docs/python/tfx/orchestration/data_types/RuntimeParameter
Something like the picture below.

Say, for example, you want to add the module file location as a runtime parameter that is passed to the transform component in your TFX pipeline.
Start by setting up your setup_pipeline.py and defining the module file parameter:
# setup_pipeline.py
from typing import Text
from tfx.orchestration import data_types, pipeline
from tfx.orchestration.kubeflow import kubeflow_dag_runner
from tfx.components import Transform
_module_file_param = data_types.RuntimeParameter(
name='module-file',
default=
'/tfx-src/tfx/examples/iris/iris_utils_native_keras.py',
ptype=Text,
)
Next, define a function that specifies the components used in your pipeline and pass along the parameter.
def create_pipeline(..., module_file):
# setup components:
...
transform = Transform(
...
module_file=module_file
)
...
components = [..., transform, ...]
return pipeline.Pipeline(
...,
components=components
)
Finally, setup the Kubeflow DAG runner so that it passes the parameter along to the create_pipeline function. See here for a more complete example.
if __name__ == "__main__":
# instantiate a kfp_runner
...
kfp_runner = kubeflow_dag_runner.KubeflowDagRunner(
...
)
kfp_runner.run(
create_pipeline(..., module_file=_module_file_param
))
Then you can run python -m setup_pipeline which will produce the yaml file that specifies the pipeline config, which you can then upload to the Kubeflow GCP interface.

Getting Tensorflow To Run Faster

I have developed a machine learning python script (let's call it classify_obj written with python 3.6) that imports TensorFlow. It was developed initially for bulk analysis but now I find the need to run this script repeatedly on smaller datasets to cater for more real time usage. I am doing this on Linux RH7.
Process Flow:
Master tool (written in Java) call classify_obj with object input to categorize.
classify_obj generates the classification result as a csv (takes about 7-10s)
Master tool reads the result from #2
Master tool proceeds to do other logic
Repeat #1 with next object input
To breakdown the time taken, I switched off the main logic and just do the modules import without performing any other action. I found that the import takes about 4-5s out of the 7-10s run time on the small dataset. The classification takes about 2s. I am also looking at other ways to reduce the run time for other areas but the bulk seems to be from the import.
Import time: 4-6s
Classify time: 1s
Read, write and other logic time: 0.2s
I am thinking what options are there to reduce the import time?
One idea I had was to modify the classify_obj into a "stay alive" process. The master tool after completing all its activity will stop this process/service. The intent (not sure if this would be the case) is that all the required libraries are already loaded during the process start and when the master tool calls that process/service, it will only incur the classification time instead of needing to import the libraries repeated.
What do you think about this? Also how can I set this up on Linux RHEL 7.4? Some reference links would be greatly appreciated.
Other suggestion would be greatly appreciated.
Thanks and have a great day!

This is the solution I designed to achieve the above.
Reference: https://realpython.com/python-sockets/
I have to create 2 scripts.
1. client python script: Used to pass the raw data to be classified to the server python script using socket programming.
server python script: Loads the keras (tensorflow) lib and model at launch. Continues to stay alive until a 'stop' request from client (to exit the while loop). When the client script sends the data to the server script, server script will process the incoming data and return a ok/not ok output back to the client script.
In the end, the classification time is reduced to 0.1 - 0.3s.
Client Script
import socket
import argparse
from argparse import ArgumentParser
def main():
parser = ArgumentParser(description='XXXXX')
parser.add_argument('-i','--input', default='NA', help='Input txt file path')
parser.add_argument('-o','--output', default='NA', help='Output csv path with class')
parser.add_argument('-stop','--stop', default='no', help='Stop the server script')
args = parser.parse_args()
str = args.input + ',' + args.output + ',' + args.stop
HOST = '127.0.0.1' # The server's hostname or IP address
PORT = 65432 # The port used by the server
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
bytedata = str.encode()
sock.send(bytedata)
data = sock.recv(1024)
print('Received', data)
if __name__== "__main__":
main()
Server Script
def main():
HOST = '127.0.0.1' # Standard loopback interface address (localhost)
PORT = 65432 # Port to listen on (non-privileged ports are > 1023)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST,PORT))
sock.listen(5)
stop_process = 'no'
while (stop_process == 'no'):
# print('Waiting for connection')
conn, addr = sock.accept()
data = ''
try:
# print('Connected by', addr)
while True:
data = conn.recv(1024)
if data:
stop_process = process_input(data) # process_input function processes incoming data. If client sends 'yes' for the stop argument, the stop_process variable will be set to 'yes' by the function.
byte_reply = stop_process.encode()
conn.sendall(byte_reply) # send reply back to client
else:
break
conn.close()
# print('Closing connection',addr)
finally:
conn.close()
if __name__== "__main__":
main()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Read pivot data from a sheet - google-sheets-api

I would like to read pivot table with the google spreadsheet API with the python client to reproduce the pivot table on another sheet. I saw this in the documentation https://developers.google.com/sheets/api/samples/pivot-tables#read_pivot_table_data but I don't know how to access to this API.

Related

Can Jupyter labs or pandas serve static files?

Send a pickled dataframe as a file to FastAPI route

How to feed an audio file from S3 bucket directly to Google speech-to-text

How do I implement the Kubeflow "Run Paramters" with the TFX SDK specialized for GCP?

Getting Tensorflow To Run Faster

Categories

Resources