Is there a solution to uploading csv file to SQL - sql

Anytime I tried uploading CSV file to Google Cloud Bigquery, I kept getting an error response. I tried Google drive to upload but it won't show the preview button on the table. I need help on how I can resolve this please.

You may want to try Loading CSV data from Cloud Storage. I used the following python code and I was able to load csv file to Bigquery successfully:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("post_abbr", "STRING"),
],
skip_leading_rows=1,
# The source format defaults to CSV, so the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Loaded {} rows.".format(destination_table.num_rows))

Related

Loading dataset from external file system (https) directly into Spark dataframe

I'm trying to load a CSV dataset directly from an external file system, but I'm getting a 401 Unauthorized response whenever I call sparkContext.addFile(). Is there a way to add authorization headers to the request before adding the file? Or a better way to load a csv file as a dataframe?
This is what I'm trying now and it throws an exception when I make the addFile() call.
import org.apache.spark.SparkFiles
spark.sparkContext.addFile(urlPath)
val df = spark.read
.option("header", true)
.csv("file://"+SparkFiles.get(urlPath))

I want to download bigquery result from my own python code and something wrong with my authentication

I am trying to download bigQuery results from GCP and I was following the instruction on GCP documentation GCP authentication. It tells me to create a service account which I did, however, the output tells me that this service account has no permission to access the table
google.api_core.exceptions.Forbidden: 403 Access Denied: Table dbd-sdlc-prod:HKG_NORMALISED.HKG_NORMALISED: User does not have permission to query table dbd-sdlc-prod:HKG_NORMALISED.HKG_NORMALISED.
This reminds me that the table I wished to query was provided by a third party, they grant my account permission to access these data and the permission was only granted for my google account. I wish to find a way to authenticate it with my own account instead of a service account to download the query result, will it be possible and how can I do that exactly?
And following is the role for my test service account, I believe I have set them right as the top role "owner". Thanks in advance
from google.cloud import bigquery
bqclient = bigquery.Client()
# Download query results.
query_string = """
SELECT
Date_Time,
Price,
Volume,
Market_VWAP,
Qualifiers AS Qualifiers,
Ex_Cntrb_ID,
Qualifiers AS TradeCategory
FROM
`dbd-sdlc-prod.HKG_NORMALISED.HKG_NORMALISED`
WHERE
RIC = '1606.HK'
AND (Date_Time BETWEEN TIMESTAMP('2016-07-11 00:00:00.000000') AND
TIMESTAMP('2016-07-11 23:59:59.999999'))
AND Type="Trade"
AND Volume >0
AND Price >0
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())
If are you using Are using the Google Cloud SDK you can just run gcloud auth login and authenticate on your google account.
If not, you will have to Authenticate as an end user, here is the details and examples of how to do that.
On your code, you will have to add the code for authenticate your application. (Don't forget to do the other steps in the tutorial)
Your code will be like this:
from google.cloud import bigquery
from google_auth_oauthlib import flow
#--- Authentication
appflow = flow.InstalledAppFlow.from_client_secrets_file(
"client_secrets.json", scopes=["https://www.googleapis.com/auth/bigquery"]
)
if launch_browser:
appflow.run_local_server()
else:
appflow.run_console()
credentials = appflow.credentials
project = 'user-project-id'
#---
bqclient = bigquery.Client(project=project, credentials=credentials)
# Download query results.
query_string = """
SELECT
Date_Time,
Price,
Volume,
Market_VWAP,
Qualifiers AS Qualifiers,
Ex_Cntrb_ID,
Qualifiers AS TradeCategory
FROM
`dbd-sdlc-prod.HKG_NORMALISED.HKG_NORMALISED`
WHERE
RIC = '1606.HK'
AND (Date_Time BETWEEN TIMESTAMP('2016-07-11 00:00:00.000000') AND
TIMESTAMP('2016-07-11 23:59:59.999999'))
AND Type="Trade"
AND Volume >0
AND Price >0
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())

How to add metadata while using boto3 create_presigned_post?

Want to add custom metadata to a file that I upload using create_presigned_post from boto3. I am running the following code but am getting 403 response. The code below is borrowed from here. Am I doing something wrong?
def create_presigned_post(bucket_name, object_name,
fields=None, conditions=None, expiration=3600):
# Generate a presigned S3 POST URL
s3_client = boto3.client('s3')
try:
response = s3_client.generate_presigned_post(bucket_name,
object_name,
Fields=fields,
Conditions=conditions,
ExpiresIn=expiration)
except ClientError as e:
print(e)
return None
# The response contains the presigned URL and required fields
return response
# Generate a presigned S3 POST URL
object_name = 'test-file.txt'
response = create_presigned_post('temp', object_name, fields={'x-amz-meta-test_key': 'test_val'})
# Demonstrate how another Python program can use the presigned URL to upload a file
with open('test-file.txt', 'rb') as f:
files = {'file': (object_name, f)}
http_response = requests.post(response['url'], data=response['fields'], files=files)
# If successful, returns HTTP status code 204
print(f'File upload HTTP status code: {http_response.status_code}')
As per document, fields dictionary will not be automatically added to the conditions list. You must specify a condition for the element as well.
response = create_presigned_post(bucket_name, object_name, fields={'x-amz-meta-test_key': 'test-val'}, conditions=[{'x-amz-meta-test_key': 'test-val'}])
It should work :)

Streaming data to Bigquery using Appengine

I'm collecting data (deriving from cookies installed in some websites) in BigQuery using a streaming approach with a Python code in App Engine.
The function I use to save the data is the following:
def stream_data(data):
PROJECT_ID = "project_id"
DATASET_ID = "dataset_id"
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
table = "table_name"
body = {
"ignoreUnknownValues": True,
"kind": "bigquery#tableDataInsertAllRequest",
"rows": [
{
"json": data,
},
]
}
bigquery = discovery.build('bigquery', 'v2', http=http)
bigquery.tabledata().insertAll(projectId=PROJECT_ID, datasetId=DATASET_ID, tableId=table, body=body).execute()
I have deployed the solution on two different App Engine instances and I get different result. My question is: how is it possible?
On the other hand comparing the results with Google Analytics metrics I also notice that not all the data are stored in BigQuery. Do you have any idea about this problem?
In your code there isn't a query exception handling during the insertAll operation. If BigQuery can't write data, you don't catch the exception.
In your last line try this code:
bQreturn = bigquery.tabledata().insertAll(projectId=PROJECT_ID, datasetId=DATASET_ID, tableId=table, body=body).execute()
logging.debug(bQreturn)
In this way, on Google Cloud Platform log, you can easily find a possible error in the insertAll operation.
When using insertAll() method you have to keep this in mind:
Data is streamed temporarily in the streaming buffer which has
different availability characteristics than managed storage. Certain
operations in BigQuery do not interact with the streaming buffer, such
as table copy jobs and API methods like tabledata.list {1}
If you are using the table preview, streaming buffered entries may not be visible.
Doing SELECT COUNT(*) from your table should return your total number of entries.
{1}: https://cloud.google.com/bigquery/troubleshooting-errors#missingunavailable-data

How do I upload a video to Youtube directly from my server?

I'm setting up a (headless) web server that lets people build their own custom time-lapse movies.
Several people want to upload the time-lapse videos they make to YouTube.
Rather than download the video to that person's laptop,
and the that person manually uploads it to YouTube,
is there a way I can write some software on my web server to take that video file on my web server and upload it directly to that user's account on YouTube?
I've been told that asking my users for their YouTube handle and password is the Wrong Thing To Do, and I should be using the YouTube V3 API with Oauth.
I tried the techniques listed at
" I want to upload a video from my web page to youtube by using javascript youtube API ",
which seems to "work", but every time I had to download the video to that person's laptop and then uploading from the laptop to YouTube. Is there a way to tweak that system to upload directly from my server to YouTube?
I found some python code that (after I set up my client_secrets.json) lets me upload videos directly from my server directly to someone's YouTube account after that person did the Oauth authentication.
But the first time some new person tries to upload a video to some new YouTube account that my server has never dealt with before, it either
(a) pops open a web browser on my server, and then if I VNC to the server and type in a YouTube handle and password into that web browser, it gets authenticated -- but I'd rather not do that for every user.
(b) with the "--noauth_local_webserver" option, spits out a URL on the command line and waits. Then if I manually copy that URL and paste it into a web browser, log in to YouTube, copy-and-paste the token back into this application that is still waiting for input on the command line, that person gets authenticated. But I'd rather not do that for every user. I guess that would be OK if I could capture that URL in my cgi-bin script and stick it in a web page, and then later somehow get the authentication response and cram it back into this program, but how? I don't even see that print statement or the raw_input statement in this code.
#!/usr/bin/python
# https://developers.google.com/youtube/v3/code_samples/python#upload_a_video
# which is identical to the code sample at
# https://developers.google.com/youtube/v3/docs/videos/insert
import httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, httplib.NotConnected,
httplib.IncompleteRead, httplib.ImproperConnectionState,
httplib.CannotSendRequest, httplib.CannotSendHeader,
httplib.ResponseNotReady, httplib.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options.title,
description=options.description,
tags=tags,
categoryId=options.category
),
status=dict(
privacyStatus=options.privacyStatus
)
)
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
#
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
)
resumable_upload(insert_request)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
try:
print "Uploading file..."
status, response = insert_request.next_chunk()
if 'id' in response:
print "Video id '%s' was successfully uploaded." % response['id']
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError, e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS, e:
error = "A retriable error occurred: %s" % e
if error is not None:
print error
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print "Sleeping %f seconds and then retrying..." % sleep_seconds
time.sleep(sleep_seconds)
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
"See https://developers.google.com/youtube/v3/docs/videoCategories/list")
argparser.add_argument("--keywords", help="Video keywords, comma separated",
default="")
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
initialize_upload(youtube, args)
except HttpError, e:
print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
use "client_secrets.json"
configure credentials to generate it
https://console.developers.google.com/apis/credentials
{
"web":
{
"client_id":"xxxxxxxxxxxxxx",
"project_id":"xxxxxxxxxxxxxx",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_secret":"xxxxxxxxxxxxxxxx",
"redirect_uris":["http://localhost:8090/","http://localhost:8090/Callback"],
"javascript_origins":["http://localhost"]
}
}
Very useful step-by-step guide about how to get access and fresh tokens and save them for future use using YouTube OAuth API v3. PHP server-side YouTube V3 OAuth API video upload guide.
https://www.domsammut.com/code/php-server-side-youtube-v3-oauth-api-video-upload-guide