Is there any other way to insert data in BigQuery via API apart from via streaming data - google-bigquery

Is there any other way to insert data in BigQuery via API apart from via streaming data i.e. Table.insetAll
InsertAllResponse response = bigquery.insertAll(InsertAllRequest.newBuilder(tableId)
.addRow("rowId", rowContent)
.build())

As you can see in the docs, you also have 2 other possibilites:
Loading from Google Cloud Storage, BigTable, DataStore
Just run a job.insert method from the job resource and set as metadata the field configuration.load.sourceUri.
In the Python Client, this is done in the method LoadTableFromStorageJob.
You can therefore just send your files to GCS for instance and then have an API call to bring the files to BigQuery.
Media Upload
This is also a job.load operation but this time the HTTP request also carries binaries from a file in your machine. So you can pretty much send any file that you have in your disk with this request (given the format is accepted by BQ).
In Python, this is done in the resource table Table.upload_from_file.

Related

Download scheduled Webi report from File Repository Server

Having launched a scheduled report in SAP BO, is it possible to somehow download from the file repository server?
I am working with the Web Intelligence RESTful API. While it is possible to export a report synchronously using the GET /documents/<documentID>?<optional_parameters> request, I have not found any non-blocking asynchronous way except for using schedules.
Here's the intended workflow:
Create a scheduled report ("now") using POST /documents/<documentID>/schedules. Use a custom unique <ReportName>, store the scheduleID
Poll the schedule status using GET /documents/<documentID>/schedules/<scheduleID>
If the schedule status is 1 (success), find the file using a CMS query
Send a POST /cmsquery with content {query: "select * from ci_infoObjects where si_instance=1 and si_schedule_status in (1) and si_name = '<ReportName>'"}
From the result, read "SI_FILES": {"SI_FILE1": "<generatedName>.pdf","SI_VALUE1": 205168,"SI_NUM_FILES":1,"SI_PATH": "frs://Output/<Path>"}
Using the browser or the RESTful API, download the file
Is step 4 possible at all? What would be the URL?
The internal base path can be configured in the CMC, and the file location would be <Path>/<generatedName>.pdf. But how can this file be accessed programmatically OR using an URL without the need to log into the BO BI interface?
As a workaround, it is possible to use the openReport method, thereby passing the scheduleID (which is equal to the SI_ID from the infostore) as parameter.
GET /BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=<scheduleID>&sIDType=InfoObjectID&token=<token>
For file type PDF, the browser internal PDF viewer is displayed. For XLS, the download is immediately initiated.
Another option is to generate report directly into shared location for example to FTP server. Here is how:
In the "Folders" management area of the CMC, select an object.
Click Actions > Schedule, and access the "Destination" page.
If you are scheduling a Web Intelligence document, click Formats
and Destinations.
Select FTP Server as the destination.
For Web Intelligence document, select FTP Server under "Output Format Details"and then
click Destination Options and Settings.
Here is the adm guide where it is explained in more details (p. 858)
https://help.sap.com/doc/24e00820a014406495980dea5d768d52/XI.3.1/en-US/xi31_sp3_bip_admin_en.pdf
Or you can check also exact steps who already done this:
https://blogs.sap.com/2015/06/10/scheduling-webi-report-output-to-ftp-shared-file-location/
After that you can expose your FTP server to internet and construct an URL for download.
I tried below steps to retrieve the scheduled instance of a WEBI report in any format.
Get the list of all the schedule instance with their IDs.
Method: Get
Headers: X-SAP-LogonToken: <token>
API: <base_url>/raylight/v1/documents/<Report ID>/schedules
Select the instance ID you received from step 1 API's response which you want to download and pass the instance ID to below API.
Method: Get
Headers: X-SAP-LogonToken: <token>
API: <base_url>/infostore/folder/<Instance ID>/file
Save the response to .wid/.xlsx/.pdf format using Save response -> Save to a file option on the response body of step 2 API.
I tried it and this works :)

How to save PDF from HTML in Azure Functions

I'm developing an application which will have a web crawler for some sites.
The application will trigger a Azure Function by URL where the crawler will start the work.
So far, so good, but, we'll have to save some evidence that the crawler passed though the site. We're thinking of save a PDF file with the screen that the crawler passed, but, as Azure Functions doesn't have GDI+, it won't work with Selenium or PhantomJS.
One different approach can be download the HTML content and somehow save this HTML string (with all the JS and CSS dependency) into a PDF file.
i'd like of some library which can work with Azure Functions to make the screenshot of some URL (or HTML string) and save to PDF.
Thanks.
Unfortunately the App Service Sandbox whose rules Azure Functions live by is going to block most GDI+ API calls. We have had success with one third party library (ByteScout) for some PDF generation needs but I think in your case that type of operation is explicitly blocked. You can find out more details here https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#win32ksys-user32gdi32-restrictions
There is no workaround that I'm aware of because at the end of the day most of these solutions are relying on GDI+ in the underlying OS (directly or indirectly).
Your only real option is to offload that workload to virtual machine without the restriction on the API.That could take the form of a dedicated VM or something like an Azure Container Instance whose life-cycle you can manage more dynamically as needed. We do something similar today where we have a message queue being monitored on a VM and our azure function drops the request into the queue for processing.

How to set object in a Play framework session or how to retrieve the current size transferred in aws?

There is an upload object which will be returned by AWS on file upload. Where the upload object contains the byte transferred so far.
How to inject an object in play framework session ? So that it can be retrieved in the next ajax call to get the status of the file upload
Is there a way to get the byte transferred by AWS API by giving the file access key or file unique key in the next ajax call after file upload.
Thanks.
1) Play's session doesn't work this way : It's based ok cookies, and there is no storage out of the box (everything you set in a user's session end up in the a cookie), so you need to handle that yourself.
I would set a random UUID as session ID, and use a backend storage that storage a data blob based on a combination.
2) Sure, but you need to hande that yourself. AWS's API is async, so you get an ID on upload, and use that later on to see the status,

memory exception using wcf wshttpbinding

I have an application to upload files to a server. I am using nettcpbinding and wshttpbinding. When the files is larger than 200 MB, I get a memory exception. Working around this, I have seen people recommend streaming, and of course it works with nettcpbinding for large files (>1GB), but when using wshttpbinding, what would be the approach?? Should I change to basichttpbinding?? what?? Thanks.
I suggest you expose another end point just to upload such large size data. This can have a binding that supports streaming. In our previous project we needed to do file uploads to server as part of business process. We ended up creating 2 endpoints one just dedicated to file upload, and another for all other business functionality.
The streaming data service can be a generic service to stream any data to the server and maybe return a token for identifying the data on server.For subsequent requests this token can be passed along to manipulate the data on server.
If you don't want to (or cannot because of legit reasons) change the binding nor use streaming, what you can do is have some method with a signature along the lines of the following:
void UploadFile(string fileName, long offset, byte[] data)
Instead of sending the whole file, you send little packets, and tell where the data should be placed. You can add more data, of course, like the whole filesize, CRC of the file to know if the transfer was successful, etc.

I need Multi-Part DOWNLOADS from Amazon S3 for huge files

I know Amazon S3 added the multi-part upload for huge files. That's great. What I also need is a similar functionality on the client side for customers who get part way through downloading a gigabyte plus file and have errors.
I realize browsers have some level of retry and resume built in, but when you're talking about huge files I'd like to be able to pick up where they left off regardless of the type of error out.
Any ideas?
Thanks,
Brian
S3 supports the standard HTTP "Range" header if you want to build your own solution.
S3 Getting Objects
I use aria2c. For private content, you can use "GetPreSignedUrlRequest" to generate temporary private URLs that you can pass to aria2c
S3 has a feature called byte range fetches. It’s kind of the download compliment to multipart upload:
Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion. You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object. This helps you achieve higher aggregate throughput versus a single whole-object request. Fetching smaller ranges of a large object also allows your application to improve retry times when requests are interrupted. For more information, see Getting Objects.
Typical sizes for byte-range requests are 8 MB or 16 MB. If objects are PUT using a multipart upload, it’s a good practice to GET them in the same part sizes (or at least aligned to part boundaries) for best performance. GET requests can directly address individual parts; for example, GET ?partNumber=N.
Source: https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html
Just updating for current situation, S3 natively supports multipart GET as well as PUT. https://youtu.be/uXHw0Xae2ww?t=1459.
NOTE: For Ruby user only
Try aws-sdk gem from Ruby, and download
object = AWS::S3::Object.new(...)
object.download_file('path/to/file.rb')
Because it download a large file with multipart by default.
Files larger than 5MB are downloaded using multipart method
http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#download_file-instance_method