BigQuery Backend Errors during upload operation - google-bigquery

I want to know what are the possible errors that can arose from Big Query server side during upload mechanism, though the .CSV file that i'm uploading contains perfect data. Can you list out those errors?
Thanks.

Some of the common errors are:
Files must be encoded in UTF-8 format.
Source data must be properly
escaped within standard guidelines for CSV and JSON.
The structure of
records and the data within of must match the schema provided.
Individual files must be under the size limits listed on our
quota/limits page.
More information about BigQuery source data formats.
Check out our Data Loading cookbook for additional tips.

Related

Why is matillion not loading data from S3?

I have a simple S3 load with all the correct information. There are no validation errors but and the package executes without a problem. It's just that there is no data in the table. Any tips from someone that is knowledgeable about Matillion?
There are a number of reasons why Matillion might not appear to load any data in an S3 Load.
Firstly, I'd check that the pattern matches the file names in the S3 location, which is a regular expression match.
I believe that also includes the path which you may have included in the location parameter, so it may be worth modifying your pattern to look something like .*\/FilePrefix.* or even just .* and then selecting the actual file in the location parameter
Secondly, if the files were last modified more than 64 days ago, or they have already been loaded in to the table previously, Snowflake won't load them by default, which you can get around by turning the Force Load parameter On.

How to add multiple images to a single record in SQL Server using asp.net

I am working on a site and one of its requirement is to add multiple images to a single column of database along with other details.
This is the screenshot of my webform:
and this is my database table:
But I am confused about adding multiple images in a specific field of one row. Is it possible that a single record can have a field with multiple images stored in it? Please suggest a good solution to this problem.
Thank you
You can store multiple files together in a single blob or stream by using MIME Multipart formatting. See this RFC: https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
Note that using a separate table with one record per image/file is a better overall solution because there is a large overhead in extracting files from a Multipart blob, making it slow and inefficient... so don't store files larger than a few kilobytes.

Querying compressed files using BigQuery federated source

According to the BigQuery federated source documentation:
[...]or are compressed must be less than 1 GB each.
This would imply that compressed files are supported types for federated sources in BigQuery.
However, I get the following error when trying to query a gz file in GCS:
I tested with an uncompressed file and it works fine. Are compressed files supported as federated sources in BigQuery, or have I misinterpreted the documentation?
Compression mode defaults to NONE and needs to be explicitly specified in the external table definition.
At the time of the question, this couldn't be done through the UI. This is now fixed and compressed data should be automatically detected.
For more background information, see:
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.query
The interesting parameter is "configuration.query.tableDefinitions.[key].compression".

Enrich CSV with metadata from database

I've been looking around for a lightweight, scaleable solution to enrich a CSV file with additional metadata from a database. Each line in the CSV represents a data item and the columns the metadata belonging to that item.
Basically I have a CSV extract and I need to add additional metadata from a database. The metadata can be accessed via ODBC or REST API call.
I have a number of options in my head but I'm looking for other ideas. My options are as follows:
Import the CSV into a database table, apply the additional metadata with sql UPDATE statements by finding the necessary metadata with SELECT statements, and then export the data back into CSV format. For this solution I was thinking to use an ETL tool which may be a bit heavyweight to tackle this problem.
I also thought about a NodeJS based solution where I read the CSV in, call web service to get the metadata and write back the data into the CSV file. The CSV can be however quite large with potentially tens of thousands of rows so this could be heavy on memory or in case of line-by-line processing not very performant.
If you have a better solution in mind, please post. Many thanks.
I think you've come up with a couple of pretty good ideas here already.
Running with your first suggestion using an ETL tool to enrich your CSV files, you should check out https://github.com/streamsets/datacollector
It's a continuous ingestion approach, so you could even monitor a directory of CSV files to load as you get them. While there's no specific functionality yet for doing lookups in a database, its certainly possible in a number of ways (including writing your own custom logic in Java, or a script in python or JavaScript).
*Full disclosure I work on this project.

Failed to read netcdf file. Help needed

I have try my best to read this file using few softwares, Idrisi, ArcMap, Envi but failed. The only software that can read this data is Panoply at http://www.giss.nasa.gov/tools/panoply/
To my surprised, Panoply recognised that data as HDF version 5 rather than netcdf. I can view my data but could not extract specific 'layer' in the data. I then need to open the data in either ArcMap or Idrisi Taiga.
Anybody willing to help? The data can be access at https://docs.google.com/file/d/0BzzExM8ZYZwxdmI4bk5rSUw0VVE/edit?usp=sharing
It looks like the issue might be that the file is in netCDF-4 format (which is built on top of HDF5 - thus panoply's ID). In general, you cannot convert netCDF-4 into netCDF-3 unless some very specific constraints are met, as their data models are different (see http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#fv14 for more info). Luckily, your file is pretty simple and can be put into the netCDF-3 format using the following command:
nccopy -k classic tos_Omon_modmean_rcp26_00.nc tos_Omon_modmean_rcp26_00-nc3.nc
The new file will be in the netCDF-3 classic format, which will likely work with the tools you are using. If you need me to, I can post the converted file for you to download (if you do not have netCDF installed, and thus access to nccopy, on your system).
Cheers!
Sean