Loading data into Google Big Query without binary format

Loading data into Google Big Query without binary format - google-bigquery

Can you load csv file without opening in binary format
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)

It looks like Google bigquery will not handle special characters when loading data from csv local storage that contains special characters

Related

How to save a tricky dataset

I have a dataset (DataFrame) which contains numbers and lists, when I save it in CSV format and then read it, the list cells are converted to strings.
Before saving : df.to_csv("data.csv")
After reading : pd.read_csv("data.csv")
After reading : pd.read_csv("data.csv", converters={"C2_ACP": lambda x: x.strip("[]").split(",")})
df.to_csv("data.csv", index=False, sep=",")
I need to have to retrive the original dataset when I read the file.

Have you tried to change the sep argument in pd.to_csv??. Maybe the standard sep=',' enters in conflict with your list separator, that is also a comma

Big Query do not accept EMOJI

I have emojis in this format - \U0001f924 why BigQuery(Google Data studio) does not display them, even if I saw examples that this format working for other people?
SAMPLE: - Second Emoji in this format \u2614
Ref: Emoji crashed when uploading to Big Query
Based on this article it should work: Google \Uhhhhhhhh Format
UPDATE 1.0:
If I use "" then emojis in this format \U2714 displays emoji, this one \U0001f680 still the same as text U0001f680
If I use '' then emojis in this format \U2714 as well as \U0001f680 display only value U2714 and U0001f680

The emoji on the question works for me with SELECT "\U0001f680":
I stored the results in a table so you can find it:
https://bigquery.cloud.google.com/table/fh-bigquery:public_dump.one_emoji?tab=preview
If you ask BigQuery to export this table to a GCS file, and bring this file into your computer, it will continue to work:
You can download this json file and load it back into BigQuery:
https://drive.google.com/file/d/1hXASPN0J4bP0PVk20x7x6HfkAFDAD4vq/view?usp=sharing
Let's load it into BigQuery:
Everything works fine:
So the problem is in the files you are loading to BigQuery - which are not encoding emoji's appropriately.
What I don't know is how you are generating these files, nor how to fix that process. But here I have proven that for files that correctly encode emojis - you can load them into BigQuery and emojis will be preserved.
🙃

Converting a massive JSON file to CSV

I have a JSON file which is 48MB (collection of tweets I data mined). I need to convert the JSON file to CSV so I can import it into a SQL database and cleanse it.
I've tried every JSON to CSV converter but they all come back with the same result of "file exceeds limits" / the file is too large. Is there a good method of converting such a massive JSON file to CSV within a short period of time?
Thank you!

A 48mb json file is pretty small. You should be able to load the data into memory using something like this
import json
with open('data.json') as data_file:
data = json.load(data_file)
Dependending on how you wrote to the json file, data may be a list, which contains many dictionaries. Try running:
type(data)
If the type is a list, then iterate over each element and inspect it. For instance:
for row in data:
print(type(row))
# print(row.keys())
If row is a dict instance, then inspect the keys and within the loop, start building up what each row of the CSV should contain, then you can either use pandas, the csv module or just open a file and write line by line with commas yourself.
So maybe something like:
import json
with open('data.json') as data_file:
data = json.load(data_file)
with open('some_file.txt', 'w') as f:
for row in data:
user = row['username']
text = row['tweet_text']
created = row['timestamp']
joined = ",".join([user, text, created])
f.write(joined)
You may still run into issues with unicode characters, commas within your data, etc...but this is a general guide.

'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery

The culprit line is as follows. It should be composed of 14 columns, with one of the column, starting with 'Hi I'm Niger...', covering multiple line with line feeds.
17935,9a7105ee-30c8-4a6d-9374-10875b7d6288.jpg,"""top""=>""0"", ""left""=>""0"", ""width""=>""180"", ""height""=>""180""",,"",2015-07-26 19:33:57.292058,2015-07-26 20:25:30.068887,fe43876f-1b2c-464a-aa20-bf335ed3ff62,c68c8c70-bc2b-11e4-90a1-22000b21105f,{},2e790350-15fb-0133-2cb8-22000ba51078,"Hi I'm Nigerian so wish to study in sweden.
so I'm Undergraduate student I want study Engineering.
Thanks.","",{}
When loading this csv data into BigQuery via command bq load --replace --source_format=CSV -F"," ..., Error complains. Could anyone give me an solution to this BigQuery Load Data command?
- File: 0 / Line:17192 / Field:12: Missing close double quote (")
character: field starts with: <Hi I'm N>
- File: 0 / Line:17193: Too few columns: expected 14 column(s) but
got 1 column(s). For additional help: http://goo.gl/RWuPQ
- File: 0 / Line:17194: Too few columns: expected 14 column(s) but
got 3 column(s). For additional help: http://goo.gl/RWuPQ

If you are loading CSV with embedded newlines, you need to specify allowQuotedNewlines.
https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowQuotedNewlines
The BigQuery default is to assume that CSV data does not contain newlines. This allows for a much higher parsing throughput when dealing with large data files since the input files can be split at arbitrary newlines. If your data contains newlines within strings, each file needs to be parsed linearly by a single machine.

Make sure you include this line before loading data to BigQuery: 'job_config.allow_quoted_newlines = True'
job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True

If you trying to load a CSV file to a table from the BigQuery google console make sure you select the Advanced option -> Quoted new lines.

Problem saving uploaded files in Python3

i control the problem of the data what is uploaded by the POST method, in the web.
If the file is a text theres no problem but the trouble comes when it's an encoded file, as a Picture or other. What the when the system insert the data into the file.
Well it doesn 't encoded in the right way. I will put all the code, from the area whats take the environ['wsgi.input'] to the area thats save the file:
# Here the data from the environ['wsgi.input'],
# first i convert the byte into a string delete the first
# field that represent the b and after i strip the single quotes
tmpData = str(rawData)[1:].strip("' '")
dat = tmpData.split('\\r')#Then i split all the data in the '\\r'
s = open('/home/hidura/test.png', 'w')#I open the test.png file.
for cont in range(5,150):#Now beging in the 5th position to the 150th position
s.write(dat[cont])#Insert the piece of the data in the file.
s.close()#Then closed.
Where is the mistake?
Thankyou in advance.

Why do you convert the binary data to a string? A png file is binary data. Just write the binary data to the file. You need to open the file in binary mode as well.
s = open('/home/hidura/test.png', 'wb')
s.write(data)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Loading data into Google Big Query without binary format - google-bigquery

Can you load csv file without opening in binary format with open(file_path, "rb") as source_file: job = client.load_table_from_file(source_file, table_id, job_config=job_config)

It looks like Google bigquery will not handle special characters when loading data from csv local storage that contains special characters

Related

How to save a tricky dataset

Big Query do not accept EMOJI

Converting a massive JSON file to CSV

'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery

Problem saving uploaded files in Python3

Categories

Resources