Error: Bad character (ASCII 0) encountered in BigQuery - google-bigquery

I'm trying to upload a CSV file to Bigquery, but I get the following error:
Error: Bad character (ASCII 0) encountered (bigquery)
I've tried the following, but none of this are working:
a) Open the file as save it as "UTF-8" in notepad.
b) Open the file in notepad++ and use the option "Search characters by type" - Non ASCII. Didn't find any character
c) Use notepad++ with the followings regular expressions, didn't find any character:
[^\x00-\x7F] and [^\x1F-\x7F].
d) Use the following command:
gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file
Didn't work: "tr" is not recognized as a command (I'm using windows 10 and I don't have access to a VM of google.)
d) Open the file and deleted the first row, then it worked. But i've lost a line of data, and I have thousand of files.
The trouble is that I need to automate the "cleaning" of this files.
How can I clean this file in Windows, Any idea about what else can this "ASCII 0" character or how to get rid of it?
Thanks!!

You can use a GitBash on your machine? If you can, I would recommend you using this.
cat -v filename
For example, I have created a file called TESTINGCAT.txt with three lines, which visually has 7 empty spaces. However, if you count spaces with notepad++ you will find just 5 because 2 of those empty spaces are "non-breaking spaces" generated by typing ALT + 0160. Below you can see that I have on Notepad++
However, if I use cat -v TESTINGCAT.txt I can see M-BM- characters, which according to this are non-breaking spaces
Therefore, if you do not have access to a Linux machine, you should try using GitBash to see the hidden characters in your file.

Related

Rename ttf/woff/woff2 file to PostScript Font Name with Script

I am a typographer working with many fonts that have incorrect or incomplete filenames. I am on a Mac and have been using Hazel, AppleScript, and Automator workflows, attempting to automate renaming these files*. I require a script to replace the existing filename of ttf, woff, or woff2 files in Finder with the font's postscriptName. I know of tools (fc-scan/fontconfig, TTX, etc) which can retrieve the PostScript name-values I require, but lack the programming knowhow to code a script for my purposes. I've only managed to setup a watched directory that can run a script when any files matching certain parameters are added.
*To clarify, I am talking about changing the filename only, not the actual names stored within the font. Also I am open to a script of any compatible language or workflow of scripts if possible, e.g. this post references embedding AppleScript within Shell scripts via osascript.
StackExchange Posts I've Consulted:
How to get Fontname from OTF or TTF File?
How to get PostScript name of TTF font in OS X?
How to Change Name of Font?
Automate Renaming Files in macOS
Others:
https://github.com/dtinth/JXA-Cookbook/wiki/Using-JavaScript-for-Automation
https://github.com/fonttools/fonttools
https://github.com/devongovett/fontkit
https://www.npmjs.com/package/rename-js
https://opentype.js.org/font-inspector.html
http://www.fontgeek.net/blog/?p=343
https://www.lantean.co/osx-renaming-fonts-for-free
Edit: Added the following by request.
1) Screenshot of a somewhat typical webfont, illustrating how the form fields for font family and style names are often incomplete, blank, or contain illegal characters.
2) The woff file depicted (also, as base64).
Thank you all in advance!
Since you mentioned Automator in your question, I thought I'd try and solve this while using that to rename the file, along with standard Mac bash to get the font name. Hopefully, it beats learning a whole programming language.
I don't know what your workflow is so I'll leave any deviations to you but here is a method to select a font file and from Services, rename the file to the font's postscript name… based on Apple's metadata, specifically "com_apple_ats_name_postscript". This is one of the pieces of data retrieved using 'mdls' from the Terminal on the font file. To focus on the postscript name, grep the output for name_postscript. For simplicity here, I'll exclude the path to the selected file.
Font Name Aquisition
So… running this command…
mdls GenBkBasBI.ttf | grep -A1 name_postscript
… generates this output, which contains FontBook's Postscript name. The 'A1' in grep returns the found line and the first line after, which is the one containing the actual font name.
com_apple_ats_name_postscript = (
"GentiumBookBasic-BoldItalic"
Clean this up with some more bash (tr, tail)…
tr -d \ | tail -n 1 | tr -d \"
In order, these strip spaces, all lines excepting the last, and quotation marks. So for the first 'tr' instance, there is an extra space after the backslash.
In a single line, it looks like this…
mdls GenBkBasBI.ttf | grep -A1 name_postscript | tr -d \ | tail -n 1 | tr -d \"
…and produces this…
GentiumBookBasic-BoldItalic
Now, here is the workflow that includes the above bash command. I got the idea for variable usage from the answer to this question…
Apple Automator “New PDF from Images” maintaining same filename
Automator Workflow
Automator Workflow screenshot
At the top; Service receives selected 'files or folders' in 'Finder'.
Get Selected Finder Items
This (or Get Specified…) is there to allow testing. It is obviated by using this as a Service.
Set Value of Variable (File)
This is to remember which file you want to rename
Run Shell Script
This is where we use the bash stuff. The $f is the selected/specified file. I'm running 'zsh' for whatever reason. You can set it to whatever you're running, presumably 'bash'.
Set Value of Variable (Text)
Assign the bash output to a variable. This will be used by the last action for the new filename.
Get Value of Variable (File)
Recall the specified/selected file to rename.
Rename Finder Items: Name Single Item
I have it set to 'Basename only' so it will leave the extension alone. Enter the 'Text' variable from action 4 in here.

How to extract the strings in double quotes for localization

I'm trying to extract the strings for localization. There are so many files where some of the strings are tagged as NSLocalizedStrings, and some of them are not.
I'm able to grab the NSLocalizedStrings using ibtool and genstrings, but I'm unable to extract the plain strings without NSLocalizedString.
I'm not good at regex, but I came up with this "[^(]#\""
and with the help of grep:
grep -i -r -I "[^(]#\"" * > out.txt
It worked, and all the strings were actually grabbed into a txt file, but the problem is ,
if in my code there is a line:
..... initWithTitle:#"New Sketch".....
I only expect the grep to grab the #"New Sketch" part, but it grabs the whole line.
So in the out.txt file, I see initWithTitle:#"New Sketch", along with some unwanted lines.
How can I write the regex to grab only the strings in double quotes ?
I tried the grep command with the regex mentioned in here, but it gave me syntax error .
For ex, I tried:
grep -i -r -I (["'])(?:(?=(\\?))\2.)*?\1 * > out.txt
and it gave me
-bash: syntax error near unexpected token `('
In xcode, open your project. Go to Editor->Export For Localization...It will create the folder of files. Everything that was marked for localization will be extracted there. No need to parse it yourself. It will be in the XML format.
If you wanna go hard way, you can then parse those files the way you're trying to do it now ?! It will also have Storyboard strings there too, btw.

In texinfo, how to specify a bash single quote?

I am writing a package using the GNU build system. The documentation hence is in the texinfo format. As a result, executing make converts the texinfo file into the info format, and executing make pdf automatically produces a pdf file.
In the texinfo file, I have something like this:
#verbatim
awk '{...}' data.txt
#end verbatim
However, in the pdf, the "basic" single quotes (U+0027) in the awk command above are transformed into "curvy" single quotes (U+2019) so that, if one does a copy-paste of the command from the pdf into a terminal, bash complains ("syntax error"). This forces the user to edit the command he just copy-pasted. Same problem occurs if I replace #verbatim by #example. I searched the texinfo manual but couldn't find a way to specify apostrophes. I am using texinfo version 5.2.
Karl Berry (via the bug-texinfo mailing list) told me to add 2 lines to my texi file (more info):
#codequoteundirected on
#codequotebacktick on
as well as add the latest version of texinfo.tex to my package.

Error from gawk about backslash line ending when running configure script

I am getting an error when compiling sqlcipher. I'm unable to run ./configure
Could someone help fix the error?
gawk: /mkopcodeh.awk:101: backslash not last character on line.
If anyone else runs into this: the awk error seems to be caused by different line endings. If you check out the sources on Windows, take care to not have the line endings changed to Windows CR/LF. Instead, stick to LF (Unix/Mac format) only. You can also change the encoding of the offending file with any decent editor later on (I just did with Notepad++ using the menu item Edit->EOL conversion).

How to print sqlite to file in utf-8?

I've opened sqlite3.exe in windows console and made a database with special characters.
.dump showed me the sql query with special characters.
Then I changed output to file: .output file.sql
And executed the .dump command.
The special characters were missing when I imported the database using .read file.sql.
I used pragma encoding="UTF-8"; but it didn't change anything (I don't know if it should).
The Windows console makes it hard to use UTF-8 correctly, and the Microsoft compiler has lots of bugs that make it impossible to use UTF-8 with portable I/O functions.
If you have entered data in the Windows console, those strings are not valid UTF-8.
If a non-ASCII string is output with correct characters in the Windows console, it is not valid UTF-8.
To ensure that your data is valid UTF-8, you have to go through files.
Alternatively, use any SQLite shell that does not use the console (such as the SQLite Manager Firefox extension).
This work fine for CP852, but could be used for any codepage known by iconv.
chcp 852 >NUL
echo INSERT into NAMES (name,timestamp) VALUES ('ěščřžýáíé','1429001515'); | iconv.exe -f cp852 -t utf-8 | ..\utilities\sqlite3.exe test.db
Windows can handle unicode internaly, but if you print it on console (by 'echo' command for example) than character mismatch. Using on-the-fly reencoding solve this problem.