is there a way to create a 'Generic' CSV file format with FileHelpers - filehelpers

I would like to use FileHelpers to read a CSV file with an unknown file layout (except that it is comma delimited). I would basically like to have a string[] for row where each item is a field. Is this possible ?

Found it ! Use a smart format detector : https://www.filehelpers.net/example/Advanced/SmartFormatDetector/

Related

Does pig support load with no delimiter?

I'd like to load a lot of small files from HDFS with Pig and process them as tuples (filename, filecontent).
a=LOAD 'mydir' USING PigStorage('','-tagPath') AS (filepath:chararray, filecontents:chararray);
However it seems like I cannot omit specifying the delimiter. Is there some sort of a "NULL" in Pig or is there any other way to make sure the content of the file will not be split?
You will have to write your own custom loader by extending LoadFunc.
Short answer to your question is no.In order to make sure the content is not split,use a delimiter that would not exist in the content.In that way, the whole content would be loaded to the field filecontents:chararray.So assuming,your input files do not have a special character '~'
a=LOAD 'mydir' USING PigStorage('~','-tagPath') AS (filepath:chararray, filecontents:chararray);

PIG LOAD filename

I am just trying to load an unstructured input file and add the filename. So what I want to get is two fields :
filename:chararray, inputrow:chararray.
I can load the filename if I have a field delimiter using pigstorage(';','-tagfile') but I do not want to delimit fields at this point I just want the string and the filename. How can I do this ?
B
The way to load in files without applying a delimiter, is to choose a delimiter that does not (cannot) occur in the file.
For example, if your file is separated by ; and cannot contain tabs \t you could do:
pigstorage('\t','-tagfile')

Import flat file containing commas/quotes into SAP BODS

Hi I have a row like following in .csv file
12346,abcded,ssadsadc,2013.04.04 08.42.31,8,"I would like to use an
existing project as a template for a new project for another Report
Suite but it just overwrites the existing project rather than creates
new one even when I use the ""Save As"" function.",Analyst,,5,"Hotel
Room,Literature,Open/ Create",,
the text string has " and , as part of the string. Hence I am not able to use " as text delimiter in SAP BODS file format.
Could somebody help me on this?
Use a delimiter that is not expected to be in your data (ex. ~ or | ) or a string of multiple characters (ex. $^$ )

Cannot upload CSV that starts with an integer

I'm stuck with what seems like a weird BigQuery bug : I cannot upload a CSV file that starts (first line, first column) by an integer.
Here's my schema : COL1:INTEGER,COL2:INTEGER,COL3:STRING
Here's my csv file content :
100,4,XXX
100,4,XXX
If I put the STRING column as first column, the upload is OK.
If I add a header and tell BigQuery to skip it during the import, the upload is ok too.
But with the CSV and schema above, BigQuery always complains : Line:1 / Field:1, Value cannot be converted to expected type.
Anyone knows what the problem is ?
Thank you in advance,
David
I could not reproduce this problem--I copied and pasted the content into a file and uploaded it with no problems.
Perhaps the uploaded file format is corrupted somehow? If there are extra bytes at the beginning of the file, those would be ignored in a header row but might result in this error is the first value of the first field is expected to be an integer. I'd recommend examining the actual binary data in the file to make sure there's nothing funny going on.
Also, are you doing this import via web UI, command-line tool, or API? Have you tried one of the other methods?

How to get data from a .rtf file or excel file into database(sqlite) in iphone sdk?

I had lots of data in a .rtf file(having usernames and passwords).How can I fetch that data into a table. I'm using sqlite3.
I had created a "userDatabase.sql" in that I had created a table "usersList" having fields "username","password". I want to get the list of data in the "list.rtf" file in to my table "usersList". Please help me .
Thanks in advance.
Praveena.
I would write a little parser. Re-save the .rtf as a txt-file and assume it look like this:
user1:pass1
user2:pass2
user5:pass5
Now do this (in your code):
open the .txt file (NSString -stringWithContentsOfFile:usedEncoding:error:)
read line by line
for each line, fetch user and password (NSArray -componentsSeparatedByString)
store user/password into your DB
Best,
Christian
Edit: for parsing excel-sheets I recommend export as CSV file and then do the same
Parsing RTF files is mostly trivial. They're actually text, not binary (like doc pdf etc).
Last I used it, I remember the file format wasn't too difficult either.
Example:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22 Username Password\par
Username2 Password2\par
UsernameN PasswordN\par
}
Do a regular expression match to get the last { ... } part. By sure to match { not \{.
Next, parse the text as you want, but keep in mind that:
everything starting with a \ is escaped, I would write a little function to unescape the text
the special identifier \par is for a new line
there are other special identifiers, such as \b which toggles bolding text
the color change identifier, \cfN changes the text color according to the color table defined in the file header. You would want to ignore this identifier since we're talking about plain text.