creating data base tables from text file data - sql

i have text files generated from another software about genes in human body, i need to insert them to a table and make the table as i need , i have 15 different text files that goes in to one table, as 15 different columns.
GFER = 3.58982863
BPIL1 = 3.58982863
BTBD1 = 4.51464898
BTBD2 = 4.40934218
RPLP1 = 3.57462687
PDHA1 = 4.19320066
LRRC7 = 4.50967385
HIGD1A = 4.46876727
above shown is the data in the text file, gene name and the distances. i need to include this in a table, gene name in a separate column and distance in a separate column, this text file have 3500 lines and i have 14 text files of data, how can i enter this data to a table without manually inserting?any automated software or tool you know? please help me out!
regards,
Rangana

The mysqlimport command ought to load it directly, http://dev.mysql.com/doc/refman/5.0/en/mysqlimport.html if you use a little trick to tell it that the = sign is the field delimiter.
shell> mysqlimport blah blah --fields-terminated-by==
If that does not work, write yourself a little routine to read the file, split on = sign, and replace it with a comma or something closer to what mysqlimport wants to see as a default.

You need an import wizard, can't say I've personally used one with mysql (but have with other DBMS) a quick google shows this might be what you need. I have a feeling phpMyAdmin used to have a feature that did this.

First create the table as something like:
mysql> create table gene (name varchar(10), distance double);
and then import a file:
mysql> load data infile '/tmp/gene.txt' into table gene columns terminated by ' = ';
The file needs to be in a place that is accessible to the user under which the mysql executable is running.
You can also use mysqlimport from outside the mysql shell. It connects to the server and issues the equivalent load data infile command.
I tested the above with your sample data and it worked.

i have 15 different text files that goes in to one table, as 15 different columns.
Do you mean 30 columns? 2 columns loaded from each file?
You may have to use = (with spaces on both sides as the delimiter). And as Ken said, if that doesn't do it, search and replace " = " to just a comma ",".
If you have SSIS this can be done fairly quick. Set up the 15 input files and map each file to a pair of columns, like:
File1 ... map to ... Column1 & Column2
File2 ... map to ... Column3 & Column4
etc
Or you can combine the 15 files (can be done easily using Excel) into 1 file with 30 columns and load it in.

i have done it, it may seems odd, but i'm adding here for some one to learn if it is valuable, i have opened those data files using open office spreadsheet, open office has this amazing features of separating the data file in to different columns. so i used them and separated my data files columns and saved it as a excel file(.xls) , then using the "sqlmaestro" as suggested by m.edmondson, using that software's importing data as an excel file feature i was able to achieve my task.
thank you all for your valuable answers, they surely add new things to my knowledge! thank you all once again!

Related

Import csv file into SQL Server temptable, without specifying columns

I am currently trying to manipulate a fairly poor csv file into an SQL database. I would prefer to not use another program to amend the csv file. it's an output from a locked program, and the business have a general understanding of SQL, which is why I prefer an SQL solution.
What I want to do is take the 60 columns from this csv and import them all into a table, without specifying the column names. I have seen this BULK INSERT which allows me to put them into a table, but only one that's already created.
My actual end goal is to transform the data in the CSV so I have a table with just five columns, but step one is to get all of the data in, so that I can then grab the information that's relevant.
MY CSV file has a header row, but 59 of the 60 headers are blank, so there are no actual column headers in the sheet.
Example
SHEET,,,,
Data1, Data2, Data3, Data4, Data5,
Even better, if i could grab the columns I need (which in my case are columns 7, 31, 55 and 59) that would be even better.
I've scoured the internet and can't find the exact solution i'm looking for.
I've also tried to use SSIS for this, but honestly I find it unreliable sometimes and slight changes seem to break everything, so I gave up!! (My idea was flat file import > Derived Column > data conversion > OLE DB destination, but I got errors with the flat file import, which I can't seem to solve.
I would prefer to do it all through one SQL script if I can anyway, but any suggestions of the best way to achieve this are welcome.
Thank you,
Craig

how to properly load a gcs file into gbq with double-pipe delimiters

my existing query:
bq load --field_delimiter="||" --skip_leading_rows=1 source.taxassessor gs://taxassessor/OFFRS_5_0_TAXASSESSOR_0001_001.txt taxassessor.txt
the error I get back is:
Not enough positional args, still looking for destination_table
I tried to mimic the command in the Web UI, and I cannot reproduce because the Web UI doesn't allow double-pipe delimiters (limitation on the UI? or the solution?)
I have two questions:
How do I repair the current query
the source file OFFRS_5_0_TAXASSESSOR_0001_001.txt is one of many source files with the last three characters of the file name showing what file number in the series that file is. How do I use wild cards so I can get file 002.txt, 003.txt, etc. something like OFFRS_5_0_TAXASSESSOR_0001_*.txt?
Thanks
How do I use wild cards so I can get file 002.txt, 003.txt, etc. something like OFFRS_5_0_TAXASSESSOR_0001_*.txt?
Do as you suggested, for instance:
bq load --field_delimiter="||" --skip_leading_rows=1 source.taxassessor gs://taxassessor/OFFRS_5_0_TAXASSESSOR_0001_*.txt taxassessor.txt
It should already work.
How do I repair the current query?
Not sure why you are getting this message as everything seems to be correct...but still it shouldn't work as your delimiter have 2 characters and they are supposed to have just one (imagine for instance if your file has the string "abcd|||efg||hijk|||l", it'd be hard to tell where the delimiter is; whether it's the first two pipes or the last).
If you can't change the delimiter, one thing you could do is saving everything in BigQuery as one entire STRING field. After that, you can extract the fields as you want, something like:
WITH data AS(
select "alsdkfj||sldkjf" as field UNION ALL
select "sldkfjld|||dlskfjdslk"
)
SELECT SPLIT(field, "||") all_fields FROM data
all_fields will have all columns in your files, you can then save the results to some other table or run any analyzes you want.
As a recommendation, it would probably be better if you could change this delimiter to something else, having just one character.

How do I partition a large file into files/directories using only U-SQL and certain fields in the file?

I have an extremely large CSV, where each row contains customer and store ids, along with transaction information. The current test file is around 40 GB (about 2 days worth), so partitioning is an absolute must for any reasonable return time on select queries.
My question is this: When we receive a file, it contains multiple store's data. I would like to use the "virtual column" functionality to separate this file into the respective directory structure. That structure is "/Data/{CustomerId}/{StoreID}/file.csv".
I haven't yet gotten it to work with the OUTPUT statement. The statement use was thus:
// Output to file
OUTPUT #dt
TO #"/Data/{CustomerNumber}/{StoreNumber}/PosData.csv"
USING Outputters.Csv();
It gives the following error:
Bad request. Invalid pathname. Cosmos Path: adl://<obfuscated>.azuredatalakestore.net/Data/{0}/{1}/68cde242-60e3-4034-b3a2-1e14a5f7343d
Has anyone attempted the same kind of thing? I tried to concatenate the outputpath from the fields, but that was a no-go. I thought about doing it as a function (UDF) that takes the two ID's and filters the whole dataset, but that seems terribly inefficient.
Thanks in advance for reading/responding!
Currently U-SQL requires that all the file outputs of a script must be understood at compile time. In other words, the output files cannot be created based on the input data.
Dynamic outputs based on data are something we are actively working for release sometime later in 2017.
In the meanwhile until the dynamic output feature is available, the pattern to accomplish what you want requires using two scripts
The first script will use GROUP BY to identify all the unique combinations of CustomerNumber and StoreNumber and write that to a file.
Then through the use of scripting or a tool written using our SDKs, download the previous output file and then programmatically create a second U-SQL script that has an explicit OUTPUT statement for each pair of CustomerNumber and StoreNumber

How to read number of rows and columns in a CSV file using VBA

I have more than 100 CSV/text files (vary in size between 1MB to 1GB). I Just need to create a excel sheet for each csv file, presenting:
name of columns
types of column i.e. numeric or string
number of records in each column
min & max values & length of each column
so the output on a sheet would be something like this (I can not paste table image here as I am new on this site, so please consider below dummy table as excel sheet):
A B C D E F G
1 Column_name Type #records min_value max_value min_length max_length
2 Name string 123456 Alis Zomby 4 30
3 Age numeric 123456 10 80 2 2
Is is possible to create any vba code for this? I am at very beginner stage so if any expert can help me out on code side, would be really helpful.
thanks!!!
You could try writing complex VBA file- and string-handling code for this; my advice is: don't.
A better approach is to ask: "What other tools can read a csv file?"
This is tabulated data, and the files are quite large. Larger, really, than you should be reading using a spreadsheet: it's database work, and your best toolkit will be SQL queries with MIN() MAX() and COUNT() functions to aggregate the data.
Microsoft Access has a good set of 'external data' tools that will read fixed-width files, and if you use 'linked data' rather than 'import table' you'll be able to read the files using SQL queries without importing all those gigabytes into an Access .mdb or .accdb file.
Outside MS-Access, you're looking at intermediate-to-advanced VBA using the ADODB database objects (Microsoft Active-X Data Objects) and a schema.ini file.
Your link for text file schema.ini files is here:
http://msdn.microsoft.com/en-us/library/ms709353%28v=vs.85%29.aspx
...And you'll then be left with the work of creating an ADODB database 'connection' object that sees text files in a folder as 'tables', and writing code to scan the file names and build the SQL queries. All fairly straightforward for an experienced developer who's used the ADO text file driver.
I can't offer anything more concrete than these general hints - and nothing like a code sample - because this is quite a complex task, and it's not really an Excel-VBA task; it's a programming task best undertaken with database tools, except for the very last step of displaying your results in a spreadsheet.
This is not a task I'd give a beginner as a teaching exercise, it demands so many unfamiliar concepts and techniques that they'd get nowhere until it was broken down into a structured series of separate tutorials.

Importing/Pasting Excel data with changing fields into SQL table

I have a table called Animals. I pull data from this table to populate another system.
I get Excel data with lists of animals that need to go in the Animals table.
The Excel data will also have other identifiers, like Breed, Color, Age, Favorite Toy, Veterinarian, etc.
These identifiers will change with each new excel file. Some may repeat, others are brand new.
Because the fields change, and I never know what new fields will come with each new excel file, my Animals table only has Animal Id and Animal Name.
I've created a Values table to hold all the other identifier fields. That table is structured like this:
AnimalId
Value
FieldId
DataFileId
And then I have a Fields table that holds the key to each FieldId in the Values table.
I do this because the alternative is to keep a big table with fields that may not even be used each time I need to add data. A big table with a lot of null columns.
I'm not sure my way is a good way either. It can seem overly complex.
But, assuming it is a good way, what is the best way to get this excel data into my Values table? The list of animals is easy to add to my Animals table. But for each identifier (Breed, Color, etc.) I have to copy or import the values and then update the table to assign a matching FieldId (or create a new FieldId in the Fields table if it doesn't exist yet).
It's a huge pain to load new data if there are a lot of identifiers. I'm really struggling and could use a better system.
Any advice, help, or just pointing me in a better direction would be really appreciated.
Thanks.
Depending on your client (eg, I use SequelPro on a Mac), you might be able to import CSVs. This is generally pretty shaky, but you can also export your Excel document as a CSV... how convenient.
However, this doesn't really help with your database structure. Granted, using foreign keys is a good idea, but importing that data unobtrusively (and easily) is something that will need to likely be done a row at a time.
However, you could try modifying something like this to suit your needs, by first exporting your Excel document as a CSV, removing the header row (the first one), and then using regular expressions on it to change it into a big chunk of SQL. For example:
Your CSV:
myval1.1,myval1.2,myval1.3,myval1.4
myval2.1,myval2.2,myval2.3,myval2.4
...
At which point, you could do something like:
myCsvText.replace(/^(.+),(.+),(.+)$/mg, 'INSERT INTO table_name(col1, col2, col3) VALUES($1, $2, $3)')
where you know the number of columns, their names, and how their values are organized (via the regular expression & replacement).
Might be a good place to start.
Your table looks OK. Since you have a variable number of fields, it seems logical to expand vertically. Although you might want to make it easier on yourself by changing DataFileID and FieldID into FieldName and DataFileName, unless you will use them in a lot of other tables too.
Getting data from Excel into SQL Server is unfortunately not so easy as you would expect from two Microsoft products interacting with eachother. There are several routes that I know of that you can take:
Work with CSV files instead of Excel files. Excel can edit CSV files just as easily as Excel files, but CSV is an infinitely more reliable datasource when it comes to importing. You don't get problems with different file formats for different Excel versions, Excel having to be installed on the computer that will run the script or quirks with automatic datatype recognition. A CSV can be read with the BCP commandline tool, the BULK INSERT command or with SSIS. Then use stored procedures to convert the data from a horizontal bulk of columns into a pure vertical format.
Use SSIS to read the data directly from the Excel file(s). It is possible to make a package that loops over several Excel files. A drawback is that the column format and the sheet name of the Excel file has to be known beforehand, and so a different template (with a separate loop) has to be made each time a new Excel format arrives. There exist third-party SSIS components that claim to be more flexible, but I haven't tested them out yet.
Write a Visual C# program or PowerShell script that grabs the Excel file, extracts the data and outputs into your SQL table. Visual C# is a pretty easy language with powerful interfaces into Office and SQL Server. I don't know how big the learning curve is to get started, but once you do, it will be a pretty easy program to write. I have also heard good things about Powershell.
Create an Excel Macro that uses VB code to open other Excel files, loop through their data and write the results either in a predefined sheet or as CSV to disk. Once everything is in a standard format it will be easy to import the data using one of the above methods.
Since I have had headaches with 1) and 2) before, I would advise on either 3) or 4). Because of my greater experience with VBA than Visual C# or Powershell, I´d go for 4) if I was in a hurry. But I think 3) is the better investment for the long term.
(You could also go adventurous and use another script language, such as Python as I once did because Python is cool, unfortunately Python offers pretty slow and limited interfaces to SQL server and Excel)
Good luck!