push query results into array - sql

I have a bash shell script. I have a psql copy command that has captured some rows of data from a database. The data from the database are actual sql statments that I will use in the bash script. I put the statements in a database because they are of varying length and I want to be able to dynamically call certain statements.
1) I'm unsure what delimiter to use in the copy statement. I can't use a comma or pipe because they are in my data coming from the database. I have tried a couple random characters because those are not in my database but copy has a fit and only wants one ascii character.
Also to complicate things I need to get query_name and query_string for each row.
This is what I currently have. I get all the data fine with the copy but now I just want to push the data into an array so that I will be able to loop over it later:
q="copy (select query_name,query_string from query where active=1)
to stdout delimiter ','"
statements=$(psql -d ${db_name} -c "${q}")
statements_split=(`echo ${statements//,/ }`)
echo ${statements_split[0]};

Looks to me like you actually want to build something like a dictionary (associative array) mapping query_name to query_string. bash isn't really the best choice for handling complex data structures. I'd suggest using Perl for this kind of task if that's an option.

Related

How to trigger a stored package/procedure from SQLLDR control file

I have a sqlldr process (which is triggerd from a windows batch) that loads around million records in to a staging table. Once a load is complete, need to trigger a SQL packages/procedures to transform the data. Can I invoke the packages from the same control file which I used to import the data? if so, what would be the syntax? if no, what is an alternate method to achieve this?
Short Answer: call the package/procedure that your need it with the sqlplus command in your batch file, after your sqlldr command, like next example.
sqlldr DB_User/DB_Password control=path_of_your_ctl_file.ctl log=path_to_save_log_file.log &
exit | sqlplus DB_User/DB_Password "begin package.procedure; end;"
Long Answer: First, the sqlldr's objective is load a considerable data amount, in batch way. For this you may use sqlldr command to load and not transform data. Now, this doesn't mean that you can't transform data while sqlldr read and load data (Look the next example).
load data
infile 'path_of_source_data_file.csv'
badfile 'path_to_save_bad_file.bad'
discardfile 'path_to_save_discard_file.dsc'
append into table TABLE_NAME_TO_STORAGE_DATA
fields terminated by ';' optionally enclosed by '"' --it depends of data file structure or extension
TRAILING NULLCOLS
(
COL_1 "TRIM(:COL_1)",
COL_2 "COALESCE(:COL_2, 'UNDEFINED')",
ID_COL "SEQ_ID_COL.NEXTVAL"
)
Although you can normalize your data through sqlldr command, you have to think in your system's architecture. If the normalize phase is simple, like the above example, will good put together into a ctl file, otherwise you should abstract the normalize phase and call it, through sqlplus command when finish the data load phase.
Note: Also you should think about security, because you are pass, into the batch file, the database username and password. I recommend you that use Powershell files and Export-Clixml to cipher the sensible parameters (look this link to achive this).

PostgreSQL: Execute queries in loop - performance issues

I need to copy data from a file into a PostgreSQL database. For that purpose I parse that file using bash in a loop and generate the corresponding insert queries. The trouble is that it takes a lot of time in order to perform that loop.
1)What can I do to accelerate that loop? Should I open a kind of connection before the loop and close it after?
2)Should I use a temporary text file inside the loop in order to write there the unique values and search in it using the text utility instead of writing them to the database and perform a search there?
Does whatever programming language you use commit after every insert? If so, the easiest thing you can do is commit after inserting all rows rather than after every row.
You might also be able to batch inserts, but using the PostgreSQL copy command is less work and also very fast.
If you insist on using BASH you could split files by defined row numbers and then excecute paralel commands using & at the end of each line.
I strongly suggest you try a different approach or programing language since as Bill said, bash doesn't talk to postgres, you can also use the pg_dump funtionality if your file's source is another postgres database.

Write results of SQL query to multiple files based on field value

My team uses a query that generates a text file over 500MB in size.
The query is executed from a Korn Shell script on an AIX server connecting to DB2.
The results are ordered and grouped by a specific field.
My question: Is it possible, using SQL, to write all rows with this specific field value to its own text file?
For example: All rows with field VENDORID = 1 would go to 1.txt, VENDORID = 2 to 2.txt, etc.
The field in question currently has 1000+ different values, so I would expect the same amount of text files.
Here is an alternative approach that gets each file directly from the database.
You can use the DB2 export command to generate each file. Something like this should be able to create one file :
db2 export to 1.txt of DEL select * from table where vendorid = 1
I would use a shell script or something like Perl to automate the execution of such a command for each value.
Depending on how fancy you want to get, you could just hardcode the extent of vendorid, or you could first get the list of distinct vendorids from the table and use that.
This method might scale a bit better than extracting one huge text file first.

Hive: create table and write it locally at the same time

Is it possible in hive to create a table and have it saved locally at the same time?
When I get data for my analyses, I usually create temporary tables to track eventual
mistakes in the queries/scripts. Some of these are just temporary tables, while others contain the data that I actually need for my analyses.
What I do usually is using hive -e "select * from db.table" > filename.tsv to get the data locally; however when the tables are big this can take quite some time.
I was wondering if there is some way in my script to create the table and save it locally at the same time. Probably this is not possible, but I thought it is worth asking.
Honestly doing it the way you are is the best way out of the two possible ways but it is worth noting you can preform a similar task in an .hql file for automation.
Using syntax like this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp' select * from table;
You can run a query and store it somewhere in the local directory (as long as there is enough space and correct privileges)
A disadvantage to this is that with a pipe you get the data stored nicely as '|' delimitation and new line separated, but this method will store the values in the hive default '^b' I think.
A work around is to do something like this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
But this is only in Hive 0.11 or higher

Unable to update the table of SQL Server with BCP utility

We have a database table that we pre-populate with data as part of our deployment procedure. Since one of the columns is binary (it's a binary serialized object) we use BCP to copy the data into the table.
So far this has worked very well, however, today we tried this technique on a Windows Server 2008 machine for the first time and noticed that not all of the columns were being updated. Out of the 31 rows that are normally inserted as part of this operation, only 2 rows actually had their binary columns populated correctly. The other 29 rows simply had null values for their binary column. This is the first situation where we've seen an issue like this and this is the same .dat file that we use for all of our deployments.
Has anyone else ever encountered this issue before or have any insight as to what the issue could be?
Thanks in advance,
Jeremy
My guess is that you're using -c or -w to dump as text, and it's choking on a particular combination of characters it doesn't like and subbing in a NULL. This can also happen in Native mode if there's no format file. Try the following and see if it helps. (Obviously, you'll need to add the server and login switches yourself.)
bcp MyDatabase.dbo.MyTable format nul -f MyTable.fmt -n
bcp MyDatabase.dbo.MyTable out MyTable.dat -f MyTable.fmt -k -E -b 1000 -h "TABLOCK"
This'll dump the table data as straight binary with a format file, NULLs, and identity values to make absolutely sure everything lines up. In addition, it'll use batches of 1000 to optimize the data dump. Then, to insert it back:
bcp MySecondData.dbo.MyTable in MyTable.dat -f MyTable.fmt -n -b 1000
...which will use the format file, data file, and set batching to increase the speed a little. If you need more speed than that, you'll want to look at BULK INSERT, FirstRow/LastRow, and loading in parallel, but that's a bit beyond the scope of this question. :)