I have a collection of SQL queries that need to run in a specific order using Teradata. How can this be done?
I've considered writing an application in some other language (like Python or C++) to sequentially call each query, but am unsure how to get live data there from Teradata. I also want to keep the queries as separate SQL files (like it is currently).
Goal is to minimize the need for human interaction ie. I want to hit "Run" and let it take care of the rest.
BTEQ scripts are your Go-To solution.
Have each query, or at least, logical blocks of several statements, in single bteq script.
Then create a script that will call the the BTEQ with needed settings, i.e. TD logon command and have this script be called in a batch with parameters like this:
start /wait C:\Teradata\BTEQ.bat Script_1.txt
start /wait C:\Teradata\BTEQ.bat Script_2.txt
start /wait C:\Teradata\BTEQ.bat Script_3.txt
pause
Then you can create several batch files, split in logical blocks and have them executed at will, or scheduled.
Related
I need to run the same damn Oracle query daily (and export the output, which I can do from the sql file itself). I'd like to automate using Windows Task Scheduler but it only opens the script, and it doesn't run it.
Is this feasible or is there an easier way?
Your description isn't very descriptive (we don't know how exactly you did that), but - here's a suggestion:
create a .bat script (let's call it a.bat) which will call your .sql (let's call it a.sql) script. Only one line in it:
sqlplus scott/tiger#orcl #a.sql
a.sql is ... well, what it is, for example
prompt Number of rows in EMP table
select count(*) from emp;
Create a job in Task Scheduler which will start a program (a.bat).
I've just tried it (using that simple example I posted above) and got the result (saying that there are 14 rows in the table), so I hope you'll get it as well.
I have a Hive query in Hue with one input variable, a string (for example a date like '20160117').
I'd like to execute this Hive query in Hue and pass it multiple values for that single variable.
Is it possible? If yes, how would you guys do it?
Oozie runs Direct Acyclic Graphs (DAG). And Acyclic comes down to no loop, ever. But of course there are workarounds.
So, if you must run the same HQL script exactly N times with a different parameter value...
either copy/paste the Hive Action N times, in a chain, with a different param value (quick and dirty)
or build a Sub-Workflow with just the Hive action and call it N times, in a chain, with a different param value
On the other hand, if you must adapt dynamically the number and the value of executions, then you must work out the "loop" logic outside of Oozie proper...
for instance, start with a Shell action that creates an empty HQL file, then adds N queries in a loop, then uploads the file to HDFS; next, a Hive action that executes the HQL script as-is (quick and dirty, but not ideal for exception handling)
or develop a Java program that connects to HiveServer2 via JDBC, submits a PreparedStatement with 1 bind variable, and executes the statement N times in a loop with different values of the variable.
And maybe, someday, Hive will support some kind of procedural language similar to PL/SQL, T-SQL, PgSQL etc. and you will be able to pass a comma-separated list of values and process it inside of Hive.
I need to copy data from a file into a PostgreSQL database. For that purpose I parse that file using bash in a loop and generate the corresponding insert queries. The trouble is that it takes a lot of time in order to perform that loop.
1)What can I do to accelerate that loop? Should I open a kind of connection before the loop and close it after?
2)Should I use a temporary text file inside the loop in order to write there the unique values and search in it using the text utility instead of writing them to the database and perform a search there?
Does whatever programming language you use commit after every insert? If so, the easiest thing you can do is commit after inserting all rows rather than after every row.
You might also be able to batch inserts, but using the PostgreSQL copy command is less work and also very fast.
If you insist on using BASH you could split files by defined row numbers and then excecute paralel commands using & at the end of each line.
I strongly suggest you try a different approach or programing language since as Bill said, bash doesn't talk to postgres, you can also use the pg_dump funtionality if your file's source is another postgres database.
I would like to know if there is a way to quere my queries. I am doing some basic text matching in psql and each query (which is saved in a different script) takes about 6 hours to run. I was wondering if there is a way to queue my scripts?
For example;
my database is called; "data"
my scipts are called; cancer, heart, death
and I am doing the following;
data; \i cancer;
data; \i heart;
data; \i death;
But I have to come back every so often and check whether it is running or not etc which doesn't seem very efficient.
I am new to postgresql so appreciate any help.
this is the most easiest/fastest solution I can think of, but should work for your case ;)
When using psql from command line, you can start it with
-f filename
where filename is a SQL script. It will run the query and send the output to stdout. Also you can forward this to a file. Just put your queries into that SQL-file and you got your own queuing.
Assuming you might run Linux, you could use screen to have a simple way to leave your session open when logging of for the night.
The easiest solution was to create a separate sql file which runs through the commands sequentially.
I'm looking to execute a series of queries as part of a migration project. The scripts to be generated are produced from a tool which analyses the legacy database then produces a script to map each of the old entities to an appropriate new record. THe scripts run well for small entities but some have records in the hundreds of thousands which produce script files of around 80 MB.
What is the best way to run these scripts?
Is there some SQLCMD from the prompt which deals with larger scripts?
I could also break the scripts down into further smaller scripts but I don't want to have to execute hundreds of scripts to perform the migration.
If possible have the export tool modified to export a BULK INSERT compatible file.
Barring that, you can write a program that will parse the insert statements into something that BULK INSERT will accept.
BULK INSERT uses BCP format files which come in traditional (non-XML) or XML. Does it have to get a new identity and use it in a child and you can't get away with using SET IDENTITY INSERT ON because the database design has changed so much? If so, I think you might be better off using SSIS or similar and doing a Merge Join once the identities are assigned. You could also load the data into staging tables in SQL using SSIS or BCP and then use regular SQL (potentially within SSIS in a SQL task) with the OUTPUT INTO feature to capture the identities and use them in the children.
Just execute the script. We regularly run backup / restore scripts that are 100's Mb in size. It only takes 30 seconds or so.
If it is critical not to block your server for this amount to time, you'll have to really split it up a bit.
Also look into the -tab option of mysqldump with outputs the data using TO OUTFILE, which is more efficient and faster to load.
It sounds like this is generating a single INSERT for each row, which is really going to be pretty slow. If they are all wrapped in a transaction, too, that can be kind of slow (although the number of rows doesn't sound that big that it would cause a transaction to be nearly impossible - like if you were holding a multi-million row insert in a transaction).
You might be better off looking at ETL (DTS, SSIS, BCP or BULK INSERT FROM, or some other tool) to migrate the data instead of scripting each insert.
You could break up the script and execute it in parts (especially if currently it makes it all one big transaction), just automate the execution of the individual scripts using PowerShell or similar.
I've been looking into the "BULK INSERT" from file option but cannot see any examples of the file format. Can the file mix the row formats or does it have to always be consistent in a CSV fashion? The reason I ask is that I've got identities involved across various parent / child tables which is why inserts per row are currently being used.