python: multiple SQLs to run in parallel

python: multiple SQLs to run in parallel - python-multiprocessing

My python program is supposed to execute approx 200 SQLs. 200 (inptLstCnt) is based on the number of tables received as the input to the function.
Instead running 200 SQLs one by one, i want use the same database connection instance, run 10 or certain pool of SQLs at a time and iterate until the inptLstCnt is met. I am quite confused what to choose between Multiprocessing pools and Threads.
Are there any code references for similar approach but not necessarily SQL execution or database connection.

Related

Talend's tOracleInput does not read data

My colleague created a project in Talend to read data from Oracle database.
I used his project and so I have his Job context with connection parameters to Oracle DB and Talend successfully connects on that computer.
I've created a trivial job which is composed of two components: tOracleInput which should be reading data and tLogRow which should be redirecting output to Talend's terminal.
The problem is that when I start the job - data is not outputted to terminal and instead of row amount outputted per second I see Starting ... status.
Would it be connection issues, inappropriate java version on my computer or something else?

Starting... status means that the query is being executed. Usually it takes a few seconds to execute a simple query against the database. This is because of the Oracle database behavior that it starts to return the data without completing a full table scan. To use this feature you can use joins and filters, but not group by / order by.
On the other hand if you're using a view or executing a complex query, or just simply use DISTINCT it could happen that the query execution takes a few minutes. This is because the oracle database generates the ResultSet on the database side before returning the records.

How to optimize the downloading of data to the server in SSIS

Good day.
Need to get records from an Oracle database to a database in SQL Server. The data source type (ODBC) the performed using a SQL command, where I am taking all possible indices according to my requirement. The process runs fine, the problem is that it takes a long time and I need to be something quick. The process can not be performed with lookup, requires merge or merge join, simply load a table from Oracle to SQL under certain conditions.
Thank you for your help

Check what is your limiting factor. Generally there are 3 points to check:
Remote server is slow.
Source DB can run low on memory, read speed or free CPU. Substitute you query with a straight SELECT statement with no WHERE clause or JOINs and see if your SSIS package runs faster.
Target DB.
You may have indexes enabled, high write latency on HDD or not enough CPU.
Run an INSERT for your target table and see how longer it takes.
Problem may be in the middle: transfer between 2 servers. Network usually is main bottleneck. Is SSIS hosted on the same server as SQL server? then you have 2 network connections + possible hardware bottleneck on dedicated SSIS machine.
Depending on the bottleneck there are different solutions.
If you have network capacity and bottleneck is 1 CPU per query on Oracle, then you can partition your data horisontally (IDs 1 to 100, 101 to 200 etc); establish multiple connections to Oracle and load data in several streams. Number of streams is 1 less then number of CPUs on Oracle, SSIS or SQL Server (which ever is smaller).

How can I execute an Sql query, but at the same time 100 of it in parallel?

Is it possible to execute a simple query against a Db but 100 of it at the same time, in parallel via TSQL in Management Studio?
(The idea is to see how it impacts the Db server in terms of performance)

you can execute your batch in a loop like in
select getdate()
go 100
it will not execute in parallel but 100 times one after another. if this is not enough, you can execute it in the same time in another session

You can use sqlcmd and put it in a batch to spawn 100 parallel processes. As an alternative you can use LinqPad and a simple C# script that uses Tasks and, too, spawns 100 processes.
If you want to have a parallel stored procedure execution, then you need to setup service broker and a queue with the length of 100, but you won't be having exactly 100 parallel activations at once because service broker adds workers as workload increases, so it won't happen instantaneously.

Reading SQL Server result sets while still executing procedure

I have a SQL Server stored procedure that returns multiple result sets to a .NET app. For performance reasons I don't want to wait for all of them to be returned, but work on a result set as soon as it is returned, so processing and retrieving other result sets happens in parallel.
Is it possible with .NET and SQL Server?

This is not possible. SQL cannot start a statement until the previous one finishes. A statement does not finish until it produced the entire result set it will produce. The result set is a stream that the client must consume.
There are many ways to execute calls in parallel, by sending distinct requests for each interested result. But that require that you code your app appropriately (use multiuple connections and async calls) and absolutely cannot be done by a stored procedure.

Fastest way to insert in parallel to a single table

My company is cursed by a symbiotic partnership turned parasitic. To get our data from the parasite, we have to use a painfully slow odbc connection. I did notice recently though that I can get more throughput by running queries in parallel (even on the same table).
There is a particularly large table that I want to extract data from and move it into our local table. Running queries in parallel I can get data faster, but I also imagine that this could cause issues with trying to write data from multiple queries into the same table at once.
What advice can you give me on how to best handle this situation so that I can take advantage of the increased speed of using queries in parallel?
EDIT: I've gotten some great feedback here, but I think I wasn't completely clear on the fact that I'm pulling data via a linked server (which uses the odbc drivers). In other words that means I can run normal INSERT statements and I believe that would provide better performance than either SqlBulkCopy or BULK INSERT (actually, I don't believe BULK INSERT would even be an option).

Have you read Load 1TB in less than 1 hour?
Run as many load processes as you have available CPUs. If you have
32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel
loads.
If you have control over the creation of your input files, make them
of a size that is evenly divisible by the number of load threads you
want to run in parallel. Also make sure all records belong to one
partition if you want to use the switch partition strategy.
Use BULK insert instead of BCP if you are running the process on the
SQL Server machine.
Use table partitioning to gain another 8-10%, but only if your input
files are GUARANTEED to match your partitioning function, meaning
that all records in one file must be in the same partition.
Use TABLOCK to avoid row at a time locking.
Use ROWS PER BATCH = 2500, or something near this if you are
importing multiple streams into one table.
For SQL Server 2008, there are certain circumstances where you can utilize minimal logging for a standard INSERT SELECT:
SQL Server 2008 enhances the methods that it can handle with minimal
logging. It supports minimally logged regular INSERT SELECT
statements. In addition, turning on trace flag 610 lets SQL Server
2008 support minimal logging against a nonempty B-tree for new key
ranges that cause allocations of new pages.

If your looking to do this in code ie c# there is the option to use SqlBulkCopy (in the System.Data.SqlClient namespace) and as this article suggests its possible to do this in parallel.
http://www.adathedev.co.uk/2011/01/sqlbulkcopy-to-sql-server-in-parallel.html

If by any chance you've upgraded to SQL 2014, you can insert in parallel (compatibility level must be 110). See this:
http://msdn.microsoft.com/en-us/library/bb510411%28v=sql.120%29.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

python: multiple SQLs to run in parallel - python-multiprocessing

Related

Talend's tOracleInput does not read data

How to optimize the downloading of data to the server in SSIS

How can I execute an Sql query, but at the same time 100 of it in parallel?

Reading SQL Server result sets while still executing procedure

Fastest way to insert in parallel to a single table

Categories

Resources