Talend's tOracleInput does not read data - sql

My colleague created a project in Talend to read data from Oracle database.
I used his project and so I have his Job context with connection parameters to Oracle DB and Talend successfully connects on that computer.
I've created a trivial job which is composed of two components: tOracleInput which should be reading data and tLogRow which should be redirecting output to Talend's terminal.
The problem is that when I start the job - data is not outputted to terminal and instead of row amount outputted per second I see Starting ... status.
Would it be connection issues, inappropriate java version on my computer or something else?

Starting... status means that the query is being executed. Usually it takes a few seconds to execute a simple query against the database. This is because of the Oracle database behavior that it starts to return the data without completing a full table scan. To use this feature you can use joins and filters, but not group by / order by.
On the other hand if you're using a view or executing a complex query, or just simply use DISTINCT it could happen that the query execution takes a few minutes. This is because the oracle database generates the ResultSet on the database side before returning the records.

Related

the transaction log for database 'tempdb' is full due to 'ACTIVE_TRANSACTION

I'm running same query in two different windows of the same server.
the only difference is : the query that is throwing above error has got 'index' on the temporary tables.
The query w/o index on temporary tables is working fine. Please explain how could index be a reason for this error?
This depends from your query. SQL-Server has to maintain indexes during data changes. This can drives you in different time-waith events.
Try this: check on your two different SQL Server instances what's heppening exactly to your running session during query executon.
You can do this monitoring wait events creating a monitor sessions for a single SPID.
This is my complete procedure to this: http://zaboilab.com/sql-server-toolbox/monitoring-wait-events-of-a-single-session-or-query

"Select data source" popping-up when executing a SQL writed query

I try to create a pretty complex database on ms Access 2013, so I wanted to type it directly in SQL. It has no errors, as other DBMS can fully build the database from the script I wrote (for example, phpmyadmin imports it with no difficulty).
On this tutorial, it is showed how to write a SQL query in order to build tables. I thought this way matched well with my goal as I could copy-paste my script in the query and run it to create the whole thing.
But when I tried to open/double-click on the query a pop-up appears saying "Select data source", waiting for me to select an ODBC, either from a file or a host, before continuing and executing the query.
I tried other types of queries (creating only one table at time, trying on a blank file, or even SELECT * FROM *), bt this message keeps showing up and I really don't know how to deal with it as I don't want to connect to anything but the infile database.
Does anyone got a hint about what to do in this case?
Or, even better, how could Access import my SQL script in order to create the database?
You should configure the database connection in the ODBC and check whether the connection is established or not. Once the connection is established, you can run the query to fetch the data or create tables as per your requirement.

SSAS Multidimensional - Table Value Function as a Query for Partition

#GregGalloway was able to answer the question I should have asked. I am adding a more concise question here, while maintaining the original lengthy text
How do I use a table valued function as the query for a partition, when the function is in separate database from my fact and referenced dimensions?
Overview: I am building a SSAS multidimensional cube that is built off of a single fact table in our application's data warehouse, and want to use the result set from a table valued function as my fact table's partition query. We are using SQL Server (and SSAS) 2014
Condition: For each environment (Dev,Tst,Prd) there are 2 separate databases on the same server, one for the application data warehouse [DW_App], the other for custom objects [DW_Custom]. I cannot create any objects in [DW_App], but have a lot of freedom in [DW_Custom]
Background info: I have not been able to find much information on using a TVF and partitions in this way. My thinking is that it will help streamline future development by giving me a single place to update the SQL if/when I modify the fact table.
So in testing out my crazy idea of using a TVF as the query for my partitions I have run into a bit of a conundrum. I am able to use my TVF when I explicitly state the Database in my FROM clause.
SELECT * FROM [DW_Custom].[dbo].[CubePartition](#StartDate, #EndDate)
However, that will not work, because the cube will be deployed in multiple environments before production, and it needs to point to different DBs for each. So I tried adding a new data source, setting my partition query to point to the new data source, and then remove the database name. IE:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
I get an error that
The SQL syntax is not valid. The relational database returned the following error message: Deferred prepare could not be completed. Invalid object name 'dbo.CubePartition'
If I click through this error and the subsequent warnings about the cube not being able to process if I continue I am able to build and deploy the cube. However I cannot process it, because I get an error that one of my dimensions does not exist.
Looking into the query that was generated and it is clear that it is querying my dimensions as well as fact, which do not exist inside of '[DW_Custom]' which explains that error perfectly fine.
So I guess 2 questions:
Is it possible to query another DB (on the same server) from inside of an SSAS partition query?
If not, is there any way I can use a variable as the database name in the query, and update that variable based on the project configuration (Dev,Tst,Prd)
Bonus question: Is the reason that I can not find much about doing it this way because it is an obviously bad idea that I am overlooking, and if so why?
How about creating a second SSAS Data Source pointing to the DW_Custom database (or whatever it's called in the particular environment you're deploying to)? Then when you deploy from Dev to Prod, you need only change that connection string. When you create your partitions, then specify the DW_Custom data source and then specify the query without database name:
SELECT * FROM [dbo].[CubePartition](#StartDate, #EndDate)
As long as the query plan for that table-valued function is efficient compared to a plain SELECT statement, then I don't see a problem with that.

Is it possible for SQL Server to report an incorrect number of rows affected?

I have a utility which:
grabs sql commands from a flat text file
executes them sequentially against SQL Server located on the same machine
and reports an error if an UPDATE command affects ZERO ROWS (there should never be an update command in this file that doesn't affect a record, hence it being recorded as an error)
The utility also logs any failed commands.
Yet the final data in the database seems to be wrong/stale, even though my utility is reporting no failed updates and no failed commands.
I know the first and most obvious culprit is some kind of logic or runtime error in my programming of the utility itself, but I just need to know of it's THEORETICALLY possible for SQL Server to report that at least one row was affected and yet no apply the change.
If it helps, the utility always seems to correctly execute the same number of commands and the final stale/wrong data is always the same i.e. it seems to correctly execute a certain number of commands that are being successfully queried against the database, then failing.
Thanks.
EDIT:
I should also note that this utility is exhibiting this behavior across 4 different production servers each with their own dedicated local database server, and that these are beefy machines with 8-16 GB RAM each that are managed by a professional sysadmin.
Based on what you say...
It is possible for the "xx rows affected" to be misleading if you have a trigger firing. You may be reading the count from the trigger. If so, add SET NOCOUNT ON to the trigger
Alternatively, the data is the same, so you actually do dummy update with the same values. Add a WHERE clause to test for differences for example.
BEGIN TRANSACTION
UPDATE MyTable
SET Message = ''
WHERE ID = 2
ROLLBACK TRANSACTION
Messages:
(1 row(s) affected)

Retrieving billions of rows from remote server?

I am trying to retrieve around 200 billion rows from a remote SQL Server. To optimize this, I have limited my query to use only an indexed column as a filter and am selecting only a subset of columns to make the query look like this:
SELECT ColA, ColB, ColC FROM <Database> WHERE RecordDate BETWEEN '' AND ''
But it looks like unless I limit my query to a time window of a few hours, the query fails in all cases with the following error:
OLE DB provider "SQLNCLI10" for linked server "<>" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server M<, Line 1
The OLE DB provider "SQLNCLI10" for linked server "<>" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7421, Level 16, State 2, Server <>, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI10" for linked server "<>".
The timeout is probably an issue because of the time it takes to execute the query plan. As I do not have control over the server, I was wondering if there is a good way of retrieving this data beyond the simple SELECT I am using. Are there any SQL Server specific tricks that I can use? Perhaps tell the remote server to paginate the data instead of issuing multiple queries or something else? Any suggestions on how I could improve this?
This is more of the kind of job SSIS is suited for. Even a simple flow like ReadFromOleDbSource->WriteToOleDbSource would handle this, creating the necessary batching for you.
Why read 200 Billion rows all at once?
You should page them, reading say a few thousand rows at a time.
Even if you do genuinely need to read all 200 Billion rows you should still consider using paging to break up the read into shorter queries - that way if a failure happens you just continue reading where you left off.
See efficient way to implement paging for at least one method of implementing paging using ROW_NUMBER
If you are doing data analysis then I suspect you are either using the wrong storage (SQL Server isn't really designed for processing of large data sets), or you need to alter your queries so that the analysis is done on the Server using SQL.
Update: I think the last paragraph was somewhat misinterpreted.
Storage in SQL Server is primarily designed for online transaction processing (OLTP) - efficient querying of massive datasets in massively concurrent environments (for example reading / updating a single customer record in a database of billions, at the same time that thousands of other users are doing the same for other records). Typically the goal is to minimise the amout of data read, reducing the amount of IO needed and also reducing contention.
The analysis you are talking about is almost the exact opposite of this - a single client actively trying to read pretty much all records in order to perform some statistical analysis.
Yes SQL Server will manage this, but you have to bear in mind that it is optimised for a completely different scenario. For example data is read from disk a page (8 KB) at a time, despite the fact that your statistical processing is probably only based on 2 or 3 columns. Depending on row density and column width you may only be using a tiny fraction of the data stored on an 8 KB page - most of the data that SQL Server had to read and allocate memory for wasn't even used. (Remember that SQL Server also had to lock that page to prevent other users from messing with the data while it was being read).
If you are serious about processing / analysis of massive datasets then there are storage formats that are optimised for exactly this sort of thing - SQL Server also has an add on service called Microsoft Analysis Services that adds additional online analytical processing (OLAP) and data mining capabilities, using storage modes more suited to this sort of processing.
Personally I would use a data extraction tool such as BCP to get the data to a local file before trying to manipulate it if I was trying to pull that much data at once.
http://msdn.microsoft.com/en-us/library/ms162802.aspx
This isn't A SQL Server specific answer, but even when the rDBMS supports server side cursors, it's considered poor form to use them. Doing so means that you are consuming resources on the server even though the server is still waiting for you to request more data.
Instead you should reformulate your query usage so that the server can transmit the entire result set as soon as it can, and then completely forget about you and your query to make way for the next one. When the result set is too large for you process all in one go, you should keep track of the last row returned by the current batch so that you can fetch another batch starting at that position.
Odds are the remote server has the "Remote Query Timeout" set. How long does it take for the query to fail?
Just run into the same problem, I also had the message at 10:01 after running the query.
Check this link. There's a remote query timeout setting under Connections that's setup to 600secs by default and you need to change it to zero (unlimited) or other value you think is right.
Try to change remote server connection timeout property.
For that go to SSMS, connect to the server, right click on server's name in object explorer, further select Properties -> Connections and change value in the Remote query timeout (in seconds, 0 = no timeout) text box.