I have multiple database connections in a Kettle (Pentaho Data Integration) transformation.
When trying to connect to the database, Kettle crashes.
Computer with 8GB of memory and version 8.3, on another computer with the same version of Java and with 4GB it works normally, I already cleared the Kettle Cache.
Related
I am working with Pentaho products: Pentaho Data Integration (PDI) and Pentaho Server.
From PDI, I could create connection to the Pentaho Repository running on a Pentaho Server. I already created some jobs or transformations, stored them in Repository and they're all well-executed.
However when I configurate the input/ output sources of jobs/ transformations, I could only use data files from my local machine. Is there any way that I can store data files on Repository, and configurate the jobs/ transformations to read data from them?
Thank you so much.
P/s: I am working with PDI v9.3 CE and Pentaho Server v9.3 CE.
This is one of the solution I found.
For each transformation/ job stored on Pentaho Repository, there are 3 running configurations: pentaho local, pentaho server, pentaho slave.
Pentaho local: execute trans/job on the machine being used. That means the trans/job can only read & write data in local storage.
Pentaho server: execute trans/job on the machine hosting Pentaho Server (also Pentaho Repository), wih this running mode, trans/job can read & write data in server's storage
Pentaho slave: execute trans/job using slave server's resource.
So, in my case, the solution is switching running config of trans/job to Pentaho Server config.
I am trying to connect the QuickBooks (QODBC) application data through Pentaho (Kettle) for ETL process. Would like to know how can we connect the QuickBooks (QODBC) application data through Pentaho.
Pentaho's ODBC connections use the JDBC-ODBC bridge that is bundled with Java.
The JDBC-ODBC bridge driver was removed in Java 8, preventing ODBC drivers from being usable in newer versions of Pentaho or other Java-based applications.
Generally, in such scenario, we suggest to use MS-Access (third Party application -->MS Access--> Linked tables-->QRemote-->QODBC-->QuickBooks Data), but in this case, accessing MS-Access also requires OBDC. A JDBC driver called UCanAccess is available as an alternative.
The limitation of UCanAccess is that it does not allow to query linked table and can only connect to ms-access base tables.
I would suggest to get in touch with Pentaho and see if you can get a version that allows JDBC-ODBC connection (Generic ODBC) via DSN.
I just moved a big SQL Server database (about 25G in db file size and 20G in log size) from one computer to another. Then suddenly a query that returns in 1 sec in the old machine will run more than 1 minutes in the newly build machine (much more powerful).
The old machine is a dual core Intel I3 with 4g ram. The new machine is a quad core Intel I7 with 16g ram.
I checked that the indexes are exactly the same.
What could be the reason?
Edits:
Haven't update DB stats. Will do that.
Haven't de-fragment the indexes. Will do that as well.
OS: The old machine runs windows server 2008. The new one runs windows server 2012.
Hard-drive: SSD raid 1. Local physical drive. Partitioned into two logical drive one for DB storage and the other for Log storage.
The new machine is running on full performance settings. It's a single machine, nothing balanced to other machines.
It's dedicated for this DB task, nothing else is running on the machine.
It could be variety of reasons. Is that a local harddrive or networked harddrive?
The newer harddisk is slow
Ensure that the db file and transaction log are defragged. You would need to stop sql server and perform defrag. You can use something like Contig from Microsoft (http://technet.microsoft.com/en-in/sysinternals/bb897428.aspx)
Is the newer harddisk filesystem encrypted?
Check for antivirus software. If you have enabled realtime filesystem check, it will slow down by a significant factor for some antivirus brands
Most probable reason would be 2 or 4 from above
As a general advice, for better performance, store db file and log files on separate hard disks (not just different partitions).
I wanted to broach the issue of SQL Server's Hadoop distribution called HDInsight.
Given that there is a connection provided to Hadoop, does anyone have experience with HDInsight and particularly a comparison between the Hadoop / SQL Server connector and HDIinsight / SQL Server from a real life DTP scenario or personal 1 node installation?
http://sqlmag.com/blog/use-ssis-etl-hadoop
http://www.microsoft.com/en-us/download/details.aspx?id=27584
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-data.aspx
HDInsight is the distribution of Hadoop that Microsoft maintains for use in Azure. You could roughly compare this to Amazon Elastic MapReduce. They both serve the purpose of being a hosted Hadoop service that has almost no management overhead.
The Hortonworks Data Platform for Windows contains the open source changes that Hortonworks and Microsoft have collaborated on to make Hadoop run well on Windows. HDP isn't HDInsight.
In short - you don't need to use HDInsight if you want to run Hadoop in a Windows environment.
While I can't speak directly to using HDInsight and moving data back and forth between SQL Server, I've done implemented a data processing solution using SQL Server, Hadoop, and Elastic MapReduce. Barring some data quality issues and BULK INSERT weirdness, the process was painless.
Finally, you ask "do we really want to run Hadoop size datasets on Windows servers?" - Windows performs well and has solid tooling around it. I've been somewhat skeptical about running Hadoop and other Java platform software on Windows because of legacy Java I/O issues and a lack of community support, not because of any performance issues.
The largest issues that Windows companies will find moving to Hadoop is there will be limited support in community forums and channels when the problem becomes a Hadoop + Windows issue. It's very easy for people to throw their hands up and say "Nope, not helping out, don't have Windows." With time and adoption, this problem goes away. Besides, nothing says you have to finish on the same platform you start with. You could easily deploy with HDP on Windows and move to HDP on Linux at a later date.
I have put together some SQL Server and Hadoop basics for DBAs that should be helpful.
At my current project, SSAS is running on a standalone server and I'd like to know the hardware specs (CPU, Memory, etc) and OS version.
The catch is I don't have access to the OS (or even remote access to perfmon or eventvwr) and the DBA's have so far ignored my requests. In the meantime, I'm wondering if there's a XMLA command I can run or a SQL query against one of the DMVs that will provide this information.
Also, I have admin rights to the SSAS instance and can run Profiler traces against it, so if there's another way, I'm all ears!
You can't find out hardware or operating system specifications through SSAS.