Write non-SQL dataset to SQL table in DataIku - dataiku

I dont seem to find a way to write the output from a previous step in the flow into a SQL table, using the SQL recipes. When I read the documentation, it seems both types of SQL action can only take as an input a SQL dataset? This cant be write, as you would imagine you would want to create datasets in the flow and then commit them to a database?
https://doc.dataiku.com/dss/latest/code_recipes/sql.html
In the docs above, it describes In\Out parameters as needing to be SQL.

Indeed, it doesn't seem possible with a SQL recipe which executes fully in the database.
That being said you can probably use a sync recipe to put your non-SQL dataset in your SQL db so that you can execute a SQL recipe.

Related

IBM SPSS How to import a Custom SQL Database Query

I am looking to see if the capability is there to have a custom SSMS sql query imported in SPSS (Statistical Package for the Social Sciences). I would want to build syntax that generates this query as my new dataset that I can then continue my scripted analysis. I see the basic query capability of one table from a Sql Server but I would like to create a query that joins to many tables. I anticipate the query to be a bit complex with many joins and perhaps data transformations.
Has anybody had experience or a solution to this situation?
I know I could take the query and make a table of it that SPSS can then connect to but my data changes daily and I would need a job in another application to refresh this table before my SPSS syntax would pull it and I would like to eliminate that first step by just having the query that grabs the data at the beginning of my syntax.
Ultimately I am looking to build out my SPSS syntax and schedule it in the Production Facility to run daily.

Resource Intensive Query

I am using ado.net entities, against a SQL azure database. One of the queries is taking an extremely long time, most likely pulling data it doesn't need. Is there a way to match up the query in C# with the query execution in Azure.
Please enable Query Store on SQL Azure to identify the T-SQL equivalent of the LINQ query. Use this article for more details.
Below command helps you enabled query store
alter database current set query_Store on
Hope this helps.

Stored Procedure vs direct SQL command in SSIS data flow source

I'm providing maintenance support for some SSIS packages. The packages have some data flow sources with complex embedded SQL scripts that need to be modified from time to time. I'm thinking about moving those SQL scripts into stored procedures and call them from SSIS, so that they are easier to modify, test, and deploy. I'm just wondering if there is any negative impact for the new approach. Can anyone give me a hint?
Yes there are issues with using stored procs as data sources (not in using them in Execute SQL tasks though in the control flow)
You might want to read this:
http://www.jasonstrate.com/2011/01/31-days-of-ssis-no-more-procedures-2031/
Basically the problem is that SSIS cannot always figure out the result set and thus the columns from a stored proc. I personally have run into this if you write a stored proc that uses a temp table.
I don't know that I would go as far as the author of the article and not use procs at all, but be careful that you are not trying to do too much with them and if you have to do something complicated, do it in an execute sql task before the dataflow.
I can honestly see nothing but improvements. Stored procedures will offer better security, the possibility for better performance due to cached execution plans, and easier maintenance, like you pointed out.
Refactor away!
You will not face issues using only simple stored procedures as data source. If procedure is using temp tables and CTE - there is no guarantee you will not face issues. Even when you can preview results in design time - you may get errors in a run time.
My experience has been that trying to get a sproc to function as a data source is just not worth the headache. Maybe some simple sprocs are fine, and in some cases TVFs will work well instead, but if you need to do some complex operations there's no alternative to a sproc.
The best workaround I've found is to create an output table for each sproc you need to use in SSIS.
Modify the sproc to truncate the new output table at start, and to write its output to this instead of (or in addition to) ending with a SELECT statement.
Call the sproc with an Exec SQL task before your data flow.
Have your data flow read from the output table - a much simpler task.
If you want to save space, truncate the output table again with another Exec SQL. I prefer to leave it, as it lets me examine the data later and lets me rerun the output data flow if it fails without calling the sproc again.
This is certainly less elegant than reading directly from a sproc's output, but it works. FWIW, this pattern follows the philosophy (obligate in Oracle) that a sproc should not try to be a parameterized view.
Of course, all this assumes that you have privs to adjust the sproc in question. If necessary, you could write a new wrapper sproc which truncates the output table, then calls the old sproc and redirects its output to the new table.

Move Data from Oracle to SQL Server

I would like to copy parts of an Oracle DB to a SQL Server DB. I need to move the data because the Oracle box is being decommissioned. I only need the data for reference purposes so don't need indexes or stored procedures or contstaints, etc. All I need is the data.
I have a link to the Oracle DB in SQL Server. I have tested the following query, which seemed to work just fine:
select
*
into
NewTableName
from
linkedserver.OracleTable
I was wondering if there are any potential issues with using this approach?
Using SSIS (sql integration services) may be a good alternative especially if your table names are the same on both servers. Use the import wizard via and it should create the destination tables for you and let you edit any mappings.
The only issue I see with that is you will need to execute that of course for each and every table you need. Glad you are decommissioning the oracle server :-). Otherwise if you are not concerned with indexes or any of the existing sprocs I don't see any issue in what you are doing.
The "select " approach could be very slow if tables are large. Consider writing pro*C in that case or use Fastreader http://www.wisdomforce.com/products-FastReader.html
A faster and easier approach might be to use the Data Transformation Services, depending on the number of objects you're trying to copy over.

Hibernate complex query

I am trying to execute a query against a MySQL database.
The query is fairly complex it has 5 inner joins, including 1 join to itself and
it returns 3 pieces of information from 2 different tables.
We are using hibernate and till now I have used it for simple queries only.
I have written the sql query and tested it too. I am wondering how to implement this using
hibernate, can I execute plain sql statements with hibernate? If so what do I need, a separate hbm.xml?
If I use hibernate and execute the plain sql query can I still utilize caching later on?
Yes, you can execute plain SQL queries with Hibernate.
No, you don't need a separate hbm.xml mapping file (unless you WANT to separate sql queries from the rest, in which case you can do so). You can map your named SQL query the same way you do with named HQL queries.
Whether you will be able to "utilize caching" depends on what exactly you understand by "caching" and how you're going to map your SQL query; it's impossible to answer without knowing more details.
All that said, you may not need to resort to SQL query; HQL is quite powerful and it may very well be possible (assuming appropriate mappings exist) to write your query as HQL. Can you post relevant mappings / schemas and your SQL query?
I strongly recommend criteria queries over HQL queries. They are much closer to your program code without sacrificing any expression power. They DO however depend on relations to be explicitly mapped, otherwise they get quite complicated.
To speed up development, set property hibernate.show_sql=true, and play with the system in the debugger, using the "reload modified class" and "drop stack frame" features of the IDE+jvm until the SQL emitted looks like the one you've posted.