I am using Apache Hive to create and execute some queries, but before the query is executed I need to report the structure of the resultset. The queries may involve joins and projections so it will be quite difficult to parse the query. The current solution that we are working on involves parsing the output of explain command but its quite complex itself.
My question is whether there is some simpler way by setting some properties in hive or some query parameters that don't select any data (the map/reduce tasks are not started) but creates a table that I can query via metastore to get the schema?
Unfortunately there is no simpler way other than using EXPLAIN or DESCRIBE commands to get query schema and table schema.
While not something you can do before the query is returned, if you enable
set hive.cli.print.header=true;
then it will show the schema right before the results.
Related
I dont seem to find a way to write the output from a previous step in the flow into a SQL table, using the SQL recipes. When I read the documentation, it seems both types of SQL action can only take as an input a SQL dataset? This cant be write, as you would imagine you would want to create datasets in the flow and then commit them to a database?
https://doc.dataiku.com/dss/latest/code_recipes/sql.html
In the docs above, it describes In\Out parameters as needing to be SQL.
Indeed, it doesn't seem possible with a SQL recipe which executes fully in the database.
That being said you can probably use a sync recipe to put your non-SQL dataset in your SQL db so that you can execute a SQL recipe.
I am looking to see if the capability is there to have a custom SSMS sql query imported in SPSS (Statistical Package for the Social Sciences). I would want to build syntax that generates this query as my new dataset that I can then continue my scripted analysis. I see the basic query capability of one table from a Sql Server but I would like to create a query that joins to many tables. I anticipate the query to be a bit complex with many joins and perhaps data transformations.
Has anybody had experience or a solution to this situation?
I know I could take the query and make a table of it that SPSS can then connect to but my data changes daily and I would need a job in another application to refresh this table before my SPSS syntax would pull it and I would like to eliminate that first step by just having the query that grabs the data at the beginning of my syntax.
Ultimately I am looking to build out my SPSS syntax and schedule it in the Production Facility to run daily.
I'm running a complicated query against a Redshift cluster in which there are 4 tables used with some of them have billions of rows, and I get the following error:
failed to make a valid plan
If I limit the data, the query will run successfully.
-The Original query was an Oracle query which I've made some modifications on it, and data loaded in the tables in Redshift was also exported from Oracle.
-The query has a lot of JOINs and sub queries.
With those being said, going through the sub-queries one at a time, one of them didn't return any results, and that was the cause of this error in my case.
Fixing that particular sub-query and the main query accordingly, it worked successfully.
Let‘s say we have got a large number of SQL queries which take a long time to run. Now, we would like to make some changes to the database and re-execute the queries. We could rerun everything, but I would prefer a solution where only those queries are executed which are affected by the changes.
Do you know of any method to obtain the relevant tables/columns for each query? A simple example would be:
(let's consider this table: TABLE1 with columns: A;B;C)
SELECT C FROM TABLE1 WHERE B>10;
I would like to know that TABLE1.B is participating in this query.
Edit: the database is HSQLDB and is used from Java via JDBC.
Are you using any workbenck to execute your sql queries ? In Mysql workbench you have query optimizer option under which you can check which query has excuter and what actions has performed with the query result in a tree block diagram which certainly helps you here and you can also parse your query and check your resuls in query optimizer :) Hope it helps to you.
I am trying to execute a query against a MySQL database.
The query is fairly complex it has 5 inner joins, including 1 join to itself and
it returns 3 pieces of information from 2 different tables.
We are using hibernate and till now I have used it for simple queries only.
I have written the sql query and tested it too. I am wondering how to implement this using
hibernate, can I execute plain sql statements with hibernate? If so what do I need, a separate hbm.xml?
If I use hibernate and execute the plain sql query can I still utilize caching later on?
Yes, you can execute plain SQL queries with Hibernate.
No, you don't need a separate hbm.xml mapping file (unless you WANT to separate sql queries from the rest, in which case you can do so). You can map your named SQL query the same way you do with named HQL queries.
Whether you will be able to "utilize caching" depends on what exactly you understand by "caching" and how you're going to map your SQL query; it's impossible to answer without knowing more details.
All that said, you may not need to resort to SQL query; HQL is quite powerful and it may very well be possible (assuming appropriate mappings exist) to write your query as HQL. Can you post relevant mappings / schemas and your SQL query?
I strongly recommend criteria queries over HQL queries. They are much closer to your program code without sacrificing any expression power. They DO however depend on relations to be explicitly mapped, otherwise they get quite complicated.
To speed up development, set property hibernate.show_sql=true, and play with the system in the debugger, using the "reload modified class" and "drop stack frame" features of the IDE+jvm until the SQL emitted looks like the one you've posted.