AWS Redshift failed to make a valid plan when trying to run a complicated query - sql

I'm running a complicated query against a Redshift cluster in which there are 4 tables used with some of them have billions of rows, and I get the following error:
failed to make a valid plan
If I limit the data, the query will run successfully.

-The Original query was an Oracle query which I've made some modifications on it, and data loaded in the tables in Redshift was also exported from Oracle.
-The query has a lot of JOINs and sub queries.
With those being said, going through the sub-queries one at a time, one of them didn't return any results, and that was the cause of this error in my case.
Fixing that particular sub-query and the main query accordingly, it worked successfully.

Related

Duplicated execution plan for Query on the Azure SQL

I have a query and run it using SQL management studio. Usually, there is created one execution plan for a query in the studio. But sometimes I can catch up the duplicated execution plans for a single Query on the Azure SQL like below.
When I open the query from this plan I see the duplicated query. As if the copied query is pasted into the same query. The same in Query 1 and Query 2. See below.
Maybe someone knows why does this happen and how to avoid this behavior? How is that even possible?
P.S. Time of execution query was increased from 2 sec to 20 sec and more.
P.P.S. The warning in the Query 2
It could be that the queries were ran with different settings. I can notice that one has a warning and the other doesn't.
Reference:
https://blogs.msdn.microsoft.com/psssql/2014/04/03/i-think-i-am-getting-duplicate-query-plan-entries-in-sql-servers-procedure-cache/

How I can find sql query for execution plan?

Some programm generate and send queries to sql server(on high load production). I want take plan of concrete query of concrete table. I start profiler with "Showplan XML" and set filter on TextData(like %MyTable%) and DatabaseName. It show rows with xml in TextData that describe execution plans(for all queries of my table). But I know that exist 5 different sql queries for this table.
How I can match some concrete query with correspond plan without use statistic?
Is there a reason this has to be done on the production environment? Most really bad execution plans (missing indexes causing table scans etc.) will be obvious enough on a dev environment where you can use all the diagnostics you want.
Otherwise running the SQL on the query cache (as in the linked question someone else mentioned) will probably have the lowest impact as it just queries a system table rather than adding diagnostics to every query.

How does SSMS show partial results of a large resultset whilst the query is still running, and can equivalent behaviour be achieved in .NET?

In SQL Server Management Studio, when running a query that produces a very large resultset, it appears to sometimes display the results of the resultset as it's loading them, rather than them all appearing at once.
My normal assumption would be that it's simply it populating the grid(s) in SSMS with the results of the finished query, and that the SQL query itself is finished.
However, if I run the following query:
SELECT 1
SELECT * FROM EnormousTable
INSERT INTO SomeOtherTable([Column1]) SELECT 'Test3'
That last INSERT does not occur until after the results from the larger resultset have been fully returned.
I have two main questions:
1. What is happening here? Is SSMS breaking down the query into separate batches even without GO statements? Please note that I'm not a DBA, so if there's some fundamental reason for this behaviour that 'any DBA would know', there's a good chance I don't know it.
2. Is there a way to attain similar functionality in .NET? What I mean by this is, when running a set of queries that will produce multiple resultsets, whether or not it's possible to have a DataSet get populated with the results of each successive query as it finishes (without waiting for all the queries to finish), without me having to manually break down the query (unless that's what SSMS is actually doing under the hood).

Dealing with enormous datasets with Impala

I have a general question about Impala vs some traditional SQL database systems. I've heard that Impala can take certain SQL statements quite literally and spit out tables with billions of rows (such as what might happen with a join statement with duplicate rows). As a narrower example suppose I run something like "SELECT * FROM database" . As far as immediate console output is concerned, I understand that most traditional SQL databases will stop running when a limit of say, 1000 entries is reached. Is the same true of Impala? In other words, if I run "SELECT * FROM database" in Impala, is it in theory doing more work, even though it will ultimately spit out a limited number of rows?
I think it depends what you use to do the query. If you just run from the command line in Bash or the Impala shell it will fetch all the results, however if you use Hue, it will page through the results like you describe. Actually the same is true for any database, if you're using a GUI to access it, you can run something like an export to csv command to get the full result set, or if you're accessing programatically, you would use fetchall().

Getting the schema of the query output in Hive

I am using Apache Hive to create and execute some queries, but before the query is executed I need to report the structure of the resultset. The queries may involve joins and projections so it will be quite difficult to parse the query. The current solution that we are working on involves parsing the output of explain command but its quite complex itself.
My question is whether there is some simpler way by setting some properties in hive or some query parameters that don't select any data (the map/reduce tasks are not started) but creates a table that I can query via metastore to get the schema?
Unfortunately there is no simpler way other than using EXPLAIN or DESCRIBE commands to get query schema and table schema.
While not something you can do before the query is returned, if you enable
set hive.cli.print.header=true;
then it will show the schema right before the results.