Redshift: Simple query is leading to nested loop join - sql

I am using a query to fetch the number of rows deleted for a given queryid:
select stl_delete.query,
listagg(distinct svv_table_info.table,',')
from stl_delete
join svv_table_info on svv_table_info.table_id=stl_delete.tbl
where stl_delete.query=1090750
group by stl_delete.query
The result seems correct.
When I run:
select event,solution from stl_alert_event_log where query = pg_last_query_id();
event solution
================================== ======================================================
Nested Loop Join in the query plan Review the join predicates to avoid Cartesian products
Firstly, why is there nested loop?
How do I fix the nested loop join here? Going through the internet, solution is the join predicate which is present in the query.
Even if I remove the listaggr and group by, I still see the issue:
select stl_delete.query,
svv_table_info.table
from stl_delete
join svv_table_info on svv_table_info.table_id=stl_delete.tbl
where stl_delete.query=1090750

The system view svv_table_info is complex and gather a lot of information about tables most of which you are not using. The loop join is inside this view and is needed to produce the in-depth table report.
Your query just needs the name of the table based on tableid. There is a system table that holds this information and will run quicker and not produce a loop join. pg_class has the tableid in a column called oid and the table name in relname. (FYI if you select * from pg_class oid won't show up, you need to specify it by name)
Or you can just live with the alert. This loop join isn't very big in Redshift terms.

Related

Best way to "SELECT *" from multiple tabs in sql without duplicate?

I am trying to retrieve every data stored in 2 tabs from my database through a SELECT statement.
The problem is there are a lot of columns in each tab and manually selecting each column would be a pain in the ass.
So naturally I thought about using a join :
select * from equipment
join data
on equipment.id = data.equipmentId
The problem is I am getting the equipment ID 2 times in the result.
I thought that maybe some specific join could help me filter out the duplicate key, but I can't manage to find a way...
Is there any way to filter out the foreign key or is there a better way to do the whole thing (I would rather not have to post process the data to manually remove those duplicate columns)?
You can use USING clause.
"The USING clause specifies which columns to test for equality when
two tables are joined. It can be used instead of an ON clause in the
JOIN operations that have an explicit join clause."
select *
from test
join test2 using(id)
Here is a demo
You can also use NATURAL JOIN
select *
from test
natural join test2;

SQL SRVR 2016: Trouble joining to a nested select statement

I'm working in a query window in SSMS.
Using 3 tables:
WORK_ORDER wo
An order to fabricate a part
OPERATION op
An operation in the fabrication of the part (laser, grinding, plating, etc.)
PART pt
A unique record defining the part
My objective is to report on the status of an operation (say #3) (#total parts ordered, #completed parts), but additionally to include the number of parts that have completed the previous operation (#2) in the sequence and are ready for the process. My solution was to use the LAG function, which works perfectly when the nested select statement below is run independently, but I get an avg of 4X duplication in my results, and my Completed_QTY_PREV_OP column is not displayed. I am aware that's because it's not in the parent select statement, but I wanted to correct the join first. I'm guessing the two problems are related.
Footnote: The WHERE contains a filter that you can ignore. The parent select statement works perfectly without the joined subquery.
Here's my sql:
SELECT op.RESOURCE_ID, pt.USER_5 AS PRODUCT, wo.PART_ID, wo.TYPE, wo.BASE_ID,
wo.LOT_ID, wo.SPLIT_ID, wo.SUB_ID, op.SEQUENCE_NO, pt.DESCRIPTION,
wo.DESIRED_QTY, op.FULFILLED_QTY AS QTY_COMP, op.SERVICE_ID, op.DISPATCHED_QTY, wo.STATUS
FROM dbo.WORK_ORDER wo INNER JOIN
dbo.OPERATION op ON wo.TYPE = op.WORKORDER_TYPE
AND wo.BASE_ID = op.WORKORDER_BASE_ID
AND wo.LOT_ID = op.WORKORDER_LOT_ID
AND wo.SPLIT_ID = op.WORKORDER_SPLIT_ID
AND wo.SUB_ID = op.WORKORDER_SUB_ID INNER JOIN
dbo.PART pt ON wo.PART_ID = pt.ID
LEFT OUTER JOIN
--The nested select statement works by itself in a query window,
--but the JOIN throws an error.
(SELECT
pr.WORKORDER_TYPE, pr.WORKORDER_BASE_ID, pr.WORKORDER_LOT_ID,
pr.WORKORDER_SPLIT_ID, pr.WORKORDER_SUB_ID, pr.SEQUENCE_NO,
LAG (COMPLETED_QTY, 1) OVER (ORDER BY pr.WORKORDER_TYPE, pr.WORKORDER_BASE_ID,
pr.WORKORDER_LOT_ID, pr.WORKORDER_SPLIT_ID, pr.WORKORDER_SUB_ID, pr.SEQUENCE_NO) AS COMP_QTY_PREV_OP
FROM dbo.OPERATION AS pr) AS prev
--End of nested select
ON
op.WORKORDER_TYPE = prev.WORKORDER_TYPE AND
op.WORKORDER_BASE_ID = prev.WORKORDER_BASE_ID AND
op.WORKORDER_LOT_ID = prev.WORKORDER_LOT_ID AND
op.WORKORDER_SPLIT_ID = prev.WORKORDER_SPLIT_ID AND
op.WORKORDER_SUB_ID = prev.WORKORDER_SUB_ID
WHERE (NOT (op.SERVICE_ID IS NULL)) AND (wo.STATUS = N'R')
You haven't given enough information for a definitive answer, so instead I will give you an approach to debugging this.
You are getting unexpected rows as a result of a JOIN. This means that your JOIN condition is not matching the two sides of the JOIN on a one-to-one basis. There are multiple rows in the table being JOINed that meet the JOIN conditions.
To find these rows, temporarily change your SELECT list to SELECT *. Do this both in the outer SELECT, and in the derived table. Look through the columns being returned by the JOINed table, and find the values that you didn't expect to be returned.
Since the JOIN that causes the issue is the last one, they will be all the way to right of the result of a SELECT *.
Then add more conditions to the JOIN to eliminate the unwanted rows from the results.
I simplified the whole query by first creating a temp table filled by the previously nested SELECT, and then joining to it from the parent SELECT.
Works perfectly now. Thanks for looking.
PS: I apologize for the confusion about an error message. I noticed after I posted that I had an old comment in the code regarding an error. The error had been resolved before posting, but I neglected to remove the comment.

Translating query from Firebird to PostgreSQL

I have a Firebird query which I should rewrite into PostgreSQL code.
SELECT TRIM(RL.RDB$RELATION_NAME), TRIM(FR.RDB$FIELD_NAME), FS.RDB$FIELD_TYPE
FROM RDB$RELATIONS RL
LEFT OUTER JOIN RDB$RELATION_FIELDS FR ON FR.RDB$RELATION_NAME = RL.RDB$RELATION_NAME
LEFT OUTER JOIN RDB$FIELDS FS ON FS.RDB$FIELD_NAME = FR.RDB$FIELD_SOURCE
WHERE (RL.RDB$VIEW_BLR IS NULL)
ORDER BY RL.RDB$RELATION_NAME, FR.RDB$FIELD_NAME
I understand SQL, but have no idea, how to work with this system tables like RDB$RELATIONS etc. It would be really great if someone helped me with this, but even some links with this tables explanation will be OK.
This piece of query is in C++ code, and when I'm trying to do this :
pqxx::connection conn(serverAddress.str());
pqxx::work trans(conn);
pqxx::result res(trans.exec(/*there is this SQL query*/));//and there is a mistake
it writes that:
RDB$RELATIONS doesn't exist.
Postgres has another way of storing information about system content. This is called System Catalogs.
In Firebird your query basically returns a row for every column of a table in every schema with an additional Integer column that maps to a field datatype.
In Postgres using system tables in pg_catalog schema something similar can be achieved using this query:
SELECT
TRIM(c.relname) AS table_name, TRIM(a.attname) AS column_name, a.atttypid AS field_type
FROM pg_class c
LEFT JOIN pg_attribute a ON
c.oid = a.attrelid
AND a.attnum > 0 -- only ordinary columns, without system ones
WHERE c.relkind = 'r' -- only tables
ORDER BY 1,2
Above query does return system catalogs as well. If you'd like to exclude them you need to add another JOIN to pg_namespace and a where clause with pg_namespace.nspname <> 'pg_catalog', because this is the schema where system catalogs are stored.
If you'd also like to see datatype names instead of their representative numbers add a JOIN to pg_type.
Information schema consists of collection of views. In most cases you don't need the entire SQL query that stands behind the view, so using system tables will give you better performance. You can inspect views definition though, just to get you started on the tables and conditions used to form an output.
I think you are looking for the information_schema.
The tables are listed here: https://www.postgresql.org/docs/current/static/information-schema.html
So for example you can use:
select * from information_schema.tables;
select * from information_schema.columns;

Need Input | SQL Dynamic Query

Have a requirement where I need to build a dynamic query based on user input and send the count of records from result set.
So there are 6 tables which I needs to make a join Inner for sure and rest table join will be based on user input and this should be performance oriented.
Here is the requirement
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
Now if user select some value in UI , then have to execute query as
select count(A.A1) from table A
INNER JOIN table B on B.B1=A.A1
INNER JOIN table B on C.C1=B.B1
INNER JOIN table D on D.D1=C.C1
INNER JOIN table E on E.E1=D.D1
INNER JOIN table F on F.F1=E.E1
INNER JOIN table B on G.G1=F.F1
Where G.Name like '%Germany%'
User can send 1- 5 choices and have to build the query and accordingly and send the result set
So if I add all the joins first and then add where clause as per the choice , then query will be easy and serve the purpose, but if user did not select any query then I am creating unnecessary join for the user choices.
So which will be better way to write having all the joins in advance and then filtering it or on demand join and with filters using dynamic query.
Could be great if someone can provide valuable inputs.
When SQL Server executes a query, there is a first step which is planning the query, i.e. deciding an strategy to get the query result.
If you use "inner joins" you're making it compulsory to include all the tables, becasuse "inner join" means that there must be matching rows on both tables of the join, so the query planner can't dicard any tables.
However, if you change the inner joins by left outer joins, it's not compulsory that there are matching rows on both sides of the join, so the query planner can decide if it includes or not the tables on the right. So, if you use left outer joins, and you don't select, or filter, or do any operation on fields on the right side of the joins, the query planner can discard then when executing the query. That's the easiest way to get rid of your concerns.
On the other hand, if you want to control what tables to inclued or not to include, and create a custom query for each case, you can use several techniques:
making a graph that includes the definition of the table relations, and using some graph manipulation library that allows you to get the necessary tables from the graph.I did this one, but is quite hard to achieve if you don't have experience with graps.
using Entity Framework. You must build a simple model including all the tables. And then, to run each query, you can programmatically build the query in LINQ, and EF will take care to generate and execute the SQL query for you.

SQL Repetition, how to avoid in query?

I'm trying to get the names and the addresses that are stored in the table but getting data repetition. I dont know how to avoid it as im a newbie to this field. here's some pictures of the commands and results.
Commands:
Results:
Owner Table:
Addresses Table:
Please help me out :(
P.S. They are all dummy data.
You need to do a join between the two tables owners and addresses using column in tables that referenece each other.
SELECT firstname,lastname,addressline_1
FROM owners o
JOIN addresses a
ON o.colName=a.colName
Your query is performing cartesian product between the two tables which is giving all rows for table address for each row in table owners.
You would have avoided getting meaningless rows had you used recommended ANSI SQL syntax of performing join using ON clause rather than WHERE clause . Although you haven't specified condition for joining between the tables still old syntax of joining using WHERE clause got executed successfully but would have thrown error in case of using ON clause.
See this thread for detailed discussion ON vs WHERE
EDIT
As per your schema of tables the query would be
SELECT firstname,lastname,addressline_1
FROM owners o
JOIN addresses a
ON o.ownerid=a.owners_ownerid
you have to add where clause to your query:-
SELECT OWNERS.first_name, OWNERS.last_name, ADDRESSES.address_line1
FROM OWNERS, ADDRESSES
WHERE OWNERS.ownerid = ADDRESSES.owners_ownerid
or you can use join
SELECT OWNERS.first_name, OWNERS.last_name, ADDRESSES.address_line1
FROM OWNERS
JOIN ADDRESSES
ON OWNERS.ownerid = ADDRESSES.owners_ownerid