Using a column in a CTE that does not exist in another table - sql

I am new to writing recursive queries and I am trying to write my first one. I am trying to determine levels of a corporate hierarchy and have the result table include two columns- the consultant ID and the level that has been determined by the recursive query. In the CTE, I can't figure out how to reference "level" since it is a column that doesn't yet exist in a table. What am I doing wrong?
WITH consultantsandlevels
(c."ConsultantDisplayID",
consultantsandlevels."level"
)
AS
(
SELECT c."ConsultantDisplayID",
0
FROM flight_export_consultant AS c
WHERE c."ParentPersonDisplayID" IS NULL
UNION all
SELECT c."ConsultantDisplayID",
c."ParentPersonDisplayID",
consultantsandlevels."level" + 1
FROM flight_export_consultant
JOIN consultantsandlevels ON c."ParentPersonDisplayID" = consultantsandlevels."ConsultantDisplayID"
)
SELECT *
FROM consultantsandlevels;

I think you want:
WITH RECURSIVE consultantsandlevels(ConsultantDisplayID, level) AS (
SELECT "ConsultantDisplayID", 0
FROM flight_export_consultant
WHERE "ParentPersonDisplayID" IS NULL
UNION ALL
SELECT fec."ConsultantDisplayID", cal.level + 1
FROM flight_export_consultant fec
INNER JOIN consultantsandlevels cal
ON fec."ParentPersonDisplayID" = cal.ConsultantDisplayID
)
SELECT * FROM consultantsandlevels;
Rationale:
The WITH clause must start with keyword RECURSIVE
The declaration of the comon-table-expression just enumerates the column names (no table prefix should appear there).
The level is initially set to 0, and you can then increment it at each iteration by refering to corresponding common-table-expression column.
Queries on both sides of UNION ALL must return the same count of columns (that correspond to the declaration of the cte) with corresponding datatypes.

Related

How to select the nth column, and order columns' selection in BigQuery

I have this huge table upon which I apply a lot of processing (using CTEs), and I want to perform a UNION ALL on 2 particular CTEs.
SELECT *
, 0 AS orders
, 0 AS revenue
, 0 AS units
FROM secondary_prep_cte WHERE purchase_event_flag IS FALSE
UNION ALL
SELECT *
FROM results_orders_and_revenues_cte
I get a "Column 1164 in UNION ALL has incompatible types : STRING,DATE at [97:5]
Obviously I don't know the name of the column, and I'd like to debug this but I feel like I'm going to waste a lot of time if I can't pin-point which column is 1164.
I also think this is a problem of the order of columns between the CTEs, so I have 2 questions:
How do I identify the 1164th column
How do I order my columns before performing the UNION ALL
I found this similar question but it is for MSSQL, I am using BigQuery
You can get information from INFORMATION_SCHEMA.COLUMNS but you'll need to create a table or view from the CTE:
CREATE OR REPLACE VIEW `project.dataset.secondary_prep_view` as select * from (select 1 as id, "a" as name, "b" as value)
Then:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'secondary_prep_view';

Array field in postgres, need to do self-join with results

I have a table that looks like this:
stuff
id integer
content text
score double
children[] (an array of id's from this same table)
I'd like to run a query that selects all the children for a given id, and then right away gets the full row for all these children, sorted by score.
Any suggestions on the best way to do this? I've looked into WITH RECURSIVE but I'm not sure that's workable. Tried posting at postgresql SE with no luck.
The following query will find all rows corresponding to the children of the object with id 14:
SELECT *
FROM unnest((SELECT children FROM stuff WHERE id=14)) t(id)
JOIN stuff USING (id)
ORDER BY score;
This works by finding the children of 14 as array first, then we convert it into a table using the unnest function, and then we join with stuff to find all rows with the given ids.
The ANY construct in the join condition would be simplest:
SELECT c.*
FROM stuff p
JOIN stuff c ON id = ANY (p.children)
WHERE p.id = 14
ORDER BY c.score;
Doesn't matter for the query whether the array of children IDs is in the same table or different one. You just need table aliases here to be unambiguous.
Related:
Check if value exists in Postgres array
Similar solution:
With Postgres you can use a recursive common table expression:
with recursive rel_tree as (
select rel_id, rel_name, rel_parent, 1 as level, array[rel_id] as path_info
from relations
where rel_parent is null
union all
select c.rel_id, rpad(' ', p.level * 2) || c.rel_name, c.rel_parent, p.level + 1, p.path_info||c.rel_id
from relations c
join rel_tree p on c.rel_parent = p.rel_id
)
select rel_id, rel_name
from rel_tree
order by path_info;
Ref: Postgresql query for getting n-level parent-child relation stored in a single table

Compare Items in the "IN" Clause and the resultset

I'd like to achieve something as follows, I have the following query (As simple as this),
SELECT ENT_ID,TP_ID FROM TC_LOGS WHERE ENT_ID IN (1,2,3,4,5).
Now the table TC_LOGS may not have all the items in the IN clause. So assuming that the table TC_LOGS has only 1,2. I'd like to compare the items in the IN clause i.e. 1,2,3,4,5 with 1,2(the resultset) and get a result as FOUND - 1,2 NOT FOUND - 3,4,5. I've have implemented this by applying an XSL transformation on the resultset in the application code, but I'd like to achieve this in a query, which I feel is more of an elegant solution to this problem. Also, I tried the following query with NVL, just to separate out the FOUND and NOT FOUND items as,
SELECT NVL(ENT_ID,"NOT FOUND") FROM TC_LOGS WHERE ENT_ID IN(1,2,3,4,5)
I was expecting a result as 1,2,NOT FOUND,NOT FOUND,NOT FOUND
But the above query doesn't return any result.. I'd appreciate if someone can guide me in the right path here.. Thanks much in advance.
Assuming that the items in your IN list can (or can come) from another query, you can do something like
WITH src AS (
SELECT level id
FROM dual
CONNECT BY level <= 5)
SELECT nvl(ent_id, 'Not Found' )
FROM src
LEFT OUTER JOIN tc_logs ON (src.id = tc_logs.ent_id)
In my case, the src query is just generating the numbers 1 through 5. You could just as easily fetch that data from a different table, load the numbers into a collection that you query using the TABLE operator, load the numbers into a temporary table that you query, etc. depending on how the IN list data is determined.
NVL isn't going to work because no values (including NULLS) are returned when there is no match with the IN statement.
What you can do is something like this:
SELECT NVL(ENT_ID, "NOT FOUND")
FROM TC_LOGS
RIGHT OUTER JOIN (
SELECT 1 AS 'TempID' UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4 UNION
SELECT 5) AS Sub ON ENT_ID = TempID
The outer join will return NULLS for ENT_ID where there are no matches. Note, I'm not an Oracle person so I can't guarantee that this syntax is perfect.
if you have a table (let's use table src )contains all (1,2,3,4,5) values, you can use full join.
You can use (WITH src AS ( SELECT level id FROM dual CONNECT BY level <= 5) as the src table also)
SELECT
ent_id,tl.tp_id,src.tp_id
FROM
src
FULL JOIN
tc_logs tl
USING (ent_id)
ORDER BY
ent_id
Here is the web site for oracle full join.http://psoug.org/snippet/Oracle-PL-SQL-ANSI-Joins-FULL-JOIN_738.htm

SQL left outer join - same named fields as rows instead of columns

I have 14 tables that describe a hierarchical XML data structure. One field is a key for the row and another field is a foreign key to its parent. The remaining 60 fields are the same in each table.
I would like to get one large serial list of rows instead of horizontally adding the columns.
For example the following is set up to give me 6 columns. How can I serialize it into 3 columns and more rows?
SELECT ICN2.ICNID, ICN2.ICNTITLE, ICN2.PANE, ICN3.ICNID AS ICNID2,
ICN3.ICNTITLE AS ICNTITLE2, ICN3.PANE AS PANE2
FROM ICN2
LEFT OUTER JOIN ICN3 ON ICN2.ICN2_PKEY = ICN3.ICN2_FKEY
WHERE ICN2.ICNID = N'65587'
Use an SQL UNION to create a single (aliased) result set, then apply the condition:
SELECT * FROM (
SELECT ICNID, ICNTITLE, PANE
FROM ICN2
UNION ALL
SELECT ICNID, ICNTITLE, PANE
FROM ICN3
UNION ALL
SELECT ICNID, ICNTITLE, PANE
FROM ICN4
... ) A
WHERE ICNID = N'65587'
This is the syntax for mysql and postgres, but the exact syntax may vary depending on which database you are using.
If you need to know which table the rows came from, add a constant to each row:
SELECT * FROM (
SELECT 'ICN2' as SOURCE, ICNID, ICNTITLE, PANE
FROM ICN2
UNION ALL
SELECT 'ICN3', ICNID, ICNTITLE, PANE
FROM ICN3
UNION ALL
... ) A
WHERE ICNID = N'65587'
BTW UNION ALL preserves ordering and duplicate rows. Plain UNION sorts and removes duplicate rows. Chose which ever suits you best

optimising/simplifying cursor sql

i've got the below code, and it operates just fine, only it takes a couple of seconds to calculate the answer - i was wondering whether there is a quicker/neater way of writing this code - and if so, what am i doing wrong?
thanks
select case when
(select LSCCert from matterdatadef where ptmatter=$Matter$) is not null then
(
(select case when
(SELECT top 1 dbo.matterdatadef.ptmatter
From dbo.workinprogress, dbo.MatterDataDef
where ptclient=(
select top 1 dbo.workinprogress.ptclient
from dbo.workinprogress
where dbo.workinprogress.ptmatter = $matter$)
and dbo.matterdatadef.LSCCert=(
select top 1 dbo.matterdatadef.LSCCert
from dbo.matterdatadef
where dbo.matterdatadef.ptmatter = $matter$)
)=ptMatter then (
SELECT isnull((DateAdd(mm, 6, (
select top 1 Date
from OfficeClientLedger
where (pttrans=3)
and ptmatter=$matter$
order by date desc))),
(DateAdd(mm, 3, (
SELECT DateAdd
FROM LAMatter
WHERE ptMatter = $Matter$)))
)
)
end
from lamatter
where ptmatter=$matter$)
)
end
It looks like this your sql was generated from a reporting tool. The problem is you are executing the SELECT top 1 dbo.matterdatadef.ptmatter... query for every row of table lamatter. Further slowing execution, within that query you are recalculating comparison values for both ptclient and LSCCert - values that aren't going to change during execution.
Better to use proper joins and craft the query to execute each part only once by avoiding correlated subqueries (queries that reference values in joined tables and must be executed for every row of that table). Calculated values are OK, as long as they are calculated only once - ie from within the final where clause.
Here is a trivial example to demonstrate a correlated subquery:
Bad sql:
select a, b from table1
where a = (select c from table2 where d = b)
Here the sub-select is run for every row, which will be slow, especially without an index on table2(d)
Better sql:
select a, b from table1, table2
where a = c and d = a
Here the database will scan each table at most once, which will be fast