postgres recursive query on the same table

postgres recursive query on the same table - sql

i spent almost a day on it now and it seems like i am doing something wrong.
ok , here is the relation:
document_urls( doc_id , url_id)
what i want to do is to build a sorte of graph that will show all the children that has been generated from a document through on of his urls.
example
select * from document_urls where doc_id=1
doc_id url_id
1 2
1 3
if i select all the document with url_id=3 or 2 i will find
select * from document_urls where url_id=2 or url_id=3
doc_id url_id
1 2
1 3
2 3
now i do the same exercise with document 2 since we covered all links of document 1 and so forth.
here is my recursive query now
WITH RECURSIVE generate_links(document_id,url_id) as(
select document_id,url_id from document_urls where document_id=1
UNION ALL
select du.document_id,du.url_id from generate_links gl,document_urls du
where gl.url_id=du.url_id
)
SELECT * FROM generate_links GROUP BY url_id,document_id limit 10;

I take it you want to move your where document_id=1 into the lower part of the query.
Be wary about doing so, however, because a recursive query does not inject the constraint into the with statement. In other words, it'll actually seq scan your whole table, recursively build every possibility and filter out those you need.
You'll be better off with an sql function in practice, i.e. something like this:
create or replace function gen_links(int) returns table (doc_id int, doc_url text) as $$
WITH RECURSIVE generate_links(document_id,url_id) as(
select document_id,url_id from document_urls where document_id=$1
UNION ALL
select du.document_id,du.url_id from generate_links gl,document_urls du
where gl.url_id=du.url_id
)
SELECT * FROM generate_links GROUP BY url_id,document_id;
$$ language sql stable;

Related

How to select the nth column, and order columns' selection in BigQuery

I have this huge table upon which I apply a lot of processing (using CTEs), and I want to perform a UNION ALL on 2 particular CTEs.
SELECT *
, 0 AS orders
, 0 AS revenue
, 0 AS units
FROM secondary_prep_cte WHERE purchase_event_flag IS FALSE
UNION ALL
SELECT *
FROM results_orders_and_revenues_cte
I get a "Column 1164 in UNION ALL has incompatible types : STRING,DATE at [97:5]
Obviously I don't know the name of the column, and I'd like to debug this but I feel like I'm going to waste a lot of time if I can't pin-point which column is 1164.
I also think this is a problem of the order of columns between the CTEs, so I have 2 questions:
How do I identify the 1164th column
How do I order my columns before performing the UNION ALL
I found this similar question but it is for MSSQL, I am using BigQuery

You can get information from INFORMATION_SCHEMA.COLUMNS but you'll need to create a table or view from the CTE:
CREATE OR REPLACE VIEW `project.dataset.secondary_prep_view` as select * from (select 1 as id, "a" as name, "b" as value)
Then:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'secondary_prep_view';

How to get get Records based on multiple columns from a table

Consider the following table.
From the above table I want to select the Middle BFS_SCORE per LN_LOAN_ID and BR_ID. There are some LN_LOAN_ID with single score.
As an example for the above table the output I need is as below.
Please let me know how this can be achieved.

To handle cases where there are two scores for unique pair of LN_LOAD_ID, BR_ID you need a median, as there is no middle value for BFS_SCORE.
Postgres solution:
Create a median aggregate function following Postgres wiki:
CREATE OR REPLACE FUNCTION _final_median(NUMERIC[])
RETURNS NUMERIC AS
$$
SELECT AVG(val)
FROM (
SELECT val
FROM unnest($1) val
ORDER BY 1
LIMIT 2 - MOD(array_upper($1, 1), 2)
OFFSET CEIL(array_upper($1, 1) / 2.0) - 1
) sub;
$$
LANGUAGE 'sql' IMMUTABLE;
CREATE AGGREGATE median(NUMERIC) (
SFUNC=array_append,
STYPE=NUMERIC[],
FINALFUNC=_final_median,
INITCOND='{}'
);
Then your query would look as simple as this:
select
ln_load_id,
median(bfs_score) as bfs_score
br_id
from yourtable
But the tricky part comes with score_order. If there are two pairs and you actually really need a median, not the middle value - then there will be no row for your calculated score, so it will be null. Other than that, join back to your table to retrieve it for the "middle" column:
select
t1.ln_load_id, t1.bfs_score, t1.br_id, t2.score_order
from (
select
ln_load_id,
median(bfs_score) as bfs_score
br_id
from yourtable
) t1
left join yourtable t2 on
t1.ln_load_id = t2.ln_load_id
and t1.br_id = t2.br_id
and t1.bfs_score = t2.bfs_score

Finding max value of multiple columns from multiple tables to update Sequence

I had a problem where the DBAs needed to recreate my sequence (had to create with "NO CACHE". Unfortunately, he dropped the sequence before grabbing the current value! The problem is, from what I can tell, there are almost 25 tables that use this sequence. My plan was to try to find the max value of each of the Primary Key "ID" fields, then run a sequence loop to get the sequence back up.
What I'm hoping to do now, is clean up my "ugly" process for a more streamlined process that I can put in my documentation (in the event this occurs again!).
My original solution was do something like the following:
SELECT 'TABLE_1','TABLE_1_ID', MAX(TABLE_1_ID) from TABLE_1
UNION ALL
SELECT 'TABLE_2','TABLE_2_ID', MAX(TABLE_2_ID) from TABLE_2
UNION ALL
SELECT 'TABLE_3','TABLE_3_ID', MAX(TABLE_3_ID) from TABLE_3
UNION ALL
...... (continue select statements for other 20+ tables)
SELECT 'TABLE_25','TABLE_25_ID', MAX(TABLE_25_ID) from TABLE_25
ORDER BY 2 DESC;
This shows works, but putting the table with the highest "MAX" at the top; but to clean it up I'd like to:
1. Simplify the query (an eliminate the UNION ALL) if possible
2. I'd really like to just run the query that returns a single row..
This would be 'gravy', but I have a loop that will run through the next val of the sequence; that loop starts off with:
declare
COL_MaxVal pls_integer;
SEQ_Currval pls_integer default -1;
BEGIN
SELECT MAX(TABLE_X_ID) INTO COL_MaxVal
FROM TABLE_X
while SEQ_Currval < COL_MaxVal
loop
select My_Sequence_SEQ.nexval into SEQ_Currval
from dual;
end loop;
end
If possible, I'd really like to just run the loop script which would discover which table/column has the highest max value, then use that table in the loop to increment the sequence to that max value.
Appreciate any help on this.

Here is solution returning one row:
WITH all_data as
(
SELECT 'TABLE_1','TABLE_1_ID', MAX(TABLE_1_ID) as id from TABLE_1
UNION ALL
SELECT 'TABLE_2','TABLE_2_ID', MAX(TABLE_2_ID) from TABLE_2
UNION ALL
SELECT 'TABLE_3','TABLE_3_ID', MAX(TABLE_3_ID) from TABLE_3
UNION ALL
...... (continue select statements for other 20+ tables)
SELECT 'TABLE_25','TABLE_25_ID', MAX(TABLE_25_ID) from TABLE_25
),
max_id as
(
SELECT max(id) as id FROM all_data
)
SELECT
ad.*
FROM
all_data ad
JOIN max_id mi ON (ad.id = mi.id)
I can not see any simpler solution for this...

If it's not too late then dba might try flashback query against dictionary. E.g.
SELECT * FROM dba_sequences AS OF TIMESTAMP systimestamp - 1/24;
Your safe value should be last_number+cache size. See details in:
LAST_NUMBER on oracle sequence

PostgreSQL: Serially apply table-valued function to a set of values and UNION ALL results

I have a table-valued PL/pgsql function that takes as 1 input an integer, an ID. The table that is returned has fixed columns (say 5) but varying number of rows.
There is a large table of these unique IDs. I'd like to apply this function to each ID and UNION ALL the results.
Looking online I keep seeing CROSS APPLY as the solution, but it does not appear to be available in PostgreSQL. How can I do this "apply" operation?
One trivial solution is to re-write the table-valued function with an additional outer loop. But is there a way to do this directly in SQL?

I think it's impossible to do in current version of PostgreSQL (9.2). In 9.3 there would be LATERAL join which does exactly what you want.
You can, however, apply function returning set of simple values:
select id, func(id) as f from tbl1
sql fiddle demo

SQL Fiddle
create table t (id int);
insert into t (id) select generate_series(1, 10);
create or replace function f (i integer)
returns table(id_2 integer, id_3 integer) as $$
select id * 2 as id_2, id * 3 as id_3
from t
where id between i - 1 and i + 1
$$ language sql;
select id, (f(id)).*
from t;

How to give the output of the first query(which has two values) as the input to the second?

i get 2 names as the output of the first query....
eg: paul,peter
now this should be the input for the second query,
which has to display paul's and peter's email ids....

For nested queries I would strongly recommend WITH clause. It makes long complex queries order of magnitude easier to understand / construct / modify:
WITH
w_users AS( -- you can name it whatever you want
SELECT id
FROM users
WHERE < long condition here >
),
w_other_subquery AS(
...
)
SELECT email_id
FROM ...
WHERE user_id IN (SELECT id FROM w_users)

You can use like this
LIKE
SELECT USER_ID,EMAIL_ID FROM USERS where user_id IN
(SELECT PRODUCT_MEMBERS FROM PRODUCT WHERE PRODUCT_NAME='ICP/RAA');
Just use the IN clause '=' is used for matching one result

You can use In Command to get result
ex:
SELECT email FROM tableName WHERE (Name IN ('paul', 'peter'))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

postgres recursive query on the same table - sql

Related

How to select the nth column, and order columns' selection in BigQuery

How to get get Records based on multiple columns from a table

Finding max value of multiple columns from multiple tables to update Sequence

PostgreSQL: Serially apply table-valued function to a set of values and UNION ALL results

How to give the output of the first query(which has two values) as the input to the second?

Categories

Resources