Aggregate multiple select statements without replicating data - sql

How do I aggregate 2 select clauses without replicating data.
For instance, suppose I have tab_a that contains the data from 1 to 10:
|id|
|1 |
|2 |
|3 |
|. |
|. |
|10|
And then, I want to generate the combination of tab_b and tab_c making sure that result has 10 lines and add the column of tab_a to the result tuple
Script:
SELECT tab_b.id, tab_c.id, tab_a.id
from tab_b, tab_c, tab_a;
However this is replicating data from tab_a for each combination of tab_b and tab_c, I only want to add and would that for each combination of tab_b x tab_c I add a row of tab_a.
Example of data from tab_b
|id|
|1 |
|2 |
Example of data from tab_c
|id|
|1 |
|2 |
|3 |
|4 |
|5 |
I would like to get this output:
|tab_b.id|tab_c.id|tab_a.id|
|1 |1 |1 |
|2 |1 |2 |
|1 |2 |3 |
|... |... |... |
|2 |5 |10 |

Your question includes an unstated, invalid assumption: that the position of the values in the table (the row number) is meaningful in SQL. It's not. In SQL, rows have no order. All joins -- everything, in fact -- are based on values. To join tables, you have to supply the values the DBMS should use to determine which rows go together.
You got a hint of that with your attempted join: from tab_b, tab_c, tab_a. You didn't supply any basis for joining the rows, which in SQL means there's no restriction: all rows are "the same" for the purpose of this join. They all match, and voila, you get them all!
To do what you want, redesign your tables with at least one more column: the key that serves to identify the value. It could be a number; for example, your source data might be an array. More commonly each value has a name of some kind.
Once you have tables with keys, I think you'll find the join easier to write and understand.

Perhaps you're new to SQL, but this is generally not the way things are done with RDBMSs. Anyway, if this is what you need, PostgreSQL can deal with it nicely, using different strategies:
Window Functions:
with
tab_a (id) as (select generate_series(1,10)),
tab_b (id) as (select generate_series(1,2)),
tab_c (id) as (select generate_series(1,5))
select tab_b_id, tab_c_id, tab_a.id
from (select *, row_number() over () from tab_a) as tab_a
left join (
select tab_b.id as tab_b_id, tab_c.id as tab_c_id, row_number() over ()
from tab_b, tab_c
order by 2, 1
) tabs_b_c ON (tabs_b_c.row_number = tab_a.row_number)
order by tab_a.id;
Arrays:
with
tab_a (id) as (select generate_series(1,10)),
tab_b (id) as (select generate_series(1,2)),
tab_c (id) as (select generate_series(1,5))
select bc[s][1], bc[s][2], a[s]
from (
select array(
select id
from tab_a
order by 1
) a,
array(
select array[tab_b.id, tab_c.id]
from tab_b, tab_c
order by tab_c.id, tab_b.id
) bc
) arr
join lateral generate_subscripts(arr.a, 1) s on true

If i understand your question correctly maybe this is what you are looking for ..
SELECT bctable.b_id, bctable.c_id, atable.a_id
FROM (SELECT a_id, ROW_NUMBER () OVER () AS arnum FROM a) atable
JOIN (SELECT p.b_id, p.c_id, ROW_NUMBER () OVER () AS bcrnum
FROM ( SELECT b.b_id, c.c_id
FROM b CROSS JOIN c
ORDER BY c.c_id, b.b_id) p) bctable
ON atable.arnum = bctable.bcrnum
Please check the SQLFiddle .

Related

Concise SQL for joining many tables on a single column

I have about 122 tables that all share a particular column. Is there an elegant/concise method to join all of these tables on that column without having 121 instances of
join on A.id = B.id
in the query?
If the column in question has the same name in both tables (which it should) then you can use this shorter syntax:
SELECT ... FROM table1 JOIN table2 USING (column)
The column will also appear only once in the result, instead of being present for each table. More details here.
You will still have to do it for each table, though.
Here goes your solution:
Create table and insert statement:
create table splitUpdate (no int,productname varchar(10),productcrossell varchar(20));
insert into splitUpdate values (1,'a','a(1)');
insert into splitUpdate values (2,null,'c(4),d(5)');
insert into splitUpdate values (3,null,'Z(1),b(2)');
create table eleminate (product varchar(20));
insert into eleminate values('x');
insert into eleminate values('y');
insert into eleminate values('Z');
insert into eleminate values('z');
Update Query:
with cte as (
select no,productname,p.product,row_number()over(partition by no)rn ,substring(p.product from 1 for position('(' in p.product )-1) SplittedProduct
from splitupdate t, unnest(string_to_array(t.productcrossell ,','))p(product)
where substring(p.product from 1 for position('(' in p.product )-1) not in (select product from eleminate))
update splitupdate set productname=splittedproduct
from cte
where splitupdate.productname is null and splitupdate.no=cte.no and cte.rn=1
SplitUpdate Table before updating:
|no|productname|productcrossell|
|1 |a |a(1) |
|2 |c |c(4),d(5) |
|3 |b |Z(1),b(2) |
Result:
|no|productname|productcrossell|
|1 |a |a(1) |
|2 |c |c(4),d(5) |
|3 |b |Z(1),b(2) |

Iterating over groups in table

I have the following data:
cte1
===========================
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
I have to perform the following operation:
Extract p_id by the following algorithm
For every row with same m_ids
In each group:
2.I. Group records by p_id
2.II. Order desc records by level
2.III. Select p_id with exact count as the m_ids length and with the biggest level
So far I fail to write this algorithm completely, but I wrote (probably wrong where I'm getting array_length) this for the last part of it:
SELECT id
FROM grouped_cte1
GROUP BY id,
level
HAVING Count(*) = array_length(grouped_cte1.m_ids, 1)
ORDER BY level DESC
LIMIT 1
where grouped_cte1 for m_ids={123} is
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
and for m_ids={432,222} is
m_ids |p_id |level
---------|-----------|-----
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
etc.
2) Combine query from p.1 with the following. The following extracts p_id with level=1 for each m_ids:
select m_ids, p_id from cte1 where level=1 --also selecting m_ids for joining later`
which results in the following:
m_ids |p_id
---------|----
{123} |98
{432,222}|215
Desirable result:
m_ids |result_1 |result_2
---------|-----------|--------
{123} |111 |98
{432,222}|240 |215
So could anyone please help me solve the first part of algorithm and (optionally) combine it in a single query with the second part?
EDIT: So far I fail at:
1. Breaking the presented table into subtables by m_ids while iterating over it.
2. Performing computation of array_length(grouped_cte1.m_ids, 1) for corresponding rows in query.
For the first part of the query you're on the right track, but you need to change the grouping logic and then join again to the table to filter it out by highest level per m_ids for which you could use DISTINCT ON clause combined with proper sorting:
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC;
This would give you:
m_ids | p_id | level
-----------+------+-------
{123} | 111 | 2
{432,222} | 240 | 2
And then when combined with second query (using FULL JOIN for displaying purposes, when the first query is missing such conditions) which I modified by adding distinct since there can be (and in fact is) more than one record for m_ids, p_id pair with first level it would look like:
select
coalesce(r1.m_ids, r2.m_ids) as m_ids,
r1.p_id AS result_1,
r2.p_id AS result_2
from (
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC
) r1
full join (
select distinct m_ids, p_id
from cte1
where level = 1
) r2 on r1.m_ids = r2.m_ids
giving you result:
m_ids | result_1 | result_2
-----------+----------+----------
{123} | 111 | 98
{432,222} | 240 | 215
that looks different from what you've expected but from my understanding of the logic it is the correct one. If I misunderstood anything, please let me know.
Just for the sake of logic explanation, one point:
Why m_ids with {123} returns 111 for result_1?
for group of m_ids = {123} we have two distinct p_id values
both 98 and 111 account for the condition of equality count with the m_ids length
p_id = 111 has a higher level, thus is chosen for the result_1

SQL to get combine rows in single row based on a column

I have a table as follow:
+---+---+---+
|obj|col|Val|
+---+---+---+
|1 |c1 | v1|
+---+---+---+
|1 |c2 | v2|
+---+---+---+
|2 |c1 | v3|
+---+---+---+
|2 |c2 | v4|
+---+---+---+
And I am looking for SQL that will give the result in the following format
+---+---+---+
|obj|c1 |c2 |
+---+---+---+
|1 |v1 | v2|
+---+---+---+
|2 |v3 | v4|
+---+---+---+
In this SQL, I am checking for col = 'c?' and printing out the corresponding Val. But the reason for group by is to avoid all NULL values in case the condition doesn't match. By grouping on obj all the NULL values will be avoided and produce the desired result.
SELECT obj,
MAX( CASE WHEN col = 'c1' THEN Val END ) AS c1,
MAX( CASE WHEN col = 'c2' THEN Val END ) AS c2
FROM Table
GROUP BY obj;
First you need to select all the unique id from your table
select distinct id
from a_table_you_did_not_name
how you can use that to left join to your columns
select base.id, one.val as c1, two.val as c2
from (
select distinct id
from a_table_you_did_not_name
) base
left join a_table_you_did_not_name one on one.id = base.id and one.col = 'c1'
left join a_table_you_did_not_name two on two.id = base.id and two.col = 'c2'
note: your case is a relatively simple case of this kind of join -- I coded it like this because using my method can be extended to the more complicated cases and still work. There are some other ways for this particular requirement that might be simpler.
specifically the most common one is joining to multiple tables, not all in the same table. My method will still work in those cases.

Fetch data from multiple tables in postgresql

I am working on an application where I want to fetch the records from multiple tables which are connected through foreign key. The query I am using is
select ue.institute, ue.marks, uf.relation, uf.name
from user_education ue, user_family uf where ue.user_id=12 and uf.user_id=12
The result of the query is
You can see the data is repeating in it. I only want a record one time. I want no repetition. I want something like this
T1 T2
id|name|fid id|descrip| fid
1 |A |1 1|DA | 1
2 |B |1 2|DB | 1
2 |B |1
Result which I want:
Result:
id|name|fid|id|descrip| fid
1 |A |1 |1|DA | 1
2 |B |1 |2|DB | 1
2 |B |1 |
The results fetched through your query
The total rows are 5
More Information
I want the rows of same user_id from both tables but you can see in T1 there are 3 rows and in T2 there are 2 rows. I do not want repetitions but also I want to fetch all the data on the basis of user_id
Table Schemas,s
T1
T2
I can't see why you would want that, but the solution could be to use the window function row_number():
SELECT ue.institute, ue.marks, uf.relation, uf.name
FROM (SELECT institute, marks, row_number() OVER ()
FROM user_education
WHERE user_id=12) ue
FULL OUTER JOIN
(SELECT relation, name, row_number() OVER ()
FROM user_family
WHERE user_id=12) uf
USING (row_number);
The result would be pretty meaningless though, as there is no ordering defined in the individual result sets.

I need to find the missing Id's from the table after checking the min and max id's from another table

I need to find the missing id's from the table #a below:
id |SEQ|Text
1 |1 |AA
1 |3 |CC
1 |4 |DD
1 |5 |EE
1 |6 |FF
1 |7 |GG
1 |8 |HH
1 |10 |JJ
2 |1 |KK
2 |2 |LL
2 |3 |MM
2 |4 |NN
2 |6 |PP
2 |7 |QQ
3 |1 |TT
3 |4 |ZZ
3 |5 |XX
The max and min SEQ of the table #a is stored in another table #b:
id| mn| mx
1 | 1 | 12
2 | 1 | 9
3 | 1 | 5
My query below is giving the correct output but the execution is expensive. Is there another way to solve this?
with cte
as
(
select id, mn, mx
from #b
union all
select id, mn, mx -1
from cte
where mx-1 > 0
)
select
cte.id, cte.mx
from
cte
left join #a on cte.id = #a.id and cte.mx = #a.seq
where
#a.seq is null
order by cte.id, cte.mx
There are mainly 2 problems in this query:
The query is running very slow. The above records are just example. In real database I have 50,000 rows.
I tried to understand the execution plan to detect the hiccups. However I could not understand some part of it, which I have highlighted in Red.
It would be great if someone could help me here. I am stuck.
You use recursive CTE to generate a set of numbers. It is quite inefficient way to do it (see charts for generating 50K numbers here). I'd recommend to have a persisted table of numbers in the database. I, personally, have a table Numbers with 100K rows with one column Number, which is a primary key, which has integer numbers from 1 to 100,000.
Once you have such table, your query is simplified to this:
SELECT
#b.id, #b.mx
FROM
#b
INNER JOIN Numbers ON
#b.mx >= Numbers.Number AND
#b.mn <= Numbers.Number -- use this clause if mn can be more than 1
LEFT JOIN #a ON
#a.id = #b.id AND
#a.seq = Numbers.Number
WHERE
#a.seq IS NULL
ORDER BY #b.id, #b.mx
Also, it goes without saying, that you have to make sure that you have index on #b on id, plus index on #a on (id, seq).
Two things that come to my mind are:
Use a numbers / tally table. Either by creating a normal table or create a virtual using CTE. Use that to find numbers that don't exist.
If there's not a lot of missing numbers, you can use a trick with row_number() to find the ranges of numbers that don't have any gaps with something like this:
select id, min(seq), max(seq)
from (
select
id,
seq,
seq - row_number () over (partition by id order by SEQ asc) GRP
from
table1
) X group by id, GRP
order by 1
This will of course need more handling after you have find the ranges of numbers that exists.
CTE is just syntax and is most likely getting evaluated multiple times
Materialize the CTE output to a #temp and join to the #temp