Not able to join two tables with limit in Postgres - sql

I have table A with col1,col2,col3 and Table B col1.
I want to join both tables using the limit
I want some thing like
select a.col1,a.col2,a.col3,b.col1
from tableA a, tableB b limit 5 and a.col1 between 1 AND 10;
So I have 10 records in table b and 10 in table a. I should get total of 50 records by limiting only 5 records from table b

Your description translates to a CROSS JOIN:
SELECT a.col1, a.col2, a.col3, b.b_col1 -- unique column names
FROM tablea a
CROSS JOIN ( SELECT col1 AS b_col1 FROM tableb LIMIT 5 ) b;
-- WHERE a.col1 BETWEEN 1 AND 10; -- see below
... and LIMIT for tableb like a_horse already demonstrated. LIMIT without ORDER BY returns arbitrary rows. The result can change from one execution to the next.
To select random rows from tableb:
...
CROSS JOIN ( SELECT col1 AS b_col1 FROM tableb ORDER BY random() LIMIT 5) b;
If your table is big consider:
Best way to select random rows PostgreSQL
While you ...
have 10 records in ... table a
... the added WHERE condition is either redundant or wrong to get 50 rows.
And while SQL allows it, it rarely makes sense to have multiple result columns of the same name. Some clients throw an error right away. Use a column alias to make names unique.

You need a derived table (aka "sub-query") for that. In the derived table, you can limit the number of rows.
select a.col1, a.col2, b.col3, b.col1
from tablea a
join (
select b.col3, b.col1
from tableb
limit 5 -- makes no sense without an ORDER BY
) b on b.some_column = a.some_column --<< you need a join condition
where a.col1 between 1 and 10;
Note that using LIMIT without an ORDER BY usually makes no sense.

Related

SQL function to create a one-to-one match between two tables?

I am trying to join 2 tables. Table_A has ~145k rows whereas Table_B has ~205k rows.
They have two columns in common (i.e. ISIN and date). However, when I execute this query:
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.date = B.date
WHERE A.isin = B.isin
I get a table with more than 147k rows. How is it possible? Shouldn't it return a table with at most ~145k rows?
What you are seeing indicates that, for some of the records in Table_A, there are several records in Table_B that satisfy the join conditions (equality on the (date, isin) tuple).
To exhibit these records, you can do:
select B.date, B.isin
from Table_A
join Table_B on A.date = B.date and A.isin = B.isin
group by B.date, B.isin
having count(*) > 1
It's up to you to define how to handle those duplicates. For example:
if the duplicates have different values in column column_name, then you can decide to pull out the maximum or minimum value
or use another column to filter on the top or lower record within the duplicates
if the duplicates are true duplicates, then you can use select distinct in a subquery to dedup them before joining
... other solutions are possible ...
If you want one row per table A, then use outer apply:
SELECT A.*,
B.column_name
FROM Table_A a OUTER APPLY
(SELECT TOP (1) b.*
FROM Table_B b
WHERE A.date = B.date AND A.isin = B.isin
ORDER BY ? -- you can specify *which* row you want when there are duplicates
) b;
OUTER APPLY implements a lateral join. The TOP (1) ensures that at most one row is returned. The OUTER (as opposed to CROSS) ensures that nothing is filtered out. In this case, you could also phrase it as a correlated subquery.
All that said, your data does not seem to be what you really expect. You should figure out where the duplicates are coming from. The place to start is:
select b.date, b.isin, count(*)
from tableb b
group by b.date, b.isin
having count(*) >= 2;
This will show you the duplicates, so you can figure out what to do about them.
Duplicate possibilities is already discuss.
When millions of records are use in join then often due to poor Cardianility Estimate,
record return are not accurate.
For this just change join order,
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.isin = B.isin
and
A.date = B.date
Also create non clustered index on both table.
Create NonClustered index isin_date_table_A on Table_A(isin,date)include(*Table_A)
*Table_A= comma seperated list Table_A column which is require in resultset
Create NonClustered index isin_date_table_B on Table_B(isin,date)include(column_nameA)
Update STATISTICS Table_A
Update STATISTICS Table_B
Keeping the DATE columns of both tables in the same format in the JOIN condition you should be getting the result as expected.
Select A.*, B.column_name
from Table_A
join Table_B on to_date(a.date,'DD-MON-YY') = to_date(b.date,'DD-MON-YY')
where A.isin = B.isin

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

Returning only duplicate rows from two tables

Every thread I've seen so far has been to check for duplicate rows and avoiding them. I'm trying to get a query to only return the duplicate rows. I thought it would be as simple as a subquery, but I was wrong. Then I tried the following:
SELECT * FROM a
WHERE EXISTS
(
SELECT * FROM b
WHERE b.id = a.id
)
Was a bust too. How do I return only the duplicate rows? I'm currently going through two tables, but I'm afraid there are a large amount of duplicates.
use this query, maybe is better if you check the relevant column.
SELECT * FROM a
INTERSECT
SELECT * FROM b
I am sure your posted code would work too like
SELECT * FROM a
WHERE EXISTS
(
SELECT 1 FROM b WHERE id = a.id
)
You can as well do a INNER JOIN like
SELECT a.* FROM a
JOIN b on a.id = b.id;
You can as well use a IN operator saying
SELECT * FROM a where id in (select id from b);
If none of them, then you can use UNION if both table satisfies the union restriction along with ROW_NUMBER() function like
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id) AS rn
FROM (
select * from a
union all
select * from b) xx ) yy
WHERE rn = 1;
Note: there's an ambiguity as to what you mean by a duplicate row, and whether you're talking about duplicate keys, or all fields being the same. My answer deals with all fields being the same; some of the others are assuming it's just the keys. It's unclear which you intend.
You might try
SELECT id, col1, col2 FROM a INNER JOIN b ON a.id = b.id
WHERE a.col1 = b.col1 AND a.col2 = b.col2
adding in other columns as necessary. The database engine should be intelligent enough to do the comparisons on the indexed columns first, so it'll be efficient as long as you don't have rows that are different only on lots of non-indexed fields. (If you do, then I don't think anything will do it particularly efficiently.)

Is there a way to do a multi table query and get result just from specific tables?

I am trying to do a multi query but I don't want to use sub queries i.e:
SELECT column1
FROM table1
WHERE
EXISTS (SELECT column1 FROM table2 WHERE table1.column1 = table2.column1);)
I thought of using a JOIN but so far my best result was this:
SELECT *
FROM table1
JOIN table2 ON table1.t1id = table2.t2id
WHERE table1.id = 5;
This would be good except of the fact that I get a duplicate column (the id in table 1 and 2 are foreign keys).
How do I remove the duplicate column if possible?
UPDATE:
Table1:
tableA_ID, TABLEB_ID
1, 1
1, 4
3, 2
4, 3
TableA: ID, COL1, COL2
1, A, B
2, A, B
3, A, B
4, A, B
TableB: ID, Col3, COL4
1, C, D
2, C, D
3, C, D
4, C, D
I want to get all or some of the columns from TableA according to a condition
Sample: Lets say the condition is that tableA_ID = 1 which will result in the 2 first rows in the table then I want to get all or some of the columns in TableA that respond to the ID that I got from Table1.
Sample: The result from before was [{1,1}{1,4}] which means I want from TableA the results:
TableA.ID, TableA.COL1, TableA.COL2
1,A,B
4,A,B
The actual results I get is:
Table1.tableA_ID, Table1.TABLEB_ID, TableA.ID, TableA.COL1, TableA.COL2
1,1,1,A,B
1,4,4,A,B
Is this what you're looking for?
select a.id, a.column1, b.column2
from table1 a
left join table2 b on a.id = b.otherid;
You can't change the column list of a query based on the values it returns. It just isn't the way that SQL is designed to operate. At best, you can return all of the columns from the second table and ignore the ones that aren't relevant based on other values in that row.
I'm not even sure how a variable column list would work. In your scenario, you're looking for two discrete values separately. But that's not the only scenario: what if the condition is tableA_ID in (1,2). Would you want different numbers of columns in different rows as part of a single result set?
Getting just the columns you want (just from specific tables, as you say) is the easy part (btw -- don't use '*' if you can help it -- topic for another discussion):
SELECT
A.ID,
A.COL1,
A.COL2
FROM
TABLE1 Bridge
LEFT JOIN TABLEA A
ON Bridge.TABLEA_ID = A.ID
LEFT JOIN TABLEB B
ON Bridge.TABLEB_ID = B.ID
Getting the rows you want will be the harder part (influenced by your choice of joins, among several other things).
I think you'll need to select only the fields of table A and use a distinct clause. Rest of your query will remain as it is. i.e.
SELECT distinct table1.*
FROM table1
JOIN table2 ON table1.t1id = table2.t2id
WHERE table1.id = 5;

Query for a table with big size of columns

I've got a table in which there are some columns with big text data. The query for 10 rows (table has only 31 records) takes more than 20 seconds. If I remove fields with big size, the query is executed quickly. The query for 1 row (by id) always executed quickly.
How can I do the query for many rows work more faster?
The query looks like this
SELECT DISTINCT (a.id), a.field_1, a.field_2, a.field_3
, a.field_4, a.field_5, a.filed_6, ...
FROM table_a a, table_b b
WHERE a.field_8 = 'o'
ORDER BY a.field_2 DESC
LIMIT 10;
#a_horse already hinted at the likely syntax error. Try:
SELECT DISTINCT ON (a.id) a.id, a.field_1, a.field_2, a.field_3, ...
FROM table_a a
-- JOIN table_b b ON ???
WHERE a.field_8 = 'o'
ORDER BY a.id, a.field_2 DESC
LIMIT 10;
Note the bold emphasis and read up on the DISTINCT clause in the manual.
Also, an index on field_8 might help.
A multicolumn index on (field_8, id, field_2) might help even more, if you can narrow it down to that (and if that is the sort order you want, which I doubt).
If you want the result sorted by a.field_2 DESC first:
In PostgreSQL 9.1, if id is the primary key:
SELECT a.id, a.field_1, a.field_2, a.field_3, ...
FROM table_a a
-- JOIN table_b b ON ???
WHERE a.field_8 = 'o'
GROUP BY a.id -- primary key takes care of all columns in table a
ORDER BY a.field_2 DESC
LIMIT 10;
why you are selecting table_b? you dont join this tables!
make a real join like this
SELECT DISTINCT
(a.id), a.field_1, a.field_2, a.field_3, a.field_4, a.field_5, a.filed_6
FROM table_a a
INNER JOIN table_b b
ON b.field_on_table_b = a.field_on_table_a
WHERE a.field_8 = 'o'
ORDER BY a.field_2 DESC LIMIT 10
then be sure that field_8 (in the where statement) is defined with a key!