How can I join on only a certain number of rows - sql

I want to join two tables on a particular column, but I only want to join on 50 of the rows from the first table. I.e. I want to do the following:
select * from s1.companies c limit 50 join s2.employees e on c.id = e.c_id;
I'm getting a syntax error because of the limit. How can I do this query? The reason I want to do this is because the companies table has millions of rows and I just want to play with some of the data without having long running queries.

You need a subquery.
select * from (select * from s1.companies limit 50) c join s2.employees e on c.id = e.c_id;

Related

Get all columns from other tables with a distinct

I am doing a distinct to filter by 2 columns, but I need it to bring me all the columns of the query, in this case it only brings me "idMes" and "idAnio", I need it to show me the other columns as well.
How could I do it?
this is my sentence:
SELECT DISTINCT e.idMes, e.idAnio FROM expensas as e INNER JOIN anios as a on e.idAnio = a.idAnio INNER JOIN meses as m on e.idMes = m.idMes;
Select * gives you all columns

why is my sql inner join return much more data than table 1?

I need to join three tables to get all the info I need. Table a has 70 million rows, after joining a with b, I got 40 million data. But after I join table c, which has only 1.7 million rows, it becomes 300 million rows.
In table c, there are more than one same pt_id and fi_id, one pt_id can connect to many different fi_id, but one fi_id only connects to one same pt_id.
I'm wondering if there is any way to get rid of the duplicate rows, cause I join table c only to get the pt_id.
Thanks for any help!
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
You can use GROUP BY
select c.pt_id,b.fi_id,a.zq_id
from a
inner join (select zq_id, fi_id from b) b
on a.zq_id = b.zq_id
inner join (select fi_id,pt_id from c) c
on b.fi_id = c.fi_id
group by c.pt_id,b.fi_id,a.zq_id
to remove all duplicate row as question below:
How do I (or can I) SELECT DISTINCT on multiple columns?

Limit Query Result Using Count

I need to limit the results of my query so that it only pulls results where the total number of lines on the ID is less than 4, and am unsure how to do this without losing the select statement columns.
select fje.journalID, fjei.ItemID, fjei.acccount, fjei.debit, fjei.credit
from JournalEntry fje
inner join JournalEntryItem fjei on fjei.journalID = fje.journalID
inner join JournalEntryItem fjei2 on fjei.journalID = fjei2.journalID and
fjei.ItemID != fjei2.ItemID
order by fje.journalID
So if journalID 1 has 5 lines, it should be excluded, but if it has 4 lines, I should see it in my query. Just need a push in the right direction. Thanks!
A subquery with an alias has many names, but it's effectively a table. In your case, you would do something like this.
select your fields
from your tables
join (
select id, count(*) records
from wherever
group by id ) derivedTable on someTable.id = derivedTable.id
and records < 4

Grouping in select query returns more rows than in actual selecting table?

I need to select few columns from table which contains 10 records, I need to group these 1 records by productId.
If I group all these records from product table which contains more than 120 records am getting more than 1000 records.
You most likely are not using the JOIN correctly. You need to say where the two (or more) tables have a value which is equal. Such as....
SELECT COUNT(a.AppleId) as AppleAmt, b.BasketLocation, c.ContainerColor
FROM Apples as a INNER JOIN
Baskets as b ON a.BasketId = b.BasketId INNER JOIN
Containers as c ON b.ContainerId = c.ContainerId
WHERE a.Rotten IS NULL and a.Eaten IS NULL and a.SoTasty IS NOT NULL
GROUP BY b.BasketLocation, c.ContainerColor
ORDER BY b.BasketLocation DESC

"Select Top 10, then Join Tables", instead of "Select Top 10 from Joined Tables"

I have inherited a stored procedure which performs joins across eight tables, some of which contain hundreds of thousands of rows, then selects the top ten entries from the result of that join.
I have enough information at the start of the procedure to select those ten rows from a single table, and then perform those joins on those ten rows, rather than on hundreds of thousands of intermediate rows.
How do I select those top ten rows and then only do joins on those ten rows, instead of performing joins all of the thousands of rows in the table?
I should try:
SELECT * FROM
(SELECT TOP 10 * FROM your_table
ORDER BY your_condition) p
INNER JOIN second_table t
ON p.field = t.field
The optimizer may not be able to perform the top 10 first if you have inner joins, since it can't be sure that the inner joins won't exclude rows later on. It would be a bug if it selected 10 rows from the main table, and then only returned 7 rows at the end because of a join. Using Marco's rewrite may gain you performance for this reason since you're expressly stating that it's safe to limit the rows before the joins.
If you're query is sufficiently complicated, the query plan optimizer may run out of time finding a good plan. It's only given a few hundred milliseconds, and with even a few joins there are probably thousands of different ways it can execute the query (different join orders, etc). If this is the case, you'll benefit from storing the first 10 rows in a temp table first, and then using that later like this:
select top 10 *
into #MainResults
from MyTable
order by your_condition;
select *
from #MainResults r
join othertable t
on t.whatever = r.whatever;
I've seen cases where this second approach has made a HUGE difference.
You can also use a CTE to define the top X and then use it
For example this data.se query limits only to top 40 tags
with top40 as (
select top 40 t.id, t.tagname
from tags t, posttags pt
where pt.tagid = t.id
group by t.tagname, t.id
order by count(pt.postid) desc
),
myanswers as(
select p.parentid, p.score
from posts p
where
p.owneruserid = ##UserID## and
p.communityowneddate is null
)
select t40.tagname as 'Tag', sum(p1.score) as 'Score',
case when sum(p1.score) >= 15 then ':-)' else ':-(' end as 'Status'
from top40 t40, myanswers p1, posttags pt1
where
pt1.postid = p1.parentid and
pt1.tagid = t40.id
group by t40.tagname
order by sum(p1.score) desc