Join a dynamic number of rows in postgres

Join a dynamic number of rows in postgres - sql

Let's say I have the following tables:
Batch Items
---+----- ---+----------+--------
id | size id | batch_id | quality
---+----- ---+----------+--------
1 | 10 1 | 1 | 9
2 | 2 2 | 1 | 10
3 | 2 | 1
4 | 2 | 2
5 | 2 | 1
6 | 2 | 9
I have batches of items. They are sent by batches of size batch.size. An item is broken if it's quality is <= 3.
I want to know the number of broken items in the last batches sent:
batch_id | broken_item_count
---------+---------------------
1 | 0
2 | 2 (and not 3)
My idea is the following:
SELECT batch.id as batch_id, COUNT(broken_items.*) as broken_item_count
FROM batch
INNER JOIN (
SELECT id
FROM items
WHERE items.quality <= 3
ORDER BY items.id asc
LIMIT batch.size -- invalid reference to FROM-clause entry for table "batch"
) broken_items ON broken_items.batch_id = batch.id
(I would ORDER BY items.shipped_at. But for simplicity, I order by items.id)
But this query shows me the error I put as the comment.
How can I limit the number of joined items based on the batch.size that is different for each row ?
Is there any other way to achieve what I want ?

SELECT b.id AS batch_id
, count(i.quality < 4 OR NULL) AS broken_item_count
FROM batch b
LEFT JOIN (
SELECT batch_id, quality
, row_number() OVER (PARTITION BY batch_id ORDER BY id DESC) AS rn
FROM items
) i ON i.batch_id = b.id
AND i.rn <= b.size
GROUP BY 1
ORDER BY 1;
SQL Fiddle with added examples.
This is much like #Clodoaldos's answer, but with a couple of differences. Most importantly:
You want to count the broken items in the last batches sent, so we have to ORDER BY id DESC
If there can be batches without items at all you need to use LEFT JOIN instead of a plain JOIN or those batches are excluded.
Consequently, the check i.rn <= b.size needs to move from the WHERE clause to the JOIN clause.

SQL Fiddle
select
b.id as batch_id,
count(quality <= 3 or null) as broken_item_count
from
batch b
inner join (
select
id, quality, batch_id,
row_number() over (partition by batch_id order by id) as rn
from items
) i on i.batch_id = b.id
where rn <= b.size
group by b.id
order by b.id

From what I understand the count of defective items cannot be greater than the batch size.
EDIT: After reading your comments, I think using the RANK() function, and then join by rank and size should work for you. The following query attempts that.
SELECT b.id,
SUM(CASE WHEN i1.quality <= 3 THEN 1 ELSE 0END) as broken_item_count
FROM BATCH as b
LEFT JOIN (SELECT i.id, i.batch_id, i.quality,
RANK() OVER(PARTITION BY i.batch_id ORDER BY i.id) as RANK
FROM ITEMS as i) as i1 ON b.id = i1.batch_id AND i1.RANK <= b.size
GROUP BY b.id
EDIT2: Updated the query with a LEFT JOIN to cover the case where there are no samples in some batch.

Related

SQL help i need to find the inventory remaining in my office

In sql help i have 3 tables, table one is asset table which is as follow
id
asset_code
asset_name
asset_group
asset_quantity
1
A001
demo asset
4
5
2
A002
demo asset 2
6
3
and another table is asset_allocation
id
asset_id
allocated_quantity
allocated_location
1
1
2
IT office
2
1
1
main hall
the last table is asset_liquidated which will present assets that are no longer going to be used
id
asset_id
liquidated_quantity
1
1
2
2
1
1
lets say i have 5 computers and i have allocated 3 computers and 1 is no longer going to be used so i should be remaining with 1 computer so now how do i make sql auto generate this math for me

You need to use aggregation and the join your tables -
SELECT id, asset_code, asset_name, asset_group, asset_quantity,
asset_quantity - COALESCE(AA.allocated_quantity, 0) - COALESCE(AL.liquidated_quantity, 0) available_quantity
FROM asset A
LEFT JOIN (SELECT asset_id, SUM(allocated_quantity) allocated_quantity
FROM asset_allocation
GROUP BY asset_id) AA ON A.id = AA.asset_id
LEFT JOIN (SELECT asset_id, SUM(liquidated_quantity) liquidated_quantity
FROM asset_liquidated
GROUP BY asset_id) AL ON A.id = AL.asset_id
This query will give you -1 as available_quantity for asset_id 1 as you have only 5 available, 3 of them are allotted and 3 are liquidated as per your sample data.

Please see if this helps
SELECT
asset_quantity AS Total_Assets
,ISNULL(allocated_quantity, 0) allocated_quantity
,ISNULL(liquidated_quantity, 0) liquidated_quantity
FROM asset
LEFT OUTER JOIN (
SELECT
asset_id, SUM(allocated_quantity) AS allocated_quantity
FROM asset_allocation
GROUP BY asset_id
) asset_allocation2
ON asset_allocation2.asset_id = asset.id
LEFT OUTER JOIN (
SELECT
asset_id, SUM(liquidated_quantity) AS liquidated_quantity
FROM asset_liquidated
GROUP BY asset_id
) asset_liquidated 2
ON asset_liquidated 2.asset_id = asset.id

Selecting values in columns based on other columns

I have two tables, info and transactions.
info looks like this:
customer ID Postcode
1 ABC 123
2 DEF 456
and transactions looks like this:
customer ID day frequency
1 1/1/12 3
1 3/5/12 4
2 4/6/12 2
3 9/9/12 1
I want to know which day has the highest frequency for each postcode.
I know how to reference from two different tables but im not too sure how to reference multiple columns based on their values to other columns.
The output should be something like this:
customer ID postcode day frequency
1 ABC 123 3/5/12 4
2 DEF 456 4/6/12 2
3 GHI 789 9/9/12 1
and so on.

You can filter with a correlated subquery:
select
i.*,
t.day,
t.frequency
from info i
inner join transactions t on t.customerID = i.customerID
where t.frequency = (
select max(t.frequency)
from info i1
inner join transactions t1 on t1.customerID = i1.customerID
where i1.postcode = i.postcode
)
Or, if your RBDMS supports window functions, you can use rank():
select *
from (
select
i.*,
t.day,
t.frequency,
rank() over(partition by i.postcode order by t.frequency desc)
from info i
inner join transactions t on t.customerID = i.customerID
) t
where rn = 1

How to select all records of n groups?

I want to select the records of the top n groups. My data looks like this:
Table 'runner':
id gid status rtime
---------------------------
100 5550 1 2016-08-19
200 5550 2 2016-08-22
300 5550 1 2016-08-30
100 6050 3 2016-09-01
200 6050 1 2016-09-02
100 6250 1 2016-09-11
200 6250 1 2016-09-15
300 6250 3 2016-09-19
Table 'static'
id description env
-------------------------------
100 something 1 somewhere 1
200 something 2 somewhere 2
300 something 3 somewhere 3
The unit id (id) is unique within the group but not unique in its column, because an instance of the group is generated regularly. The group id (gid) is assigned to every unit but will not generate on more than one instance.
Now, combining the tables and selecting everything or filter by a specific value is easy, but how do I select all records of, for example, the first two groups without directly refering to the group ids?
Expected result would be:
id gid description status rtime
--------------------------------------
300 6250 something 2 3 2016-09-19
200 6250 something 1 1 2016-09-15
100 6250 something 3 1 2016-09-11
200 6050 something 2 1 2016-09-02
100 6050 something 1 3 2016-09-01
Extra Question: When I filter for a timeframe like this:
[...]
WHERE runner.rtime BETWEEN '2016-08-25' AND '2016-09-16'
Is there a simple way of ensuring, that groups are not cut off but either appear with all their records or not at all?

You can use a ROW_NUMBER() to do this. First, create a query to rank groups:
SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid
Then use this as a derived table to get your other info, and use a where clause to filter to the number of groups you want to see. For instance, the below would return the top 5 groups RN <= 5:
SELECT id, R.gid, description, status, rtime
FROM (SELECT gid, ROW_NUMBER() over (order by gid desc) as RN
FROM Runner
GROUP BY gid) G
INNER JOIN Runner R on R.gid = G.gid
INNER JOIN Statis S on S.id = R.id
WHERE RN <= 5 --Change this to see more or less groups
For your second question about dates, you can do this with a subquery like so:
SELECT *
FROM Runner
WHERE gid IN (SELECT gid
FROM Runner
WHERE rtime BETWEEN '2016-08-25' AND '2016-09-16')

Hmmm. I suspect this might do what you want:
select top (1) with ties r.*
from runner r
order by min(rtime) over (partition by gid), gid;
At least, this will get the complete first group.
In any case, the idea is to include gid as a key in the order by and to use top with ties.

you can do the following
with report as(
select n.id,n.gid,m.description,n.status,n.rtime, dense_rank() over(order by gid desc) as RowNum
from #table1 n
inner join #table2 m on n.id = m.id )
select id,gid,description,status,rtime
from report
where RowNum<=2 -- <-- here n=2
order by gid desc,rtime desc
here a working demo

DENSE_RANK looks like a ideal solution here
Select * From
(
select DENSE_RANK() over (order by gid desc) as D_RN, r.*
from runner r
) A
Where D_RN = 1

No need to use ranking functions (ROW_NUMBER, DENSE_RANK etc).
SELECT r.id, gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
WHERE gid IN (
SELECT TOP 2 gid FROM runner GROUP BY gid ORDER BY gid DESC
)
ORDER BY rtime DESC;
The same using CTE:
WITH grouped
AS
(
SELECT TOP 2 gid
FROM runner GROUP BY gid ORDER BY gid DESC
)
SELECT r.id, grouped.gid, [description], [status], rtime
FROM runner r
INNER JOIN static s ON r.id = s.id
INNER JOIN grouped ON r.gid = grouped.gid
ORDER BY rtime DESC;

Selecting objects that are associated with similar datasets

I'm trying to select all company rows from a [Company] table that share with at least one other company, the same number of employees (from an [Employee] table that has a CompanyId column), where each group of respective employees share the same set of LocationIds (a column in the [Employee] table) and in the same proportion.
So, for instance, two companies with three employees each that have the locationIds 1,2, and 2, would be selected by this query.
[Employee]
EmployeeId | CompanyId | LocationId |
========================================
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 2
7 | 3 | 3
[Company]
CompanyId |
============
1 |
2 |
3 |
Returns the CompanyIds:
======================
1
2
CompanyIds 1 and 2 are selected because they share in common with at least one other company: 1. the number of employees (3 employees); and 2. the number/proportion of LocationIds associated with those employees (1 employee has LocationId 1 and 2 employees have LocationId 2).
So far I think I want to use a HAVING COUNT(?) > 1 statement, but I'm having trouble working out the details. Does anyone have any suggestions?

This is ugly, but the only way I can think of to do it:
;with CTE as (
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
)
select c.Id
from Company c
join (
select x.LocationEmployeeData, count(x.Id) [CompanyCount]
from CTE x
group by x.LocationEmployeeData
having count(x.Id) >= 2
) y on y.LocationEmployeeData = (select LocationEmployeeData from CTE where Id = c.Id)
See fiddle: http://www.sqlfiddle.com/#!6/6bc16/5
It works by encoding the Employee count per Location data (multiple rows) into an xml string for each Company.
The CTE code on its own:
select c.Id,
(
select e.Location, count(e.Id) [EmployeeCount]
from Employee e
where e.IdCompany=c.Id
group by e.Location
order by e.Location
for xml auto
) LocationEmployeeData
from Company c
Produces data like:
Id LocationEmployeeData
1 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
2 <e Location="1" EmployeeCount="2"/><e Location="2" EmployeeCount="1"/>
3 <e Location="3" EmployeeCount="1"/>
Then it compares companies based on this string (rather than trying to ascertain whether multiple rows match, etc).

An alternative solution could look like this. However it also requires performance testing in advance (I don't feel quite confident with <> type join).
with List as
(
select
IdCompany,
Location,
row_number() over (partition by IdCompany order by Location) as RowId,
count(1) over (partition by IdCompany) as LocCount
from
Employee
)
select
A.IdCompany
from List as A
inner join List as B on A.IdCompany <> B.IdCompany
and A.RowID = B.RowID
and A.LocCount = B.LocCount
group by
A.IdCompany, A.LocCount
having
sum(case when A.Location = B.Location then 1 else 0 end) = A.LocCount
Related fiddle: http://sqlfiddle.com/#!6/d9f2e/1

Cross Join with Filter?

i need to make Sp to distribute students to their sections
the procedure take 2 string parameters StuID and SecID
in case I've send '1,2,3,4,5' as StuID and 'a,b' as SecID
i'm using spliting function which well return tables
Tb1 | Tb2
1 | a
2 | b
3 |
4 |
5 |
how can i get the following result
1 a
2 b
3 a
4 b
5 a
....
I've tried to do it via cross join but it did not show the result i want
select US.vItem as UserID,SE.vItem as Section
from split(#pUserID,',') us
cross join split(#pSectionID,',') se

Cross join isn't meant to work like that.
This will give you the results you want, but it's a bodge.
select t1.vItem, t2.VItem from
( select *, ROW_NUMBER() over (order by vItem) r from US ) t1
inner join
( select *, ROW_NUMBER() over (order by vItem desc) -1 r from SE ) t2
on t2.r = t1.r % (select COUNT(*) from SE)
order by t1.vItem

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join a dynamic number of rows in postgres - sql

SQL Fiddle select b.id as batch_id, count(quality <= 3 or null) as broken_item_count from batch b inner join ( select id, quality, batch_id, row_number() over (partition by batch_id order by id) as rn from items ) i on i.batch_id = b.id where rn <= b.size group by b.id order by b.id

Related

SQL help i need to find the inventory remaining in my office

Selecting values in columns based on other columns

How to select all records of n groups?

Selecting objects that are associated with similar datasets

Cross Join with Filter?

Categories

Resources