In an SQL join operation, how to get the rows from the left join and only the aggregate of two columns from the right table - sql

I am trying to SUM the quantity in tpos table and count the distinct number of stores for each item that is in tpos.
For each row in inv_dtl there could be mulitple rows in tpos tables. I would like to put a script together that would give me all the rows from the inv_dtl table and add two aggregate columns sum(tpos.quantiy), count(distinct, tpos.store_number) that matches the join condition.
Here is what I have so far. The aggregates are working but my output contains the number or rows that match in tpos.
For example 1 row in inv_dtl could have 100 rows in tpos. My output should contain 1 row plus the two aggregate columns but my current script generates 100 rows.
WITH FT1 As
(
SELECT * FROM inv_dtl WHERE inv_no IN (16084, 23456, 14789)
),
FT2 As
(
SELECT
FT1.*,
SUM(tpos.quantity) OVER (partition by tpos.item_id) As pos_qty,
DENSE_RANK() OVER (partition by tpos.store_number ORDER BY tpos.item_id ASC) +
DENSE_RANK() OVER (partition by tpos.store_number ORDER BY tpos.item_id DESC)
As unique_store_cnt
FROM FT1
LEFT JOIN tpos
ON tpos.item_id = FT1.ITEM_ID
And tpos.movement_date Between FT1.SDATE And FT1.EDATE
And tpos.store_number != 'CMPNY'
)
SELECT * FROM FT2 ORDER BY ITEM_ID

Just use a conventional GROUP BY which will reduce the number of rows. But as I have no idea what columns you want from the first mentioned table so I have just invented 4 as an example.
WITH
FT1 AS (
SELECT
col1, col2, col3, col4
FROM inv_dtl
WHERE inv_no IN (16084, 23456, 14789)
),
FT3 AS (
SELECT
FT1.col1, FT1.col2, FT1.col3, FT1.col4
, SUM(tpos.quantity) OVER (PARTITION BY tpos.item_id) AS pos_qty
, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ASC) +
AS unique_store_cnt
FROM FT1
LEFT JOIN tpos ON tpos.item_id = FT1.ITEM_ID
AND tpos.movement_date BETWEEN FT1.SDATE AND FT1.EDATE
AND tpos.store_number != 'CMPNY'
GROUP BY
FT1.col1, FT1.col2, FT1.col3, FT1.col4
)
SELECT
*
FROM FT3
ORDER BY col1, col2, col3, col4
Do pleae note that RANK() and DENSE_RANK() can repeat numbers if data is of "equal rank". To guarantee a unique integer per row use ROW_NUMBER() instead.

Related

How to select first-n/top-n rows from a query's resultant if its count is more than a given number?

I have a query that returns more than 1000 rows.
Step1:
with total_res as (
select table1.col1, table1.col2, table2.col3,... table2.coln
from table1 join table2
on table1.keycol=table2.keycol
where table1.col4='ABCD' and table2.col5 <= '02-02-2022'
order by table1.col1 desc)
In my requirement, I have to return the first 350rows by ordering col3 in desc if the output of the above query contain more than 1000rows.
So I added a row number column like below to add sequential numbers to the resulset from above.
Step2:
select col1, col2, col2...coln, ROW_NUMBER() OVER (ORDER BY col3 desc) as number from total_res;
What I don't understand now is how can I check if the output from step2 contains more than 350 rows and if so, select the first 350 rows.
Could anyone let me know how can I achieve this ? Or is there a better way to do it than using row_number ?
try like below in 2nd step check highest number>350 and then limit the value 350
with cte as
(
select col1, col2, col2...coln, ROW_NUMBER() OVER (ORDER BY col3 as number from total_res
) select * from cte
where 350 < ( select max(number) from cte)
order by number
limit 350

Need to sort first column by values in a dataset and then find average

There are values in my data set. there are 3 columns.
column 1 has values 1,1,3,4,5,5,6,7,7,7,7. I need to sort the column and then apply average.
1,1 means two rows with index 1 and 1. i need to average values in rest of the columns i.e column 2 and column 3 for each row.
similarly for data in 5,5 and so on. able to sort but cannot manage the average problem..
The ROW_NUMBER() should do the sorting for you and (col1+col2+col3)/3 should make it average for you. For nullable columns you will need to do some changes to the code.
SELECT t1.rownumber, (t1.col1 + t2.col2 + t3.col3)/3 as "AVG"
FROM (SELECT ROW_NUMBER() OVER(ORDER BY col1 DESC) AS rownumber, col1 FROM MyTable) t1
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col2 ASC) AS rownumber, col2 FROM MyTable) as t2 on t1.rownumber = t2.rownumber
INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY col3 ASC) AS rownumber, col3 FROM MyTable) as t3 on t1.rownumber = t3.rownumber
Your question sounds like a convoluted way of describing aggregation. Is this what you want?
select col1, avg(col2), avg(col3)
from t
group by col1;
If you want the average on each row, then use window functions:
select col1,
avg(col2) over (partition by (col1),
avg(col3) over (partition by (col1)
from t;

How to intersect two tables without losing the duplicate values oracle

How to intersect two tables without losing the duplicate values in Oracle?
TAB1:
A
A
B
C
TAB2:
A
A
B
D
Output:
A
A
B
A subquery will filter the rows:
select *
from tab1
where col in (select col from tab2)
If I understand correctly:
select a.*, row_number() over (partition by col1 order by col1)
from a
intersect
select b.*, row_number() over (partition by col1 order by col1)
from b;
This adds a new sequential number to each row. Intersect will go up to the matching number.
This uses partition by col1 -- the col1 is arbitrary. You may need to include all columns in the partition by.

TSQL merge 2 dataset with even number of rows next to eachother

What I am trying to accomplish:
Dataset 1
Name1
Name2
Name3
Dataset 2
Number1
Number2
Number3
will become 2 columns:
dataset1 dataset2
Name1 Number1
Name2 Number2
Name3 Number3
My datasets 1 & 2 will always have equal rows.
Which name linked to which number I don't care as long as two names are not linked to the same number and vice versa.
How can I solve this with SQL / SQL Server ?
If you don't want to add an identity column to the tables, you can use the ROW_NUMBER() function like this:
SELECT
T1.Col1,
T2.Col1
FROM
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table1) T1
INNER JOIN
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table2) T2
ON T1.N = T2.N
Here, replace Table1 and Table2 with the name of your tables, and replace Col1 with the name of the column (or columns) that you want to output from the two tables.
Add identity columns to both the tables and perform join on basis of these column
ALTER TABLE Table1
ADD ID INT IDENTITY(1,1) NOT NULL
ALTER TABLE Table2
ADD ID INT IDENTITY(1,1) NOT NULL
SELECT Table1.dataset1col , Table2.dataset2Col
From Table1 INNER JOIN Table2
ON Table1.ID = Table2.ID
This may work for you :
;WITH cte1 (name, rn)
AS (SELECT Name,
row_number()
OVER(
ORDER BY Name) rn
FROM Dataset1),
cte2 (Number, rn)
AS (SELECT Number,
row_number()
OVER(
ORDER BY Number) rn
FROM Dataset2)
SELECT name,
Number
FROM cte1
JOIN cte2
ON cte1.rn = cte2.rn
WITH Table1 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset1) as Rnk,Dataset1
FROM TA1
)
With Table2 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset2) as Rnk, Dataset2
FROM TA2
)
Select Table.Dataset1 as 'DataSet1', Table2.DataSet2 as 'DataSet2'
From Table1
inner join Table2 on Table1.Rnk = Table2.Rnk
Because you haven't added table name so I considered it as TA1 and TA2.
Another way of writing the query is:
select row_number() over (order by Names asc) as rownum,
Names
into #Temp1
from NameTable
select row_number() over (order by Numbers asc) as rownum,
Numbers
into #Temp2
from NumberTable
select Names, Numbers
from #Temp1
inner join #Temp2 on #Temp1.rownum = #Temp2.rownum
Demo
There are 3 possible solutions to this.
First: Use following trick (Warning: Use this in case of small datasets)
SELECT DISTINCT tbl1.col1, tbl2.col2
FROM
(SELECT FirstName AS col1, ROW_NUMBER() OVER (ORDER BY FirstName) Number FROM dbo.User) tbl1
INNER JOIN
(SELECT LastName AS col2, ROW_NUMBER() OVER (ORDER BY LastName) Number FROM dbo.User) tbl2
ON tbl1.Number = tbl2.Number
Second: Use variable tables to store result temporarily. This solution is for relatively large datasets. (approx records to 100s)
Third:
Use identitfy field in both tables as already mentioned by mmhasannn. But i will prefer this method least, as we need to modify our DB structure.
RECOMMENDED: Use variable tables approach

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).