Select Multiple value from a table but group by one value - sql

I am trying to select multiple values from two tables but i want to group by single value. I have tried using max(value) in select but max is returning the greatest one and not the exact one.
Here are my tables
The result i need is something like this
Result : HeadQuarterId - A, PropertyName - Name1, Amount - 102
HeadQuarterId - B, PropertyName - Name5, Amount - 30
Here is my query
SELECT Headquarterid,Max(PropertyName),sum(Amount)
FROM Table1 A LEFT OUTER JOIN Table2 B
ON A.Propetyid = B.PropertyId
GROUP BY Headquarterid
Here i have used Left Outer Join so that i will get all the data from left table even it is not available in right table.
Also i cannot use A.HeadquarterID = A.PropertyId in where condition since i have other dependency in that table. Please suggest someother way to achieve this result.

I think I understand. You want the headquarters with the maximum value, which happens to be A. If so:
select t1.*, sum(t2.amount) over () as total
from t1 left join
t2
on t2.PropertyId = t1.PropertyId
order by t2.amount desc
fetch first 1 row only;
Note: Not all databases support fetch first. It might be spelled limit or use select top (1) for instance.

I would recommend to get the headquartename per ID in a cte / subquery, then join it again to T1 and left join T1 to T2 in a second cte / subquery. This way you can calculate your sums basing on a single group:
WITH cte AS(
SELECT ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY CASE WHEN t1.ID = t1.PROPERTYID THEN 0 ELSE 1 END) rn, t1.ID, t1.Name
FROM t1
),
cte2 AS(
SELECT c.name cName, t1.*, t2.Value
FROM t1
INNER JOIN cte c ON c.ID = t1.ID AND c.rn = 1
LEFT JOIN t2 ON t1.Propertyid = t2.propertyid
)
SELECT c2.id, c2.cname, sum(c2.value) value
FROM cte2 c2
GROUP BY c2.id, c2.cname
See SQLFiddle for details: http://sqlfiddle.com/#!18/8bf66/13/2
Of course you can build the first cte without the row_number only by using the WHERE ID = PROPERTYID - matter of taste I'd say...

As per your sample data you want window function :
select distinct t1.HeadQuarterId,
max(t1.PropertyName) over (partition by t1.HeadQuarterId) as PropertyName,
sum(t2.amount) over (partition by t1.HeadQuarterId) as amount
from t1 left join
t2
on t2.PropertyId = t1.PropertyId;

This provided the result i expected.
SELECT HQTRS1 AS headId,Max(LLORD1) AS headName, sum(Amount) AS amount
FROM
(SELECT DISTINCT HeadQuarterId AS HQTRS1, PropertyName AS LLORD1 FROM Table_1 WHERE HeadQuarterId = PropertyId) AS temp
INNER JOIN Table_1 AS A ON A.HeadQuarterId = temp.HQTRS1
LEFT OUTER JOIN Table_2 B
ON B.PropertyId = A.PropertyId
GROUP BY HQTRS1

Related

How to implement a LEFT OUTER JOIN CLAUSE after WITH AS?

Currently trying to figure out how to implement a SQL LEFT OUTER JOIN while using the SQL WITH AS clause. My code breaks down into 3 SELECT statements while using the same table, then using LEFT OUTER JOIN to merge another table on the id.
I need 3 SELECT statements before joining because I need a SELECT statement to grab the needed columns, ROW RANK the time, and set WHERE clause for the ROW RANK.
SELECT *
(
WITH employee AS
(
SELECT id, name, department, code, time, reporttime, scheduled_time
FROM table1 AS a
WHERE department = "END"
),
employe_v2 as
(
SELECT address
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY time desc, reporttime desc, scheduled_time desc) AS row_rank
FROM table1 AS b
)
SELECT *
FROM employee, employee_v2
WHERE row_rank = 1
) t1
LEFT OUTER JOIN
(
SELECT b.id, b.new_code, b.date
FROM table2 AS b
WHERE b.newcode != "A"
) t2
ON t1.id = t2.id
Group BY t1.id, t1.name, t1.department, t1.code, t1.time, t1.reporttime,
t1.scheduled_time, t1.row_rank, t2.id, t2.new_code, t2.date
How I could fix my code?
not sure if group by is needed, i see no aggregation whatsover
but if it's something you need , you can add at the end of final select and ofcourse you have to take care of columns/aggregation in select
nevertheless you can simplify your query as below :
with employee as (
select * from (
select id, name, department, code, time, reporttime, scheduled_time, address
,row_number() over (partition by id order by time desc, reporttime desc, scheduled_time desc) AS row_rank
from table1
) t where row_rank =1
)
select t1.*, b.id, b.new_code, b.date
from employee t1
left join table2 as t2
on t1.id = t2.id
where t2.newcode != "A"

How to Group By all fields nested tables in a Left Join query in BigQuery?

I have about 10 tables that I make one big nested tables by rounds with the following query:
R1 AS(
SELECT ANY_VALUE(Table1).*, ARRAY_AGG(( SELECT AS STRUCT Table2.* EXCEPT(ID))) AS Table2
FROM Table1 LEFT JOIN Table2 USING(ID)
GROUP BY Table1.ID),
R2 AS(
SELECT ANY_VALUE(R1).*, ARRAY_AGG(( SELECT AS STRUCT Table3.* EXCEPT(ID))) AS Table3
FROM R1 LEFT JOIN Table3 USING(ID)
GROUP BY R1.ID),
...
SELECT ANY_VALUE(R9).*, ARRAY_AGG(( SELECT AS STRUCT Table10.* EXCEPT(ID))) AS Table10
FROM R9 LEFT JOIN Table10 USING(ID)
The thing is that for example in my first table I can have two records with the same ID but some other fields will be different and I want to consider them as two distinct records and thus group by all the fields of the table while I join.
Then I want to do the same with all the "sub-table" (the R tables in the query), so I will able to group by all the fields of the nested tables.
How can I do it easily ?
I tried GROUP BY Table1.* but it doesn't work...
Thank you in advance
Try to_json_string:
...
FROM Table1 t1
...
GROUP BY to_json_string(t1)
You seem to want something like this:
select *
from table1 t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
This uses qualify instead of group by to fetch one row.
If you don't want all rows from from table1, you can whittle them down as well:
select *
from (select t1.*
from table1 t1
where true
qualify row_number() over (partition by id, col1, col2 order by id) = 1
) t1 left join
(select t2.*
from table2 t2
where true
qualify row_number() over (partition by t2.id order by t2.id) = 0
) t2
using (id)
How to Group By all fields ...?
I tried GROUP BY Table1.* but it doesn't work...
Consider below example
SELECT ANY_VALUE(t1).*,
ARRAY_AGG(( SELECT AS STRUCT t2.* EXCEPT(ID))) AS Table2
FROM Table1 t1 LEFT JOIN Table2 t2 USING(ID)
GROUP BY FORMAT('%t', t1)

Need assistance in rewriting this query

We have this query in production which runs daily
It does a lot of joins and also uses window function in hive
We tried to add few set options but that did not help much
Structure is something like this -
SELECT
C.f1, C.f2, A.f2 ...
FROM (
SELECT * FROM (
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
FROM T1 AS T1
JOIN T5 ON T1.t_dt = T5.t_dt
JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) T
WHERE T.rank_ = 1
) A
JOIN (SELECT *, row_number() over (partition by ac_id order by b_ts desc) rank_
FROM T4
WHERE event not in ('CT','UPD')
) AS C
ON A.a_id = C.a_id
AND A.atid = C.ac_id
AND C.rank_ = 1
JOIN T6 ON C.t_dt = T6.t_dt
As i cannot ignore any tables ( and joins ), My approach was to substitute the window function with another join using aggregate function max but i was not able to rewrite it.
Also i am not sure if that will surely help to improve performance so any guidance will help us.
Analytic functions usually perform better than joins with select max, because you are reading the same table only once in case of analytic function and row_number calculation is parallelized by partition by.
Try to regroup joins and filtering.
Join
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
with where condition ISNULL(PV.p_cd) is reducing some rows in T1.
The same do these conditions:
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
Move this join into the subquery, if it filters a lo, this may help to reduce the dataset in T1 before all other joins and row_number():
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1
Also first row_number is calculated only on T1 and B tables:
PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC
Consider joining T5 table after row_number filter, if this join is heavy, and row_number filter is reducing the dataset, then wrap row_number with filter in the subquery again and join subquery filtered with T5.
(--filtered by row_number
select * from
(
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
from
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1 JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
) T WHERE T.rank_ = 1
) T --filtered
JOIN T5 ON T1.t_dt = T5.t_d
This may help depending on your data.
Read also: https://stackoverflow.com/a/51061613/2700344 and this: https://stackoverflow.com/a/51061613/2700344

Postgresql SELECT LEFT JOIN with columns on case

SELECT a1,a2,a3,a4,count(a5),b1,b2,b3
FROM table1
LEFT JOIN table2 ON a1=b1 AND a2=b2 (*here i need to join
next columns a3=b3 only if from table2 will be returned more than 1 records
other wise first 2 columns will be enough*)
group by a1,a2,a3,a4,a5,b1,b2,b3
Anybody knows how to perform this trick ?
Well, if I understand correctly:
FROM table1 t1 LEFT JOIN
(SELECT t2.*, COUNT(*) OVER (PARTITION BY b1, b2) as cnt
FROM table2 t2
)
ON t1.a1 = t2.b1 AND t1.a2 = t2.b2 AND
(cnt = 1 OR t1.a3 = t2.a3)

tsql: alternative to select subquery in join

this is my table layout simplified:
table1: pID (pkey), data
table2: rowID (pkey), pID (fkey), data, date
I want to select some rows from table1 joining one row from table2 per pID for the most recent date for that pID.
I currently do this with the following query:
SELECT * FROM table1 as a
LEFT JOIN table2 AS b ON b.rowID = (SELECT TOP(1) rowID FROM table2 WHERE pID = a.pID ORDER BY date DESC)
This way of working is slow, probabaly because it has to do a subquery on each row of table 1. Is there a way to improve performance on this or do it another way?
You can try something on these lines, use the subquery to get the latest based on the date field (grouping by the pID), then join that with the first table, this way the subquery would not have not have to be executed for each row of Table1 and will result in better performance:
Select *
FROM Table1 a
INNER JOIN
(
SELECT pID, Max(Date) FROM Table2
GROUP BY pID
) b
ON a.pID = b.pID
I have provided the sample SQL for one column using the group by, in case you need additional columns, add them to the GROUP BY clause. Hope this helps.
use the below code, and note that i added the order by Date desc to get the most resent data
select *
from table1 a
inner join table2 b on a.pID=b.pID
where b.rowID in(select top(1) from table2 t where t.pID=a.pID order by Date desc)
I am using the code below in a similar scenaro (I transcripted it to your example)
SELECT b.*
FROM table1 AS a
left outer join (
SELECT a.*
FROM table2 a
inner join (
SELECT a.pID, max(date) as date
FROM table2
WHERE date <= <max_date>
group by pID
) b ON a.pID = b.pID AND a.date = b.date
) b ON a.pID = b.pID
) b on a.pID = b.pID
The only problem with this aproach is that you have to make sure the date's don't reapet for the pID's
You can do this with the row_number() function and a subquery:
SELECT t1.*
FROM table1 t1 LEFT JOIN
(select t2.*, row_number() over (partition by pId order by rowId desc) as seqnum
from table2 t2
) t2
on t1.pId = t2.pId and t2.seqnum = 1;
Use the ROW_NUMBER() function to get a column saying which id of each row in table 2 is the first (As partitioned by the pID, and ordered by the rowDate descending)
Example:
WITH cte AS
(
SELECT
rowID AS t2RowId,
ROW_NUMBER OVER (PARTITION BY pID ORDER BY rowDate DESC) AS rowNum
FROM table2 t2
) -- gets the t2RowIds + a column which says which is the latest for each pID
SELECT t1.*, t2.*
FROM table1 t1
LEFT JOIN
(
table2 t2
JOIN cte ON t2.rowID = cte.t2RowId AND cte.rowNum = 1
) ON t1.pID = t2.pID
This is guaranteed to only return 1 item from table2 per pID, even if multiple items have the same date. You should of course ensure that the date column is indexed in table 2 for quick performance (ideally an index that also covers the PrimaryID of table2)