Combining records using Hive window frame - sql

I am having a following query which is executed in Hive for a single date. But the table is having massive number of rows containing dudate from 2000 onwards. I want to create a window function for this dudate for previous day and previous week. Basically trying to convert this query including all dates dynamically using window function.
SQL
SELECT R.rmid as rid, '2021-05-30' as pdate, Sum(odn) as Odn, Sum(gdn) as gdn, Sum(wdn) as wdn,0 as podn,0 as pgdn,0 as pwdn,0 as aodn,0 as agdn,0 as awdn
FROM table1 as cd
inner Join table2 c on c.mid = cd.mid
inner join table3 rsi on rsi.omid = c.mid and rsi.omtype = 1
inner join table4 rs on rs.rsmid = rsi.rsmid
inner join table5 r on r.rmid = rs.rmid
WHERE cd.dudate = date('2021-05-30')
GROUP BY R.rmid,pdate
Union
SELECT R.rmid as rid,'2021-05-30' as pdate, 0 as Odn,0 as gdn,0 as wdn,Sum(odn) as podn, Sum(gdn) as pgdn, Sum(wdn) as pwdn,0 as aodn,0 as agdn,0 as awdn
FROM table1 as cd
inner Join table2 c on c.mid = cd.mid
inner join table3 rsi on rsi.omid = c.mid and rsi.omtype = 1
inner join table4 rs on rs.rsmid = rsi.rsmid
inner join table5 r on r.rmid = rs.rmid
WHERE cd.dudate = date_add('2021-05-30', -1)
GROUP BY R.rmid,pdate
Union
SELECT R.rmid as rid,'2021-05-30' as pdate,0 as Odn,0 as gdn,0 as wdn,0 as podn,0 as pgdn,0 as pwdn,Sum(odn)/7 as aodn, Sum(gdn)/7 as agdn, Sum(wdn)/7 as awdn
FROM table1 cd
inner Join table2 c on c.mid = cd.mid
inner join table3 rsi on rsi.omid = c.mid and rsi.omtype = 1
inner join table4 rs on rs.rsmid = rsi.rsmid
inner join table5 r on r.rmid = rs.rmid
WHERE cd.dudate BETWEEN date_add('2021-05-30', -7) AND date_add('2021-05-30', -1)
GROUP BY R.rmid
Basically combining all rows for given date, previous day and 7 days before records.
Can anyone provide me an idea how to union values in windowframes?

Related

Best Join Strategy/Indexes for SQL Server

What is the best join strategy/indexes for this query:
SELECT
kwk.*, an.AuftragDatum, an.AbgabeDatum, an.BezahltDatum, an.AuftragStatus
FROM
KundenWerbenKunden kwk
INNER JOIN
Auftrag an ON an.AuftragNummer = kwk.AuftragNummer
WHERE
kwk.Deleted = 0
Table KundenWerbenKunden has 103950 rows with 103646 Deleted = 0 ones.
Table Auftrag has 3826552 rows.
In my real query I make some more joins:
INNER JOIN
Filiale fn WITH (NOLOCK) ON an.FilialeID = fn.FilialeID
INNER JOIN
Kunde kn ON an.KundeID = kn.KundeID
OUTER APPLY
(SELECT DISTINCT KSKNr
FROM KdZuordnung
WHERE KundeID = kn.KundeID) zn
LEFT JOIN
Anrede ann WITH (NOLOCK) ON kn.Anrede = ann.Anrede
INNER JOIN
AuftragArt aa WITH (NOLOCK) ON an.AuftragArtID = aa.AuftragArtID
INNER JOIN
AuftragGrund ag WITH (NOLOCK) ON an.AuftragGrundID = ag.AuftragGrundID
INNER JOIN
AuftragType at WITH (NOLOCK) ON an.AuftragTypeID = at.AuftragTypeID
For this query:
SELECT *
FROM KundenWerbenKunden kwk INNER JOIN
Auftrag an
ON an.AuftragNummer = kwk.AuftragNummer
WHERE kwk.Geloescht = 0;
And not knowing anything about the distribution of Geloescht, I would first try indexes on KundenWerbenKunden(Geloescht, AuftragNummer) and Auftrag(AuftragNummer).

summation in sql

I have a query result tabletable of result i would like to sum the bill amount such that it returns one row with a distinct account ,balance,sum billed amount fPreviousReading,
fCurrentReading,
fConsumption .
result should be
1.account 11074
2.balance269.49
3.sumbilledamount 520.48
4. fPreviousReading 574
5 fCurrentReading 612
6 fConsumption 38
Thanks
query
select
Ten.Account,
ten.DCBalance AS Balance,
SUM(T.fInclusiveAmount)AS BilledAmount,
MRD.fPreviousReading,
MRD.fCurrentReading ,
MRD.fConsumption ,
T.cDescription
from _mtblTransactions T
left join _mtblProperties P ON P.idProperty = T.iPropertyID
left join _mtblPropertyPortions PP ON PP.idPropertyPortions = T.iPortionID
left join _mtblPropertyPortionServices PPS ON PPS.idPropertyPortionServices = T.iPropertyPortionServiceID
left join _mtblCategories Cat ON Cat.idCategory = PP.iPortionUsageID
left join _mtblServices S ON PPS.iPortionServiceID = S.idService
left join _mtblServiceGroups SG ON S.iServiceGroupID = SG.idServiceGroup
left join _mtblRateTariffs RT ON RT.idRateTariffs = PPS.iServiceRateTariffID
left join Client Ten ON T.iCustomerID = Ten.DCLink
left join _mtblMeters M ON PPS.iPropertyPortionMeterID = M.idMeter
left join _mtblWalkDetails WD ON WD.iWalkMeterID = PPS.iPropertyPortionMeterID
left join _mtblWalks W ON WD.iWalkID = W.idWalk
left join Client Own ON P.iPropertyOwnerID = Own.DCLink
left outer join _mtblRegions R on R.idRegions = P.iPropertyRegionID
left outer join _mtblSubRegions SR on SR.idSubRegions = P.iPropertySubRegionID
left outer join _mtblAreas A on A.idAreas = P.iPropertyAreaID
left join _etblPeriod PER ON T.iPeriodID = PER.idPeriod
left join _mtblMeterReadingDetails MRD ON T.iMeterID = MRD.iMeterReadingsMeterID and T.iPeriodID = MRD.iBillingPeriodID
and MRD.iReadingType=0
Where
oWN.Account='11074'
and idPeriod='79'
GROUP BY Ten.Account,ten.DCBalance,MRD.fPreviousReading, MRD.fCurrentReading, MRD.fConsumption, T.cDescription
As I don't know your data, there is a possibility I wrote you a code that would return some double rows. But that is a problem you can easily handle.
Try it, nevertheless:
SELECT t1.Account,
t1.Balance,
t1.BilledAmount,
t1.fPreviousReading,
t1.fCurrentReading,
t1.fConsumption,
t2.cDescription
FROM (SELECT Ten.Account,
ten.DCBalance AS Balance,
SUM (T.fInclusiveAmount) AS BilledAmount,
SUM (MRD.fPreviousReading) AS fPreviousReading,
SUM (MRD.fCurrentReading) AS fCurrentReading,
SUM (MRD.fConsumption) AS fConsumption
FROM _mtblTransactions T
left join _mtblProperties P ON P.idProperty = T.iPropertyID
left join Client Ten ON T.iCustomerID = Ten.DCLink
left join Client Own ON P.iPropertyOwnerID = Own.DCLink AND oWN.Account='11074'
left join _etblPeriod PER ON T.iPeriodID = PER.idPeriod
left join _mtblMeterReadingDetails MRD ON T.iMeterID = MRD.iMeterReadingsMeterID and T.iPeriodID = MRD.iBillingPeriodID
and MRD.iReadingType=0
and idPeriod='79'
GROUP BY Ten.Account,ten.DCBalance) t1
JOIN (SELECT T.cDescription,
Ten.Account,
ten.DCBalance
FROM _mtblTransactions T
left join Client Ten ON T.iCustomerID = Ten.DCLink) t2 ON t2.Account = t1.Account AND t2.DCBalance = t1.DCBalance

SQL Multiple Joins not working as expected

I have following query not working when I try to join all 4 tables (It is taking over an hour to run, I have to eventually kill the query without any data being returned).
It works when Table 1,2 & 3 are joined AND Then If I try Table 1,2 & 4 join but not when I attempt to join all 4 tables below.
Select * From
(Select
R.ID, R.MId, R.RId, R.F_Name, R.F_Value, FE.FullEval, M.Name, RC.CC
FROM Table1 as R
Inner Join Table2 FE
ON R.ID = FE.RClId and R.MId = FE.MId and R.RId = FE.RId
Inner Join Table3 as M
ON R.MId = M.MId and FE.MId = M.MId
Inner Join Table4 as RC
ON R.RId = RC.RId and FE.RId = RC.RId and FE.Date = RC.Date
) AS a
NOTE:
1) RId is not available in table3.
2) MId is not available in table4.
Thanks for help.
Since you mentioned that you don't have permission to view the query plan, try breaking down into each table join. You can also check which table join is taking time to retrieve records. From there, you can investigate the data why it's taking time. It may be because of non-availability of column keys in Table 3 and Table 4?
WITH Tab1_2 AS
(SELECT r.ID, r.MId, r.RId, r.F_Name, r.F_Value, fe.FullEval, fe.date
FROM Table1 as r
INNER JOIN Table2 fe
ON r.ID = fe.RClId
AND r.MId = fe.MId
AND r.RId = fe.RId
WHERE ... -- place your conditions if any
),
Tab12_3 AS
(SELECT t12.*, m.Name
FROM Tab1_2 t12
INNER JOIN Table3 as m
ON t12.MId = m.MId
WHERE ... -- place your conditions if any
),
Tab123_4 AS
(SELECT t123.ID, t123.MId, t123.RId, t123.F_Name, t123.F_Value, t123.FullEval, rc.CC
FROM Tab12_3 t123
INNER JOIN Table4 as rc
ON t123.RId = rc.RId
AND t123.Date = rc.Date
WHERE ... -- place your conditions if any
)
SELECT *
FROM Tab123_4 t1234

why selecting particular columns from same table slows down query performance significantly?

I have SELECT statement that querying columns from tblQuotes. Why if I am selecting columns a.ProducerCompositeCommission and a.CompanyCompositeCommission, then query spinning forever.
Execution plans with and without those columns are IDENTICAL!
If I commented them out - then it brings result for 1 second.
SELECT
a.stateid risk_state1,
--those columns slows down performance
a.ProducerCompositeCommission,
a.CompanyCompositeCommission,
GETDATE() runDate
FROM
tblQuotes a
INNER JOIN
lstlines l ON a.LineGUID = l.LineGUID
INNER JOIN
tblSubmissionGroup tsg ON tsg.SubmissionGroupGUID = a.SubmissionGroupGuid
INNER JOIN
tblUsers u ON u.UserGuid = tsg.UnderwriterUserGuid
INNER JOIN
tblUsers u2 ON u2.UserGuid = a.UnderwriterUserGuid
LEFT OUTER JOIN
tblFin_Invoices tfi ON tfi.QuoteID = a.QuoteID AND tfi.failed <> 1
INNER JOIN
lstPolicyTypes lpt ON lpt.policytypeid = a.policytypeid
INNER JOIN
tblproducercontacts prodC ON prodC.producercontactguid = a.producercontactguid
INNER JOIN
tblProducerLocations pl ON pl.producerlocationguid = prodc.producerlocationguid
INNER JOIN
tblproducers prod ON prod.ProducerGUID = pl.ProducerGUID
LEFT OUTER JOIN
Catalytic_tbl_Model_Analysis aia ON aia.ImsControl = a.controlno
AND aia.analysisid = (SELECT TOP 1 tma2.analysisid
FROM Catalytic_tbl_Model_Analysis tma2
WHERE tma2.imscontrol = a.controlno)
LEFT OUTER JOIN
Catalytic_tbl_RDR_Analysis rdr ON rdr.ImsControl = a.controlno
AND rdr.analysisid = (SELECT TOP 1 tma2.analysisid
FROM Catalytic_tbl_RDR_Analysis tma2
WHERE tma2.imscontrol = a.controlno)
LEFT OUTER JOIN
tblProducerContacts mnged ON mnged.producercontactguid = ProdC.ManagedBy
LEFT OUTER JOIN
lstQuoteStatusReasons r1 ON r1.id = a.QuoteStatusReasonID
WHERE
l.LineName = 'EARTHQUAKE'
AND CAST(a.EffectiveDate AS DATE) >= CAST('2017-01-01' AS DATE)
AND CAST(a.EffectiveDate AS DATE) <= CAST('2017-12-31' AS DATE)
ORDER BY
a.effectiveDate
The execution plan can be found here:
https://www.brentozar.com/pastetheplan/?id=rJawDkTx-
I ran sp_help and this is what I see:
What exactly wrong with those columns?
I dont use them in a JOIN or anything. Why such bahaviour?
Table Size:
Indexes on table tblQuotes

Choose the greater of either left or right side of 2 queries

I have the following union query that queries for the most recent date of a column if it exists:
SELECT TOP 1 m.sentdate AS 'calltreelastsignedoff'
FROM Incidents i
INNER JOIN Plans p ON i.planuid = p.uid
INNER JOIN IncidentMessages im ON i.uid = im.incidentuid
INNER JOIN Messages m ON im.messageuid = m.uid
WHERE p.uid = '031E3346-2921-426E-9494-1111111111'
UNION
SELECT TOP 1 m.sentdate AS 'calltreelastsignedoff'
FROM Incidents i
INNER JOIN PlanExercises pe ON i.planexerciseuid = pe.uid
INNER JOIN IncidentMessages im ON i.uid = im.incidentuid
INNER JOIN Messages m ON im.messageuid = m.uid
WHERE pe.planuid = '031E3346-2921-426E-9494-1111111111'
This will return 2 values if each query returns a top 1 result.
What I really want is to select the top 1 of the combined query.
How can I perform a select on the unioned query?
try this:
You could do this with a derived table
select top 1 from
(
SELECT TOP 1 m.sentdate AS 'calltreelastsignedoff'
FROM Incidents i
INNER JOIN Plans p ON i.planuid = p.uid
INNER JOIN IncidentMessages im ON i.uid = im.incidentuid
INNER JOIN Messages m ON im.messageuid = m.uid
WHERE p.uid = '031E3346-2921-426E-9494-1111111111'
UNION
SELECT TOP 1 m.sentdate AS 'calltreelastsignedoff'
FROM Incidents i
INNER JOIN PlanExercises pe ON i.planexerciseuid = pe.uid
INNER JOIN IncidentMessages im ON i.uid = im.incidentuid
INNER JOIN Messages m ON im.messageuid = m.uid
WHERE pe.planuid = '031E3346-2921-426E-9494-1111111111'
)a
order by <col>