I am trying to convert a stored procedure which is written in T-SQL to BigQuery compatible syntax.
In one of the temp table used inside the proc, there is a function WITHIN GROUP as given in the query below.
SELECT DISTINCT
flr_id, lid, sentinel, liquid, d_id, sent_time, tracker,
(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ((DATE_DIFF(second, (sent_time), (tracker))/3600.0)) ASC) OVER (PARTITION BY flr_id, sentinel, d_id)/24.0) AS mct
FROM
history AS h
INNER JOIN
magent AS mag ON mag.flat = h.flat
INNER JOIN
stepand AS stepand ON stepand.soid = h.soid
INNER JOIN
sv AS st_v ON st_v.stoid = stepand.stoid
INNER JOIN
tvr AS tvr ON tvr.trvid = stepand.trvid
INNER JOIN
sdv AS sdv ON st_v.stoid = sdv.stoid
WHERE
liquid > 0
AND mag.flr_id = '1234'
AND tracker <= GETDATE()
AND sent_time >= DATEADD(WEEK, 1, GETDATE())
AND d_id NOT LIKE ('UNKNOWN')
AND part_type_code NOT IN ('ABCDE')
AND h.lid not like 'B%'
AND h.lid not like 'T%'
AND h.lid not like 'VL%'
AND h.step_deleted_sw <> 'Y'
AND h.lid NOT IN (SELECT lid from test)");
I converted all of the query except for this line.
(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ((DATE_DIFF(second,
(sent_time), (tracker))/3600.0)) ASC) OVER (PARTITION BY flr_id,
sentinel, d_id)/24.0) AS mct
I looked for WITHIN GROUP in SQL and found a good explanation here
What I don't understand is what is the equivalent function for WITHIN GROUP on Bigquery ?
When I tried to run the query as it is, I get an error
Syntax error: Expected end of input but got keyword WITHIN at <PROCNAME>
Could anyone let me know how can I modify the line using WITHIN GROUP to big query compatible syntax ?
Ant help is much appreciated.
If I understand what you want to do, BigQuery implements this using the ORDER BY for the window function:
PERCENTILE_CONT(0.5) OVER (PARTITION BY flr_id, sentinel, d_id)/24.0
ORDER BY ((DATE_DIFF(second, (sent_time), (tracker))/3600.0)) ASC
) AS mct
Of course the DATE_DIFF() syntax is not correct for BigQuery and your query may have other issues as well. However, this answers the question that you specifically asked.
Related
I cannot solve a problem with my GROUP BY problem in my query containing CASE...WHEN
Could you help me please with that?
select ID,
CODE,
NOM AS TITLE,
level,
ID_PARENT,
CASE ID_PARENT
WHEN 1111 THEN 'MAIN'
ELSE
(
SUBSTR(
(
SELECT CODE FROM units o
INNER JOIN LIB_UNITS_MV oLab on oLab.ID = o.ID WHERE o.ID = units.ID_PARENT AND LNG_CD = 'ENG'
)
, 7)
)
END AS "PARENT_CODE"
from units
INNER JOIN LIB_UNITS_MV orgLab on orgLab.ID = units.ID
WHERE orgLab.LNG ='ENG'
start with units.id = 1111
connect by prior u.ID = u.ID_PARENT
GROUP BY ID, CODE, NOM, level, ID_PARENT
I obtain the error "not a GROUP BY expression" because I have the WHEN...CASE
Thank you in advance for your help
Regards
Becuase when you group by you need to group by sufficient number of columns, which you use in select statement, outside aggregating functions (min, max, sum etc). So in your case this means - you can either group by all columns used in your case statement, or group by the whole case statement (just copy it over to your group by), or any set of sub-combinations of the whole case, altogether covering it completely. However - since you are not using any aggregate functions I would just do distinct and drop group by altogether.
I have looked at a few other questions regarding this problem, we are trying to get a stored procedure working that contains the LAG() function, but the machine we are now trying to install an instance on is SQL 2008 and we can't use it
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
What I've tried so far (edited to reflect Zohar Peled's) suggestoin
SELECT se.SetID,se.SetName,se.ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level,
QuestionType
FROM tblSet se
left join tblSet se2 on se.ParentSetId = se2.ParentSetId -1
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where se.CollectionId=#colID and se.IsDeleted=0
order by se.SetID
it does not seem to be bringing out all of the same records when I run them side by side and the level value seems to be different also
I have put in some of the outputs into a HTML formatted table from the version containing LAG() (the first results) then the second is the new version, where the levels are not coming out the same
https://jsfiddle.net/gyn8Lv3u/
LAG() can be implemented using a self-join as Jeroen wrote in his comment, or by using a correlated subquery. In this case, it's a simple lag() so the correlated subquery is also simple:
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (
(
SELECT TOP 1 ParentSetId
FROM tblSet seInner
WHERE seInner.ParentSetId < se.ParentSetId
ORDER BY seInner.ParentSetId DESC
)
<> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
If you had specified an offset it would be harder do implement using a correlated subquery, and a self join would make a much easier solution.
Sample data and desired results would help. This construct:
(case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1
end) as level
is quite strange. You are lagging by the only column used in the order by. That makes sense. But then you are comparing the value to the same column, implying that there are duplicates.
If you have duplicates, then order by se.ParentSetId is unstable. That is, the "previous" row is indeterminate because of the duplicate values being ordered. You can run the query twice and get different results.
I am guessing you want one row with the value 1 for each parent set id. If so, then in either database, you would use:
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level
This also has the problem with an unstable ordering. You can fix this by changing the order by to what you really want.
I am trying to sum up the COUNT(IHID.RSID_PROD_N) by IHID.CS_ID but facing a problem. How to solve it?
SELECT
IHID.CS_ID ,IHID.RSID_PROD_N,COUNT(IHID.RSID_PROD_N),
RSPF.RSPF_PROD_N,COUNT(RSPF.RSPF_PROD_N),sum(COUNT(IHID.RSID_PROD_N))
from IHIH
JOIN IHID
ON ihih.rsih_invoice_n = ihid.rsih_invoice_n AND ihih.cs_id = ihid.cs_id
JOIN RSPF
ON ihih.cs_id = rspf.cs_id AND ihid.rsid_prod_n=rspf.rspf_prod_n
WHERE rspf_desc LIKE '%SCISSOR LIFT'
GROUP BY IHID.CS_ID, IHID.RSID_PROD_N,RSPF.RSPF_PROD_N,IHID.CS_ID;
The table is something like this
16 SJIII4626 1 SJIII4626 1
16 SJIII4632 1 SJIII4632 1
I want 1+1=2 for 16
I think you need analytic functions rather than aggregates here. Something like:
SELECT
IHID.CS_ID
,IHID.RSID_PROD_N
,row_number() over (partition by IHID.CS_ID order by IHID.RSID_PROD_N) as IHID_RSID_PROD_N
,RSPF.RSPF_PROD_N
,row_number() over (partition by IHID.CS_ID order by RSPF.RSPF_PROD_N) as RSPF_RSPF_PROD_N
,COUNT(IHID.RSID_PROD_N) over (partition by IHID.CS_ID) as sum_count
from IHIH
JOIN IHID
ON ihih.rsih_invoice_n = ihid.rsih_invoice_n AND ihih.cs_id = ihid.cs_id
JOIN RSPF
ON ihih.cs_id = rspf.cs_id AND ihid.rsid_prod_n=rspf.rspf_prod_n
WHERE rspf_desc LIKE '%SCISSOR LIFT'
;
Not entirely sure because your question lacks a complete test case.
If this answer isn't quite what you want please edit your question to provide table structures and sample input data together with required output derived from that data.
Its not grouping as you would like because of the unique values in IHID.RSID_PROD_N and RSPF.RSPF_PROD_N. Remove those columns and it will group as expected.
One option is to use your current query (almost unchanged) as a CTE, and then apply SUM to a COUNT which you couldn't have done in a nested manner. Something like this:
with your_current_query as
-- removed nested SUM(COUNT)
(select
ihid.cs_id,
ihid.rsid_prod_n,
rspf.rspf_prod_n,
count(ihid.rsid_prod_n) cnt_rsid
count(rspf.rspf_prod_n) cnt_rspf
from ihih join ihid on ihih.rsih_invoice_n = ihid.rsih_invoice_n
and ihih.cs_id = ihid.cs_id
join rspf on ihih.cs_id = rspf.cs_id
and ihid.rsid_prod_n=rspf.rspf_prod_n
where rspf_desc like '%SCISSOR LIFT'
group by ihid.cs_id,
ihid.rsid_prod_n,
rspf.rspf_prod_n
)
select cs_id,
rsid_prod_n,
rspf_prod_n,
cnt_rsid,
cnt_rspf,
sum(cnt_rsid) sum_cnt_rsid --> this represents nested SUM(COUNT)
from your_current_query
group by cs_id,
rsid_prod_n,
rspf_prod_n,
cnt_rsid,
cnt_rspf;
I am trying to write a query to spot new line items appearing in my data set. So for example I have the following table structure.
The logic needs to identify if the line item is new since the previous billedmonth
TableA
So if I was to write it in English.
Select IF 'CLI' & 'Description' & 'UnitCost' doesn't exist for BilledMonth -1
I have managed to create a join showing if it exists for the previous billing month.
But I am really struggling with the negative logic (i.e. the line item is new for this month)
Any help greatly appreciated.
SELECT t.CLI, t.Description
FROM yourTable t
LEFT JOIN yourTable t2
ON t.CLI = t2.CLI
AND t.Description = t2.Description
AND t.UnitCost = t2.UnitCost
AND t.BilledMonth - 1 = t2.BilledMonth
WHERE t2.CLI is null
I think sql server supports analytic functions, so something like this should work:
select CLI, Description, UnitCost, billedMonth
from (
select CLI, Description, UnitCost, billedMonth
count(*) over (partition by CLI, Description, UnitCost order by billedMonth) cnt
from mytable
) where cnt = 1
Iff this works it is very likely to be way more efficient and faster than a join based select statement.
I am looking at a report on policy exceptions based on various criteria such as Beacon Score, Debt to Income, and Loan to Value. This information is kept in multiple different tables, and right now the Loan to Value column is causing multiple entries in my report because a specific loan might have multiple pieces of collateral. For proper exception monitoring, I only need one entry.
With all that said, how might I execute the following code, with a distinct value for dbo.Folders.Id? Just putting 'DISTINCT' after the SELECT statement does not seem to work. (Sensitive values masked with '#'.)
SELECT dbo.Folders.LoanOfficerId,
dbo.Folders.Id,
dbo.CollateralType.Description,
dbo.Customers.CUSTNAME,
dbo.Folders.DateLoanActivated,
dbo.Folders.CurrentAccountBalance,
dbo.Folders.UnadvancedCommitAmount,
dbo.Folders.BeaconScore,
dbo.Folders.DebtToIncome,
dbo.Collateral.LoanToValue
FROM dbo.Folders
INNER JOIN dbo.Customers
ON dbo.Folders.CustomersNAMEKEY = dbo.Customers.NAMEKEY
INNER JOIN dbo.Collateral
ON dbo.Folders.Id = dbo.Collateral.FoldersID
INNER JOIN dbo.CollateralType
ON dbo.Collateral.CollateralTypeCollCode = dbo.CollateralType.CollCode
WHERE ( (dbo.Folders.BeaconScore < ###)
AND (dbo.Folders.BeaconScore > ###)
AND (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CollateralCode <> ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR (dbo.Collateral.LoanToValue > dbo.CollateralType.LTV)
Any constructive criticism on my code is welcome. (Static values in the above statement are on the docket to be corrected later with a thresholds/criteria table.) From what I have seen, others have suggested using ROW_COUNT() with PARTITION, but I am unable to make the syntax work.
Comment about formatting: learn to use table aliases. They make the query easier to read and write.
If you only need one row from the results, you can use row_number(). This enumerates the rows for each folder (in your case) and you would just use the first one. You can do this using:
with t as (
<your query here>
)
select t.*
from (select t.*,
row_number() over (partition by FoldersId order by (select NULL)) as seqnum
from t
) t
where seqnum = 1;
On the other hand, if you needed to aggregate information from the collateral tables, then you would use group by in your query with the appropriate aggregation functions.