I am confused about how to fix it; can anyone help me?
with table_harga_barang as
(
select harga
from `kimia.penjualan` pen
where id_barang >= 'BRG0001'
and id_barang <= 'BRG0010'
)
select
table_harga_barang as harga_barang,
bar.nama_barang, bar.tipe, bar.lini
from
table_harga_barang
inner join
`kimia.barang` bar on bar.kode_barang = pen.id_barang;
How can I fix 'on'?
I already tried but I don't know how I can join on the primary key
The alias 'pen' you used inside the CTE remains in the CTE (not recognised by the the 'calling/main' query (the bottom part). To make it work you have to select pen.id_barang from the CTE, and use that in the join. Also, the 'caller/main' query (not being aware of the alias 'pen') can assign a new alias to the CTE (it can even be 'pen' again).
Also you are using the table name as a column in SELECT ...table_harga_barang as harga_barang, I presume you wanted to retrieve harga here.
with table_harga_barang as
(
select id_barang
, harga
from `kimia.penjualan` pen
where id_barang >= 'BRG0001'
and id_barang <= 'BRG0010'
)
select
pen.harga as harga_barang,
bar.nama_barang, bar.tipe, bar.lini
from
table_harga_barang pen
inner join
`kimia.barang` bar on bar.kode_barang = pen.id_barang;
There is not much benefit to using a CTE in this context (other than, perhaps, you wanted to emphasize id_barang >= 'BRG0001' and id_barang <= 'BRG0010' criteria towards the top of the query?
Related
I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.
I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....
This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name
Let me open with:
SHOWPLAN permission denied in database 'MyDatabase'.
With that out of the way, I'll layout my situation.
So, The database I work with has a view that executes fairly quickly.
SELECT * FROM MyView
returns 32 rows in 1 second and includes a non-indexed column of values (IDs) I need to filter on.
If I filter on these IDs directly in the view:
SELECT * FROM MyView WHERE MyView.SomeId = 18
Things slow immensely and it takes 21 seconds to return the 20 rows with that ID.
As an experiment I pushed the unfiltered results into a temporary table and executed the filtered query on the the temporary table:
IF OBJECT_ID('tempdb..#TEMP_TABLE') IS NOT NULL
BEGIN
DROP TABLE #TEMP_TABLE
END
SELECT * INTO #TEMP_TABLE
FROM MyView;
SELECT *
FROM #TEMP_TABLE
WHERE #TEMP_TABLE.SomeId = 18
DROP TABLE #TEMP_TABLE
And found that it returns the filtered results far faster (roughly 1 second)
Is there a cleaner syntax or pattern that can be implemented to achieve the same performance?
UPDATE: View Definition and Description
Manually obfuscated, but I was careful so hopefully there aren't many errors. Still waiting on SHOWPLAN permissions, so Execution Plan is still pending.
The view's purpose is to provide a count of all the records that belong to a specific component (CMP.COMPONENT_ID = '100') grouped by location.
"Belonging" is determined by the record's PROC_CODE (mapped through PROC_ID) being within the CMP's inclusion range (CMP_INCs) and not in the CMP's exclusion range (CMP_EXCs).
In practice, exclusion ranges are created for individual codes (the bounds are always equal) making it sufficient to check that the code is not equal a bound.
PROC_CODES can (and don't always) have an alphabetic prefix or suffix, which makes the ISNUMERIC() comparison necessary.
Records store PROC_IDs for their PROC_CODEs, so it's necessary to convert the CMP's PROC_CODE ranges into a set of PROC_IDs for identifying which records belong to that component
The performance issue occurs when trying to filter by DEPARTMENT_ID or LOCATION_ID
[CO_RECORDS] is also a view, but if it's that deep I'm going turf this to someone with less red tape to fight through.
CREATE VIEW [ViewsSchema].[MyView] AS
WITH
CMP_INCs AS (SELECT RNG.*, COALESCE(RNG.RANGE_END, RNG.RANGE_BEG) [SAFE_END] FROM DBEngine.DBO.DB_CMP_RANGE [RNG] WHERE [RNG].COMPONENT_ID = '100'),
CMP_EXCs AS (SELECT CER.* FROM DBEngine.DBO.DB_CMP_EXC_RANGE CER WHERE CER.COMPONENT_ID = '100'),
CMP_PROC_IDs AS (
SELECT
DBEngine_ProcTable.PROC_ID [CMP_PROC_ID],
DBEngine_ProcTable.PROC_CODE [CMP_PROC_CODE],
DB_CmpTable.COMPONENT_ID [CMP_ID],
MAX(DB_CmpTable.COMPONENT_NAME) [CMP_NAME]
FROM [DBEngine].DBO.DBEngine_ProcTable DBEngine_ProcTable
LEFT JOIN CMP_INCs ON ISNUMERIC(DBEngine_ProcTable.PROC_CODE) = ISNUMERIC(CMP_INCs.RANGE_BEG)
AND(DBEngine_ProcTable.PROC_CODE = CMP_INCs.RANGE_BEG
OR DBEngine_ProcTable.PROC_CODE BETWEEN CMP_INCs.RANGE_BEG AND CMP_INCs.SAFE_END)
INNER JOIN DBEngine.DBO.DB_CmpTable ON CMP_INCs.COMPONENT_ID = DB_CmpTable.COMPONENT_ID
LEFT JOIN CMP_EXCs EXCS ON EXCS.COMPONENT_ID = DB_CmpTable.COMPONENT_ID AND EXCS.EXCL_RANGE_END = DBEngine_ProcTable.PROC_CODE
WHERE EXCS.EXCL_RANGE_BEG IS NULL
GROUP BY
DBEngine_ProcTable.PROC_ID,
DBEngine_ProcTable.PROC_CODE,
DBEngine_ProcTable.BILL_DESC,
DBEngine_ProcTable.PROC_NAME,
DB_CmpTable.COMPONENT_ID
)
SELECT
RECORD.LOCATION_NAME [LOCATION_NAME]
, RECORD.LOCATION_ID [LOCATION_ID]
, MAX(RECORD.[Department]) [DEPARTMENT]
, RECORD.[Department ID] [DEPARTMENT_ID]
, SUM(RECORD.PROCEDURE_QUANTITY) [PROCEDURE_COUNT]
FROM DBEngineCUSTOMRPT.ViewsSchema.CO_RECORDS [RECORDS]
INNER JOIN CMP_PROC_IDs [CV] ON [CV].CMP_PROC_ID = [RECORDS].PROC_ID
CROSS JOIN (SELECT DATEADD(M, DATEDIFF(M, 0,GETDATE()), 0) [FIRSTOFTHEMONTH]) VARS
WHERE [RECORDS].TYPE = 1
AND ([RECORDS].VOID_DATE IS NULL OR [RECORDS].VOID_DATE >= VARS.[FIRSTOFTHEMONTH] )
AND [RECORDS].POST_DATE < VARS.[FIRSTOFTHEMONTH]
AND [RECORDS].DOS_MONTHS_BACK = 2
GROUP BY [RECORDS].LOCATION_NAME, [RECORDS].[Department ID]
GO
Based on the swift down votes, the answer to my question is
'No, there is not a clean syntax based solution for the improved
performance, and asking for one is ignorant of the declarative nature
of SQL you simple dirty plebeian'.
From the requests for the view's definition, it's clear that performance issues in simple queries should be addressed by fixing the structure of the objects being queried ('MyView' in this case) rather than syntactical gymnastics.
For interested parties the issue was resolved by adding a Row_Number() column to the final select in the view definition, wrapping it in a CTE, and using the new column in an always true filter while selecting the original columns.
I have no idea if this is the optimal solution. It doesn't feel good to me, but it appears to be working.
CREATE VIEW [ViewsSchema].[MyView] AS
WITH
CMP_INCs AS (SELECT RNG.*, COALESCE(RNG.RANGE_END, RNG.RANGE_BEG) [SAFE_END] FROM DBEngine.DBO.DB_CMP_RANGE [RNG] WHERE [RNG].COMPONENT_ID = '100'),
CMP_EXCs AS (SELECT CER.* FROM DBEngine.DBO.DB_CMP_EXC_RANGE CER WHERE CER.COMPONENT_ID = '100'),
CMP_PROC_IDs AS (
SELECT
DBEngine_ProcTable.PROC_ID [CMP_PROC_ID],
DBEngine_ProcTable.PROC_CODE [CMP_PROC_CODE],
DB_CmpTable.COMPONENT_ID [CMP_ID],
MAX(DB_CmpTable.COMPONENT_NAME) [CMP_NAME]
FROM [DBEngine].DBO.DBEngine_ProcTable DBEngine_ProcTable
LEFT JOIN CMP_INCs ON ISNUMERIC(DBEngine_ProcTable.PROC_CODE) = ISNUMERIC(CMP_INCs.RANGE_BEG)
AND(DBEngine_ProcTable.PROC_CODE = CMP_INCs.RANGE_BEG
OR DBEngine_ProcTable.PROC_CODE BETWEEN CMP_INCs.RANGE_BEG AND CMP_INCs.SAFE_END)
INNER JOIN DBEngine.DBO.DB_CmpTable ON CMP_INCs.COMPONENT_ID = DB_CmpTable.COMPONENT_ID
LEFT JOIN CMP_EXCs EXCS ON EXCS.COMPONENT_ID = DB_CmpTable.COMPONENT_ID AND EXCS.EXCL_RANGE_END = DBEngine_ProcTable.PROC_CODE
WHERE EXCS.EXCL_RANGE_BEG IS NULL
GROUP BY
DBEngine_ProcTable.PROC_ID,
DBEngine_ProcTable.PROC_CODE,
DBEngine_ProcTable.BILL_DESC,
DBEngine_ProcTable.PROC_NAME,
DB_CmpTable.COMPONENT_ID
),
RESULTS as (
SELECT
RECORD.LOCATION_NAME [LOCATION_NAME]
, RECORD.LOCATION_ID [LOCATION_ID]
, MAX(RECORD.[Department]) [DEPARTMENT]
, RECORD.[Department ID] [DEPARTMENT_ID]
, SUM(RECORD.PROCEDURE_QUANTITY) [PROCEDURE_COUNT]
, ROW_NUMBER() OVER (ORDER BY TDL.[Medical Department ID], TDL.[BILL_AREA_ID], TDL.JP_POS_NAME) [ROW]
FROM DBEngineCUSTOMRPT.ViewsSchema.CO_RECORDS [RECORDS]
INNER JOIN CMP_PROC_IDs [CV] ON [CV].CMP_PROC_ID = [RECORDS].PROC_ID
CROSS JOIN (SELECT DATEADD(M, DATEDIFF(M, 0,GETDATE()), 0) [FIRSTOFTHEMONTH]) VARS
WHERE [RECORDS].TYPE = 1
AND ([RECORDS].VOID_DATE IS NULL OR [RECORDS].VOID_DATE >= VARS.[FIRSTOFTHEMONTH] )
AND [RECORDS].POST_DATE < VARS.[FIRSTOFTHEMONTH]
AND [RECORDS].DOS_MONTHS_BACK = 2
GROUP BY [RECORDS].LOCATION_NAME, [RECORDS].[Department ID]
)
SELECT
[LOCATION_NAME]
, [LOCATION_ID]
, [DEPARTMENT]
, [DEPARTMENT_ID]
, [PROCEDURE_COUNT]
FROM RESULTS
WHERE [ROW] > 0
GO
I'm trying to figure out where I could add an index or modify an existing one to make the query go faster.
Main problem is that I cannot change the query itself, it's generated by Business Objects.
The strange thing is that if I change the WHERE clause to use another table, then the query is really fast!
The query is the following:
SELECT Year(listingperformanceindicator.date),
Month(listingperformanceindicator.date),
categoryflattened.parentname,
categoryflattened.groupname,
categoryflattened.categoryname,
Count(listingperformanceindicator.listingid),
( Count(listingperformanceindicator.listingid) ) / 30,
Sum(listingperformanceindicator.listingviews),
CASE
WHEN ( ( Count(listingperformanceindicator.listingid) ) / 30 ) = 0 THEN
0
ELSE ( Sum(listingperformanceindicator.listingviews) ) /
( ( Count(listingperformanceindicator.listingid) ) / 30 )
END
FROM listingperformanceindicator
RIGHT OUTER JOIN categorytolisting
ON ( categorytolisting.siteid = listingperformanceindicator.siteid
AND categorytolisting.listingid = listingperformanceindicator.listingid )
RIGHT OUTER JOIN categoryflattened
ON ( categoryflattened.siteid = categorytolisting.siteid
AND ( categoryflattened.parentid = categorytolisting.categoryid
OR categoryflattened.categoryid = categorytolisting.categoryid
OR categoryflattened.groupid = categorytolisting.categoryid ) )
RIGHT OUTER JOIN site ON (site.id=categoryflattened.siteId)
WHERE --listingperformanceindicator.siteid = 'DED29E78-17B0-423B-A1D1-67E2F3CA864D' --THIS IS REALLY FAST :/
site.id = 'DED29E78-17B0-423B-A1D1-67E2F3CA864D' --THIS IS REALLY SLOW! :(
AND categoryflattened.languagecode ='SE'
GROUP BY Year(listingperformanceindicator.date),
Month(listingperformanceindicator.date),
categoryflattened.parentname,
categoryflattened.groupname,
categoryflattened.categoryname
I've tried looking at the execution plan but it makes no sense to me :(
Here they are:
fast: http://imageshack.us/a/img211/5755/fastquery.png
slow: http://imageshack.us/a/img823/7227/slowquery.png
Any suggestion appreciated!
Thanks!
I have the following query that works fine:
SELECT NomComplet, IIF(Count(FS3.Index) = 0, '0 (RAS)', Count(FS3.Index))
FROM ControleAcces INNER JOIN (
Employes LEFT JOIN (
SELECT FS1.Index, FS1.OTP, FS1.OTP, FS1.Axe, FS1.FaitSaillant, FS1.Utilisateur, FS2.DateInsertion
FROM FaitsSaillants AS FS1 INNER JOIN (
SELECT Axe, Index, Max(FaitsSaillants.DateInsertion) AS DateInsertion
FROM FaitsSaillants
WHERE DateValue(DateInsertion) > #2010-01-01#
AND DateValue(DateInsertion) < #2011-12-31#
GROUP BY Axe, Index
) AS FS2
ON (FS1.DateInsertion = FS2.DateInsertion
AND FS1.Index = FS2.Index)
WHERE FS1.Axe = 'Project' AND FS2.Axe = 'Project'
) AS FS3
ON Employes.CIP = FS3.Utilisateur
)
ON ControleAcces.Valeur = Employes.CIP
GROUP BY NomComplet
ORDER BY NomComplet
Don't bother to fully understand it, all I want it to edit my IIF condition on the first line. Actually, the condition doesn't do much, it checks how many FS3.Index the query returns and concatenate (RAS) if it's 0. However, in fact, I would like it to check if there is any row in FaitsSaillants where Axe = 'RAS'. If the Count() of this is > 0, then the condition is met.
Can I do a subquery into the IIF segment, something like SELECT COUNT(Index) FROM FaitsSaillants WHERE Axe = 'RAS' AND Utilisateur = FS1.Utilisateur? If the result is 0, then I add the RAS to my second field's results. If not, it stays Count(FS3.Index).
I tried it and while the syntax is correct, the problem is it can't check for the Utilisateur = FS1.Utilisateur condition because FS1 is in the main query. However, I must check this because this is the only way to be sure that I'm looking for the right thing: it must be the same Utilisateur whether I'm in the main query or the subquery.
EDIT:
Here is a shorter version of what I tried from the answers/comments below.
SELECT NomComplet, IIf(FS2.AxeCount > 0, "0 (RAS)", count(FS3.index))
FROM ControleAcces INNER JOIN (Employes LEFT JOIN (SELECT FS2.AxeCount, FS1.Index, FS1.OTP, FS1.OTP, FS1.Axe, FS1.FaitSaillant, FS1.Utilisateur, FS2.DateInsertion
FROM FaitsSaillants AS FS1 INNER JOIN (
SELECT Axe, Index, Max(FaitsSaillants.DateInsertion) AS DateInsertion, SUM(IIf(Axe = 'RAS', 1, 0)) As AxeCount
FROM FaitsSaillants
GROUP BY Axe, Index
) AS FS2
ON (FS1.DateInsertion = FS2.DateInsertion
AND FS1.Index = FS2.Index)
) AS FS3 ON Employes.CIP = FS3.Utilisateur) ON ControleAcces.Valeur = Employes.CIP
GROUP BY NomComplet;
I still got an error about FS2.AxeCount that isn't a part of the aggregate function (iff).
I've also tried this:
SELECT NomComplet, IIf((select count(*) from FaitsSaillants where axe='RAS' and Utilisateur=ControleAcces.Valeur) > 0, "0 (RAS)", count(FS3.index))
FROM ControleAcces INNER JOIN (Employes LEFT JOIN (SELECT FS2.AxeCount, FS1.Index, FS1.OTP, FS1.OTP, FS1.Axe, FS1.FaitSaillant, FS1.Utilisateur, FS2.DateInsertion
FROM FaitsSaillants AS FS1 INNER JOIN (
SELECT Axe, Index, Max(FaitsSaillants.DateInsertion) AS DateInsertion, SUM(IIf(Axe = 'RAS', 1, 0)) As AxeCount
FROM FaitsSaillants
GROUP BY Axe, Index
) AS FS2
ON (FS1.DateInsertion = FS2.DateInsertion
AND FS1.Index = FS2.Index)
) AS FS3 ON Employes.CIP = FS3.Utilisateur) ON ControleAcces.Valeur = Employes.CIP
GROUP BY NomComplet, ControleAccess.Valeur;
FS3.Index is NULL if there is no corresponding record, because of the LEFT JOIN. Wouldn't a test
IIf(IsNull(FS3.Index), ..., ...)
... be sufficient? I'm not sure so, since other conditions and joins are involved as well.
UPDATE (recaptulation of comments)
We can get the desired count (AxeCount) from the innermost nested SELECT (FS2):
SELECT
Axe, Index, Max(FaitsSaillants.DateInsertion) AS DateInsertion,
SUM(IIf(Axe = 'RAS', 1, 0)) As AxeCount
FROM FaitsSaillants
...
This intermediate result must be passed to the outermost SELECT by including it in the select list of the intermediate SELECT (FS3):
SELECT FS2.AxeCount, FS1.Index, ...
The outer most SELECT has a GROUP BY clause. In this case, all fields of the select list must either be included in the GROUP BY clause or must be included in an aggregate function. The GROUP BY clause groups rows by the fields listed in this very clause. This usually reduces the number of rows, as several rows similar in their group fields are condensed to form one row. This means that the values of the remaining fields of the select list (not in the group fields) must be combined together. This is what the aggregate function does. Aggregate functions are
Avg (average)
Count
First, Last
Min, Max (minimum, maximum)
StDev, StDevP (standard deviation)
Sum
Var, VarP (variance)
See SQL Aggregate Functions (Access)
Now, we can add this in the outermost select list
IIf(SUM(FS2.AxeCount) > 0, ..., ...)
I'm a little unclear on exactly what you want to check in the FaitsSaillants table; you said:
I would like it to check if there is any row in FaitsSaillants where
Axe = 'RAS'. If the Count() of this is > 0, then the condition is met.
But you also stated:
something like SELECT COUNT(Index) FROM FaitsSaillants WHERE Axe =
'RAS' AND Employes.CIP = FS3.Utilisateur
My guess is that you meant SELECT COUNT(Index) FROM FaitsSaillants WHERE Axe = 'RAS', because in the second SQL statement you're joining to two tables that aren't being referenced in the FROM clause of your subquery.
What about using DCount in your IIF statement?
IIF(DCount("Index", "FaitsSaillants", "Axe='RAS'"), '0 (RAS)', Count(FS3.Index))
This should work if you meant SELECT COUNT(Index) FROM FaitsSaillants WHERE Axe = 'RAS', but it would need to be modified if you had something else in mind.
Just a caveat though: I would try to use the "domain" functions (DCount, DLookup, etc...) sparingly, because they are slow.
By the way, I believe you can also use a subquery in your IIF statement (is it giving you an error? It seems to work for me):
IIF((SELECT COUNT(*) FROM FaitsSaillants WHERE Axe = 'RAS'), '0 (RAS)', Count(FS3.Index))
Just make sure you are putting the subquery in parenthesis.
I have the following SQL query:
SELECT VehicleRegistrations.ID, VehicleRegistrations.VehicleReg,
VehicleRegistrations.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations
INNER JOIN VehicleType ON VehicleRegistrations.VehicleType = VehicleType.ID
LEFT OUTER JOIN (SELECT TOP (1) ID, VehicleRegID, DateEffective, IVehHire,
FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts
WHERE (DateEffective <= GETDATE())
ORDER BY DateEffective DESC) AS dt
ON dt.VehicleRegID = VehicleRegistrations.ID
What I basically want to do is always select the top 1 record from the 'VehicleFixedCosts' table, where the VehicleRegID matches the one in the main query. What is happening here is that it's selecting the top row before the join, so if the vehicle registration of the top row doesn't match the one we're joining to it returns nothing.
Any ideas? I really don't want to have use subselects for each of the columns I need to return
Try this:
SELECT vr.ID, vr.VehicleReg,
vr.Phone, VehicleType.VehicleTypeDescription,
dt.ID AS 'CostID', dt.IVehHire, dt.FixedCostPerYear, dt.VehicleParts,
dt.MaintenancePerMile, dt.DateEffective
FROM VehicleRegistrations vr
INNER JOIN VehicleType ON vr.VehicleType = VehicleType.ID
LEFT OUTER JOIN (
SELECT ID, VehicleRegID, DateEffective, IVehHire, FixedCostPerYear, VehicleParts, MaintenancePerMile
FROM VehicleFixedCosts vfc
JOIN (
select VehicleRegID, max(DateEffective) as DateEffective
from VehicleFixedCosts
where DateEffective <= getdate()
group by VehicleRegID
) t ON vfc.VehicleRegID = t.VehicleRegID and vfc.DateEffective = t.DateEffective
) AS dt
ON dt.VehicleRegID = vr.ID
Subquery underneath dt might need some grouping but without schema (and maybe sample data) it's hard to say which column should be involved in that.