How do I aggregate 3 columns that are different with MIN(DATE)? - sql

I'm facing a simple problem here that I can't solve, I have this query:
SELECT
MIN(TEA_InicioTarefa),
PFJ_Id_Analista,
ATC_Id,
SRV_Id
FROM
dbo.TarefaEtapaAreaTecnica
INNER JOIN Tarefa t ON t.TRF_Id = TarefaEtapaAreaTecnica.TRF_Id
WHERE SRV_Id = 88
GROUP BY SRV_Id, ATC_Id, PFJ_Id_Analista
ORDER BY ATC_Id ASC
It returns me this:
I was able to group it a little with GROUP BY SRV_Id, ATC_Id, PFJ_Id_Analista that gave me these 8 records, but as you can see some PFJ_Id_Analista are different.
What I want is to select only the early date of each SRV_Id and ATC_Id, the PFJ_Id_Analista don't need to grup, if I remove PFJ_Id_Analista from the grouping the query works, but I need the column.
For eg.: between row number 2 and 3 I want only the early date, so it will be row 2. The same goes for rows 5 to 8, I want only row 6.
DDL for TarefaEtapaAreaTecnica (important key: TRF_Id)
CREATE TABLE [dbo].[TarefaEtapaAreaTecnica](
[TEA_Id] [int] IDENTITY(1,1) NOT NULL,
**[TRF_Id] [int] NOT NULL,**
[ETS_Id] [int] NOT NULL,
[ATC_Id] [int] NOT NULL,
[TEA_Revisao] [int] NOT NULL,
[PFJ_Id_Projetista] [int] NULL,
[TEA_DoctosQtd] [int] NULL,
[TEA_InicioTarefa] [datetime2](7) NULL,
[PFJ_Id_Analista] [int] NULL,
[TEA_FimTarefa] [datetime2](7) NULL,
[TEA_HorasQtd] [numeric](18, 1) NULL,
[TEA_NcfQtd] [int] NULL,
[PAT_Id] [int] NULL
DDL for Tarefa (important keys TRF_Id and SRV_Id (which I need it)):
CREATE TABLE [dbo].[Tarefa](
**[TRF_Id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,**
**[SRV_Id] [int] NOT NULL,**
[TRT_Id] [int] NOT NULL,
[TRF_Descr] [varchar](255) NULL,
[TRF_Entrada] [datetime] NOT NULL,
[TRF_DoctosQtd] [int] NOT NULL,
[TRF_Devolucao] [datetime] NULL,
[TRF_NcfQtd] [int] NULL,
[TRF_EhDocInsuf] [bit] NULL,
[TRF_Observ] [varchar](255) NULL,
[TRF_AreasTrfQtd] [int] NULL,
[TRF_AreasTrfLiqQtd] [int] NULL
Thanks a lot.
EDIT:
CORRECT QUERY
Based on #Gordon Linoff post:
select t.TEA_InicioTarefa, t.PFJ_Id_Analista, t.ATC_Id, t.SRV_Id
from (select t.*,
row_number() over (partition by ATC_Id, SRV_Id
order by TEA_InicioTarefa) as seqnum, ta.SRV_Id
from dbo.TarefaEtapaAreaTecnica t
inner join dbo.Tarefa ta on t.TRF_Id = ta.TRF_Id
) t
where seqnum = 1 AND t.SRV_Id = 88

Just use window functions:
select t.*
from (select t.*,
row_number() over (partition by ATC_Id, SRV_Id
order by ini) as seqnum
from dbo.TarefaEtapaAreaTecnica t
) t
where seqnum = 1;
This is really an example of filtering, not aggregation. The problem is getting the right value to filter on.

Then get the grouping first and then do a JOIN with it like
SELECT
x.Min_TEA_InicioTarefa,
t.PFJ_Id_Analista,
t.ATC_Id,
t.SRV_Id
FROM
dbo.TarefaEtapaAreaTecnica t
INNER JOIN Tarefa ta ON ta.TRF_Id = t.TRF_Id
INNER JOIN (
select SRV_Id, MIN(TEA_InicioTarefa) as Min_TEA_InicioTarefa
from dbo.TarefaEtapaAreaTecnica
GROUP BY SRV_Id
) x ON t.SRV_Id = x.SRV_Id
WHERE t.SRV_Id = 88
ORDER BY t.ATC_Id ASC;

Related

SQL Query to update a colums from a table base on a datetime field

I have 2 table called tblSetting and tblPaquets.
I need to update 3 fields of tblPaquets from tblSetting base on a where clause that use a datetime field of tblPaquest and tblSetting.
The sql below is to represent what I am trying to do and I know it make no sense right now.
My Goal is to have One query to achieve this goal.
I need to extract the data from tblSettings like this
SELECT TOP(1) [SupplierID],[MillID],[GradeFamilyID] FROM [tblSettings]
WHERE [DateHeure] <= [tblPaquets].[DateHeure]
ORDER BY [DateHeure] DESC
And Update tblPaquets with this data
UPDATE [tblPaquets]
SET( [SupplierID] = PREVIOUS_SELECT.[SupplierID]
[MillID] = PREVIOUS_SELECT.[MillID]
[GradeFamilly] = PREVIOUS_SELECT.[GradeFamilyID] )
Here the table design
CREATE TABLE [tblSettings](
[ID] [int] NOT NULL,
[SupplierID] [int] NOT NULL,
[MillID] [int] NOT NULL,
[GradeID] [int] NOT NULL,
[TypeID] [int] NOT NULL,
[GradeFamilyID] [int] NOT NULL,
[DateHeure] [datetime] NOT NULL,
[PeakWetEnable] [tinyint] NULL)
CREATE TABLE [tblPaquets](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PaquetID] [int] NOT NULL,
[DateHeure] [datetime] NULL,
[BarreCode] [int] NULL,
[Grade] [tinyint] NULL,
[SupplierID] [int] NULL,
[MillID] [int] NULL,
[AutologSort] [tinyint] NULL,
[GradeFamilly] [int] NULL)
You can do this using CROSS APPLY:
UPDATE p
SET SupplierID = s.SupplierID,
MillID = s.MillID
GradeFamilly = s.GradeFamilyID
FROM tblPaquets p CROSS APPLY
(SELECT TOP (1) s.*
FROM tblSettings s
WHERE s.DateHeure <= p.DateHeure
ORDER BY p.DateHeure DESC
) s;
Notes:
There are no parentheses before SET.
I don't recommend using [ and ] to escape identifiers, unless they need to be escaped.
I presume the query on tblSettings should have an ORDER BY to get the most recent rows.

Subquery with Sum of top X rows

I am trying to get the top 5 results for each person from a table. I am able to get the top result for them however I want the sum of the top 5.
Select Distinct
r.LastName,
r.FirstName ,
r.Class,
r.BibNum,
(Select top 1 r2.points from Results r2 where r2.season=r.Season and r2.Mountain=r.Mountain and r2.Bibnum=r.bibnum Order By r2.Points Desc) as Points
from Results as r Where Season='2015' and Mountain='Ski Resort'
Order By Class, Points Desc
Columns:
[Id] [int] IDENTITY(1,1) NOT NULL,
[RaceNum] [int] NOT NULL,
[BibNum] [int] NOT NULL,
[FirstName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Sex] [nvarchar](max) NULL,
[Class] [nvarchar](max) NULL,
[Team] [nvarchar](max) NULL,
[FirstRun] [nvarchar](max) NULL,
[SecondRun] [nvarchar](max) NULL,
[Best] [nvarchar](max) NULL,
[Points] [int] NOT NULL,
[Season] [int] NOT NULL,
[Mountain] [nvarchar](max) NULL
You could use row_number() to get the top five rows, and then group by the other fields:
SELECT LastName,
FirstName,
Class,
BibNum,
SUM(points)
FROM (SELECT LastName,
FirstName,
Class,
BibNum,
points,
ROW_NUMBER() OVER (PARTITION BY LastName,
FirstName,
Class,
BibNum
ORDER BY points DESC) AS rn
FROM results
WHERE Season='2015' and Mountain='Ski Resort'
) t
WHERE rn <= 5
GROUP BY LastName, FirstName, Class, BibNum

Speeding up this query

This is my query which takes about 1.5 seconds. Can I lower this?
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY NAME asc) peta_rn,
peta_query.*
FROM
(SELECT
BOOK, PAGETRIMMED, NAME, TYPE, PDF
FROM
CCWiseDocumentNames2 cdn
INNER JOIN
CCWiseInstr2 cwi ON cwi.ID = cdn.ID) as peta_query) peta_paged
WHERE
peta_rn > 1331900 AND peta_rn <= 1331950
These are my table structures:
CREATE TABLE [dbo].[CCWiseDocumentNames2](
[ID] [int] NULL,
[BK_PG] [varchar](50) NULL,
[NAME] [varchar](100) NULL,
[OTHERNAM] [varchar](100) NULL,
[TYPE] [varchar](50) NULL,
[INDEXNAME] [varchar](50) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[CCWiseInstr2](
[ID] [int] NULL,
[BK_PG] [varchar](50) NULL,
[DATE] [datetime] NULL,
[ITYPE] [varchar](50) NULL,
[BOOK] [int] NULL,
[PAGE] [varchar](50) NULL,
[NOBP] [varchar](50) NULL,
[DESC] [varchar](240) NULL,
[TIF] [varchar](50) NULL,
[INDEXNAME] [varchar](50) NULL,
[CONFIRM] [varchar](50) NULL,
[PDF] [varchar](50) NULL,
[PAGETRIMMED] [varchar](10) NULL,
[PageINT] [int] NULL,
[PageCHAR] [varchar](2) NULL,
[IdAuto] [int] NOT NULL
) ON [PRIMARY]
This is my execution plan:
As you can see it is 97% clustered index seek and 3% index scan. Any way to improve this query further?
You can't add rownumber on the fly to more than a million rows and expect a where clause will instantly recognize those rows with the newly generated rownumbers.
Because I don't have that volume of data, can only provide some options for your consideration:
Dedicate the clustered index for Name column (other than ID)
Make the join after you get row_number over name.
Include the three columns from CCWiseInstr2 into a non-clustered index on ID column. This could save some hard disk spindle's movement. Perfomance gain could only be observed with large volume of data.
CREATE NONCLUSTERED INDEX [idx2_ID_include] ON [dbo].[CCWiseInstr2] ([ID] ASC) INCLUDE ( [BOOK], [PDF], [PAGETRIMMED])
GO
With a as (
Select *
from ( SELECT ROW_NUMBER() OVER (ORDER BY NAME asc) as peta_rn, ID,
type
from CCWiseDocumentNames2) as Temp
where peta_rn > 1331900 AND peta_rn <= 1331950
)
select a.peta_rn,
a.type,
b.book,
b.PAGETRIMMED,
b.PDF
from a
join CCWiseInstr2 as b on a.id = b.id

SQL fastest 'GROUP BY' script

Is there any difference in how I edit the GROUP BY command?
my code:
SELECT Number, Id
FROM Table
WHERE(....)
GROUP BY Id, Number
is it faster if i edit it like this:
SELECT Number, Id
FROM Table
WHERE(....)
GROUP BY Number , Id
it's better to use DISTINCT if you don't want to aggregate data. Otherwise, there is no difference between the two queries you provided, it'll produce the same query plan
This examples are equal.
DDL:
CREATE TABLE dbo.[WorkOut]
(
[WorkOutID] [bigint] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[TimeSheetDate] [datetime] NOT NULL,
[DateOut] [datetime] NOT NULL,
[EmployeeID] [int] NOT NULL,
[IsMainWorkPlace] [bit] NOT NULL,
[DepartmentUID] [uniqueidentifier] NOT NULL,
[WorkPlaceUID] [uniqueidentifier] NULL,
[TeamUID] [uniqueidentifier] NULL,
[WorkShiftCD] [nvarchar](10) NULL,
[WorkHours] [real] NULL,
[AbsenceCode] [varchar](25) NULL,
[PaymentType] [char](2) NULL,
[CategoryID] [int] NULL
)
Query:
SELECT wo.WorkOutID, wo.TimeSheetDate
FROM dbo.WorkOut wo
GROUP BY wo.WorkOutID, wo.TimeSheetDate
SELECT DISTINCT wo.WorkOutID, wo.TimeSheetDate
FROM dbo.WorkOut wo
SELECT wo.DateOut, wo.EmployeeID
FROM dbo.WorkOut wo
GROUP BY wo.DateOut, wo.EmployeeID
SELECT DISTINCT wo.DateOut, wo.EmployeeID
FROM dbo.WorkOut wo
Execution plan:

Return rows which have same data in three columns

I have a table with the following schema
CREATE TABLE [dbo].[personas](
[id_persona] [int] IDENTITY(1,1) NOT NULL,
[nombres] [nvarchar](50) NOT NULL,
[apellido_paterno] [nvarchar](50) NULL,
[apellido_materno] [nvarchar](50) NULL,
[fecha_nacimiento] [date] NOT NULL,
[sexo] [varchar](1) NOT NULL,
[estado_civil] [nvarchar](50) NOT NULL,
[calle] [nvarchar](200) NULL,
[colonia] [nvarchar](100) NULL,
[codigo_postal] [char](5) NOT NULL,
[telefonos] [varchar](50) NULL,
[celular] [varchar](25) NULL,
[email] [varchar](50) NULL,
)
How do I make a query in SQL Server to return rows where nombre, apellido_paterno and apellido_materno are repeated? I mean two or more rows have the same data in these columns.
I suppose I'm looking something opposite to DISTINCT clause
You would want...
SELECT nombre, apellido_paterno, apellido_materno
FROM dbo.personas
GROUP BY nombre, apellido_paterno, apellido_materno
HAVING COUNT(*) > 1
If you want to look at the actual rows, then use that as an inner query and join onto it. So, something like
SELECT *
FROM personas pOuter INNER JOIN
(SELECT nombre, apellido_paterno, apellido_materno
FROM dbo.personas
GROUP BY nombre, apellido_paterno, apellido_materno
HAVING COUNT(*) > 1) pInner
ON pInner.nombre = pOuter.nombre
AND pInner.apellido_paterno = pOuter.apellido_paterno
AND pInner.apellido_materno = pOuter.apellido_materno
;WITH x AS
(
SELECT id_personas, rn = ROW_NUMBER() OVER
(
PARTITION BY nombre, apellido_paterno, apellido_materno
ORDER BY id_personas
)
FROM dbo.personas
)
SELECT <col list>
FROM dbo.personas AS p
WHERE EXISTS
(
SELECT 1 FROM x
WHERE x.id_personas = p.id_personas
AND x.rn > 1
);