what's the right way of joning two tables, group by a column, and select only one row for each record?

what's the right way of joning two tables, group by a column, and select only one row for each record? - sql

I have a crews table
CREATE TABLE crew(crew_id INT, crew_name nvarchar(20), )
And a time log table, which is just a very long list of actions performed by the crew
CREATE TABLE [dbo].[TimeLog](
[time_log_id] [int] IDENTITY(1,1) NOT NULL,
[experiment_id] [int] NOT NULL,
[crew_id] [int] NOT NULL,
[starting] [bit] NULL,
[ending] [bit] NULL,
[exception] [nchar](10) NULL,
[sim_time] [time](7) NULL,
[duration] [int] NULL,
[real_time] [datetime] NOT NULL )
I want to have a view that shows only one row for each crew with the latest sim_time + duration .
Is a view the way to go? If yes, how do I write it? If not, what's the best way of doing this?
Thanks

Here is a query to select what you want:
select * from (
select
*,
row_number() over (partition by c.crew_id order by l.sim_time desc) as rNum
from crew as c
inner join TileLog as l (on c.crew_id = l.crew_id)
) as t
where rNum = 1

it depends on what you need that data for.
anyway, a simple query to find latest sim time would be something like
select C.*, TL.sim_time
from crew C /*left? right? inner?*/ join TimeLog TL on TL.crew_id = C.crew.id
where TL.sim_time in (select max(timelog_subquery.sim_time) from TimeLog timelog_subquery where crew_id = C.crew_id )

Related

Return list of Students by ZipCode Count

I am trying to get a list of students that live in the same zip code where zip code count > 1.
I tried the following and get nothing in my query. If I remove s.Student, I get results of zipcode and count, but I want to include student also.
SELECT s.Student, z.ZipCode, COUNT(s.ZipCodeId) As 'Zip Code Count'
FROM Students s
INNER JOIN ZipCodes z ON z.ZipCodeId = s.ZipCodeId
GROUP BY s.Student, z.ZipCode
HAVING COUNT(z.ZipCode) > 1
Below are the database tables I am using.
CREATE TABLE [dbo].[Instructors](
[InstructorId] [int] IDENTITY(1,1) NOT NULL,
[Instructor] [varchar](50) NOT NULL,
[ZipCodeId] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Students](
[StudentId] [int] IDENTITY(1,1) NOT NULL,
[Student] [varchar](50) NOT NULL,
[ZipCodeId] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[ZipCodes](
[ZipCodeId] [int] IDENTITY(1,1) NOT NULL,
[ZipCode] [varchar](9) NULL,
[City] [varchar](50) NULL,
[State] [varchar](25) NULL
) ON [PRIMARY]

I think you need to query the Zip Codes which are used more than once, then join the Students on along with the Zip Code details e.g.
SELECT S.Student, Z.ZipCode, Z1.Num AS "Zip Code Count"
FROM (
SELECT COUNT(*) Num, ZipCodeId
FROM Students S
GROUP BY ZipCodeId
HAVING COUNT(*) > 1
) Z1
INNER JOIN Students S on S.ZipCodeId = Z1.ZipCodeId
INNER JOIN ZipCodes Z on Z.ZipCodeId = Z1.ZipCodeId;
Note: You don't use single quotes (') to delimit a column name - you use double quotes (") or square brackets ([]).
Also, sample data would allow testing of our solutions.

You can do this using a window function, without re-joining
SELECT
S.Student,
Z.ZipCode,
Z.Num AS [Zip Code Count]
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY S.ZipCodeId) Num
FROM Students S
) S
INNER JOIN ZipCodes Z on Z.ZipCodeId = S.ZipCodeId
WHERE S.Num > 1;

How do I aggregate 3 columns that are different with MIN(DATE)?

I'm facing a simple problem here that I can't solve, I have this query:
SELECT
MIN(TEA_InicioTarefa),
PFJ_Id_Analista,
ATC_Id,
SRV_Id
FROM
dbo.TarefaEtapaAreaTecnica
INNER JOIN Tarefa t ON t.TRF_Id = TarefaEtapaAreaTecnica.TRF_Id
WHERE SRV_Id = 88
GROUP BY SRV_Id, ATC_Id, PFJ_Id_Analista
ORDER BY ATC_Id ASC
It returns me this:
I was able to group it a little with GROUP BY SRV_Id, ATC_Id, PFJ_Id_Analista that gave me these 8 records, but as you can see some PFJ_Id_Analista are different.
What I want is to select only the early date of each SRV_Id and ATC_Id, the PFJ_Id_Analista don't need to grup, if I remove PFJ_Id_Analista from the grouping the query works, but I need the column.
For eg.: between row number 2 and 3 I want only the early date, so it will be row 2. The same goes for rows 5 to 8, I want only row 6.
DDL for TarefaEtapaAreaTecnica (important key: TRF_Id)
CREATE TABLE [dbo].[TarefaEtapaAreaTecnica](
[TEA_Id] [int] IDENTITY(1,1) NOT NULL,
**[TRF_Id] [int] NOT NULL,**
[ETS_Id] [int] NOT NULL,
[ATC_Id] [int] NOT NULL,
[TEA_Revisao] [int] NOT NULL,
[PFJ_Id_Projetista] [int] NULL,
[TEA_DoctosQtd] [int] NULL,
[TEA_InicioTarefa] [datetime2](7) NULL,
[PFJ_Id_Analista] [int] NULL,
[TEA_FimTarefa] [datetime2](7) NULL,
[TEA_HorasQtd] [numeric](18, 1) NULL,
[TEA_NcfQtd] [int] NULL,
[PAT_Id] [int] NULL
DDL for Tarefa (important keys TRF_Id and SRV_Id (which I need it)):
CREATE TABLE [dbo].[Tarefa](
**[TRF_Id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,**
**[SRV_Id] [int] NOT NULL,**
[TRT_Id] [int] NOT NULL,
[TRF_Descr] [varchar](255) NULL,
[TRF_Entrada] [datetime] NOT NULL,
[TRF_DoctosQtd] [int] NOT NULL,
[TRF_Devolucao] [datetime] NULL,
[TRF_NcfQtd] [int] NULL,
[TRF_EhDocInsuf] [bit] NULL,
[TRF_Observ] [varchar](255) NULL,
[TRF_AreasTrfQtd] [int] NULL,
[TRF_AreasTrfLiqQtd] [int] NULL
Thanks a lot.
EDIT:
CORRECT QUERY
Based on #Gordon Linoff post:
select t.TEA_InicioTarefa, t.PFJ_Id_Analista, t.ATC_Id, t.SRV_Id
from (select t.*,
row_number() over (partition by ATC_Id, SRV_Id
order by TEA_InicioTarefa) as seqnum, ta.SRV_Id
from dbo.TarefaEtapaAreaTecnica t
inner join dbo.Tarefa ta on t.TRF_Id = ta.TRF_Id
) t
where seqnum = 1 AND t.SRV_Id = 88

Just use window functions:
select t.*
from (select t.*,
row_number() over (partition by ATC_Id, SRV_Id
order by ini) as seqnum
from dbo.TarefaEtapaAreaTecnica t
) t
where seqnum = 1;
This is really an example of filtering, not aggregation. The problem is getting the right value to filter on.

Then get the grouping first and then do a JOIN with it like
SELECT
x.Min_TEA_InicioTarefa,
t.PFJ_Id_Analista,
t.ATC_Id,
t.SRV_Id
FROM
dbo.TarefaEtapaAreaTecnica t
INNER JOIN Tarefa ta ON ta.TRF_Id = t.TRF_Id
INNER JOIN (
select SRV_Id, MIN(TEA_InicioTarefa) as Min_TEA_InicioTarefa
from dbo.TarefaEtapaAreaTecnica
GROUP BY SRV_Id
) x ON t.SRV_Id = x.SRV_Id
WHERE t.SRV_Id = 88
ORDER BY t.ATC_Id ASC;

SQL get recently modified post for each user

This is my table:
CREATE TABLE [dbo].[posts]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[user_id] [int] NOT NULL,
[date_posted] [datetime] NOT NULL,
[date_modified] [datetime] NOT NULL,
[content] [text] NOT NULL,
CONSTRAINT [PK_posts] PRIMARY KEY CLUSTERED ( [id] ASC )
)
My company needs a single query that will get the post id of the most recently modified post for each user. Can anyone please help me? Thanks,

The rank() function should do the trick:
SELECT user_id, id AS most_recent_post_id
FROM (SELECT user_id,
id,
RANK() OVER (PARTITION BY user_id ORDER BY date_posted DESC) AS rk
FROM [posts]) p
WHERE rk = 1

Query is very very slow for processing 200000 plus records

I have 200,000 rows in Patient & Person table, and the query shown takes 30 secs to execute.
I have defined the primary key (and clustered index) in the Person table on PersonId and on PatientId in the Patient table. What else can I do here to improve performance of my procedure?
New to database development side. I know only basic SQL. Also not sure SQL Server can handle 200,000 rows quickly.
Whole dynamic Procedure you can see at https://github.com/Padayappa/SQLProblem/blob/master/Performance
Anyone faced handling huge rows like this? How do I improve performance here?
DECLARE #return_value int,
#unitRows bigint,
#unitPages int,
#TenantId int,
#unitItems int,
#page int
SET #TenantId = 1
SET #unitItems = 20
SET #page = 1
DECLARE #PatientSearch TABLE(
[PatientId] [bigint] NOT NULL,
[PatientIdentifier] [nvarchar](50) NULL,
[PersonNumber] [nvarchar](20) NULL,
[FirstName] [nvarchar](100) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[ResFirstName] [nvarchar](100) NOT NULL,
[ResLastName] [nvarchar](100) NOT NULL,
[AddFirstName] [nvarchar](100) NOT NULL,
[AddLastName] [nvarchar](100) NOT NULL,
[Address] [nvarchar](255) NULL,
[City] [nvarchar](50) NULL,
[State] [nvarchar](50) NULL,
[ZipCode] [nvarchar](20) NULL,
[Country] [nvarchar](50) NULL,
[RowNumber] [bigint] NULL
)
INSERT INTO #PatientSearch SELECT PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName AS ResFirstName
,RES_PER.LastName AS ResLastName
,ADD_PER.FirstName AS AddFirstName
,ADD_PER.LastName AS AddLastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
,ROW_NUMBER() OVER (ORDER BY PAT.PatientId DESC) AS RowNumber
FROM dbo.Patient AS PAT
INNER JOIN dbo.Person AS PER
ON PAT.PersonId = PER.PersonId
INNER JOIN dbo.Person AS RES_PER
ON PAT.ResponsiblePersonId = RES_PER.PersonId
INNER JOIN dbo.Person AS ADD_PER
ON PAT.AddedBy = ADD_PER.PersonId
INNER JOIN dbo.Booking AS B
ON PAT.PatientId = B.PatientId
WHERE PAT.TenantId = #TenantId AND B.CategoryId = #CategoryId
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
;
SELECT #unitRows = ##ROWCOUNT
,#unitPages = (#unitRows / #unitItems) + 1;
SELECT *
FROM #PatientSearch AS IT
WHERE RowNumber BETWEEN (#page - 1) * #unitItems + 1 AND #unitItems * #page

Well, unless I am missing something (like duplicate rows?) you should be able to remove the GROUP BY
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
as you are grouping by all fields in the select list, and you are partitioning by PAT.PatientId
Further to that, you should create index on the tables with the index containing columns that you join/filter on.
So for instance I would create an index on table Patient with columns (TenantId,PersonId,ResponsiblePersonId,AddedBy) with included columns (PatientId,PatientIdentifier)

Frankly speaking, 200,000 rows is nothing to SQL server. Please first remove logic redundancy, like you have primary key, why still group so many columns, and why you need to join same table (person) 3 times? After removing logic redundancy, you need to create some composite index/include index at least. Get the execution plan (CTRL+M) or (CTRL+M), to see what index you missed. If you need further help, please paste your table schema with few rows of sample data.

create the SQL query

I have the table of DeviceMaster and DeviceStatus.
where in DeviceMaster is the master for devices and the DeviceStatus is the status of the device.Now i want to get the record of the latest DeviceStatus of each device with only one row using the DeviceMasterId and according to the last one first(descending order).
eg.
DeviceName RecordCreatedDate Status
ElectronicRod 14/11/2011 12:00:00 On
ElectronicRod 14/11/2011 11:30:00 Off
even though the there is multiple record in DeviceStatus.
here is the table structure
DeviceMaster
[Id] [int],
[ROId] [int] ,
[ClientId] [int] ,
[DeviceTypeId] [int] ,
[Label] [varchar](50) ,
[ClientCommChannelId] [int] ,
[ServerCommChannelId] [bigint] ,
[DeviceName] [varchar](50) ,
[Address] [varchar](50) ,
[Attribute1] [varchar](50) ,
[Attribute2] [varchar](50) ,
[Attribute3] [varchar](50) ,
[IsDeleted] [bit] ,
[RecordCreatedDate] [datetime] ,
[RecordUpdatedDate] [datetime] ,
[RecordCreatedBy] [int] ,
[RecordUpdatedBy] [int] ,
[IsTransfered] [bit]
DeviceStatus
[Id] [bigint],
[ROId] [int],
[ClientId] [int],
[ServerDeviceId] [bigint] , --It is the foreign key reference of Device Id
[ClientDeviceId] [int] ,
[Status] [bit] ,
[TimeStamp] [datetime] ,
[Attribute1] [varchar](50) ,
[Attribute2] [varchar](50) ,
[Attribute3] [varchar](50) ,
[RecordCreatedDate] [datetime] ,
[RecordUpdatedDate] [datetime] ,
[RecordCreatedBy] [int] ,
[RecordUpdatedBy] [int] ,
[IsTransfered] [bit]
DeviceStatus have the multiple line entry for single device.I need the latest DeviceStatus for each and every device.
Thank you in advance

You can use a CTE (Common Table Expression) with the ROW_NUMBER function:
;WITH LastPerDevice AS
(
SELECT
dm.DeviceName, ds.RecordCreatedDate, ds.Status,
ROW_NUMBER() OVER(PARTITION BY dm.DeviceMasterId
ORDER BY ds.RecordCreatedDate DESC) AS 'RowNum'
FROM dbo.DeviceMaster dm
INNER JOIN dbo.DeviceStatus ds ON ds.DeviceMasterId = dm.DeviceMasterId
)
SELECT
DeviceName, RecordCreatedDate, Status
FROM LastPerDevice
WHERE RowNum = 1
This CTE "partitions" your data by DeviceMasterId, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by RecordCreatedDate DESC - so the most recent row gets RowNum = 1 (for each DeviceMasterId) which is what I select from the CTE in the SELECT statement after it.

I'm not positive what you're looking for, but the following uses a subquery in the where clause to filter on the last date. Maybe if this isn't exactly what you're looking for it may help you on your way to finding it. Let me know if you need me to clarify anything.
select DeviceName, RecordCreateDated, Status
from DeviceMaster dm join
DeviceStatus ds on dm.Id = ds.ServiceDeviceId
where DeviceName = 'ElectronicRod' and
RecordCreateDate =
(select max(RecordCreatedDate)
from DeviceMaster dm1 join
DeviceStatus ds1 on dm1.Id = ds1.ServiceDeviceId
where dm1.DeviceName = dm.DeviceName)

select dm.DeviceName, ds1.LatestDate, ds2.Status
from DeviceMaster dm,
(select ServerDeviceId, max(RecordCreatedDate) as LatestDate
from DeviceStatus
group by ServerDeviceId) ds1,
DeviceStatus ds2
where dm.Id = ds1.ServerDeviceId
and dm.Id = ds2.ServerDeviceId
and ds1.LatestDate = ds2.RecordCreatedDate
order by ds1.LatestDate desc

You want to get the status from the latest record in the Status table for each Device.
This means you need to actually do two joins to the Status table - one table to get the latest row (grouping by the device), and then again to get the status for that latest status. Like this:
I see you have BIGINTS - this suggests to me that the Status table is large. This suggests it is updated very frequently. Make sure the Status table is indexed on ServerDeviceId.
select DeviceName ,ds.RecordCreatedDate,ds.Status
from
DeviceMaster dm inner join
(select ServerDeviceId,MAX(Id) DeviceStatusId from DeviceStatus
group by ServerDeviceId) laststatus
on laststatus.ServerDeviceId= dm.Id inner join
DeviceStatus ds
on ds.Id = laststatus.DeviceStatusId

use the max() , group by method
your code will look probbly a bit like this:
SELECT max("DeviceStatus") as "devicestatus" , "Id" FROM "DeviceMaster"
group by "Id";
EDIT:
SELECT max("RecordCreatedDate") as "Date" , "DeviceName" FROM "DeviceMaster"
group by "Name";

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

what's the right way of joning two tables, group by a column, and select only one row for each record? - sql

Here is a query to select what you want: select * from ( select *, row_number() over (partition by c.crew_id order by l.sim_time desc) as rNum from crew as c inner join TileLog as l (on c.crew_id = l.crew_id) ) as t where rNum = 1

Related

Return list of Students by ZipCode Count

How do I aggregate 3 columns that are different with MIN(DATE)?

SQL get recently modified post for each user

Query is very very slow for processing 200000 plus records

create the SQL query

Categories

Resources