Query is very very slow for processing 200000 plus records

Query is very very slow for processing 200000 plus records - sql

I have 200,000 rows in Patient & Person table, and the query shown takes 30 secs to execute.
I have defined the primary key (and clustered index) in the Person table on PersonId and on PatientId in the Patient table. What else can I do here to improve performance of my procedure?
New to database development side. I know only basic SQL. Also not sure SQL Server can handle 200,000 rows quickly.
Whole dynamic Procedure you can see at https://github.com/Padayappa/SQLProblem/blob/master/Performance
Anyone faced handling huge rows like this? How do I improve performance here?
DECLARE #return_value int,
#unitRows bigint,
#unitPages int,
#TenantId int,
#unitItems int,
#page int
SET #TenantId = 1
SET #unitItems = 20
SET #page = 1
DECLARE #PatientSearch TABLE(
[PatientId] [bigint] NOT NULL,
[PatientIdentifier] [nvarchar](50) NULL,
[PersonNumber] [nvarchar](20) NULL,
[FirstName] [nvarchar](100) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[ResFirstName] [nvarchar](100) NOT NULL,
[ResLastName] [nvarchar](100) NOT NULL,
[AddFirstName] [nvarchar](100) NOT NULL,
[AddLastName] [nvarchar](100) NOT NULL,
[Address] [nvarchar](255) NULL,
[City] [nvarchar](50) NULL,
[State] [nvarchar](50) NULL,
[ZipCode] [nvarchar](20) NULL,
[Country] [nvarchar](50) NULL,
[RowNumber] [bigint] NULL
)
INSERT INTO #PatientSearch SELECT PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName AS ResFirstName
,RES_PER.LastName AS ResLastName
,ADD_PER.FirstName AS AddFirstName
,ADD_PER.LastName AS AddLastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
,ROW_NUMBER() OVER (ORDER BY PAT.PatientId DESC) AS RowNumber
FROM dbo.Patient AS PAT
INNER JOIN dbo.Person AS PER
ON PAT.PersonId = PER.PersonId
INNER JOIN dbo.Person AS RES_PER
ON PAT.ResponsiblePersonId = RES_PER.PersonId
INNER JOIN dbo.Person AS ADD_PER
ON PAT.AddedBy = ADD_PER.PersonId
INNER JOIN dbo.Booking AS B
ON PAT.PatientId = B.PatientId
WHERE PAT.TenantId = #TenantId AND B.CategoryId = #CategoryId
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
;
SELECT #unitRows = ##ROWCOUNT
,#unitPages = (#unitRows / #unitItems) + 1;
SELECT *
FROM #PatientSearch AS IT
WHERE RowNumber BETWEEN (#page - 1) * #unitItems + 1 AND #unitItems * #page

Well, unless I am missing something (like duplicate rows?) you should be able to remove the GROUP BY
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
as you are grouping by all fields in the select list, and you are partitioning by PAT.PatientId
Further to that, you should create index on the tables with the index containing columns that you join/filter on.
So for instance I would create an index on table Patient with columns (TenantId,PersonId,ResponsiblePersonId,AddedBy) with included columns (PatientId,PatientIdentifier)

Frankly speaking, 200,000 rows is nothing to SQL server. Please first remove logic redundancy, like you have primary key, why still group so many columns, and why you need to join same table (person) 3 times? After removing logic redundancy, you need to create some composite index/include index at least. Get the execution plan (CTRL+M) or (CTRL+M), to see what index you missed. If you need further help, please paste your table schema with few rows of sample data.

Related

Repeat data issues SQL

Had a quick browse to see if any previous questions related to my issue, couldn't see any.
Basically I'm doing this database for my online Cert IV course and if I weren't completely stuck (as I have been for the past few months) I wouldn't be asking for major help on this
I've got an Antiques database that is supposed to show the Customer Name, Sales Date, Product Name and Sales Price and only list the items that were sold between 2 dates and order them by said dates. Nothing I do results in not having repeat data
I've got 4 tables for this particular query Customers, Sales and Products, Tables are set up like this:
CREATE TABLE [dbo].[Customers](
[CustID] [int] IDENTITY(1,1) NOT NULL,
[firstName] [varchar](50) NOT NULL,
[lastName] [varchar](50) NOT NULL,
CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
CREATE TABLE [dbo].[Sales](
[SalesNo] [int] IDENTITY(1,1) NOT NULL,
[CustID] [int] NOT NULL,
[salesDate] [date] NOT NULL,
CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED
CREATE TABLE [dbo].[Products](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[prodName] [varchar](50) NOT NULL,
[prodYear] [int] NOT NULL,
[prodType] [varchar](50) NOT NULL,
[salesPrice] [money] NOT NULL,
CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED
CREATE TABLE [dbo].[ProductSales](
[ProductID] [int] NOT NULL,
[SalesNo] [int] NOT NULL
My query looks like this
SELECT (Customers.firstName + ' ' + Customers.lastName) AS Customers_Name,
Sales.salesDate, Products.prodName, Sales.salesPrice
FROM Customers, ProductSales JOIN Products ON ProductSales.ProductID = Products.ProductID
JOIN Sales ON ProductSales.SalesNo = Sales.SalesNo
WHERE Sales.salesDate BETWEEN '2016-06-03' AND '2016-06-06'
ORDER BY Sales.salesDate
This is what shows up when I run this query:
Any help would be appreciated.

Try below - you need to join customer table properly
SELECT (Customers.firstName + ' ' + Customers.lastName) AS Customers_Name,
Sales.salesDate, Products.prodName, Sales.salesPrice
FROM ProductSales JOIN Products ON ProductSales.ProductID = Products.ProductID
JOIN Sales ON ProductSales.SalesNo = Sales.SalesNo
JOIN Customers on Sales.[CustID]=Customers.[CustID]
WHERE Sales.salesDate BETWEEN '2016-06-03' AND '2016-06-06'
ORDER BY Sales.salesDate

COALESCE vs OR condition for JOIN (SQL)

I have Event table
TABLE Event(
EventId [int] IDENTITY(1,1) NOT NULL,
EventSource1Id [int] NULL,
EventSource2Id [int] NULL
)
that contains info about events from different sources
where one of the event sources can be null
TABLE EventSource1(
Id [int] IDENTITY(1,1) NOT NULL,
Name [nvarchar](50) NULL,
VenueId [int] NOT NULL
)
and
TABLE EventSource2(
Id [int] IDENTITY(1,1) NOT NULL,
Name [nvarchar](50) NULL,
VenueId [int] NOT NULL
)
TABLE Venue(
Id [int] IDENTITY(1,1) NOT NULL,
TimeZone [nvarchar](100) NOT NULL
)
I'd like to create view, but I'm not sure what is the best way to use: coalesce vs OR condition for JOIN
First option:
SELECT
ev.[Id] AS 'Id',
ven.[Id] AS 'VenueId'
FROM Event ev
LEFT JOIN EventSource1 source1 ON source1.[Id] = ev.EventSource1Id
LEFT JOIN EventSource2 source1 ON source2.[Id] = ev.EventSource2Id
LEFT JOIN Venue AS ven ON ven.[Id] = source1.[VenueId] OR v.[Id] = source2.[VenueId]
Second option:
SELECT
ev.[Id] AS 'Id',
ven.[Id] AS 'VenueId'
FROM Event ev
LEFT JOIN EventSource1 source1 ON source1.[Id] = ev.EventSource1Id
LEFT JOIN EventSource2 source1 ON source2.[Id] = ev.EventSource2Id
LEFT JOIN Venue AS ven ON ven.[Id] = COALESCE(source1.[Id], source2.[Id])
Could you help me please?

The COALESCE will typically yield a better query plan. You should test with your data.

update trigger to update records in another table

I have on User_Table
CREATE TABLE [dbo].[User_TB]
(
[User_Id] [varchar](15) NOT NULL,
[User_FullName] [varchar](50) NULL,
[User_Address] [varchar](150) NULL,
[User_Gender] [varchar](10) NULL,
[User_Joindate] [varchar](50) NULL,
[User_Email] [varchar](50) NULL,
[User_Branch] [varchar](50) NULL,
[User_TeamLeader] [varchar](50) NULL,
[User_Department] [varchar](50) NULL,
[User_Position] [varchar](50) NULL,
[TID] [int] NULL
)
Break_Table
CREATE TABLE [dbo].[Break_TB]
(
[Break_Id] [int] IDENTITY(1,1) NOT NULL,
[User_Id] [varchar](15) NOT NULL,
[Date] [date] NULL,
[Break_Time] [int] NULL,
[Status] [varchar](50) NULL,
[Late_time] [int] NULL,
[TL_Id] [varchar](15) NULL,
[start_Time] [time](7) NULL,
[end_Time] [time](7) NULL,
)
Log_Table
CREATE TABLE [dbo].[Log_TB]
(
[User_Id] [varchar](50) NOT NULL,
[First_Login] [time](0) NULL,
[Logout] [time](0) NULL,
[Date] [date] NULL,
[Working_Hrs] [time](0) NULL,
)
Now what am trying to do is that whenever the User_Id from User_Table is Updated , I want trying to update User_Id of Another two tables,
I have written trigger for that
Alter TRIGGER [dbo].[updateUserId] on [dbo].[User_TB]
FOR Update
AS
declare #Branch_Name varchar(50),
#User_Id varchar(15)
select #User_Id = i.User_Id from inserted i;
Update Break_TB set User_Id = #User_Id where User_Id = #User_Id;
Update Log_TB set User_Id = #User_Id where User_Id = #User_Id;
But
It only updates records from Break_TB, It not works for Log_TB
Am not very good at triggers, if am wrong please Help me.

You would need something like this - a set-based solution that takes into account that in an UPDATE statement, you might be updating multiple rows at once, and therefore your trigger also must deal with multiple rows in the Inserted and Deleted tables.
CREATE TRIGGER [dbo].[updateUserId]
ON [dbo].[User_TB]
FOR UPDATE
AS
-- update the "Break" table - find the rows based on the *old* User_Id
-- from the "Deleted" pseudo table, and set it to the *new* User_Id
-- from the "Inserted" pseudo table
SET User_Id = i.User_Id
FROM Inserted i
INNER JOIN Deleted d ON i.TID = d.TID
WHERE
Break_TB.User_Id = d.User_Id
-- update the "Log" table - find the rows based on the *old* User_Id
-- from the "Deleted" pseudo table, and set it to the *new* User_Id
-- from the "Inserted" pseudo table
UPDATE Break_TB
SET User_Id = i.User_Id
FROM Inserted i
INNER JOIN Deleted d ON i.TID = d.TID
WHERE
Break_TB.User_Id = d.User_Id
This code assumes that the TID column in the User_TB table is the primary key which remains the same during updates (so that I can join together the "old" values from the Deleted pseudo table with the "new" values after the update, stored in the Inserted pseudo table)

SQL loop executes but new old values are over written

As my question title says, my program loops but all of my values I updated are being overwritten. Here's the code posted below. Say minRownum is 1 and max is 12, I see the loop execute 12 times correctly and min gets updated +1 each time. But in the end result, only the final row in my column whose RowNum is 12 have any values
I'm not exactly sure why overwriting is occurring since I'm saying "Update it where the rownumber = minrownumber" then I increment minrownum.
Can anyone point to what I am doing wrong? Thanks
WHILE (#MinRownum <= #MaxRownum)
BEGIN
print ' here'
UPDATE #usp_sec
set amount=(
SELECT sum(amount) as amount
FROM dbo.coverage
inner join dbo.owner
on coverage.api=owner.api
where RowNum=#MinRownum
);
SET #MinRownum = #MinRownum + 1
END
PS: I edited this line to say (below) and now every value has the same wrong number (its not distinct but duplicated to all.
set amount = (SELECT sum(amount) as amount
FROM dbo.coverage
INNER JOIN dbo.owner ON coverage.api = owner.api
where RowNum=#MinRownum
) WHERE RowNum = #MinRownum;
Tables:
CREATE TABLE dbo. #usp_sec
(
RowNum int,
amount numeric(20,2),
discount numeric(3,2)
)
CREATE TABLE [dbo].[handler](
[recordid] [int] IDENTITY(1,1) NOT NULL,
[covid] [varchar](25) NULL,
[ownerid] [char](10) NULL
)
CREATE TABLE [dbo].[coverage](
[covid] [varchar](25) NULL,
[api] [char](12) NULL,
[owncovid] [numeric](12, 0) NULL,
[amount] [numeric](14, 2) NULL,
[knote] [char](10) NULL
)
CREATE TABLE [dbo].[owner](
[api] [char](12) NOT NULL,
[owncovid] [numeric](12, 0) NULL,
[ownerid] [char](10) NOT NULL,
[officer] [char](20) NOT NULL,
[appldate] [date] NOT NULL
)

Your UPDATE statement needs its own WHERE clause. Otherwise, each UPDATE will update every row in the table.
And the way you have this written, your subquery still needs its WHERE clause too. In fact, you need to definitively correlate the subquery to your table's (#usp_sec) rows. We cannot tell you how that should be done without more information such as your table definitions.

what's the right way of joning two tables, group by a column, and select only one row for each record?

I have a crews table
CREATE TABLE crew(crew_id INT, crew_name nvarchar(20), )
And a time log table, which is just a very long list of actions performed by the crew
CREATE TABLE [dbo].[TimeLog](
[time_log_id] [int] IDENTITY(1,1) NOT NULL,
[experiment_id] [int] NOT NULL,
[crew_id] [int] NOT NULL,
[starting] [bit] NULL,
[ending] [bit] NULL,
[exception] [nchar](10) NULL,
[sim_time] [time](7) NULL,
[duration] [int] NULL,
[real_time] [datetime] NOT NULL )
I want to have a view that shows only one row for each crew with the latest sim_time + duration .
Is a view the way to go? If yes, how do I write it? If not, what's the best way of doing this?
Thanks

Here is a query to select what you want:
select * from (
select
*,
row_number() over (partition by c.crew_id order by l.sim_time desc) as rNum
from crew as c
inner join TileLog as l (on c.crew_id = l.crew_id)
) as t
where rNum = 1

it depends on what you need that data for.
anyway, a simple query to find latest sim time would be something like
select C.*, TL.sim_time
from crew C /*left? right? inner?*/ join TimeLog TL on TL.crew_id = C.crew.id
where TL.sim_time in (select max(timelog_subquery.sim_time) from TimeLog timelog_subquery where crew_id = C.crew_id )

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query is very very slow for processing 200000 plus records - sql

Related

Repeat data issues SQL

COALESCE vs OR condition for JOIN (SQL)

update trigger to update records in another table

SQL loop executes but new old values are over written

what's the right way of joning two tables, group by a column, and select only one row for each record?

Categories

Resources