Optimize aggregating query

Optimize aggregating query - sql

I have a view that has suddenly gotten too slow and I'm at a loss of how to optimize it. The tables currently contain 15000 (#dispatchPallet) and 135000 (#pickLog) rows respectively.
I've written a minimized piece of code to show the important parts below.
DECLARE #dispatchPallet TABLE
(
[PICK_PALL_NUM] [bigint] NOT NULL,
[PALLET_PLACEMENT] [nvarchar](4) NOT NULL,
[SHIPMENT_ID] [nvarchar](255) NULL
)
DECLARE #pickLog TABLE
(
[LINE_NUM] [int] NOT NULL,
[QTY_PRE] [numeric](9, 2) NULL,
[QTY_SUF] [numeric](9, 2) NULL,
[PICK_PALL_NUM] [bigint] NULL,
[ROWID] [uniqueidentifier] NOT NULL,
[WEIGHT_GROSS] [numeric](9, 3) NULL,
[VOLUME] [numeric](9, 3) NULL
)
INSERT INTO #dispatchPallet ([PICK_PALL_NUM], [PALLET_PLACEMENT], [SHIPMENT_ID])
VALUES
(4797753, 'B', 'SHIPMENT-1'),
(4797752, 'B', 'SHIPMENT-2'),
(4797750, 'B', 'SHIPMENT-3'),
(4797749, 'B', 'SHIPMENT-4'),
(4797739, 'B', 'SHIPMENT-5'),
(4797732, 'B', 'SHIPMENT-6'),
(4797731, 'B', 'SHIPMENT-7'),
(4797730, 'B', 'SHIPMENT-7'),
(4797723, 'B', 'SHIPMENT-8'),
(4797713, 'B', 'SHIPMENT-9')
INSERT INTO #pickLog ([LINE_NUM], [QTY_PRE], [QTY_SUF], [PICK_PALL_NUM], [ROWID], [WEIGHT_GROSS])
VALUES
(30, 54, 54, 4797753, NEWID(), 1070.280),
(10, 24, 24, 4797752, NEWID(), 471.360),
(30, 12, 12, 4797750, NEWID(), 237.960),
(320, 25, 25, 4797749, NEWID(), 102.750),
(110, 3, 3, 4797739, NEWID(), 40.650),
(40, 12, 12, 4797732, NEWID(), 238.080),
(50, 4, 4, 4797732, NEWID(), 78.560),
(20, 20, 20, 4797731, NEWID(), 110.000),
(20, 40, 40, 4797730, NEWID(), 220.000),
(1340, 3, 3, 4797723, NEWID(), 14.250),
(410, 2, 2, 4797723, NEWID(), 4.780),
(440, 2, 2, 4797723, NEWID(), 21.000),
(480, 1, 1, 4797723, NEWID(), 3.500),
(1290, 2, 2, 4797723, NEWID(), 39.280),
(470, 1, 1, 4797723, NEWID(), 8.500),
(280, 3, 3, 4797723, NEWID(), 16.500),
(10, 2, 2, 4797723, NEWID(), 10.700),
(500, 2, 2, 4797723, NEWID(), 6.600),
(290, 1, 1, 4797713, NEWID(), 0.540),
(40, 2, 2, 4797713, NEWID(), 33.800)
SELECT
[dispatchPallet].[SHIPMENT_ID],
SUM([pickLog].[QTY_SUF]) AS KOLLI,
COUNT(DISTINCT [pickLog].[LINE_NUM]) AS LINES,
SUM([pickLog].[WEIGHT_GROSS]) AS PICKED_WEIGHT,
COUNT(DISTINCT [pickLog].[PICK_PALL_NUM]) AS PALLETS,
COUNT(DISTINCT CASE WHEN [dispatchPallet].[PALLET_PLACEMENT] = 'B' THEN [dispatchPallet].[PICK_PALL_NUM] ELSE NULL END) AS BOTTOM_PALLETS
FROM
#dispatchPallet dispatchPallet
INNER JOIN #pickLog pickLog ON [dispatchPallet].[PICK_PALL_NUM] = [pickLog].[PICK_PALL_NUM]
GROUP BY
[dispatchPallet].[SHIPMENT_ID]
-- Expected output:
-- SHIPMENT_ID KOLLI LINES PICKED_WEIGHT PALLETS BOTTOM_PALLETS
-- SHIPMENT-1 54.00 1 1070.280 1 1
-- SHIPMENT-2 24.00 1 471.360 1 1
-- SHIPMENT-3 12.00 1 237.960 1 1
-- SHIPMENT-4 25.00 1 102.750 1 1
-- SHIPMENT-5 3.00 1 40.650 1 1
-- SHIPMENT-6 16.00 2 316.640 1 1
-- SHIPMENT-7 60.00 1 330.000 2 2
-- SHIPMENT-8 18.00 9 125.110 1 1
-- SHIPMENT-9 3.00 2 34.340 1 1

You should at least create primary constraint on as
ALTER TABLE #dispatchPallet TABLE ADD PRIMARY KEY (PICK_PALL_NUM);
Foreign Key constraint as
ALTER TABLE #pickLog TABLE ADD foreign key (PICK_PALL_NUM) references #dispatchPallet(PICK_PALL_NUM)
Also create a unique index on
CREATE UNIQUE NONCLUSTERED INDEX idx_PALLET_PLACEMENT_notnull
ON #dispatchPallet(PALLET_PLACEMENT)
WHERE PALLET_PLACEMENT IS NOT NULL;

Your query is simple and there isn't much room to optimize. You should check that you at least have indexes on dispatchPallet by SHIPMENT_ID and on pickLog by PICK_PALL_NUM. These would be the best choices for your query:
CREATE NONCLUSTERED INDEX NCI_dispatchPallet_shipment_ID
ON dispatchPallet (SHIPMENT_ID, PICK_PALL_NUM)
INCLUDE (PALLET_PLACEMENT)
CREATE NONCLUSTERED INDEX NCI_pickLog_pick_pall_num
ON pickLog (PICK_PALL_NUM)
INCLUDE (QTY_SUF, LINE_NUM, WEIGHT_GROSS)
You should also validate if you need your COUNT to be DISTINCT or not (distinct is an expensive operation).
Last but not least, you should really check how you access the view; if you are filtering it, joining it, etc. These other conditions might generate different query plans and make your performance go down if not managed correctly (even with the right indexes!).

For starters there should be primary keys and foreign keys on these tables so that this query can do index seeks/scans (paparazzo's comment above) as opposed to full table seeks/scans.
In addition to the bigint/int, what's the purpose of the uniqueidentifier?

Related

How to exclude data if any record is not matching in any of the column in the another Table

I have Table A and Table B
CREATE TABLE [dbo].[TableA]
(
[ID] Int NULL,
[sk] bigint NULL,
[class] int NULL,
[Values] int NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[TableA] ([ID], [sk], [class], [Values])
VALUES (1, 17734, 5, 66443), (2, 17734, 4, 5456),
(3, 17734, 6, 445645), (4, 17734, 7, 4534),
(5, 16601, 4, 5443), (6, 16601, 7, 453434),
(7, 16601, 8, 76645), (8, 16601, 5, 9875)
CREATE TABLE [dbo].[TableB]
(
[ID] Int NULL,
[sk] bigint NULL,
[class] int NULL,
[Values] int NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[TableB] ([ID], [sk], [class], [Values])
VALUES (1, 17734, 5, 66443), (2, 17734, 4, 5456),
(3, 17734, 6, 445645), (4, 17734, 7, 4534),
(5, 16601, 4, 5443), (6, 16601, 7, 453434),
(7, 16601, 8, 76645), (8, 16601, 5, 9875)
I'm looking to join both the tables with all columns in each table. If any record is not matching then we need to remove all the SK.
For 17734 value all the columns from the both tables are matching then I need to get the values for 17734 .
For 16601 value only 1 value is not matching so I don't want to bring all the values for 16601.
SELECT DISTINCT
DC.[sk],
DC.class,
DC.Values,
DB.class AS DCC,
DB.Values AS DBC
FROM
[dbo].[TableA]
LEFT JOIN
[dbo].[TableA] DB ON DC.[sk] = DB.[sk]
AND DC.class = DB.class
AND DC.Values = DB.Values;
After joining, I get:
sk class Values class values
--------------------------------
16601 3 65567 NULL NULL
16601 4 5443 4 5443
16601 7 453434 7 453434
16601 8 76645 8 76645
17734 4 5456 4 5456
17734 5 66443 5 66443
17734 6 445645 6 445645
17734 7 4534 7 4534
Output :
sk class Values class values
--------------------------------
17734 4 5456 4 5456
17734 5 66443 5 66443
17734 6 445645 6 445645
17734 7 4534 7 4534

You must do the whole join to discover if some of the results should be removed. To reuse the query results, use a CTE and refer to it twice - once for the results and again to filter out the unwanted rows:
with table1 as (
select distinct
DC.sk,
DC.class,
DC.Values,
DB.class AS DCC,
DB.Values AS DBC
from dbo.TableA
left join dbo.TableB DB on DC.sk = DB.sk
and DC.class = DB.class
and DC.Values = DB.Values
)
select *
from table1
where sk not in (select sk from table1 where DCC is null)
BTW, you have a bug in your query: You're joining [dbo].[TableA] with itself, but it should be joined to [dbo].[TableB]

You can use EXCEPT operator to get your expected result. I will explain my query in step by step process.
Step 1: I am finding out non-matching rows using Except operator:-
SELECT A.sk, A.class, A.[Values] FROM TableA A
EXCEPT
SELECT B.sk, B.class, B.[Values] FROM TableB B
Step 2: Now I am selecting the sk of non-matching rows using:-
SELECT T.sk FROM (query of step 1)T
Step 3: I am excluding those sk from TableA using:-
SELECT * FROM TableA
WHERE sk NOT IN(query of step 2)
So our final Query is look like this:
SELECT * FROM TableA
WHERE sk NOT IN
(
SELECT T.sk FROM
(
SELECT A.sk, A.class, A.[Values] FROM TableA A
EXCEPT
SELECT B.sk, B.class, B.[Values] FROM TableB B
)T
)

Insert statement conflicted with foreign key

I'm new to SQL and I'm trying to insert values into my table. I'm currently using SQL Server Management Studio.
CREATE TABLE Materials
(
materials_ID int NOT NULL PRIMARY KEY,
floor_boards int NOT NULL,
power_Points int NOT NULL,
electrical_Wiring int NOT NULL,
stairs_Pack int NOT NULL,
);
SELECT * FROM materials;
-- Creation of the JobCards Table
CREATE TABLE jobCards
(
customer_id VARCHAR(50)
FOREIGN KEY REFERENCES Customers(customer_id),
jobCardID int NOT NULL PRIMARY KEY,
materials_ID int
FOREIGN KEY REFERENCES Materials(materials_ID),
jobType VARCHAR(150) NOT NULL,
rate decimal NOT NULL,
no_of_days int NOT NULL,
city VARCHAR(150) NOT NULL,
);
-- Selectin statement output values in Jobcards
SELECT * FROM jobCards;
-- Insert statement allows data to be inputed to table
INSERT INTO jobCards (customer_id, jobCardID, materials_ID, jobType, rate, no_of_days, city)
VALUES
('0001', 11000, 1, 'Full Conversion', 120000, 7, 'Pretoria'),
('0002', 10478, 2, 'Semi Conversion', 1080, 2, 'Pretoria'),
('0003', 14253, 3, 'Floor Boarding', 900, 2, 'Pretoria'),
('0004', 11258, 4, 'Full Conversion', 120000, 8, 'Pretoria'),
('0005', 12058, 5, 'Semi Conversion', 1080, 3, 'Pretoria'),
('0006', 13697, 6, 'Full Conversion', 120000, 7, 'Pretoria'),
('0007', 10211, 7, 'Full Conversion', 120000, 7, 'Pretoria'),
('0008', 10471, 8, 'Semi Conversion', 1080, 2, 'Pretoria'),
('0009', 13521, 9, 'Semi Conversion', 1080, 3, 'Pretoria'),
('0010', 10102, 10, 'Floor Boarding', 900, 2, 'Pretoria');
I have inserted the tables that have the issue. Whenever I run my program I get this error
The INSERT statement conflicted with the FOREIGN KEY constraint
"FK__jobCards__materi__2F10007B". The conflict occurred in database
"DomingoRoofWorks", table "dbo.Materials", column 'materials_ID'.

Remember that a foreign key is a field that help us to link two tables together. We can use a foreign key to refer to the primary key in another table.
You can try to insert some rows in the Materials table before inserting into the jobCards table.
Here is an example:
-- After creating the tables:
INSERT INTO Materials (materials_ID, floor_boards, power_Points, electrical_Wiring, stairs_Pack)
VALUES (1, --someIntValue, --someIntValue, --someIntValue, --someIntValue)
After inserting all the rows needed you can use those Ids as a foreign key in the jobCards table. At this point, this query could be performed wothout problem.
INSERT INTO jobCards (customer_id, jobCardID, materials_ID, jobType, rate, no_of_days, city)
VALUES
('0001', 11000, 1, 'Full Conversion', 120000, 7, 'Pretoria'),
('0002', 10478, 2, 'Semi Conversion', 1080, 2, 'Pretoria'),
('0003', 14253, 3, 'Floor Boarding', 900, 2, 'Pretoria'),
('0004', 11258, 4, 'Full Conversion', 120000, 8, 'Pretoria'),
('0005', 12058, 5, 'Semi Conversion', 1080, 3, 'Pretoria'),
('0006', 13697, 6, 'Full Conversion', 120000, 7, 'Pretoria'),
('0007', 10211, 7, 'Full Conversion', 120000, 7, 'Pretoria'),
('0008', 10471, 8, 'Semi Conversion', 1080, 2, 'Pretoria'),
('0009', 13521, 9, 'Semi Conversion', 1080, 3, 'Pretoria'),
('0010', 10102, 10, 'Floor Boarding', 900, 2, 'Pretoria');

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.

The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

COUNT() with GROUP BY - Query to get number of times a set of values appear in a column SQL

Just struggling with this question.
Write a query that selects the following details about the sides that have been ordered in orders:
• The side ID number and side name.
• How many orders the side has been ordered in (regardless of quantity).
I have created a view already and this is the table for it (ordered_sides_details is the name of the view)
View Table
I've wrttien this query but I believe it just counts the number of rows instead of how many times each side is ordered.
SELECT ordered_sides_details.side_name, COUNT(*)
FROM ordered_sides_details
GROUP BY ordered_sides_details.side_name;
This is the resulting table
Obviously its incorrect as 1.25L Coke has only been in 1 order.
Any help with solving this would be awesome. Thanks.

Solution
There must be something wrong with view you've created.
This should be enough to yield proper results:
SELECT
side_id
,side_name
,COUNT(*) AS total_count
FROM dbo.orders
GROUP BY side_id, side_name
Boostrapping (SQL Server)
Scripts for bootstrapping your example:
IF NOT EXISTS (SELECT 1 FROM sys.tables t WHERE t.object_id = OBJECT_ID('dbo.orders'))
BEGIN
CREATE TABLE orders
(
order_id INT,
side_id INT NOT NULL,
side_name NVARCHAR(100) NOT NULL,
ordered_quantity INT NOT NULL,
total_cost MONEY NOT NULL
);
END;
INSERT INTO orders (order_id, side_id, side_name, ordered_quantity, total_cost)
VALUES
(10, 1, '390ml Coke', 1, 3.00),
(5, 2, '1.25l Coke', 2, 10.00),
(8, 3, 'Lava Cake', 3, 8.85),
(7, 4, 'Chicken Wings', 4, 14.00),
(6, 5, 'Garlic Bread', 4, 7.80),
(5, 6, 'Healthy Kale Chips', 3, 16.50),
(5, 6, 'Healthy Kale Chips', 2, 11.00),
(4, 5, 'Garlic Bread', 1, 1.95),
(3, 4, 'Chicken Wings', 1, 3.50),
(2, 3, 'Lava Cake', 2, 5.90);

Joining two different tables with a common third table on a common column

Here are the tables
Table: Status
ID, StatusDesc
1, New
2, Active
3, Cancelled
4, Complete
Table: Order (foreign key relationship with Status table above)
ID, OrderNumber, StatusID
1, 1001 , 1
2, 1002, 1
3, 1003, 2
4, 1004, 3
5, 1500, 4
Table: LineItem(foreign key relationship with Order and Status tables above)
ID, OrderNumber, LineItemNumber, StatusID
1, 1001 , 1, 1
2, 1001 , 2, 1
3, 1002 , 1, 2
4, 1002 , 2, 1
5, 1003 , 1, 2
6, 1004 , 1, 3
7, 1004 , 2, 4
8, 1500 , 1, 3
As you can see, the table Status holds the statuses common for both Order and LineItem tables.
I want to produce the result which will include columns like this, status description for both Order and LineItem:
OrderNumber, LineItemNumber, StatusDesc_Order, StatusDesc_LineItem
How to do this?

You could join the Status table twice to achieve this:
SELECT
o.OrderNumber
, li.LineItemNumber
, orderStatus.StatusDesc AS StatusDesc_Order
, lineItemStatus.StatusDesc AS StatusDesc_LineItem
FROM [LineItem] AS li
INNER JOIN [Status] AS lineItemStatus ON li.StatusID = lineItemStatus.ID
INNER JOIN [Order] AS o ON li.OrderNumber = o.OrderNumber
INNER JOIN [Status] AS orderStatus ON o.StatusID = orderStatus.ID
I do suggest however you try and stay away from table names using reserved keywords like Order and Status, it also is good practice to explcitly add schema prefixes before the table names in the query (i.e. dbo.Status or another user defined schema).

If the required results really are that simple then just use a couple of sub-queries e.g.
-- SETUP TEST DATA
declare #Order table (id int, OrderNumber int, StatusId int)
insert into #Order (id, OrderNumber, StatusId)
values (1, 1001, 1), (2, 1002, 1), (3, 1003, 2), (4, 1004, 3), (5, 1500, 4)
declare #LineItem table (id int, OrderNumber int, LineItemNumber int, StatusId int)
insert into #LineItem (id, OrderNumber, LineItemNumber, StatusId)
values (1, 1001, 1, 1), (2, 1001, 2, 1), (3, 1002, 1, 2), (4, 1002, 2, 1), (5, 1003, 1, 2), (6, 1004, 1, 3), (7, 1004, 2, 4), (8, 1500, 2, 3)
declare #Status table (id int, StatusDesc varchar(32))
insert into #Status(id, StatusDesc)
values (1,'New'), (2,'Active'), (3,'Cancelled'), (4,'Complete')
-- QUERY DATA
select LI.OrderNumber, LI.LineItemNumber
, (select S.StatusDesc from #Status S where S.id = StatusId) [StatusDesc_Order]
, (select S.StatusDesc from #Status S where S.id = (select O.StatusId from #Order O where O.OrderNumber = LI.OrderNumber)) [StatusDesc_LineItem]
from #LineItem LI
order by LI.OrderNumber, LI.LineItemNumber
Note: If you provide your sample data in this format in future questions you make your question much easier to answer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimize aggregating query - sql

For starters there should be primary keys and foreign keys on these tables so that this query can do index seeks/scans (paparazzo's comment above) as opposed to full table seeks/scans. In addition to the bigint/int, what's the purpose of the uniqueidentifier?

Related

How to exclude data if any record is not matching in any of the column in the another Table

Insert statement conflicted with foreign key

SQL Pivot Half of table

COUNT() with GROUP BY - Query to get number of times a set of values appear in a column SQL

Joining two different tables with a common third table on a common column

Categories

Resources