Cumulative query with 'group by' causing slowness - sql

We have a table which will store the information of Users like deals performed, date etc. Same user can perform same deals multiple times. Now we want to display the cumulative sum of amount user performed totally if deal no is same
Query with GROUP BY Causing Slowness
Query to display cumulative sum joining the same table.
Table has 2.7 million records.
And this goes for proper index scan (Index structure).
INDEX `IDX_GRPBY2` (`DEAL`, `FIN`),
INDEX `Table1_TEMP` (`CREATE1`, `FEVSTATUS`, `EVE`)
Below is the expected result
Expected Result
Below is the actual query which is used to get deals with cumulative sum if it's same deal:
SELECT MAX(A.PKEY) PKEY,
A.ENT, A.DEAL, A.FIN, A.DATE1, A.TYPE1, A.STEPNO,
A.CREATE1, A.EVE, A.FEVSTATUS, A.STATUSDATE, A.INTAMT,
A.INTPAID, A.INT_TYPE, A.PENALTY_1, A.CONV_PAID_4,
A.INT_PAID_4, A.CONVFIN_4, A.INTTYPE,
B.DEAL AS DEAL_B, B.FIN AS FIN_B,
SUM(B.CONV_PAID_4) AS CONV_PAID_4_OUT
FROM Table1 A, Table1 B
WHERE A.CREATE1 = '0'
AND (A.FEVSTATUS = '1'
OR A.EVE IN ('E06', 'E07', 'E02', 'E15', 'E03', 'E04')
)
AND A.DEAL = B.DEAL
AND A.FIN = B.FIN
AND A.PKEY >= B.PKEY
GROUP BY A.ENT, A.DEAL, A.FIN, A.DATE1, A.TYPE1, A.STEPNO,
A.CREATE1, A.EVE, A.FEVSTATUS, A.STATUSDATE, A.INTAMT,
A.INTPAID, A.INT_TYPE, A.PENALTY_1, A.CONV_PAID_4,
A.INT_PAID_4, A.CONVFIN_4, A.INTTYPE,
B.DEAL, B.FIN;
I tried to change to below to sum up in sub query and then join, but still the result is taking very long
Changed Query:
SELECT MAX(A.PKEY) PKEY, A.ENT, A.DEAL, A.FIN, A.DATE1, A.TYPE1,
A.STEPNO, A.CREATE1, A.EVE, A.FEVSTATUS, A.STATUSDATE,
A.INTAMT, A.INTPAID, A.INT_TYPE, A.PENALTY_1, A.CONV_PAID_4,
A.INT_PAID_4, A.CONVFIN_4, A.INTTYPE, A.DEAL AS DEAL_B,
A.FIN AS FIN_B,
(
SELECT SUM(B.CONV_PAID_4)
FROM Table1 B
WHERE A.DEAL = B.DEAL
AND A.FIN = B.FIN
AND A.PKEY >= B.PKEY
) AS PRI_CONV_PAID_4_OUT
FROM Table1 A
WHERE A.CREATE1 = '0'
AND (A.FEVSTATUS = '1'
OR A.EVE IN ('E06', 'E07', 'E02', 'E15', 'E03', 'E04')
)
GROUP BY A.ENT, A.DEAL, A.FIN, A.DATE1, A.TYPE1, A.STEPNO,
A.CREATE1, A.EVE, A.FEVSTATUS, A.STATUSDATE, A.INTAMT,
A.INTPAID, A.INT_TYPE, A.PENALTY_1, A.CONV_PAID_4, A.INT_PAID_4,
A.CONVFIN_4, A.INTTYPE, B.DEAL, B.FIN;
Any help in re-framing the query faster?
Below the show create table
CREATE TABLE `Table1` (
`PKEY` DECIMAL(10,0) NOT NULL DEFAULT '0',
`ENT` CHAR(3) NOT NULL DEFAULT '',
`DEAL` CHAR(14) NOT NULL DEFAULT '',
`FIN` CHAR(3) NOT NULL DEFAULT '',
`DATE1` DATETIME NULL DEFAULT NULL,
`TYPE1` CHAR(3) NULL DEFAULT NULL,
`STEPNO` CHAR(3) NULL DEFAULT NULL,
`CREATE1` CHAR(1) NULL DEFAULT NULL,
`EVE` CHAR(3) NULL DEFAULT NULL,
`FEVSTATUS` CHAR(1) NULL DEFAULT NULL,
`STATUSDATE` DATETIME NULL DEFAULT NULL,
`INTAMT` DECIMAL(19,5) NULL DEFAULT NULL,
`INTPAID` DECIMAL(19,5) NULL DEFAULT NULL,
`INT_TYPE` CHAR(1) NULL DEFAULT NULL,
`PENALTY_1` DECIMAL(9,6) NULL DEFAULT NULL,
`CONV_PAID_4` DECIMAL(15,2) NULL DEFAULT NULL,
`CONV_PAID_4` DECIMAL(19,5) NULL DEFAULT NULL,
`CONV_FIN_4` CHAR(3) NULL DEFAULT NULL,
`INTTYPE` CHAR(1) NULL DEFAULT NULL,
PRIMARY KEY (`PKEY`),
UNIQUE INDEX `IXQDWFEV_PK` (`PKEY`),
UNIQUE INDEX `IXIDWFEV` (`ENT`, `DEAL`, `FIN`, `DATE1`),
INDEX `IXQDWFEV_GI1` (`ENT`, `DEAL`, `CONV_FIN_4`),
INDEX `IXQDWFEV_GI2` (`ENT`, `DEAL`, `TYPE1`, `STEPNO`),
INDEX `IDX_GRPBY2` (`DEAL`, `FIN`),
INDEX `IXQDWFEV1_TEMP` (`CREATE1`, `FEVSTATUS`, `EVE`)
)
COLLATE='utf8_general_ci'
;

Related

Insert only unique records in SQL

I ran the below query in SQL to insert records (this is just a snippet)
insert into list_member (list_id, list_int_value , list_float_value , list_decimal_value , list_varchar_value, list_datetime_value, modified_by, asof_time, do_not_audit)
select 42, null, null, security_id, null, null, 10, null, null
from security
where not exists(select user_id_4 from list_member.user_id_4 where list_member.user_id_4 = security.user_id_4)
and deleted = 0 and user_id_4 in
(
'ES0125220311',
'ES0132105018',
'ES0167050915'
)
Now I have another list to insert, but only want to insert new records.
I'm unsure where to insert the additional 'where' clause so that it doesn't insert duplicates. I've come up with the below (which in theory should only add the final record), but the additional where clause in bold is likely wrong ...
insert into list_member (list_id, list_int_value , list_float_value , list_decimal_value , list_varchar_value, list_datetime_value, modified_by, asof_time, do_not_audit)
select 42, null, null, security_id, null, null, 10, null, null
from security
**where not exists(select user_id_4 from list_member.user_id_4 where list_member.user_id_4 = security.user_id_4)**
and deleted = 0 and user_id_4 in
(
'ES0125220311',
'ES0132105018',
'ES0167050915',
'ES0123456789'
)
Anyone able to assist?
I changed it to this instead
left join list_member
on list_member.list_id = 42 and list_member.list_decimal_value = security.security_id
Sorry for the trouble.
You can use corelated query as follows:
insert into list_member (list_id, list_int_value , list_float_value , list_decimal_value , list_varchar_value, list_datetime_value, modified_by, asof_time, do_not_audit)
select 42, null, null, security_id, null, null, 10, null, null
from security s
where not exists
(select 1 from list_member l where l.user_id_4 = s.user_id_4)
and deleted = 0 and user_id_4 in
(
'ES0125220311',
'ES0132105018',
'ES0167050915',
'ES0123456789'
)

Converting strange SQL Server JOIN syntax to MySQL syntax

I have a SQL Server query that I am attempting to port to MySQL, but the JOIN syntax is something that I have never seen used before. The query is from a view designed to measure procedure code usage. What the heck is going on with the JOIN syntax just past T.PatID = P.ID, and the third LEFT OUTER JOIN, and what equivalent syntax can we use in MySQL? It does not like this JOIN syntax at all (disregard the ISNULL and CONVERT SQL Server specific syntax)
SELECT
T.Code
, P.LastName
, P.FirstName
, T.TranDate
, CD.DaysUnits
, T.TranAmt
, TD.FullName AS Provider
, ISNULL(TD.ID, ISNULL(AD.ID, PD.ID)) AS DoctorID
FROM
dbo.Doctors AS PD
INNER JOIN
dbo.Transactions AS T
INNER JOIN
dbo.Patients AS P
ON
T.PatID = P.ID
ON
PD.ID = P.DoctorID
LEFT OUTER JOIN
dbo.Doctors AS TD
ON
T.DoctorID = TD.ID
LEFT OUTER JOIN
dbo.Doctors AS AD
LEFT OUTER JOIN
dbo.Appointments
ON
AD.ID = dbo.Appointments.DoctorID
AND CONVERT(varchar(20), dbo.Appointments.ScheduleDateTime, 8) <> '00:00:00'
ON
T.ApptID = dbo.Appointments.ID
LEFT OUTER JOIN
dbo.ChargeDetails AS CD
ON
T.ID = CD.ChargeTranID
WHERE
(
T.Code IS NOT NULL
)
The SHOW CREATE TABLE are as follows
CREATE TABLE Doctors
(
ID int(10) NOT NULL PRIMARY KEY
, FullName varchar(50) DEFAULT NULL
)
CREATE TABLE Patients
(
LName varchar(50) DEFAULT NULL
, FName varchar(50) DEFAULT NULL
, ID int(10) NOT NULL PRIMARY KEY
)
CREATE TABLE Transactions
(
TranType varchar(2) DEFAULT NULL
, Code varchar(100) DEFAULT NULL
, TranSubType varchar(2) DEFAULT NULL
, Description varchar(2000) DEFAULT NULL
, TranDate datetime
, PatID int(10) DEFAULT NULL
, ID int(10) NOT NULL PRIMARY KEY
, TranAmt decimal(19,4) DEFAULT NULL
, ApptID int(10) DEFAULT NULL
, DoctorID int(10) DEFAULT NULL
)
CREATE TABLE ChargeDetails
(
DaysUnits varchar(50) DEFAULT NULL
-- DaysUnits is just an int ranging from 1 to 2
, ChargeTranID int(10) NOT NULL PRIMARY KEY
)
CREATE TABLE Appointments
(
DoctorID int(10) DEFAULT NULL
, PatientID int(10) DEFAULT NULL
, ScheduleDateTime datetime DEFAULT NULL
, ID int(10) NOT NULL PRIMARY KEY
)
Thank you in advance for your help.
Here is a similar (and simplified) query using the same structure as the first query. The second query moves the joins around to make things easier to read.
set nocount on;
use tempdb;
go
declare #doc table (id int not null);
declare #tran table (id int not null, patid int not null);
declare #patients table (id int not null, docid int not null);
insert #doc (id) values (1);
insert #patients (id, docid) values (25, 1);
insert #tran (id, patid) values (100, 25)
select *
from #doc as pd
inner join #tran as t
inner join #patients as p
on t.patid = p.id
on pd.id = p.docid;
select *
from #tran as t
inner join #patients as p
on t.patid = p.id
inner join #doc as pd
on pd.id = p.docid;
Other things look strange. I don't see a need to join to appointments but I'm not going to spend a lot of time to figure out the logic and the schema. The convert usage seems like a bad way to check for null - unless there is a special "flag" datetime value that is used as the equivalent to null. Again, you need to understand the query, the goal of the query, the schema on which it is based, and how the tables are populated. Quite frankly, this code raises concerns about the quality of the entire system.
.

How can I calculate the duration a contact has been in any of the availability states in this database?

Given the following tables:
CREATE TABLE [Contact]
(
[Id] INTEGER NOT NULL,
[Uri] CHARACTER VARYING(255) NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL
);
CREATE TABLE [Availability]
(
[Id] TINYINT NOT NULL,
[Name] CHARACTER VARYING(255) NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL
);
CREATE TABLE [ContactAvailability]
(
[Id] BIGINT NOT NULL,
[ContactId] INTEGER NOT NULL,
[AvailabilityId] INTEGER NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL
);
I am attempting to get a list of all of the contacts and the durations for which they have been in any of the availabilities for the current day.
The ContactAvailability table ends up having records such as:
(1, 1, 1, '01/01/2014 08:00:23.51 -07:00'),
(2, 1, 3, '01/01/2014 08:15:38.01 -07:00'),
(3, 1, 3, '01/01/2014 08:15:38.02 -07:00'),
(4, 2, 2, '01/01/2014 08:18:33.12 -07:00')
These records represent a Contact's transition from one Availability to another, and also from one Availability to the same. It is essentially a running status that is logged on an interval.
The query I have come up with only queries for a particular user and only gets a list of their availabilities for the current day, but it won't calculate how long the Contact has been in any Availability. I am not sure where to start when it comes to that.
This is that query:
SELECT [Contact].[Uri] AS [ContactUri],
[Availability].[Name] AS [AvailabilityName],
[ContactAvailability].[CreatedOn]
FROM [ContactAvailability]
INNER JOIN [Contact] ON [Contact].[Id] = [ContactAvailability].[ContactId]
INNER JOIN [Availability] ON [Availability].[Id] = [ContactAvailability].[AvailabilityId]
WHERE [Contact].[Uri] = 'sip:contact#example.com' AND
[ContactAvailability].[CreatedOn] >= '06/30/2014 00:00:00 -07:00' AND
[ContactAvailability].[CreatedOn] < '07/01/2014 00:00:00 -07:00'
You can use a Window Function in combination with a CTE.
I think this should work, not tested yet :) So you might have to change your column names.
with SourceTable
( ContactID, AvailabilityID, NewDate, OldDate)
as(
SELECT ContactAvailability.ContactID AS ContactID,
ContactAvailability.AvailabilityID AS AvailabilityID,
[ContactAvailability].[CreatedOn] As NewDate,
LAG(ContactAvailability.CreatedON) OVER (Partition By ContactAvailability.ContactID order by ContactAvailability.CreatedOn) as OldDate
FROM [ContactAvailability])
SELECT [Contact].[Uri] AS [ContactUri],
[Availability].[Name] AS [AvailabilityName],
SourceTable.OldDate as PreviousAvailabilityDate,
SourceTable.NewDate as CurrentAvailibilityDate,
SourceTable.NewDate - SourceTable.OldDate as DifferenceBetweenAvailability,
[ContactAvailability].[CreatedOn]
FROM SourceTable
INNER JOIN [Contact] ON [Contact].[Id] = SourceTable.[ContactId]
INNER JOIN [Availability] ON [Availability].[Id] = SourceTable.[AvailabilityId]
If you need to calculate the total time somebody has been in a certain availability (f.e. personA is in availability A then B then A again and then C) you will have to add another cte and partition on ContactAvailability.AvailabilityID and then make a sum of your calculated field.

TSQL CASE on Multiple columns

declare #T table
(
ID int identity primary key,
FBK_ID BIGINT null,
TWT_ID BIGINT null,
LNK_ID NVARCHAR(50) null
);
Each record can ONLY have either a FBK_ID or a TWT_ID or a LNK_ID. No records have multiple values on those fields.
So mainly some records will have FacebookID values, some others have TwitterID, some others have LinkedInID.
QUESTIONS:
what is the fastest and cleanest way to do this?
Select ID, Type from #T
....where Type is a nvarchar(10) equal to either 'Facebook' or 'Twitter' or 'LinkedIn' depending on who has a value?
You could do something like this:
select
ID ,
case when FBK_ID is not null then FBK_ID
when TWT_ID is not null then TWT_ID
else LNK_ID end as LinkID
from #t
where <rest of your conditions if any>
You will get back the ID and one of the link IDS for the specific social network. If you want to know, additionally, to what kind of social network does the LinkID returned belongs to, you can add an extra column as so:
select
ID ,
case when FBK_ID is not null then FBK_ID,
when TWT_ID is not null then TWT_ID
else LNK_ID end as LinkID,
case when FBK_ID is not null then 'F'
when TWT_ID is not null then 'T'
else 'L' end as LinkIDFrom
from #t
where <rest of your conditions if any>
Each record can ONLY have either a FBK_ID or a TWT_ID or a LNK_ID. No
records have multiple values on those fields.
First fix your table:
DECLARE #T TABLE
(
ID INT IDENTITY PRIMARY KEY,
FBK_ID BIGINT NULL,
TWT_ID BIGINT NULL,
LNK_ID NVARCHAR(50) NULL,
CHECK (
(FBK_ID IS NOT NULL AND TWT_ID IS NULL AND LNK_ID IS NULL)
OR (FBK_ID IS NULL AND TWT_ID IS NOT NULL AND LNK_ID IS NULL)
OR (FBK_ID IS NULL AND TWT_ID IS NULL AND LNK_ID IS NOT NULL)
)
);
what is the fastest and cleanest way to do this?
This was quite fast for me to write, employing copy+paste, and looks clean to my eye:
SELECT ID, CAST('Facebook' AS NVARCHAR(10)) AS Type
FROM #T
WHERE FBK_ID IS NOT NULL
UNION
SELECT ID, CAST('Twitter' AS NVARCHAR(10)) AS Type
FROM #T
WHERE TWT_ID IS NOT NULL
UNION
SELECT ID, CAST('LinkedIn' AS NVARCHAR(10)) AS Type
FROM #T
WHERE LNK_ID IS NOT NULL;

Summing a field while excluding certain fields in SQL

I am trying to suma column in a table, while excluding certain records that have the paid field set to true.
I am doing so like this:
SELECT SUM( cost ) AS total
FROM sales
WHERE passport = 'xxxxx'
AND paid <>1
The table is full of data, and I can display costs by themselves, or the entire total. Just adding on
AND paid <>1
Is what causes it to fail. The query does not fail as such, but NULL is returned which is quite useless.
This is the SQL for my table
CREATE TABLE sales (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
uuid varchar(64) NOT NULL,
firstname varchar(64) NOT NULL DEFAULT '',
lastname varchar(64) NOT NULL DEFAULT '',
passport varchar(64) DEFAULT NULL,
product varchar(64) NOT NULL,
quantity int(11) DEFAULT NULL,
cost double DEFAULT NULL,
paymenttype varchar(64) NOT NULL DEFAULT '',
paid tinyint(1) DEFAULT NULL,
tabno varchar(64) NOT NULL,
createdby int(10) unsigned DEFAULT NULL,
creationdate datetime DEFAULT NULL,
modifiedby int(10) unsigned DEFAULT NULL,
modifieddate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id)
)
And the current data
INSERT INTO sales (id, uuid, firstname, lastname, passport, product, quantity, cost, paymenttype, paid, tabno, createdby, creationdate, modifiedby, modifieddate) VALUES
(20, ':8dcee958-d1ac-6791-6253-0a7344054295', 'Jason', 'Hoff', 'r454545', 'Nicaraguan nachoes', 4, 320, 'credit', 1, '23434', 2, '2010-07-06 04:10:18', 2, '2010-07-06 04:10:18'),
(19, ':3f03cda5-21bf-9d8c-5eaa-664eb2d4f5a6', 'Jason', 'Hoff', 'r454545', 'Nica Libre (doble 4 o 5 anos)', 1, 30, 'cash', NULL, '35', 2, '2010-07-06 03:35:35', 2, '2010-07-06 03:35:35'),
(18, ':f83da33b-2238-94b9-897c-debed0c3815e', 'Jason', 'Hoff', 'r454545', 'Helado con salsa de chocolate', 1, 40, 'cash', 1, '2', 2, '2010-07-05 21:30:58', 2, '2010-07-05 21:30:58');
The 'paid' value is NULL for that row. You would need to do
SELECT SUM( cost ) AS total
FROM test.sales
WHERE passport = 'r454545'
AND paid IS NULL or paid = 0
/*Or paid <> 1 as I see you are using tinyint datatype*/
Or, better would be to not allow NULLS in that column and have paid default to 0.
Your problem is that your condition does not match any rows.
The condition paid <> 1 does not match the row where paid is NULL.
Try this query: SELECT 1 <> NULL It will return NULL. A WHERE clause filters out rows in which the clause is either false or NULL.
Replace AND paid <> 1 with AND (paid IS NULL OR paid <> 1)
The book SQL Antipatterns describes this problem in detail, section Searching Nullable Columns. Strongly recommended book.
SELECT SUM( cost ) AS total
FROM sales
WHERE passport = 'xxxxx'
AND paid = false or paid is null
EDITED
use
IS NOT
instead of
!=
This should work
SELECT SUM( cost ) AS total
FROM sales
WHERE passport = 'xxxxx'
AND paid IS NOT true
This may mean there are no rows in the table for which paid <> 1. Or maybe there are but they cost of NULL.
I would not use "<>", but use "!=" instead;
SELECT SUM( cost ) AS total
FROM sales
WHERE passport = 'xxxxx'
AND paid != 1
If that doesn't work, can you post the table definition? And are there any records which do not have paid = 1?