Couldn't use SELECT in including Unicode characters table SQL - sql

I used Bulk Insert to import data from a text:
BULK INSERT dbo.Infosp1 FROM 'D:\test.txt' WITH ( FIELDTERMINATOR =',', ROWTERMINATOR ='\n', DATAFILETYPE='widechar')
This is my 'D:\test.txt'(saved by Endcoding : Unicode ):
BácHồng,Giá,3
BácHồng,Hành,2
BácHồng,Lơxanh,3
BácHồng,Xả,3
BácHồng,Ngao,5
BácHồng,Bắptàu,5
CôHòaBính,Giá,5
CôHòaBính,Càrốt,2
CôHòaBính,Chanh,2
---->This is my table. Click to show image
Why using INSERT shows 0 result??????:
SELECT * FROM dbo.Infosp1 WHERE Khach = 'BácHồng'
or
SELECT * FROM dbo.Infosp1 WHERE Khach = 'CôHòaBính'
---I think that have some problems around convertion data type because save this 'D:\test.txt' by Endcoding : ANSI have some errors like :
BácH?ng, CôH?aBính. In other case, when my field didn't error type i could use SELECT to show all results. I want have a solve could make it work.

Try prepending N to unicode string literals in SQL Server:
SELECT * FROM dbo.Infosp1 WHERE Khach = N'BácHồng'
and
SELECT * FROM dbo.Infosp1 WHERE Khach = N'CôHòaBính'
Example:
create table Infosp1 (Khach nvarchar(64), FirstName nvarchar(64), SomeNumber int)
insert into Infosp1 values
(N'BácHồng',N'Giá',3)
,(N'BácHồng',N'Hành',2)
,(N'BácHồng',N'Lơxanh',3)
,(N'BácHồng',N'Xả',3)
,(N'BácHồng',N'Ngao',5)
,(N'BácHồng',N'Bắptàu',5)
,(N'CôHòaBính',N'Giá',5)
,(N'CôHòaBính',N'Càrốt',2)
,(N'CôHòaBính',N'Chanh',2)
SELECT 'WithN' as WithOrWithoutN, * FROM dbo.Infosp1 WHERE Khach = N'BácHồng'
union all
SELECT 'WithoutN',* FROM dbo.Infosp1 WHERE Khach = 'BácHồng'
union all
SELECT 'WithN', * FROM dbo.Infosp1 WHERE Khach = N'CôHòaBính'
union all
SELECT 'WithoutN',* FROM dbo.Infosp1 WHERE Khach = 'CôHòaBính'
rextester demo: http://rextester.com/WSNNX6950
returns:
+----------------+-----------+-----------+------------+
| WithOrWithoutN | Khach | FirstName | SomeNumber |
+----------------+-----------+-----------+------------+
| WithN | BácHồng | Giá | 3 |
| WithN | BácHồng | Hành | 2 |
| WithN | BácHồng | Lơxanh | 3 |
| WithN | BácHồng | Xả | 3 |
| WithN | BácHồng | Ngao | 5 |
| WithN | BácHồng | Bắptàu | 5 |
| WithN | CôHòaBính | Giá | 5 |
| WithN | CôHòaBính | Càrốt | 2 |
| WithN | CôHòaBính | Chanh | 2 |
| WithoutN | CôHòaBính | Giá | 5 |
| WithoutN | CôHòaBính | Càrốt | 2 |
| WithoutN | CôHòaBính | Chanh | 2 |
+----------------+-----------+-----------+------------+

Related

TSQL - Convert Rows per record to Columns

Consider the following table:
+------------------------------------------------------------------------------+
| GUID | DeviceGUID | DetailGUID | sValue | iValue | gValue | DateStored |
| ENTRY1 | DEVICE1 | Detail1 | SN112 | | | 01/01/2020 |
| ENTRY2 | DEVICE1 | Detail4 | | 1241 | | 01/01/2020 |
| ENTRY3 | DEVICE1 | Detail7 | | | GUID12 | 01/01/2020 |
| ENTRY4 | DEVICE2 | Detail1 | SN111 | | | 01/01/2020 |
| ENTRY5 | DEVICE2 | Detail2 | RND123 | | | 01/01/2020 |
| ENRTY6 | DEVICE2 | Detail4 | | 2351 | | 03/01/2020 |
| ENTRY7 | DEVICE3 | Detail1 | SN100 | | | 02/01/2020 |
| [...] | [...] | [...] | | | | |
| | | | | | | |
+------------------------------------------------------------------------------+
I have a table which links a DeviceGUID with a DetailsGUID, with the idea of having unlimited options for Details (just create a new detail and it will be fetchable). however, this means I have a finite and unknown amount of records per deviceGUID.
What I want to show to my users is a table like this:
+--------+---------------------------------------------------------------------+
| GUID | DeviceGUID |Detail1 |Detail2 |Detail4 |Detail7 |DateStored |
| ENTRY1 | DEVICE1 |SN112 | [NULL] |1241 |GUID12 | [MAX DateStored] |
| ENTRY2 | DEVICE2 |SN111 | RND123 |2351 | [NULL] | [MAX DateStored] |
| ENTRY3 | DEVICE3 |SN100 | | | | |
| [...] | [...] | | | | | |
+------------------------------------------------------------------------------+
I have been searching a bit and found the PIVOT option but that only seems to function for one field,
another option was CROSS APPLY, but that (seems to) need to convert everything to the same datatype; as I hope is visible n the ColumnNames, I will have 3 types of data: String (VARCHAR) value, Integer value, GUID (uniqueidentifier) value, and they will not be interchangeable (meaning Detail with GUID Detail1 will always have a VARCHAR, Detail with DetailGUID Detail4 will always be an Integer
what I was able to find out until now:
DECLARE #columns NVARCHAR(MAX), #sql NVARCHAR(MAX);
SET #columns = N'';
SELECT #columns+=N', p.'+QUOTENAME([Name])
FROM
(
SELECT GUID AS [Name]
FROM [dbo].Details AS p
) AS x;
SET #sql = N'
SELECT [DeviceObjectGUID], '+STUFF(#columns, 1, 2, '')+' FROM (
SELECT [DeviceObjectGUID], [DateStored], [DetailGUID] as [Name]
FROM [dbo].[DeviceDetails]) AS j PIVOT (MAX(DateStored) FOR [Name] in
('+STUFF(REPLACE(#columns, ', p.[', ',['), 1, 1, '')+')) AS p;';
EXEC sp_executesql #sql
to make a dynamic PIVOT for transposing the data.. but as mentioned this is limited to one column
and
select DeviceObjectGUID, value
from DeviceDetails
cross apply
(
select 'sValue', cast(sValue as varchar(MAX)) union all
select 'gValue', cast(gValue as varchar(MAX))union all
select 'iValue', cast(iValue as varchar(MAX))
) c(col, value)
This will have me converting all fields to VARCHAR..
One other option I tried (to understand PIVOT) was this:
SELECT *
FROM
(SELECT *
FROM [dbo].[DeviceDetails]
) AS SourceTable
PIVOT(
max(sValue)FOR [DetailGUID] IN(
[450533BB-43B2-499B-B2F7-094BFAE949B0],
[7483E518-EB61-4B72-93F7-0F97BBFAFA01],
[29B1BDE8-3AD4-4576-8B76-3CAE83E10B11],
[5FC8CC76-12EB-4924-9320-5D09BBE97C10],
[789AA79B-B1DF-4BA2-860A-7129B39D341F],
[C90F4EFE-D848-4BAB-96BF-8DC6BF4F6E62],
[EC6A4ED3-1475-4B0A-8E08-B2F4E095622F],
[D442B7CA-5825-49D9-9977-D88770304B57],
[99B70FEE-999B-4D44-83E9-EB8119B15930],
[3F83ED76-8CC3-4B3D-936A-F528DEB6C045]
)
) AS PivotTable;
(The GUIDS in the 'IN clause are the DetailGUIDs)
this almost gets me what I want, except that it is not dynamic and the fact that it is still limited to one data column. (max(sValue)) in this case.
===================
in response to
It should be as simple as that:
SELECT *
FROM
(
SELECT DeviceGUID
,DetailGUID
,CONCAT(sValue, iValue, gValue) as [value]
,DateStored
FROM my_table
) DS
PIVOT
(
MAX([value]) FOR DetailGUID IN (......)
) PVT

Change value in row based on a lookup from other rows

I have this data in a table(actually output of a query):
+--------------+------+---------+
| Connection | Pin | Circuit |
+--------------+------+---------+
| Value 1 | 1 | 33 |
| Value 1 | 2 | 1004 |
| Value 1 | 3 | 1015 |
| Value 1 | 4 | |
| Value 2 | SP-A | 1003 |
| Value 2 | SP-A | 1004 |
| Value 2 | SP-A | 1005 |
| Value 2 | SP-B | 1014 |
| Value 2 | SP-B | 1015 |
| Value 2 | SP-B | 1016 |
+--------------+------+---------+
I would like to use an SQL query to change it to this:
(changing the Pin based on a matching Circuit)
e.g.:
For each "SP-A", get the list of possible Circuits (1003, 1004, 1005)
Then look for a matching Circuit in another Connection (here this matches 1004, so we get Pin = 2)
Then replace the original value "SP-A" here with the match "2"
+------------+-------+---------+
| Connection | Pin | Circuit |
+------------+-------+---------+
| Value 1 | 1 | 33 |
| Value 1 | 2 | 1004 |
| Value 1 | 3 | 1015 |
| Value 1 | 4 | |
| Value 2 | *2* | 1003 |
| Value 2 | *2* | *1004*|
| Value 2 | *2* | 1005 |
| Value 2 | *3* | 1014 |
| Value 2 | *3* | *1015*|
| Value 2 | *3* | 1016 |
+------------+-------+---------+
My SQL skills are lacking.
I'm doing this in MS-Access.
First of all, try this, if it works, apply it on your main data.
CREATE TABLE #TEMP
(Connection nvarchar(50),
Pin nvarchar(50),
Circuit nvarchar(50))
INSERT INTO #TEMP
SELECT Connection, Pin, Circuit FROM Table_1
UPDATE TU
SET Pin = (
SELECT '*' + T1.Pin + '*' FROM Table_1 T1
INNER JOIN Table_1 T2 ON T1.Circuit = T2.Circuit AND T1.Connection <> T2.Connection AND T2.Pin = TU.Pin
)
FROM #TEMP TU
WHERE Connection = 'Value 2'
UPDATE TU
SET Circuit = '*' + T2.Circuit + '*'
FROM #TEMP TU
INNER JOIN #TEMP T2 ON TU.Circuit = T2.Circuit AND TU.Connection <> T2.Connection
WHERE TU.Connection = 'Value 2'
SELECT * FROM #TEMP
DROP TABLE #TEMP
You can express the logic using a correlated subquery:
update t
set pin = (select top (1) t2.pin
from t as t2
where t2.circuit = t.circuit and
t2.connection <> t.connection
)
where pin in ('SP-A', 'SP-B');

SQL - Rows that are repetitive with a particular condition

We have a table like this:
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 3 | Steve | SomeService3 | | | | 2 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 4 | Steve | SomeService4 | | | | 12 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
Every digit in zones is a tooth (dental science) and it means "John" has got "SomeService1" twice for tooth #3.
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
Note that Steve has received services twice for tooth #2 (4th Zone) but services are not one.
I'd write some code that gives me a table with duplicate rows (Checking the only patient and received service)(using "group by" clause") but I need to check zones too.
I've tried this:
select ROW_NUMBER() over(order by vv.ID_sick) as RowNum,
bb.Radif,
bb.VCount as 'Count',
vv.ID_sick 'ID_Sick',
vv.ID_service 'ID_Service',
sick.FNamesick + ' ' + sick.LNamesick as 'Sick',
serv.NameService as 'Service',
vv.Mab_Service as 'MabService',
vv.Mab_daryafti as 'MabDaryafti',
vv.datevisit as 'DateVisit',
vv.Zone1,
vv.Zone2,
vv.Zone3,
vv.Zone4,
vv.ID_dentist as 'ID_Dentist',
dent.FNamedentist + ' ' + dent.LNamedentist as 'Dentist',
vv.id_do as 'ID_Do',
do.FNamedentist + ' ' + do.LNamedentist as 'Do'
from visiting vv inner join (
select ROW_NUMBER() OVER(ORDER BY a.ID_sick ASC) AS Radif,
count(a.ID_sick) as VCount,
a.ID_sick,
a.ID_service
from visiting a
group by a.ID_sick, a.ID_service, a.Zone1, a.Zone2, a.Zone3, a.Zone4
having count(a.ID_sick)>1)bb
on vv.ID_sick = bb.ID_sick and vv.ID_service = bb.ID_service
left join InfoSick sick on vv.ID_sick = sick.IDsick
left join infoService serv on vv.ID_service = serv.IDService
left join Infodentist dent on vv.ID_dentist = dent.IDdentist
left join infodentist do on vv.id_do = do.IDdentist
order by bb.ID_sick, bb.ID_service,vv.datevisit
But this code only returns rows with all tooths repeated. What I want is even one tooth repeats ...
How can I implement it?
I need to check characters in zones.
**Zone's datatype is varchar
This is a bad datamodel for what you are trying to do. By storing the teeth as a varchar, you have kind of decided that you are not interested in single teeth, but only in the group of teeth. Now, however, you are trying to investigate on single teeth.
You'd want a datamodel like this:
service
+------------+--------+-----------------+
| service_id | Name | RecievedService |
+------------+--------+-----------------+
| 1 | John | SomeService1 |
+------------+--------+-----------------+
| 3 | Steve | SomeService3 |
+------------+--------+-----------------+
| 4 | Steve | SomeService4 |
+------------+-------+-----------------+
service_detail
+------------+------+-------+
| service_id | zone | tooth |
+------------+------+-------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 3 | 4 |
+------------+------+-------+
| 1 | 1 | 3 |
| 1 | 1 | 4 |
+------------+------+-------+
| 3 | 4 | 2 |
+------------+------+-------+
| 4 | 4 | 1 |
| 4 | 4 | 2 |
+------------+------+-------+
What you can do with the given datamodel is to create such table on-the-fly using a recursive query and string manipulation:
with unpivoted(service_id, name, zone, teeth) as
(
select recievedservice, name, 1, firstzoneteeth
from mytable where len(firstzoneteeth) > 0
union all
select recievedservice, name, 2, secondzoneteeth
from mytable where len(secondzoneteeth) > 0
union all
select recievedservice, name, 3, thirdzoneteeth
from mytable where len(thirdzoneteeth) > 0
union all
select recievedservice, name, 4, fourthzoneteeth
from mytable where len(fourthzoneteeth) > 0
)
, service_details(service_id, name, zone, tooth, teeth) as
(
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from unpivoted
union all
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from service_details
where len(teeth) > 0
)
, duplicates(service_id, name) as
(
select distinct service_id, name
from service_details
group by service_id, name, zone, tooth
having count(*) > 1
)
select m.*
from mytable m
join duplicates d on d.service_id = m.recievedservice and d.name = m.name;
A lot of work and a rather slow query due to a bad datamodel, but still feasable.
Rextester demo: http://rextester.com/JVWK49901

Combine two tables - sql server

I'm trying to combine 2 tables in SQL Server
Table 1: SO
ItemCode | SONumber| SODate | SOQTY
-------------------------------------------
TBJ182-01-02 | 0005251 | 29/01/2014 | 5
TBJ184-01-02 | 0005251 | 29/01/2014 | 2
TBJ182-01-02 | 0005554 | 15/02/2014 | 4
TBJ185-01-02 | 0005554 | 15/02/2014 | 5
Table 2: PO
ItemCode | PONumber| PODate | POQTY
--------------------------------------------
TBJ182-01-02 | 0009105 | 11/02/2014 | 8
TBJ184-01-02 | 0009208 | 14/02/2014 | 5
TBJ189-01-02 | 0009208 | 14/02/2014 | 5
Result table:
ItemCode | SONumber| SODate | SOQTY | PONmber |PODate | POQTY
-------------------------------------------------------------------------
TBJ182-01-02 | 0005251 | 29/01/2014| 5 | | |
TBJ184-01-02 | 0005251 | 29/01/2014| 2 | | |
TBJ182-01-02 | 0005554 | 15/02/2014| 4 | | |
TBJ185-01-02 | 0005554 | 15/02/2014| 5 | | |
TBJ182-01-02 | | | | 0009105 | 11/02/2014 | 8
TBJ184-01-02 | | | | 0009208 | 14/02/2014 | 5
TBJ189-01-02 | | | | 0009208 | 14/02/2014 | 5
Could you help?
You can do this most easily with a full outer join and a little trick:
select coalesce(SO.ItemCode, PO.ItemCode) as ItemCode,
SO.SONumber, SO.SODate, SO.SOQTY,
PO.PONmber, PO.PODate, PO.POQTY
from SO full outer join
PO
on 1 = 0;
Nothing more i have modified removed outer and ISNULL in place of coalesce
DECLARE #Tbl TABLE (
ITEMCode VARCHAR(100),
SONUMBER VARCHAR(100),
SoQTY INT
)
INSERT INTO #Tbl VALUES ('TBJ182-01-02','0005251',1)
INSERT INTO #Tbl VALUES ('TBJ184-01-02', '0005251', 2)
INSERT INTO #Tbl VALUES ('TBJ182-01-02', '0005554',4)
INSERT INTO #Tbl VALUES ('TBJ182-01-02', '0005554',6)
DECLARE #Tbl1 TABLE (
ITEMCode VARCHAR(100),
PONUMBER VARCHAR(100),
POQTY INT
)
INSERT INTO #Tbl1 VALUES ('TBJ182-01-02','0005251',1)
INSERT INTO #Tbl1 VALUES ('TBJ184-01-02', '0005251', 2)
INSERT INTO #Tbl1 VALUES ('TBJ182-01-02', '0005554',4)
INSERT INTO #Tbl1 VALUES ('TBJ182-01-02', '0005554',6)
select ISNULL(t.ITEMCode,tt.ITEMCode),t.SONUMBER,t.SoQTY,tt.PONUMBER,tt.POQTY from #Tbl t
FULL JOIN #Tbl1 tt
ON 1 = 0

Optimal query to fetch a cumulative sum in MySQL

What is 'correct' query to fetch a cumulative sum in MySQL?
I've a table where I keep information about files, one column list contains the size of the files in bytes. (the actual files are kept on disk somewhere)
I would like to get the cumulative file size like this:
+------------+---------+--------+----------------+
| fileInfoId | groupId | size | cumulativeSize |
+------------+---------+--------+----------------+
| 1 | 1 | 522120 | 522120 |
| 2 | 2 | 316042 | 316042 |
| 4 | 2 | 711084 | 1027126 |
| 5 | 2 | 697002 | 1724128 |
| 6 | 2 | 663425 | 2387553 |
| 7 | 2 | 739553 | 3127106 |
| 8 | 2 | 700938 | 3828044 |
| 9 | 2 | 695614 | 4523658 |
| 10 | 2 | 744204 | 5267862 |
| 11 | 2 | 609022 | 5876884 |
| ... | ... | ... | ... |
+------------+---------+--------+----------------+
20000 rows in set (19.2161 sec.)
Right now, I use the following query to get the above results
SELECT
a.fileInfoId
, a.groupId
, a.size
, SUM(b.size) AS cumulativeSize
FROM fileInfo AS a
LEFT JOIN fileInfo AS b USING(groupId)
WHERE a.fileInfoId >= b.fileInfoId
GROUP BY a.fileInfoId
ORDER BY a.groupId, a.fileInfoId
My solution is however, extremely slow. (around 19 seconds without cache).
Explain gives the following execution details
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | |
| 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
My question is:
How can I optimize the above query?
Update
I've updated the question as to provide the table structure and a procedure to fill the table with 20,000 records test data.
CREATE TABLE `fileInfo` (
`fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
, `groupId` int(10) unsigned NOT NULL
, `name` varchar(128) NOT NULL
, `size` int(10) unsigned NOT NULL
, PRIMARY KEY (`fileInfoId`)
, KEY `groupId` (`groupId`)
) ENGINE=InnoDB;
delimiter $$
DROP PROCEDURE IF EXISTS autofill$$
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE gid INT DEFAULT 0;
DECLARE nam char(20);
DECLARE siz INT DEFAULT 0;
WHILE i < 20000 DO
SET gid = FLOOR(RAND() * 250);
SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
SET siz = FLOOR((RAND() * 1024 * 1024));
INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
SET i = i + 1;
END WHILE;
END;$$
delimiter ;
CALL autofill();
About the possible duplicate question
The question linked by Forgotten Semicolon is not the same question. My question has extra column. because of this extra groupId column, the accepted answer there does not work for my problem. (maybe it can be adapted to work, but I don't know how, hence my question)
You could use a variable - it's far quicker than any join:
SELECT
id,
size,
#total := #total + size AS cumulativeSize,
FROM table, (SELECT #total:=0) AS t;
Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:
Create the table:
DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`size` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Fill with 20,000 random numbers:
DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 20000 DO
INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL autofill();
Check the row count:
SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
| 20000 |
+----------+
Run the cumulative total query:
SELECT
id,
size,
#total := #total + size AS cumulativeSize
FROM table1, (SELECT #total:=0) AS t;
+-------+------+----------------+
| id | size | cumulativeSize |
+-------+------+----------------+
| 1 | 226 | 226 |
| 2 | 869 | 1095 |
| 3 | 668 | 1763 |
| 4 | 733 | 2496 |
...
| 19997 | 966 | 10004741 |
| 19998 | 522 | 10005263 |
| 19999 | 713 | 10005976 |
| 20000 | 0 | 10005976 |
+-------+------+----------------+
20000 rows in set (0.07 sec)
UPDATE
I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.
I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.
It's beautifully simple, elegant, and very quick:
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId:=0, #cs:=0) AS vars
ORDER BY groupId
) AS tmp;
You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.
Here's a simple test case:
INSERT INTO `fileInfo` VALUES
( 1, 3, 'name0', '10'),
( 5, 3, 'name1', '10'),
( 7, 3, 'name2', '10'),
( 8, 1, 'name3', '10'),
( 9, 1, 'name4', '10'),
(10, 2, 'name5', '10'),
(12, 4, 'name6', '10'),
(20, 4, 'name7', '10'),
(21, 4, 'name8', '10'),
(25, 5, 'name9', '10');
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId := 0, #cs := 0) AS vars
ORDER BY groupId
) AS tmp;
+------------+---------+-------+------+----------------+
| fileInfoId | groupId | name | size | cumulativeSize |
+------------+---------+-------+------+----------------+
| 8 | 1 | name3 | 10 | 10 |
| 9 | 1 | name4 | 10 | 20 |
| 10 | 2 | name5 | 10 | 10 |
| 1 | 3 | name0 | 10 | 10 |
| 5 | 3 | name1 | 10 | 20 |
| 7 | 3 | name2 | 10 | 30 |
| 12 | 4 | name6 | 10 | 10 |
| 20 | 4 | name7 | 10 | 20 |
| 21 | 4 | name8 | 10 | 30 |
| 25 | 5 | name9 | 10 | 10 |
+------------+---------+-------+------+----------------+
Here's a sample of the last few rows from a 20,000 row table:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 |
| 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 |
| 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 |
| 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 |
| 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 |
| 275 | 249 | 86RJM39K72K | 860323 | 1688396 |
| 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 |
...
| 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 |
| 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 |
| 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 |
+------------+---------+-------------+---------+----------------+
20000 rows in set (0.31 sec)
I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.
Add a covering compound index that includes both primaryId and foreignId.