Import JSON File into SQL Server Table with nested Arrays - sql

I'm trying to import the Census Block GeoJSON file and unable to get "Coordinates" for each block along with it's properties. I'm trying to get the ID, BlockGrp, Block.. and it's associated coordinates. Below is my code, but I'm unable to get the coordinates since it's in an array. Can anybody guide me on how to make it work?
Declare #GeoJSON varchar(max)
SELECT #GeoJSON = BulkColumn
FROM OPENROWSET (BULK 'C:\Temp\Census_Blocks__2010.geojson', SINGLE_CLOB) as j
SELECT *
FROM OPENJSON (#GeoJSON,'$.features')
WITH
(
OBJECTID INT N'$.properties.OBJECTID'
, BLKGRP NVARCHAR(10) N'$.properties.BLKGRP'
, BLOCK INT N'$.properties.BLOCK'
, GEOID NVARCHAR(100) N'$.properties.GEOID'
, GEOID10 NVARCHAR(100) N'$.properties.GEOID10'
, ACRES nvarchar(100) N'$.properties.ACRES'
, Shape_Length nvarchar(100) N'$.properties.Shape_length'
, Shape_Area nvarchar(100) N'$.properties.Shape_Area'
, SQMILES nvarchar(100) N'$.properties.SQMILES'
, Longitude nvarchar(100) N'$.geometry.coordinates[0]'
, Latitude nvarchar(100) N'$.geometry.coordinates[1]'
) a
The output is as follows:
+----------+---------+-------+-----------------+-----------------+-------------+--------------------+--------------------+------------+-----------+----------+
| OBJECTID | BLKGRP | BLOCK | GEOID | GEOID10 | ACRES | Shape_Length | Shape_Area | SQMILES | Longitude | Latitude |
+----------+---------+-------+-----------------+-----------------+-------------+--------------------+--------------------+------------+-----------+----------+
| 1 | 0005011 | 1004 | 110010005011004 | 110010005011004 | 92.90825947 | 3646.7801257671467 | 375986.38657525991 | 0.14516916 | NULL | NULL |
| 2 | 0005011 | 1005 | 110010005011005 | 110010005011005 | 4.22602654 | 600.80242048281752 | 17102.122624542077 | 0.00660317 | NULL | NULL |
| 3 | 0005011 | 1006 | 110010005011006 | 110010005011006 | 3.37694114 | 567.78401560218686 | 13665.995959875707 | 0.00527647 | NULL | NULL |
| 4 | 0005011 | 1007 | 110010005011007 | 110010005011007 | 6.2465494 | 784.3194030589018 | 25278.888549948519 | 0.00976023 | NULL | NULL |
| 5 | 0005011 | 1008 | 110010005011008 | 110010005011008 | 0.45035641 | 233.98753402256077 | 1822.5277124594836 | 0.00070368 | NULL | NULL |
| 6 | 0005011 | 1009 | 110010005011009 | 110010005011009 | 2.54391236 | 523.98099364773702 | 10294.848087676977 | 0.00397486 | NULL | NULL |
| 7 | 0005011 | 1010 | 110010005011010 | 110010005011010 | 3.65630529 | 511.54127551683035 | 14796.542550295248 | 0.00571298 | NULL | NULL |
| 8 | 0005011 | 1011 | 110010005011011 | 110010005011011 | 5.64727404 | 689.75830443180621 | 22853.707228554606 | 0.00882387 | NULL | NULL |
| 9 | 0005011 | 1012 | 110010005011012 | 110010005011012 | 7.38896984 | 856.70248366785154 | 29902.100049688841 | 0.01154527 | NULL | NULL |
| 10 | 0005011 | 1013 | 110010005011013 | 110010005011013 | 2.45065536 | 590.21583640085453 | 9917.4503661506897 | 0.00382915 | NULL | NULL |
+----------+---------+-------+-----------------+-----------------+-------------+--------------------+--------------------+------------+-----------+----------+
The GeoJSON file structure is as follows:
{
"type":"FeatureCollection",
"features":[
{
"type":"Feature",
"geometry":{
"type":"Polygon",
"coordinates":[
]
},
"properties":{
}
}
]
}
The GEOJSON file is available here.

I think you are on the right track, you should just perform a few additional steps:
retrieve the content of coordinates property using the AS JSON syntax
add two additional OPENJSON statements to shred your array to the [Longitude,Latitude] level with CROSS APPLY statements
retrieve Longitude and Latitude values with JSON_VALUE in your SELECT statement.
This is a sample query that should extract what you need:
SELECT
a.OBJECTID
, a.BLKGRP
, a.BLOCK
, a.GEOID
, a.GEOID10
, a.ACRES
, a.Shape_Length
, a.Shape_Area
, a.SQMILES
, JSON_VALUE(array2,'$[0]') as Longitude
, JSON_VALUE(array2,'$[1]') as Latitude
FROM OPENJSON (#GeoJSON,'$.features')
WITH
(
OBJECTID INT N'$.properties.OBJECTID'
, BLKGRP NVARCHAR(10) N'$.properties.BLKGRP'
, BLOCK INT N'$.properties.BLOCK'
, GEOID NVARCHAR(100) N'$.properties.GEOID'
, GEOID10 NVARCHAR(100) N'$.properties.GEOID10'
, ACRES nvarchar(100) N'$.properties.ACRES'
, Shape_Length nvarchar(100) N'$.properties.Shape_length'
, Shape_Area nvarchar(100) N'$.properties.Shape_Area'
, SQMILES nvarchar(100) N'$.properties.SQMILES'
, coordinates NVARCHAR(MAX) N'$.geometry.coordinates' AS JSON
) a
CROSS APPLY OPENJSON(coordinates) WITH (array nvarchar(max) N'$' as json) b
CROSS APPLY OPENJSON(array) WITH (array2 nvarchar(max) N'$' as json) c
Sample output:
You can see this code in action on a subset of your data in this fiddle.

Related

TSQL - Convert Rows per record to Columns

Consider the following table:
+------------------------------------------------------------------------------+
| GUID | DeviceGUID | DetailGUID | sValue | iValue | gValue | DateStored |
| ENTRY1 | DEVICE1 | Detail1 | SN112 | | | 01/01/2020 |
| ENTRY2 | DEVICE1 | Detail4 | | 1241 | | 01/01/2020 |
| ENTRY3 | DEVICE1 | Detail7 | | | GUID12 | 01/01/2020 |
| ENTRY4 | DEVICE2 | Detail1 | SN111 | | | 01/01/2020 |
| ENTRY5 | DEVICE2 | Detail2 | RND123 | | | 01/01/2020 |
| ENRTY6 | DEVICE2 | Detail4 | | 2351 | | 03/01/2020 |
| ENTRY7 | DEVICE3 | Detail1 | SN100 | | | 02/01/2020 |
| [...] | [...] | [...] | | | | |
| | | | | | | |
+------------------------------------------------------------------------------+
I have a table which links a DeviceGUID with a DetailsGUID, with the idea of having unlimited options for Details (just create a new detail and it will be fetchable). however, this means I have a finite and unknown amount of records per deviceGUID.
What I want to show to my users is a table like this:
+--------+---------------------------------------------------------------------+
| GUID | DeviceGUID |Detail1 |Detail2 |Detail4 |Detail7 |DateStored |
| ENTRY1 | DEVICE1 |SN112 | [NULL] |1241 |GUID12 | [MAX DateStored] |
| ENTRY2 | DEVICE2 |SN111 | RND123 |2351 | [NULL] | [MAX DateStored] |
| ENTRY3 | DEVICE3 |SN100 | | | | |
| [...] | [...] | | | | | |
+------------------------------------------------------------------------------+
I have been searching a bit and found the PIVOT option but that only seems to function for one field,
another option was CROSS APPLY, but that (seems to) need to convert everything to the same datatype; as I hope is visible n the ColumnNames, I will have 3 types of data: String (VARCHAR) value, Integer value, GUID (uniqueidentifier) value, and they will not be interchangeable (meaning Detail with GUID Detail1 will always have a VARCHAR, Detail with DetailGUID Detail4 will always be an Integer
what I was able to find out until now:
DECLARE #columns NVARCHAR(MAX), #sql NVARCHAR(MAX);
SET #columns = N'';
SELECT #columns+=N', p.'+QUOTENAME([Name])
FROM
(
SELECT GUID AS [Name]
FROM [dbo].Details AS p
) AS x;
SET #sql = N'
SELECT [DeviceObjectGUID], '+STUFF(#columns, 1, 2, '')+' FROM (
SELECT [DeviceObjectGUID], [DateStored], [DetailGUID] as [Name]
FROM [dbo].[DeviceDetails]) AS j PIVOT (MAX(DateStored) FOR [Name] in
('+STUFF(REPLACE(#columns, ', p.[', ',['), 1, 1, '')+')) AS p;';
EXEC sp_executesql #sql
to make a dynamic PIVOT for transposing the data.. but as mentioned this is limited to one column
and
select DeviceObjectGUID, value
from DeviceDetails
cross apply
(
select 'sValue', cast(sValue as varchar(MAX)) union all
select 'gValue', cast(gValue as varchar(MAX))union all
select 'iValue', cast(iValue as varchar(MAX))
) c(col, value)
This will have me converting all fields to VARCHAR..
One other option I tried (to understand PIVOT) was this:
SELECT *
FROM
(SELECT *
FROM [dbo].[DeviceDetails]
) AS SourceTable
PIVOT(
max(sValue)FOR [DetailGUID] IN(
[450533BB-43B2-499B-B2F7-094BFAE949B0],
[7483E518-EB61-4B72-93F7-0F97BBFAFA01],
[29B1BDE8-3AD4-4576-8B76-3CAE83E10B11],
[5FC8CC76-12EB-4924-9320-5D09BBE97C10],
[789AA79B-B1DF-4BA2-860A-7129B39D341F],
[C90F4EFE-D848-4BAB-96BF-8DC6BF4F6E62],
[EC6A4ED3-1475-4B0A-8E08-B2F4E095622F],
[D442B7CA-5825-49D9-9977-D88770304B57],
[99B70FEE-999B-4D44-83E9-EB8119B15930],
[3F83ED76-8CC3-4B3D-936A-F528DEB6C045]
)
) AS PivotTable;
(The GUIDS in the 'IN clause are the DetailGUIDs)
this almost gets me what I want, except that it is not dynamic and the fact that it is still limited to one data column. (max(sValue)) in this case.
===================
in response to
It should be as simple as that:
SELECT *
FROM
(
SELECT DeviceGUID
,DetailGUID
,CONCAT(sValue, iValue, gValue) as [value]
,DateStored
FROM my_table
) DS
PIVOT
(
MAX([value]) FOR DetailGUID IN (......)
) PVT

how to fetch '/' separated node/tag name from a given xml in sql

i want to fetch '/' separated node name from a given xml such that only node/tag name are getting fetched instead of node/tag value from a given xml.
Suppose if i have below xml :
<ns:manageWorkItemRequest>
<ns:wiFocus>
<act:orderDate>2020-03-16T10:30:56.000Z</act:orderDate>
<act:orderItem>
<agr:instance>
<spec1:customerServiceIdentifier>ETHA15302121</spec1:customerServiceIdentifier>
<spec1:instanceCharacteristic>
<spec1:action>
<spec1:code>Modify</spec1:code>
</spec1:action>
<spec1:instanceIdentifier>
<spec1:value>OS014-AHEFV5T9</spec1:value>
</spec1:instanceIdentifier>
</agr:instance>
</act:orderItem>
<act:orderVersion>1</act:orderVersion>
</ns:wiFocus>
<ns:wiAction>Create</ns:wiAction>
<ns:wiVersion>1</ns:wiVersion>
</ns:manageWorkItemRequest>
I want result as :
ns:manageWorkItemRequest/ns:wiFocus/act:orderItem/agr:instance/spec1:customerServiceIdentifier/ETHA15302121
actually the requirement is if i get this "ETHA15302121" value in above xml then i should show the path i.e. where exactly in xml that value is in '/' separated format.
Your XML was not well-formed (missing closing tag in the middle and missing namespace declarations.
After adding the missing parts it looks as so and you might try something along this route (warning: this won't be fast...):
Your XML
DECLARE #xml XML=
N'<root xmlns:ns="dummy1" xmlns:act="dummy2" xmlns:agr="dummy3" xmlns:spec1="dummy4">
<ns:manageWorkItemRequest>
<ns:wiFocus>
<act:orderDate>2020-03-16T10:30:56.000Z</act:orderDate>
<act:orderItem>
<agr:instance>
<spec1:customerServiceIdentifier>ETHA15302121</spec1:customerServiceIdentifier>
<spec1:instanceCharacteristic>
<spec1:action>
<spec1:code>Modify</spec1:code>
</spec1:action>
<spec1:instanceIdentifier>
<spec1:value>OS014-AHEFV5T9</spec1:value>
</spec1:instanceIdentifier>
</spec1:instanceCharacteristic>
</agr:instance>
</act:orderItem>
<act:orderVersion>1</act:orderVersion>
</ns:wiFocus>
<ns:wiAction>Create</ns:wiAction>
<ns:wiVersion>1</ns:wiVersion>
</ns:manageWorkItemRequest>
</root>';
--the query
WITH AllNamespaces As
(
SELECT CONCAT('ns',ROW_NUMBER() OVER(ORDER BY (B.namespaceUri))) Prefix
,B.namespaceUri
FROM #xml.nodes('//*') A(nd)
CROSS APPLY(VALUES(A.nd.value('namespace-uri(.)','nvarchar(max)')))B(namespaceUri)
WHERE LEN(B.namespaceUri)>0
GROUP BY B.namespaceUri
)
,recCte AS
(
SELECT 1 AS NestLevel
,ROW_NUMBER() OVER(ORDER BY A.nd) AS ElementPosition
,CAST(REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0') AS VARCHAR(900)) COLLATE DATABASE_DEFAULT AS SortString
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
,CAST(CONCAT('/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS NVARCHAR(MAX)) COLLATE DATABASE_DEFAULT AS XPath
,A.nd.value('text()[1]','nvarchar(max)') AS NodeValue
,A.nd.query('./*') NextFragment
FROM #xml.nodes('/*') A(nd)
LEFT JOIN AllNamespaces ns ON ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')
UNION ALL
SELECT r.NestLevel+1
,ROW_NUMBER() OVER(ORDER BY A.nd)
,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0')) AS VARCHAR(900)) COLLATE DATABASE_DEFAULT
,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
,CONCAT(r.XPath,'/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
,A.nd.value('text()[1]','nvarchar(max)') AS NodeValue
,A.nd.query('./*') NextFragment
FROM recCte r
CROSS APPLY NextFragment.nodes('*') A(nd)
OUTER APPLY(SELECT Prefix FROM AllNamespaces ns WHERE ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')) ns
)
SELECT XPath
,NodeValue
,NestLevel
,ElementPosition
,SortString
FROM recCte
--WHERE NodeValue IS NOT NULL
ORDER BY SortString;
--The result
/*
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| XPath | NodeValue | NestLevel | ElementPosition | SortString |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderDate[1] | 2020-03-16T10:30:56.000Z | 4 | 1 | 00001000010000100001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:customerServiceIdentifier[1] | ETHA15302121 | 6 | 1 | 000010000100001000020000100001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:action[1]/ns4:code[1] | Modify | 8 | 1 | 0000100001000010000200001000020000100001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:instanceIdentifier[1]/ns4:value[1] | OS014-AHEFV5T9 | 8 | 1 | 0000100001000010000200001000020000200001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderVersion[1] | 1 | 4 | 3 | 00001000010000100003 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiAction[1] | Create | 3 | 2 | 000010000100002 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiVersion[1] | 1 | 3 | 3 | 000010000100003 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
*/
--just to show, that the created XPath is working as expected:
WITH XMLNAMESPACES('dummy1' AS ns1,'dummy2' AS ns2,'dummy3' AS ns3,'dummy4' AS ns4,'dummy5' AS ns5)
SELECT #xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderDate[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:customerServiceIdentifier[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:action[1]/ns4:code[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:instanceIdentifier[1]/ns4:value[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderVersion[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiAction[1]','nvarchar(max)')
,#xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiVersion[1]','nvarchar(max)');
The idea in short:
The namespace prefixes can be defined by your own. The underlying URI is important.
The first cte will create a set of all occuring URIs and return this together with a prefix.
The recursive CTE will traverse deeper and deeper into the XML. This will continue as long as APPLY with .nodes() can return nested nodes.
The full name is concatenated as well as the full XPath.
The CASTs and COLLATEs help to avoid data type mismatch (recursive CTEs are very picky with this).
The concatenated SortString is needed to ensure the same order in your output.
UPDATE: You might think about FROM OPENXML
Just to mention it: There is the absolutely outdated FROM OPENXML, which is - afaik - the only way to get literally everything back:
DECLARE #xml XML=
N'<root xmlns="default" xmlns:ns="dummy">
<a ns:test="blah">blub</a>
<ns:b test2="hugo">blubber</ns:b>
</root>';
DECLARE #DocHandle INT;
EXEC sp_xml_preparedocument #DocHandle OUTPUT, #xml;
SELECT * FROm OPENXML(#DocHandle,'/*');
EXEC sp_xml_removedocument #DocHandle;
the result
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| id | parentid | nodetype | localname | prefix | namespaceuri | datatype | prev | text |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 0 | NULL | 1 | root | NULL | default | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 2 | 0 | 2 | xmlns | xmlns | NULL | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 10 | 2 | 3 | #text | NULL | NULL | NULL | NULL | default |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 3 | 0 | 2 | ns | xmlns | NULL | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 11 | 3 | 3 | #text | NULL | NULL | NULL | NULL | dummy |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 4 | 0 | 1 | a | NULL | default | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 5 | 4 | 2 | test | ns | dummy | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 12 | 5 | 3 | #text | NULL | NULL | NULL | NULL | blah |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 6 | 4 | 3 | #text | NULL | NULL | NULL | NULL | blub |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 7 | 0 | 1 | b | ns | dummy | NULL | 4 | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 8 | 7 | 2 | test2 | NULL | NULL | NULL | NULL | NULL |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 13 | 8 | 3 | #text | NULL | NULL | NULL | NULL | hugo |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 9 | 7 | 3 | #text | NULL | NULL | NULL | NULL | blubber |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
As you can see, this result contains namespaces, prefixes and content. But it is very clumsy and far away from "today" :-)

Converting rows from a table into days of the week

What I thought was going to be a fairly easy task is becoming a lot more difficult than I expected. We have several tasks that get performed sometimes several times per day, so we have a table that gets a row added whenever a user performs the task. What I need is a snapshot of the month with the initials and time of the person that did the task like this:
The 'activity log' table is pretty simple, it just has the date/time the task was performed along with the user that did it and the scheduled time (the "Pass Time" column in the image); this is the table I need to flatten out into days of the week.
Each 'order' can have one or more 'pass times' and each pass time can have zero or more initials for that day. For example, for pass time 8:00, it can be done several times during that day or not at all.
I have tried standard joins to get the orders and the scheduled pass times with no issues, but getting the days of the week is escaping me. I have tried creating a function to get all the initials for the day and just creating
'select FuncCall() as 1, FuncCall() as 2', etc. for each day of the week but that is a real performance suck.
Does anyone know of a better technique?
Update: I think the comment about PIVOT looks promising, but not quite sure because everything I can find uses an aggregate function in the PIVOT part. So if I have the following table:
create table #MyTable (OrderName nvarchar(10),DateDone date, TimeDone time, Initials nvarchar(4), PassTime nvarchar(8))
insert into #MyTable values('Order 1','2018/6/1','2:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','2:20','DEF','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','4:40','XYZ','2nd Pass')
insert into #MyTable values('Order 1','2018/6/3','5:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/4','4:00','QXY','2nd Pass')
insert into #MyTable values('Order 1','2018/6/10','2:00','ABC','1st Pass')
select * from #MyTable
pivot () -- Can't figure out what goes here since all examples I see have an aggregate function call such as AVG...
drop table #MyTable
I don't see how to get this output since I am not aggregating anything other than the initials column:
Something like this?
DECLARE #taskTable TABLE(ID INT IDENTITY,Task VARCHAR(100),TaskPerson VARCHAR(100),TaskDate DATETIME);
INSERT INTO #taskTable VALUES
('Task before June 2018','AB','2018-05-15T12:00:00')
,('Task 1','AB','2018-06-03T13:00:00')
,('Task 1','CD','2018-06-04T14:00:00')
,('Task 2','AB','2018-06-05T15:00:00')
,('Task 1','CD','2018-06-06T16:00:00')
,('Task 1','EF','2018-06-06T17:00:00')
,('Task 1','EF','2018-06-06T18:00:00')
,('Task 2','GH','2018-06-07T19:00:00')
,('Task 1','CD','2018-06-07T20:00:00')
,('After June 2018','CD','2018-07-15T21:00:00');
SELECT p.*
FROM
(
SELECT t.Task
,ROW_NUMBER() OVER(PARTITION BY t.Task,CAST(t.TaskDate AS DATE) ORDER BY t.TaskDate) AS Taskindex
,CONCAT(t.TaskPerson,' ',CONVERT(VARCHAR(5),t.TaskDate,114)) AS Content
,DAY(TaskDate) AS ColumnName
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
) tbl
PIVOT
(
MAX(Content) FOR ColumnName IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task,Taskindex;
The result
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task | Taskindex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00 | CD 20:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 2 | NULL | NULL | NULL | NULL | NULL | EF 17:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 3 | NULL | NULL | NULL | NULL | NULL | EF 18:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 2 | 1 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
The first trick is, to use the day's index (DAY()) as column name. The second trick is the ROW_NUMBER(). This will add a running index per task and day thus replicating the rows per index. Otherwise you'd get just one entry per day.
You input tables will be more complex, but I think this shows the principles...
UPDATE: So we have to get it even slicker :-D
WITH prepareData AS
(
SELECT t.Task
,t.TaskPerson
,t.TaskDate
,CONVERT(VARCHAR(10),t.TaskDate,126) AS TaskDay
,DAY(t.TaskDate) AS TaskDayIndex
,CONVERT(VARCHAR(5),t.TaskDate,114) AS TimeContent
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
)
SELECT p.*
FROM
(
SELECT t.Task
,STUFF((
SELECT ', ' + CONCAT(x.TaskPerson,' ',TimeContent)
FROM prepareData AS x
WHERE x.Task=t.Task
AND x.TaskDay= t.TaskDay
ORDER BY x.TaskDate
FOR XML PATH(''),TYPE
).value(N'.',N'nvarchar(max)'),1,2,'') AS Content
,t.TaskDayIndex
FROM prepareData t
GROUP BY t.Task, t.TaskDay,t.TaskDayIndex
) p--tbl
PIVOT
(
MAX(Content) FOR TaskDayIndex IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task;
The result
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00, EF 17:00, EF 18:00 | CD 20:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 2 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
This will use a well discussed XML trick within a correlated sub-query in order to get all common entries together as one. With this united content you can go the normal PIVOT path. The aggregate will not compute anything, as there is - for sure - just one value per cell.

Couldn't use SELECT in including Unicode characters table SQL

I used Bulk Insert to import data from a text:
BULK INSERT dbo.Infosp1 FROM 'D:\test.txt' WITH ( FIELDTERMINATOR =',', ROWTERMINATOR ='\n', DATAFILETYPE='widechar')
This is my 'D:\test.txt'(saved by Endcoding : Unicode ):
BácHồng,Giá,3
BácHồng,Hành,2
BácHồng,Lơxanh,3
BácHồng,Xả,3
BácHồng,Ngao,5
BácHồng,Bắptàu,5
CôHòaBính,Giá,5
CôHòaBính,Càrốt,2
CôHòaBính,Chanh,2
---->This is my table. Click to show image
Why using INSERT shows 0 result??????:
SELECT * FROM dbo.Infosp1 WHERE Khach = 'BácHồng'
or
SELECT * FROM dbo.Infosp1 WHERE Khach = 'CôHòaBính'
---I think that have some problems around convertion data type because save this 'D:\test.txt' by Endcoding : ANSI have some errors like :
BácH?ng, CôH?aBính. In other case, when my field didn't error type i could use SELECT to show all results. I want have a solve could make it work.
Try prepending N to unicode string literals in SQL Server:
SELECT * FROM dbo.Infosp1 WHERE Khach = N'BácHồng'
and
SELECT * FROM dbo.Infosp1 WHERE Khach = N'CôHòaBính'
Example:
create table Infosp1 (Khach nvarchar(64), FirstName nvarchar(64), SomeNumber int)
insert into Infosp1 values
(N'BácHồng',N'Giá',3)
,(N'BácHồng',N'Hành',2)
,(N'BácHồng',N'Lơxanh',3)
,(N'BácHồng',N'Xả',3)
,(N'BácHồng',N'Ngao',5)
,(N'BácHồng',N'Bắptàu',5)
,(N'CôHòaBính',N'Giá',5)
,(N'CôHòaBính',N'Càrốt',2)
,(N'CôHòaBính',N'Chanh',2)
SELECT 'WithN' as WithOrWithoutN, * FROM dbo.Infosp1 WHERE Khach = N'BácHồng'
union all
SELECT 'WithoutN',* FROM dbo.Infosp1 WHERE Khach = 'BácHồng'
union all
SELECT 'WithN', * FROM dbo.Infosp1 WHERE Khach = N'CôHòaBính'
union all
SELECT 'WithoutN',* FROM dbo.Infosp1 WHERE Khach = 'CôHòaBính'
rextester demo: http://rextester.com/WSNNX6950
returns:
+----------------+-----------+-----------+------------+
| WithOrWithoutN | Khach | FirstName | SomeNumber |
+----------------+-----------+-----------+------------+
| WithN | BácHồng | Giá | 3 |
| WithN | BácHồng | Hành | 2 |
| WithN | BácHồng | Lơxanh | 3 |
| WithN | BácHồng | Xả | 3 |
| WithN | BácHồng | Ngao | 5 |
| WithN | BácHồng | Bắptàu | 5 |
| WithN | CôHòaBính | Giá | 5 |
| WithN | CôHòaBính | Càrốt | 2 |
| WithN | CôHòaBính | Chanh | 2 |
| WithoutN | CôHòaBính | Giá | 5 |
| WithoutN | CôHòaBính | Càrốt | 2 |
| WithoutN | CôHòaBính | Chanh | 2 |
+----------------+-----------+-----------+------------+

Optimal query to fetch a cumulative sum in MySQL

What is 'correct' query to fetch a cumulative sum in MySQL?
I've a table where I keep information about files, one column list contains the size of the files in bytes. (the actual files are kept on disk somewhere)
I would like to get the cumulative file size like this:
+------------+---------+--------+----------------+
| fileInfoId | groupId | size | cumulativeSize |
+------------+---------+--------+----------------+
| 1 | 1 | 522120 | 522120 |
| 2 | 2 | 316042 | 316042 |
| 4 | 2 | 711084 | 1027126 |
| 5 | 2 | 697002 | 1724128 |
| 6 | 2 | 663425 | 2387553 |
| 7 | 2 | 739553 | 3127106 |
| 8 | 2 | 700938 | 3828044 |
| 9 | 2 | 695614 | 4523658 |
| 10 | 2 | 744204 | 5267862 |
| 11 | 2 | 609022 | 5876884 |
| ... | ... | ... | ... |
+------------+---------+--------+----------------+
20000 rows in set (19.2161 sec.)
Right now, I use the following query to get the above results
SELECT
a.fileInfoId
, a.groupId
, a.size
, SUM(b.size) AS cumulativeSize
FROM fileInfo AS a
LEFT JOIN fileInfo AS b USING(groupId)
WHERE a.fileInfoId >= b.fileInfoId
GROUP BY a.fileInfoId
ORDER BY a.groupId, a.fileInfoId
My solution is however, extremely slow. (around 19 seconds without cache).
Explain gives the following execution details
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,foreignId | PRIMARY | 4 | NULL | 14905 | |
| 1 | SIMPLE | b | ref | PRIMARY,foreignId | foreignId | 4 | db.a.foreignId | 36 | Using where |
+----+--------------+-------+-------+-------------------+-----------+---------+----------------+-------+-------------+
My question is:
How can I optimize the above query?
Update
I've updated the question as to provide the table structure and a procedure to fill the table with 20,000 records test data.
CREATE TABLE `fileInfo` (
`fileInfoId` int(10) unsigned NOT NULL AUTO_INCREMENT
, `groupId` int(10) unsigned NOT NULL
, `name` varchar(128) NOT NULL
, `size` int(10) unsigned NOT NULL
, PRIMARY KEY (`fileInfoId`)
, KEY `groupId` (`groupId`)
) ENGINE=InnoDB;
delimiter $$
DROP PROCEDURE IF EXISTS autofill$$
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE gid INT DEFAULT 0;
DECLARE nam char(20);
DECLARE siz INT DEFAULT 0;
WHILE i < 20000 DO
SET gid = FLOOR(RAND() * 250);
SET nam = CONV(FLOOR(RAND() * 10000000000000), 20, 36);
SET siz = FLOOR((RAND() * 1024 * 1024));
INSERT INTO `fileInfo` (`groupId`, `name`, `size`) VALUES(gid, nam, siz);
SET i = i + 1;
END WHILE;
END;$$
delimiter ;
CALL autofill();
About the possible duplicate question
The question linked by Forgotten Semicolon is not the same question. My question has extra column. because of this extra groupId column, the accepted answer there does not work for my problem. (maybe it can be adapted to work, but I don't know how, hence my question)
You could use a variable - it's far quicker than any join:
SELECT
id,
size,
#total := #total + size AS cumulativeSize,
FROM table, (SELECT #total:=0) AS t;
Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:
Create the table:
DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`size` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Fill with 20,000 random numbers:
DELIMITER //
DROP PROCEDURE IF EXISTS autofill//
CREATE PROCEDURE autofill()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 20000 DO
INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL autofill();
Check the row count:
SELECT COUNT(*) FROM table1;
+----------+
| COUNT(*) |
+----------+
| 20000 |
+----------+
Run the cumulative total query:
SELECT
id,
size,
#total := #total + size AS cumulativeSize
FROM table1, (SELECT #total:=0) AS t;
+-------+------+----------------+
| id | size | cumulativeSize |
+-------+------+----------------+
| 1 | 226 | 226 |
| 2 | 869 | 1095 |
| 3 | 668 | 1763 |
| 4 | 733 | 2496 |
...
| 19997 | 966 | 10004741 |
| 19998 | 522 | 10005263 |
| 19999 | 713 | 10005976 |
| 20000 | 0 | 10005976 |
+-------+------+----------------+
20000 rows in set (0.07 sec)
UPDATE
I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.
I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.
It's beautifully simple, elegant, and very quick:
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId:=0, #cs:=0) AS vars
ORDER BY groupId
) AS tmp;
You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.
Here's a simple test case:
INSERT INTO `fileInfo` VALUES
( 1, 3, 'name0', '10'),
( 5, 3, 'name1', '10'),
( 7, 3, 'name2', '10'),
( 8, 1, 'name3', '10'),
( 9, 1, 'name4', '10'),
(10, 2, 'name5', '10'),
(12, 4, 'name6', '10'),
(20, 4, 'name7', '10'),
(21, 4, 'name8', '10'),
(25, 5, 'name9', '10');
SELECT fileInfoId, groupId, name, size, cumulativeSize
FROM (
SELECT
fileInfoId,
groupId,
name,
size,
#cs := IF(#prev_groupId = groupId, #cs+size, size) AS cumulativeSize,
#prev_groupId := groupId AS prev_groupId
FROM fileInfo, (SELECT #prev_groupId := 0, #cs := 0) AS vars
ORDER BY groupId
) AS tmp;
+------------+---------+-------+------+----------------+
| fileInfoId | groupId | name | size | cumulativeSize |
+------------+---------+-------+------+----------------+
| 8 | 1 | name3 | 10 | 10 |
| 9 | 1 | name4 | 10 | 20 |
| 10 | 2 | name5 | 10 | 10 |
| 1 | 3 | name0 | 10 | 10 |
| 5 | 3 | name1 | 10 | 20 |
| 7 | 3 | name2 | 10 | 30 |
| 12 | 4 | name6 | 10 | 10 |
| 20 | 4 | name7 | 10 | 20 |
| 21 | 4 | name8 | 10 | 30 |
| 25 | 5 | name9 | 10 | 10 |
+------------+---------+-------+------+----------------+
Here's a sample of the last few rows from a 20,000 row table:
| 19481 | 248 | 8CSLJX22RCO | 1037469 | 51270389 |
| 19486 | 248 | 1IYGJ1UVCQE | 937150 | 52207539 |
| 19817 | 248 | 3FBU3EUSE1G | 616614 | 52824153 |
| 19871 | 248 | 4N19QB7PYT | 153031 | 52977184 |
| 132 | 249 | 3NP9UGMTRTD | 828073 | 828073 |
| 275 | 249 | 86RJM39K72K | 860323 | 1688396 |
| 802 | 249 | 16Z9XADLBFI | 623030 | 2311426 |
...
| 19661 | 249 | ADZXKQUI0O3 | 837213 | 39856277 |
| 19870 | 249 | 9AVRTI3QK6I | 331342 | 40187619 |
| 19972 | 249 | 1MTAEE3LLEM | 1027714 | 41215333 |
+------------+---------+-------------+---------+----------------+
20000 rows in set (0.31 sec)
I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.
Add a covering compound index that includes both primaryId and foreignId.