SELECT dynamic json stored in SQL as part the select query - sql

I know similar questions has been asked multiple times, however my scenario seems to be a bit different.
My database table is like this:
App ID | ID | JSONData | URL | CreatedOn
----------+-----------+-------------------+--------------+-----------------
5b5cd8 | 1 | {"F":"B", "S":"D"}| http://local | Mar 19 2018 13:04
5b5cd8 | 2 | {"F":"C", "S":"K"}| http://remote| Mar 29 2018 09:34
6b9df0 | 3 | {"T":"N", "D":"S"}| http://site | Apr 04 2018 16:12
App ID column can have varying values, however the structure of JSONData is (*supposed to be) same for the same App ID.
Is there anyway I can split the JSONData data and get a result like this?
App ID | ID | F | S | URL | CreatedOn
----------+-----------+-----+-----+--------------+-------------------------
5b5cd8 | 1 | B | D | http://local | Mar 19 2018 13:04
5b5cd8 | 2 | C | K | http://remote| Mar 29 2018 09:34
For the next App ID it is like this
App ID | ID | T | D | URL | CreatedOn
----------+-----------+-----+-----+--------------+-------------------------
6b9df0 | 3 | N | S | http://site | Apr 04 2018 16:12
Note: The data in JSONData field will mostly be one level deep i.e. all the data will be string and no further objects.
The solution I found most of the times like this, was either using static JSON key names to split, or creating temp table that is going to cause performance issues.

You were told already, that the column names of a result set must be known in advance.
The only workaround was dynamic SQL (creation of the statement as string and EXEC() to get its result). But this has some major draw-backs (and some advantages)...
You might go with something along this (needs SQL-Server 2016+):
A mockup-table
DECLARE #tbl TABLE(AppID VARCHAR(100),ID INT,JSONData NVARCHAR(MAX));
INSERT INTO #tbl VALUES
('5b5cd8',1,N'{"F":"B", "S":"D"}')
,('5b5cd8',2,N'{"F":"C", "S":"K"}')
,('6b9df0',3,N'{"T":"N", "D":"S"}');
--This query fetches the values using JSON_VALUE
--You'd need to create one statement for each possible list of columns
--Apply a WHERE to filter for appropriate rows
SELECT t.AppID
,t.ID
,JSON_VALUE(t.JSONData,'$.F') AS F
,JSON_VALUE(t.JSONData,'$.S') AS S
FROM #tbl t
WHERE t.AppID='5b5cd8'
--You might include all possible columns
--This works without a filter, but will return a lot of NULLs
SELECT t.AppID
,t.ID
,JSON_VALUE(t.JSONData,'$.F') AS F
,JSON_VALUE(t.JSONData,'$.S') AS S
,JSON_VALUE(t.JSONData,'$.T') AS T
,JSON_VALUE(t.JSONData,'$.D') AS D
FROM #tbl t
--A bit cleaner / better to read was OPENJSON() in connection with a WITH-clause
SELECT t.AppID
,t.ID
,JsonColumns.*
FROM #tbl t
CROSS APPLY OPENJSON(t.JSONData) WITH(F CHAR(1)
,S CHAR(1)
,T CHAR(1)
,D CHAR(1)) JsonColumns
My suggestion: Create the last one as a VIEW or (probably better) an iTVF and use dedicated statements against this, one per each type of structure.

Related

Selecting/updating single field from duplicates

This is my Sql:
SELECT * FROM parameter1
WHERE parameter1_ID IN (34,11)
this results in over 400 results (which is normal), most results have multiple values in VALUE column and some of the results have duplicate and multiple values in the VALUE column. I want to get rid of the duplicate VALUEs within their respective results, what can I add to this query to do so?
Using MS SQL studio
example result from query:
ID VALUE
1 100,200
2 100,100,200
3 200,200,300
4 200,200,300
result I want
ID VALUE
1 100,200
2 100,200
3 200,300
4 200,300
As many others have said, the way you are storing data is very bad practice and will almost certainly cause many more headaches in the future.
That said, if you are actually unable to change this there are still options for you. In SQL Server 2017 you have the benefit of string_split and string_agg:
declare #t table(ID int, val varchar(40))
insert into #t values
(1,'100,200')
,(2,'100,100,200')
,(3,'200,200,300')
,(4,'200,200,300');
with s as
(
select distinct t.ID
,s.[value] as val
from #t as t
cross apply string_split(t.val,',') as s
)
select s.ID
,string_agg(s.val,',') as val
from s
group by s.ID;
Output
+----+---------+
| ID | val |
+----+---------+
| 1 | 100,200 |
| 2 | 100,200 |
| 3 | 200,300 |
| 4 | 200,300 |
+----+---------+

Recursive self join over file data

I know there are many questions about recursive self joins, but they're mostly in a hierarchical data structure as follows:
ID | Value | Parent id
-----------------------------
But I was wondering if there was a way to do this in a specific case that I have where I don't necessarily have a parent id. My data will look like this when I initially load the file.
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
Essentially, its a CSV file where each row in the table is a line in the file. Lines 1 and 5 identify an object header and lines 3, 4, 7, and 8 identify the rows belonging to the object. The object header lines can have only 40 attributes which is why the object is broken up across multiple sections in the CSV file.
What I'd like to do is take the table, separate out the record # column, and join it with itself multiple times so it achieves something like this:
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,5,6,7,8,...
2 | *,record,abc,efg,hij,lmn,opq,rst
3 | ,,1,x,y,z,t,u,v,...
4 | ,,2,q,r,s,l,m,n,...
I know its probably possible, I'm just not sure where to start. My initial idea was to create a view that separates out the first and second columns in a view, and use the view as a way of joining in a repeated fashion on those two columns. However, I have some problems:
I don't know how many sections will occur in the file for the same
object
The file can contain other objects as well so joining on the first two columns would be problematic if you have something like
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
9 | ,4,Data,1,2,3,4,...
10 | *,record,lmn,opq,rst,...
11 | ,,1,t,u,v,...
In the above case, my plan could join rows from the Data object in row 9 with the first rows of the Formula object by matching the record value of 1.
UPDATE
I know this is somewhat confusing. I tried doing this with C# a while back, but I had to basically write a recursive decent parser to parse the specific file format and it simply took to long because I had to get it in the database afterwards and it was too much for entity framework. It was taking hours just to convert one file since these files are excessively large.
Either way, #Nolan Shang has the closest result to what I want. The only difference is this (sorry for the bad formatting):
+----+------------+------------------------------------------+-----------------------+
| ID | header | x | value
|
+----+------------+------------------------------------------+-----------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 |3,Formula,1,2,3,4,5,6,7,8 |
| 2 | ,, | ,1,x,y,z,t,u,v | ,1,x,y,z,t,u,v |
| 3 | ,, | ,2,q,r,s,l,m,n | ,2,q,r,s,l,m,n |
| 4 | *,record, | ,abc,efg,hij,lmn,opq,rst |*,record,abc,efg,hij,lmn,opq,rst |
| 5 | ,4, | ,Data,1,2,3,4 |,4,Data,1,2,3,4 |
| 6 | *,record, | ,lmn,opq,rst | ,lmn,opq,rst |
| 7 | ,, | ,1,t,u,v | ,1,t,u,v |
+----+------------+------------------------------------------+-----------------------------------------------+
I agree that it would be better to export this to a scripting language and do it there. This will be a lot of work in TSQL.
You've intimated that there are other possible scenarios you haven't shown, so I obviously can't give a comprehensive solution. I'm guessing this isn't something you need to do quickly on a repeated basis. More of a one-time transformation, so performance isn't an issue.
One approach would be to do a LEFT JOIN to a hard-coded table of the possible identifying sub-strings like:
3,Formula,
*,record,
,,1,
,,2,
,4,Data,
Looks like it pretty much has to be human-selected and hard-coded because I can't find a reliable pattern that can be used to SELECT only these sub-strings.
Then you SELECT from this artificially-created table (or derived table, or CTE) and LEFT JOIN to your actual table with a LIKE to get all the rows that use each of these values as their starting substring, strip out the starting characters to get the rest of the string, and use the STUFF..FOR XML trick to build the desired Line.
How you get the ID column depends on what you want, for instance in your second example, I don't know what ID you want for the ,4,Data,... line. Do you want 5 because that's the next number in the results, or do you want 9 because that's the ID of the first occurrance of that sub-string? Code accordingly. If you want 5 it's a ROW_NUMBER(). If you want 9, you can add an ID column to the artificial table you created at the start of this approach.
BTW, there's really nothing recursive about what you need done, so if you're still thinking in those terms, now would be a good time to stop. This is more of a "Group Concatenation" problem.
Here is a sample, but has some different with you need.
It is because I use the value the second comma as group header, so the ,,1 and ,,2 will be treated as same group, if you can use a parent id to indicated a group will be better
DECLARE #testdata TABLE(ID int,Line varchar(8000))
INSERT INTO #testdata
SELECT 1,'3,Formula,1,2,3,4,...' UNION ALL
SELECT 2,'*,record,abc,efg,hij,...' UNION ALL
SELECT 3,',,1,x,y,z,...' UNION ALL
SELECT 4,',,2,q,r,s,...' UNION ALL
SELECT 5,'3,Formula,5,6,7,8,...' UNION ALL
SELECT 6,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 7,',,1,t,u,v,...' UNION ALL
SELECT 8,',,2,l,m,n,...' UNION ALL
SELECT 9,',4,Data,1,2,3,4,...' UNION ALL
SELECT 10,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 11,',,1,t,u,v,...'
;WITH t AS(
SELECT *,REPLACE(SUBSTRING(t.Line,LEN(c.header)+1,LEN(t.Line)),',...','') AS data
FROM #testdata AS t
CROSS APPLY(VALUES(LEFT(t.Line,CHARINDEX(',',t.Line, CHARINDEX(',',t.Line)+1 )))) c(header)
)
SELECT MIN(ID) AS ID,t.header,c.x,t.header+STUFF(c.x,1,1,'') AS value
FROM t
OUTER APPLY(SELECT ','+tb.data FROM t AS tb WHERE tb.header=t.header FOR XML PATH('') ) c(x)
GROUP BY t.header,c.x
+----+------------+------------------------------------------+-----------------------------------------------+
| ID | header | x | value |
+----+------------+------------------------------------------+-----------------------------------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 | 3,Formula,1,2,3,4,5,6,7,8 |
| 3 | ,, | ,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v | ,,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v |
| 2 | *,record, | ,abc,efg,hij,lmn,opq,rst,lmn,opq,rst | *,record,abc,efg,hij,lmn,opq,rst,lmn,opq,rst |
| 9 | ,4, | ,Data,1,2,3,4 | ,4,Data,1,2,3,4 |
+----+------------+------------------------------------------+-----------------------------------------------+

SQL Query: Search with list of tuples

I have a following table (simplified version) in SQLServer.
Table Events
-----------------------------------------------------------
| Room | User | Entered | Exited |
-----------------------------------------------------------
| A | Jim | 2014-10-10T09:00:00 | 2014-10-10T09:10:00 |
| B | Jim | 2014-10-10T09:11:00 | 2014-10-10T09:22:30 |
| A | Jill | 2014-10-10T09:00:00 | NULL |
| C | Jack | 2014-10-10T09:45:00 | 2014-10-10T10:00:00 |
| A | Jack | 2014-10-10T10:01:00 | NULL |
.
.
.
I need to create a query that returns person's whereabouts in given timestamps.
For an example: Where was (Jim at 2014-10-09T09:05:00), (Jim at 2014-10-10T09:01:00), (Jill at 2014-10-10T09:10:00), ...
The result set must contain the given User and Timestamp as well as the found room (if any).
------------------------------------------
| User | Timestamp | WasInRoom |
------------------------------------------
| Jim | 2014-10-09T09:05:00 | NULL |
| Jim | 2014-10-09T09:01:00 | A |
| Jim | 2014-10-10T09:10:00 | A |
The number of User-Timestamp tuples can be > 10 000.
The current implementation retrieves all records from Events table and does the search in Java code. I am hoping that I could push this logic to SQL. But how?
I am using MyBatis framework to create SQL queries so the tuples can be inlined to the query.
The basic query is:
select e.*
from events e
where e.user = 'Jim' and '2014-10-09T09:05:00' >= e.entered and ('2014-10-09T09:05:00' <= e.exited or e.exited is NULL) or
e.user = 'Jill' and '2014-10-10T09:10:00 >= e.entered and ('2014-10-10T09:10:00' <= e.exited or e.exited is NULL) or
. . .;
SQL Server can handle ridiculously large queries, so you can continue in this vein. However, if you have the name/time values in a table already (or it is the result of a query), then use a join:
select ut.*, t.*
from usertimes ut left join
events e
on e.user = ut.user and
ut.thetime >= et.entered and (ut.thetime <= exited or ut.exited is null);
Note the use of a left join here. It ensures that all the original rows are in the result set, even when there are no matches.
Answers from Jonas and Gordon got me on track, I think.
Here is query that seems to do the job:
CREATE TABLE #SEARCH_PARAMETERS(User VARCHAR(16), "Timestamp" DATETIME)
INSERT INTO #SEARCH_PARAMETERS(User, "Timestamp")
VALUES
('Jim', '2014-10-09T09:05:00'),
('Jim', '2014-10-10T09:01:00'),
('Jill', '2014-10-10T09:10:00')
SELECT #SEARCH_PARAMETERS.*, Events.Room FROM #SEARCH_PARAMETERS
LEFT JOIN Events
ON #SEARCH_PARAMETERS.User = Events.User AND
#SEARCH_PARAMETERS."Timestamp" > Events.Entered AND
(Events.Exited IS NULL OR Events.Exited > #SEARCH_PARAMETERS."Timestamp"
DROP TABLE #SEARCH_PARAMETERS
By declaring a table valued parameter type for the (user, timestamp) tuples, it should be simple to write a table valued user defined function which returns the desired result by joining the parameter table and the Events table. See http://msdn.microsoft.com/en-us/library/bb510489.aspx
Since you are using MyBatis it may be easier to just generate a table variable for the tuples inline in the query and join with that.

Trouble with Pivot Tables

I know there are a lot of Pivot table examples on the internet, however I'm new to SQL and I'm having a bit of trouble as all the examples seem to be pertaining to aggregate functions.
Table 1:
|Date | Tag |Value |
|06/10 2:00pm | A | 65 |
|06/10 2:00pm | B | 44 |
|06/10 2:00pm | C | 33 |
|06/10 2:02pm | A | 12 |
|06/10 2:02pm | B | 55 |
|06/10 2:02pm | C | 21 |
....
|06/10 1:58am | A | 23 |
What I would like it to look like is (table 2):
|Date | A | B | C |
|06/10 2:00pm| 65 | 44 | 33 |
|06/10 2:02pm| 12 | 55 | 21 |
.....
|06/10 1:58am| 23 | etc. | etc. |
(sorry for the format)
Some problems that encounter (doesn't work with code I have found online)
I'd like to run this as a stored procedure (rather a SQL job), every 2 minutes so that this data from table 1 is constantly being moved to table 2. However I think I would need to alter the date every single time? (thats the syntax I've seen)
The pivot table itself seems simple on its own, but the datetime has been causing me grief.
Any code snipets or links would be greatly appreciated.
Thanks.
The pivot itself seems simple:
select *
from table1
pivot (min (Value) for Tag in ([A], [B], [C])) p
As for stored procedure, I would use last date saved in table2 as a filter for table1, excluding incomplete groups (I'm assuming that there will be, at some point, all three tags present, and that only last date can be incomplete. If not, you will need special processing for last date to update/insert a row).
So, in code:
create proc InsertPivotedTags
as
set NoCount ON
set XACT_ABORT ON
begin transaction
declare #startDate datetime
-- Last date from Table2 or start of time
select #startDate = isnull (max ([Date]), '1753-01-01')
from Table2
insert into Table2
select *
from Table1
pivot (min (Value) for Tag in ([A], [B], [C])) p
where [Date] > #startDate
-- exclude incomplete groups
and a is not null
and b is not null
and c is not null
commit transaction
If groups can be incomplete you should remove exlude filter and add a delete statement that removes last date in case it is incomplete, and adjusts #startDate to three milliseconds earlier to get the same rows again, but now in more filled up state.

Access 2007 select first value of query results

I am running into a rather annoying thingy in Access (2007) and I am not sure if this is a feature or if I am asking for the impossible.
Although the actual database structure is more complex, my problem boils down to this:
I have a table with data about Units for specific years. This data comes from different sources and might overlap.
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
...
Now I would like the user to select certain sources, order them by priority and then extract one data value for each year.
For example, if the user selects source 1, 2 and 3 and orders them by (3, 1, 2), then I would like the following result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
I am able to order the initial table, based on a specific order. I do this with the following query
SELECT Unit, IYR, X1, Source
FROM TestTable
WHERE Source In (1,2,3)
ORDER BY Unit, IYR,
IIf(Source=3,1,IIf(Source=1,2,IIf(Source=2,3,4)))
This gives me the following intermediate result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2011 | 90 | 1 |
Next step is to only get the first value of each year. I was thinking to use the following query:
SELECT X.Unit, X.IYR, first(X.X1) as FirstX1
FROM (...) AS X
GROUP BY X.Unit, X.IYR
Where (…) is the above query.
Now Access goes bananas. Whatever order I give to the intermediate results, the result of this query is.
Unit | IYR | X1 |
--------------------
A | 2009 | 55 |
A | 2010 | 80 |
A | 2011 | 90 |
In other words, for year 2010 it shows the value of source 1 instead of 3. It seems that Access does not care about the ordering of the nested query when it applies the FIRST() function and sticks to the original ordering of the data.
Is this a feature of Access or is there a different way of achieving the desired results?
Ps: Next step would be to use a self join to add the source column to the results again, but I first need to resolve above problem.
Rather than use first it may be better to determine the MIN Priority and then join back e.g.
SELECT
t.UNIT,
t.IYR,
t.X1,
t.Source ,
t.PrioritySource
FROM
(SELECT
Unit,
IYR,
X1,
Source,
SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
) as t
INNER JOIN
(SELECT
Unit,
IYR,
MIN(SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3)) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
GROUP BY
Unit,
IYR ) as MinPriortiy
ON t.Unit = MinPriortiy.Unit and
t.IYR = MinPriortiy.IYR and
t.PrioritySource = MinPriortiy.PrioritySource
which will produce this result (Note I include Source and priority source for demonstration purposes only)
UNIT | IYR | X1 | Source | PrioritySource
----------------------------------------------
A | 2009 | 55 | 1 | 2
A | 2010 | 150 | 3 | 1
A | 2011 | 90 | 1 | 2
Note the first subquery is to handle the fact that Access won't let you join on a Switch
Yes, FIRST() does use an arbitrary ordering. From the Access Help:
These functions return the value of a specified field in the first or
last record, respectively, of the result set returned by a query. If
the query does not include an ORDER BY clause, the values returned by
these functions will be arbitrary because records are usually returned
in no particular order.
I don't know whether FROM (...) AS X means you are using an ORDER BY inline (assuming that is actually possible) or if you are using a VIEW ('stored Query object') here but either way I assume the ORDER BY is being disregarded (because an ORDER BY should only apply to the final result).
The alternative is to use MIN() (or possibly MAX()).
This is the most concise way I have found to write such queries in Access that require pulling back all columns that correspond to the first row in a group of records that are ordered in a particular way.
First, I added a UniqueID to your table. In this case, it's just an AutoNumber field. You may already have a unique value in your table, in which case you can use that.
This will choose the row with a Source 3 first, then Source 1, then Source 2. If there is a tie, it picks the one with the higher X1 value. If there is a further tie, it is broken by the UniqueID value:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=
(SELECT TOP 1 [UniqueID] FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, UniqueID)
This yields:
Unit IYR X1 Source UniqueID
A 2009 55 1 1
A 2010 150 3 4
A 2011 90 1 5
I recommend (1) you create an index on the IYR field -- this will dramatically increase your performance for this type of query, and (2) if you have a lot (>~100K) records, this isn't the best choice. I find it works quite well for tables in the 1-70K range. For larger datasets, I like to use my GroupIncrement function to partition each group (similar to SQL Server's ROW_NUMBER() OVER statement).
The Choose() function is a VBA function and may not be clear here. In your case, it sounds like there is some interactivity required. For that, you could create a second table called "Choices", like so:
Rank Choice
1 3
2 1
3 2
Then, you could substitute the following:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=(SELECT TOP 1 [UniqueID] FROM
[TestTable] t2 INNER JOIN [Choices] c
ON t2.Source=c.Choice
WHERE t.IYR=t2.IYR ORDER BY c.[Rank], t2.X1 DESC, t2.UniqueID);
Indexing Source on TestTable and Choice on the Choices table may be helpful here, too, depending on the number of choices required.
Q:
Can you get this to work without the need for surrogate key? For
example what if the unique key is the composite of
{Unit,IYR,X1,Source}
A:
If you have a compound key, you can do it like this-- however I think that if you have a large dataset, it will totally kill the performance of the query. It may help to index all four columns, but I can't say for sure because I don't regularly use this method.
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.Unit & t.IYR & t.X1 & t.Source =
(SELECT TOP 1 Unit & IYR & X1 & Source FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, Unit, IYR)
In certain cases, you may have to coalesce some of the individual parts of the key as follows (though Access generally will coalesce values automatically):
t.Unit & CStr(t.IYR) & CStr(t.X1) & CStr(t.Source)
You could also use a query in your FROM statements instead of the actual table. The query itself would build a composite of the four fields used in the key, and then you'd use the new key name in the WHERE clause of the top SELECT statement, and in the SELECT TOP 1 [key] of the subquery.
In general, though, I will either: (a) create a new table with an AutoNumber field, (b) add an AutoNumber field, (c) add an integer and populate it with a unique number using VBA - this is useful when you get a MaxLocks error when trying to add an AutoNumber, or (d) use an already indexed unique key.