What should I use for a default value for a json column in SQL Server? - sql

Background: I have been tasked with grafting some simple key/value data pairs to an existing database table in SQL Server (in Azure). The nature of the KvP data is simply some extended data that may or may not exist for all rows.
Further, the data is somewhat freeform as not all rows will have the same key/value pairs. This is very much bolt-on-data that (in my opinion) doesn't merit the complexity of a related table. Instead, I've decided to try using JSON to hold the data and so, to get my feet wet I've tried the following:
First, I created a new column on my table thusly:
ALTER TABLE [TheTable]
ADD [ExtendedData] NVARCHAR(512) NOT NULL DEFAULT('')
Second, I picked a few records at random and added some additional JSON in the newly created column, for example:
{ "Color":"Red", "Size":"Big", "Shape":"Round" }
Finally, I expected to be able to query this extra data, by using the JSON_VALUE function in SQL, like this:
SELECT
Field1,
Field2,
JSON_VALUE(ExtendedData, '$.Color') AS Color,
JSON_VALUE(ExtendedData, '$.Size') AS Size
FROM
MyTable
I expected my output to be a result set with 4 columns (Field1, Field2, Color, Size) where some (most) of the Color and Size fields were NULL (because the majority of rows simply do not have any json data) - but instead I got an error complaining
JSON text is not properly formatted
This led me to suspect that ALL of my ExtendedData should be properly formatted JSON for my new query to work, and so replacing my default column value of '' (an empty string) with '{}' seemingly fixes my problem.
But I am left wondering if this is the correct solution. Should I indeed default my new ExtendedData column to use an empty json object '{}', or is it safe to use an empty string '' and I am missing something syntactically in my query?

Without any evidence to the contrary, and working within the rules established for this database, I've decided to use a default value of '{}' for my JSON data.
If anyone else does this, be careful as some API's / parsers / IDE's might not like the string '{}' and require you to escape the sequence as '{{}}'.

Related

Escaping JSON special characters with JSON_QUERY not working

A project I'm working on involves storing a string of data in a table column. The table will have other columns relevant to the records. We decided to store the string data column using JSON.
From the table, a view will parse the JSON column into separate columns. The view will also have columns derived from the other main table columns. The data from the view is then used to populate parts of a document through SSRS.
When loading data into the main table, I need to utilize separate tables for deriving the other column values and the JSON column. I decided to use common table expressions for this. At the end of the query, I bring together the derived columns from the different common table expressions, including the JSON column, and insert them into the main table.
I had it almost done until I realized that when I use FOR JSON to create the JSON column, it escapes special characters. I did some research and have been trying to use the JSON_QUERY function to get around this but it's not working. Here is a simplification of the problem:
WITH Table1
(
First_Name_JSON
)
As
(
SELECT 'Tim/' As First_Name
FOR JSON PATH
)
SELECT JSON_QUERY(Table1.First_Name_JSON) as first_name
FROM Table1
FOR JSON PATH
Here is the output:
[{"first_name":[{"First_Name":"Tim\/"}]}]
Why is it still escaping? The documentation shows that passing a column that was created by a FOR JSON should make the JSON_QUERY function return it without escaped characters.
I know that this works:
SELECT JSON_QUERY('{"Firt_Name": "Tim/"}') as first_name
FOR JSON PATH
Output:
[{"first_name":{"Firt_Name": "Tim/"}}]
However, I need to be able to pass a column that's holding JSON data already because it's pretty long logic with many columns. Using FOR JSON is ideal for making changes versus hard coding the JSON format around each column.
I must be missing something. Thanks for any help.
It's quite simple:
{"Firt_Name": "Tim/"} is valid JSON, so JSON_QUERY can return it as is. Tim/ is not valid so needs escaping first.
Quote from the docs:
Using JSON_QUERY with FOR JSON
JSON_QUERY returns a valid JSON fragment. As a result, FOR JSON doesn't escape special characters in the JSON_QUERY return value.
If you're returning results with FOR JSON, and you're including data that's already in JSON format (in a column or as the result of an expression), wrap the JSON data with JSON_QUERY without the path parameter.
Given your use case, is it not possible to pass through the JSON to where you need it and un-escape it there? OPENJSON and JSON_VALUE are capable of this.

Dynamic type cast in select query

I have totally rewritten my question because of inaccurate description of the problem!
We have to store a lot of different informations about a specific region. For this we need a flexible data structure which does not limit the possibilities for the user.
So we've create a key-value table for this additional data which is described through a meta table which contains the datatype of the value.
We already use this information for queries over our rest api. We then automatically wrap the requested field with into a cast.
SQL Fiddle
We return this data together with information form other tables as a JSON object. We convert the corresponding rows from the data-table with array_agg and json_object into a JSON object:
...
CASE
WHEN count(prop.name) = 0 THEN '{}'::json
ELSE json_object(array_agg(prop.name), array_agg(prop.value))
END AS data
...
This works very well. Now the problem we have is if we store data like a floating point number into this field, we then get returned a string representation of this number:
e.g. 5.231 returns as "5.231"
Now we would like to CAST this number during our select statement into the right data-format so the JSON result would be correctly formatted. We have all the information we need so we tried following:
SELECT
json_object(array_agg(data.name),
-- here I cast the value into the right datatype!
-- results in an error
array_agg(CAST(value AS datatype))) AS data
FROM data
JOIN (
SELECT name, datatype
FROM meta)
AS info
ON info.name = data.name
The error message is following:
ERROR: type "datatype" does not exist
LINE 3: array_agg(CAST(value AS datatype))) AS data
^
Query failed
PostgreSQL said: type "datatype" does not exist
So is it possible to dynamically cast the text of the data_type column to a postgresql type to return a well-formatted JSON object?
First, that's a terrible abuse of SQL, and ought to be avoided in practically all scenarios. If you have a scenario where this is legitimate, you probably already know your RDBMS so intimately, that you're writing custom indexing plugins, and wouldn't even think of asking this question...
If you tell us what you're actually trying to do, there's about a 99.9% chance we can tell you a better way to do it.
Now with that disclaimer aside:
This is not possible, without using dynamic SQL. With a sufficiently recent version of PostgreSQL, you can accomplish this with the use of 'EXECUTE IMMEDIATE', which you can read about in the manual. It basically boils down to using EXEC.
Note, however, that even using this method, the result for every row fetched in the same query must have the same data type. In other words, you can't expect that row 1 will have a data type of VARCHAR, and row 2 will have INT. That is completely impossible.
The problem you have is, that json_object does create an object out of a string array for the keys and another string array for the values. So if you feed your JSON objects into this method, it will always return an error.
So the first problem is, that you have to use a JSON or JSONB column for the values. Or you can convert the values from string to json with to_json().
Now the second problem is that you need to use another method to create your json object because you want to feed it with a string array for the keys and a json-object array for the values. For this there is a method called json_object_agg.
Then your output should be like the one you expected! Here the full query:
SELECT
json_object_agg(data.name, to_json(data.value)) AS data
FROM data

Split multiple points in text format and switch coordinates in postgres column

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

Is is possible to retrieve the maximum declared size of a varchar(n) column?

Suppose you have a DB table like this:
Table t
....
column_a integer
column_b varchar(255)
....
Now, I want to store a string that is composed by a list of names on t.column_b, with the following format (separated by commas):
Word A, Word B, Word C...
The problem is, it might be the case that the string is larger than 255 characters and in my application logic I don't want to blindly trim to 255, but instead store the maximum number of words possible, eliminating the last word that exceeds the size. Also, I want to develop in such a way that if the column changes size, I don't want to change my application. Is it possible to write a SQL query that retrieves the declared size of a column? Or perhaps, I should use another column type?
If relevant, I am using Informix.
Thanks in advance.
Informix truncates blindly at the limit unless your database is MODE ANSI.
The DBI defines metadata attributes for columns and DBD::Informix implements them.
For a statement handle, $sth, you can use:
$sth->{PRECISION}->[0]
to get the precision (length) of the first column in the output.
See perldoc DBI under 'Statement Handle Attributes'.
If you need to know the type information for some column, write a SELECT statement, prepare it, then analyze the statement handle.
Because this is defined by DBI, you will get the same behaviour with any driver (DBD::YourDBMS).