I've checked stackoverflow as well as google but could not find any solution. What would be the proper way to remove duplicate entries from an nvarchar field that contains json string in SQL Server? For my case, let say I have nvarchar 'People' field on my table which contains the following data.
[
{
"name":"Jon",
"age": 30
},
{
"name":"Bob",
"age": 30
},
{
"name":"Nick",
"age": 40
},
{
"name":"Bob",
"age": 40
}
]
I need to remove the entries which has duplicate names which would be the 'Bob' in that case. So after executing the query I am expecting this result
[
{
"name":"Jon",
"age": 30
},
{
"name":"Bob",
"age": 30
},
{
"name":"Nick",
"age": 40
}
]
What would be the proper sql query to do that? Actually I am trying to achieve no duplicate names rather than no duplicate entries. That's why 2 Bobs have different ages in the above example. More specifically I need to keep only first items among duplicates for this example the first Bob with age 30.Using ROW_NUMBER() and Partition By would be solution but it breaks the existing order.I need to achieve this without breaking the existing order. So I have the table with Id and PeopleJson fields. The following query would achieve what I want to achieve but it breaks the order in PeopleJson
SELECT Id, (
SELECT [Name],[Age] FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY (select NULL)) as row_num
FROM OPENJSON(PeopleJson) WITH ([Name] NVARCHAR(1000), [Age] int)
) t WHERE t.row_num = 1
FOR JSON PATH, INCLUDE_NULL_VALUES
) as [People]
From [TestTable]
WITH cte AS (
SELECT DISTINCT value
FROM OPENJSON(#json)
)
SELECT * FROM cte
DECLARE #json NVARCHAR(MAX) = N'[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, {"id": 1, "name": "Alice"}]'
SELECT (
SELECT *
FROM (
SELECT DISTINCT value
FROM OPENJSON(#json)
) cte
FOR JSON AUTO
) AS result
[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]
Can you provide some more information, please? I understand what you're trying to do, but I have some questions.
Each Bob has a different age, so those aren't duplicate entries, only duplicate names. Either way, it would be hard to decide which entry to remove if each one is different.
You can achieve no duplicate "Bob" entries, but the issue comes in when deciding which Bob record you want to keep.
Related
I'm using Postgres (latest) with node (latest) PG (latest). Some endpoint is receiving json which looks like:
{
"id": 12345,
"total": 123.45,
"items": [
{
"name": "blue shirt",
"url": "someurl"
},
{
"name": "red shirt",
"url": "someurl"
}
]
}
So I'm storing this in two tables:
CREATE TABLE orders (
id INT NOT NULL,
total NUMERIC(10, 2) DEFAULT 0 NOT NULL,
PRIMARY KEY (id)
);
CREATE INDEX index_orders_id ON orders(id);
CREATE TABLE items (
id BIGSERIAL NOT NULL,
order_id INT NOT NULL,
name VARCHAR(128) NOT NULL,
url VARCHAR(128) DEFAULT '' NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);
CREATE INDEX index_items_id ON items(id);
The items table has a FK of order_id to relate the id of the order to its respective items.
Now, the issue is I almost always need to fetch the order along with the items.
How do I get an output similar to my input json in one query?
I know it can be done in two queries, but this pattern will be all over the place and needs to be efficient. My last resort would be to store the items as JSONB column directly in the orders table, but then if I need to query on the items or do joins with them it won't be as easy.
One of many ways:
SELECT jsonb_pretty(
to_jsonb(o.*) -- taking whole row
|| (SELECT jsonb_build_object('items', jsonb_agg(i))
FROM (
SELECT name, url -- picking columns
FROM items i
WHERE i.order_id = o.id
) i
)
)
FROM orders o
WHERE o.id = 12345;
This returns formatted text similar to the displayed input. (But keys are sorted, so 'total' comes after 'items'.)
If an order has no items, you get "items": null.
For a jsonb value, strip the jsonb_pretty() wrapper.
I chose jsonb for its additional functionality - like the jsonb || jsonb → jsonb operator and the jsonb_pretty() function.
Related:
Return multiple columns of the same row as JSON array of objects
If you want a json value instead, you can cast the jsonb directly (without format) or the formatted text (with format). Or build a json value with rudimentary formatting directly (faster):
SELECT row_to_json(sub, true)
FROM (
SELECT o.*
, (SELECT json_agg(i)
FROM (
SELECT name, url -- pick columns to report
FROM items i
WHERE i.order_id = o.id
) i
) AS items
FROM orders o
WHERE o.id = 12345
) sub;
db<>fiddle here
It all depends on what you need exactly.
Aside:
Consider type text (or varchar) instead of the seemingly arbitrary varchar(128). See:
Should I add an arbitrary length limit to VARCHAR columns?
Say I have this json in a SQL Column called MyJson in a table called StoreTeams
{
MyTeams: [
{
id: 1
},
{
id: 2
},
{
id: 3
}
]
}
I want to take all these id's and then do a inner join against another table.
User Table
- id <pk>
- firstName
- lastName
I am not sure how I would do this, I would be probably running this code via ado.net.
You can use openjson(). You don't specify the exact result you want, but the logic is:
select *
from mytable t
cross apply openjson(t.myjson, '$.MyTeams') with (id int '$.id') as x
inner join users u on u.id = x.id
postgres 10.3
I have about 1000 rows inside a table called sites
If I query like this
SELECT id, name from sites;
I will get the 1000 rows.
I also have another table called jsonindexdocument with a single row where the id is 1 and a field called index that is JSONB
Is it possible that in a single query I take out all the 1000 rows in sites table and then update the field called index under id 1?
The format of the json would be
[
{
"id": 10,
"name": "somename"
},
{
"id": 11,
"name": "another name"
} // and the rest of the 1000 rows
]
I am also okay if it uses more than 1 raw SQL statement.
UPDATE
I want to add that if the result is empty set, then default to empty array in the json field
Assuming you're OK with fully replacing the index value in the jsonindexdocument table:
UPDATE jsonindexdocument
SET index = (
-- using json_agg(row_to_json(sites.*)) would also work here, if you want to copy
-- all columns from the sites table into the json value
SELECT COALESCE(json_agg(json_build_object(
'id', id,
'name', name
)), '[]'::json)
FROM sites
)
WHERE id = 1;
As an example:
CREATE TEMP TABLE sites (
id INT,
name TEXT
);
CREATE TEMP TABLE jsonindexdocument (
id INT,
index JSON
);
INSERT INTO sites
VALUES (1, 'name1')
, (2, 'name2');
INSERT INTO jsonindexdocument
VALUES (1, NULL);
UPDATE jsonindexdocument
SET index = (
SELECT COALESCE(json_agg(json_build_object(
'id', id,
'name', name
)), '[]'::json)
FROM sites
)
WHERE id = 1;
SELECT * FROM jsonindexdocument;
returns
+--+------------------------------------------------------------+
|id|index |
+--+------------------------------------------------------------+
|1 |[{"id" : 1, "name" : "name1"}, {"id" : 2, "name" : "name2"}]|
+--+------------------------------------------------------------+
I have two tables that I'm trying to combine.
Table 1:
FAME_ID, FAME_Emblem_Title, FAME_Category
Table 2:
User_ID, FAME_ID, Times_Received
However, some values does not exist in table 2. For Example:
Table 1: Table 2:
Fame_ID: 1 Fame_ID: null/does not have a value
FAME_Emblem_Title: test1 User_ID: null/does not have a value
FAME_Category: 1 Times_Received: null/does not have a value
Fame_ID: 2 Fame_ID: 2
FAME_Emblem_Title: test2 User_ID: user1
FAME_Category: 1 Times_Received: 1
My goal is to filter the SQL query by Category and user but still display all results that match the first filter even if table 2 does not have any value. By the way my output is in JSON array form.
Result:
[
{
"User_ID": "user1",
"FAME_ID": 1,
"FAME_Category": "1",
"Times_Received": 1,
},
{
"User_ID": "null",
"FAME_ID": 2,
"FAME_Category": "1",
"Times_Received": null,
}
]
I'm honestly not sure if this is possible. Any help is highly appreciated.Thanks!
The JOIN required when you want to return all rows from the LEFT table even if there is no join to the RIGHT table is called a LEFT OUTER JOIN. If there is no match, then the row for the left table will be returned, and the value null substitude for all columns of the missing right table row.
You can achieve the join you want as follows, and output in JSON with:
SELECT t2.User_ID, t1.FAME_ID, t1.FAME_Category, t2.Times_Received
FROM table1 t1 LEFT OUTER JOIN table2 t2 on t1.FAME_ID = t2.FAME_ID
FOR JSON AUTO;
Let's say I create two tables using the following SQL,
such that post has many comment:
CREATE TABLE IF NOT EXISTS post (
id SERIAL PRIMARY KEY,
title VARCHAR NOT NULL,
text VARCHAR NOT NULL
)
CREATE TABLE IF NOT EXISTS comment (
id SERIAL PRIMARY KEY,
text VARCHAR NOT NULL,
post_id SERIAL REFERENCES post (id)
)
I would like to be able to query these tables so as to serve a response that
looks like this:
{
"post" : [
{ id: 100,
title: "foo",
text: "foo foo",
comment: [1000,1001,1002] },
{ id: 101,
title: "bar",
text: "bar bar",
comment: [1003] }
],
"comment": [
{ id: 1000,
text: "bla blah foo",
post: 100 },
{ id: 1001,
text: "bla foo foo",
post: 100 },
{ id: 1002,
text: "foo foo foo",
post: 100 },
{ id: 1003,
text: "bla blah bar",
post: 101 },
]
}
Doing this naively would involve to SELECT statements,
the first along the lines of
SELECT DISTINCT ON(post.id), post.title, post.text, comment.id
FROM post, comment
WHERE post.id = comment.post_id
... and the second something along the lines of
SELECT DISTINCT ON(comment.id), comment.text, post.id
FROM post, comment
WHERE post.id = comment.post_id
However, I cannot help but think that there is a way to do this involving
only one SELECT statement - is this possible?
Notes:
I am using Postgres, but I do not require a Postgres-specific solution. Any standard SQL solution should do.
The queries above are illustrative only, they do not give we exactly what is necessary at the moment.
It looks like what the naive solution here does is perform the same join on the same two tables, just doing a distinct on a different table each time. This definitely leaves room for improvement.
It appears that ActiveModel Serializers in Rails already do this - if someone familair with them would like to chime in how they work under the hood, that would be great.
You need two queries to get the form you laid out:
SELECT p.id, p.title, p.text, array_agg(c.id) AS comments
FROM post p
JOIN comment c ON c.post_id = p.id
WHERE p.id = ???
GROUP BY p.id;
Or faster, if you really want to retrieve all or most of your posts:
SELECT p.id, p.title, p.text, c.comments
FROM post p
JOIN (
SELECT post_id, array_agg(c.id) AS comments
FROM comment
GROUP BY 1
) c ON c.post_id = p.id
GROUP BY 1;
Plus:
SELECT id, text, post_id
FROM comment
WHERE post_id = ??;
Single query
SQL can only send one result type per query. For a single query, you would have to combine both tables, listing columns for post redundantly. That conflicts with the desired response in your question. You have to give up one of the two conflicting requirements.
SELECT p.id, p.title, p.text AS p_text, c.id, c.text AS c_text
FROM post p
JOIN comment c ON c.post_id = p.id
WHERE p.id = ???
Aside: The column comment.post_id should be integer, not serial! Also, column names are probably just for a quick show case. You wouldn't use the non-descriptive text as column name, which also conflicts with a basic data type.
Compare this related case:
Foreign key of serial type - ensure always populated manually
However, I cannot help but think that there is a way to do this involving only one SELECT statement - is this possible?
Technically: yes. If you really want your data in json anyway, you could use PostgreSQL (9.2+) to generate it with the json functions, like:
SELECT row_to_json(sq)
FROM (
SELECT array_to_json(ARRAY(
SELECT row_to_json(p)
FROM (
SELECT *, ARRAY(SELECT id FROM comment WHERE post_id = post.id) AS comment
FROM post
) AS p
)) AS post,
array_to_json(ARRAY(
SELECT row_to_json(comment)
FROM comment
)) AS comment
) sq;
But I'm not sure it's worth it -- usually not a good idea to dump all your data without limit / pagination.
SQLFiddle