SQL to get count in JSON data - sql

i am new to sql language. Could anyone sharing some point or solution for my cases?
i'll show the JSON data below, could it possible return some value in the column how many time appear in my json data. look like the table below:
+--------------------------------+
|column |value |totalCount |
+--------------------------------+
|brand |top-brand| (2) |
|brand |low | (1) |
|type |Bobtail | (1) |
|type |Snowshoe | (2) |
+--------------------------------+
[
{
"id": 1,
"name": "cat",
"type": {
"id": 2,
"name": "Snowshoe",
},
"brand": {
"id": 3,
"name": "top-brand",
}
},
{
"id": 2,
"name": "cat",
"type": {
"id": 2,
"name": "Snowshoe",
},
"brand": {
"id": 2,
"name": "low",
}
},
{
"id": 3,
"name": "cat",
"type": {
"id": 1,
"name": "Bobtail",
},
"brand": {
"id": 3,
"name": "top-brand",
}
}
]

Related

Search a JSON column in a database

I'm looking to see if it's possible to search multiple database rows for a specific value that's stored in a json string. For instance I have a table called stashitems that contains a json column items that stores all of a players items. I would like to search the 1500 rows of data for a specific label or name value. Below is a snippet of one players stashitems.
How could I accomplish this for the entire table? Thanks for any help!
{
"8": {
"type": "item",
"slot": 8,
"amount": 948,
"weight": 100,
"name": "glass",
"label": "Glass",
"image": "glass.png",
"useable": false,
"unique": false,
"info": ""
},
"23": {
"type": "item",
"slot": 23,
"amount": 1,
"weight": 200,
"name": "crack_baggy",
"label": "Bag of Crack",
"image": "crack_baggy.png",
"useable": true,
"unique": false,
"info": ""
},
"47": {
"type": "item",
"slot": 47,
"amount": 1,
"weight": 20000,
"name": "diving_gear",
"label": "Diving Gear",
"image": "diving_gear.png",
"useable": true,
"unique": true,
"info": []
},
"48": {
"type": "item",
"slot": 48,
"amount": 1,
"weight": 20000,
"name": "diving_gear",
"label": "Diving Gear",
"image": "diving_gear.png",
"useable": true,
"unique": true,
"info": []
}
}
MariaDB 10.4.22
In SQL Server, you can very easily traverse JSON with the following syntax:
SELECT 'stashitems'
JSON_VALUE(json_column, '$.8.type') AS type
see this documentation.
It's also possible in MySQL, though the syntax is different.
Unfortunately, SQLite stores only text values, so a workaround is needed.
There are some possibilities like
SELECT JSON_SEARCH(#json, 'all', 'glass');
| JSON_SEARCH(#json, 'all', 'glass') |
| :--------------------------------- |
| ["$.8.name", "$.8.label"] |
SELECT #json Like '%glass%'
| #json Like '%glass%' |
| -------------------: |
| 1 |
db<>fiddle here

How to filter out events before joining datasets with stats

I have some events (2 different sourcetype—process_events and socket_events) that look something like this:
{
"action": "added",
"columns": {
"time": "1527895541",
"success": "1",
"action": "connect",
"auid": "1000",
"family": "2",
"local_address": "",
"local_port": "0",
"path": "/usr/bin/curl",
"pid": "30220",
"remote_address": "127.0.0.2",
"remote_port": "80"
},
"unixTime": 1527895545,
"hostIdentifier": "HOST_ONE",
"name": "socket_events",
"numerics": false
}
{
"action": "added",
"columns": {
"time": "1527895541",
"success": "1",
"action": "connect",
"auid": "1000",
"family": "2",
"local_address": "",
"local_port": "0",
"path": "/usr/bin/curl",
"pid": "30220",
"remote_address": "10.10.10.10",
"remote_port": "12345"
},
"unixTime": 1527895545,
"hostIdentifier": "HOST_ONE",
"name": "socket_events",
"numerics": false
}
{
"action": "added",
"columns": {
"uid": "0",
"time": "1527895541",
"pid": "30220",
"path": "/usr/bin/curl",
"auid": "1000",
"cmdline": "curl google.com",
"ctime": "1503452096",
"cwd": "",
"egid": "0",
"euid": "0",
"gid": "0",
"parent": ""
},
"unixTime": 1527895550,
"hostIdentifier": "HOST_ONE",
"name": "process_events",
"numerics": false
}
Current query:
(name=socket_events OR name=process_Events) columns.path=*bin*
| stats values(*) as * by hostIdentifier, columns.path, columns.pid
Result
+-------------------------------------------------------------------------------------------+
| hostIdentifier | columns.path | columns.pid | cmdline | columns.remote_addr | columns.remote_p
+-------------------------------------------------------------------------------------------+
| HOST_ONE | /usr/bin/curl | 30220 | curl google.com | 127.0.0.2 | 80
| | | | | 10.10.10.10 | 12345
+-------------------------------------------------------------------------------------------+
Is there a way for me to apply some filter logics like these
If columns.remote is multivalue AND one of the remote_address!=127.0.0.0/8 AND > remote_port>5000, then pipe it to stats
If columns.remote is not multivalue AND remote_address!=127.0.0.0/8
AND remote_port>5000, then pipe it to stats()
Else, ignore
I feel like I need to apply the filter before the | stats ... because I need to exclude all the socket_events events that don't satisfy the condition before the JOIN with process_events.
Any help would be awesome!
Also, sample data taken from https://osquery.readthedocs.io/en/stable/deployment/process-auditing/
One can't filter out multi-value fields before stats because it's stats that makes them multi-value. Try filtering out the undesired IP addresses before joining the events.
(name=socket_events OR name=process_Events) columns.path=*bin*
| where (isnull(columns.remote_addr) OR NOT cidrmatch("127.0.0.0/8", columns.remote_addr))
| stats values(*) as * by hostIdentifier, columns.path, columns.pid
The isnull function retains rows that don't have a remote_addr field.

Postgres combine 3 CTEs causes duplicate rows

I'm trying to combine 2 select queries on 2 different tables which have a foreign key in common project_id included with a condition and returned in a single result set with the project_id a json_array called sprints and a json_array called backlog. The output should look something like this.
{
"id": "1920c79d-69d7-4b63-9662-ed5333e9b735",
"name": "Test backend v1",
"backlog_items": [
{
"id": "961b2438-a16b-4f30-83f1-723a05592d68",
"name": "Another User Story 1",
"type": "User Story",
"backlog": true,
"s3_link": null,
"sprint_id": null
},
{
"id": "a2d93017-ab87-4ec2-9589-71f6cebba936",
"name": "New Comment",
"type": "Comment",
"backlog": true,
"s3_link": null,
"sprint_id": null
}
],
"sprints": [
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62ed4",
"name": "Test name 2",
"sprint_items": [
{
"id": "1285825b-1669-40f2-96b8-de02ec80d8bd",
"name": "As an admin I should be able to delete an organization",
"type": "User Story",
"backlog": false,
"s3_link": null,
"sprint_id": "1cd165c7-68f7-4a1d-b018-609989d62ed4"
}
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62f44",
"name": "Test name 1",
"sprint_items": []
}
]
}
In case there are no backlog items associated with the project_id or sprints with the project_id I want to return an empty list. I figured using Postgres COALESCE function might help in this case but I'm not sure how to use is to achieve what I want.
Sprint table
id | end_date | start_date | project_id | name
--------------------------------------+----------+------------+--------------------------------------+-------------
1cd165c7-68f7-4a1d-b018-609989d62ed4 | | | 1920c79d-69d7-4b63-9662-ed5333e9b735 | Test name 2
Sprint item table
id | sprint_id | name | type | s3_link | backlog | project_id
--------------------------------------+--------------------------------------+--------------------------------------------------------+------------+---------+---------+--------------------------------------
961b2438-a16b-4f30-83f1-723a05592d68 | | Another User Story 1 | User Story | | t | 1920c79d-69d7-4b63-9662-ed5333e9b735
a2d93017-ab87-4ec2-9589-71f6cebba936 | | New Comment | Comment | | t | 1920c79d-69d7-4b63-9662-ed5333e9b735
1285825b-1669-40f2-96b8-de02ec80d8bd | 1cd165c7-68f7-4a1d-b018-609989d62ed4 | As an admin I should be able to delete an organization | User Story | | f | 1920c79d-69d7-4b63-9662-ed5333e9b735
The query I'm using right now which returns multiple duplicates in the result set.
with si as (
select si.id, si.name, si.backlog, si.project_id
from sprint_items si
), s as (
select s.id, s.name, s.project_id, jsonb_agg(to_jsonb(si) - 'project_id') as sprint_items
from sprints s
left join sprint_items si
on si.sprint_id = s.id
group by s.id, s.name, s.project_id
), p as (
select p.id, p.name, jsonb_agg(to_jsonb(s) - 'project_id') as sprints,
jsonb_agg(to_jsonb(case when si.backlog = true then si end) - 'project_id') as backlog_items
from projects p
left join s
on s.project_id = p.id
left join si
on si.project_id = p.id
group by p.id, p.name
)
select to_jsonb(p) from p
where p.id = '1920c79d-69d7-4b63-9662-ed5333e9b735'
Updated
This is what the above query is producing in terms of duplicating the sprint items and sprints
{
"id": "1920c79d-69d7-4b63-9662-ed5333e9b735",
"name": "Test backend v1",
"sprints": [
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62ed4",
"name": "Test name 2",
"sprint_items": [
{
"id": "1285825b-1669-40f2-96b8-de02ec80d8bd",
"name": "As an admin I should be able to delete an organization",
"type": "User Story",
"backlog": false,
"s3_link": null,
"sprint_id": "1cd165c7-68f7-4a1d-b018-609989d62ed4"
}
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62ed4",
"name": "Test name 2",
"sprint_items": [
{
"id": "1285825b-1669-40f2-96b8-de02ec80d8bd",
"name": "As an admin I should be able to delete an organization",
"type": "User Story",
"backlog": false,
"s3_link": null,
"sprint_id": "1cd165c7-68f7-4a1d-b018-609989d62ed4"
}
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62ed4",
"name": "Test name 2",
"sprint_items": [
{
"id": "1285825b-1669-40f2-96b8-de02ec80d8bd",
"name": "As an admin I should be able to delete an organization",
"type": "User Story",
"backlog": false,
"s3_link": null,
"sprint_id": "1cd165c7-68f7-4a1d-b018-609989d62ed4"
}
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62f44",
"name": "Test name 1",
"sprint_items": [
null
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62f44",
"name": "Test name 1",
"sprint_items": [
null
]
},
{
"id": "1cd165c7-68f7-4a1d-b018-609989d62f44",
"name": "Test name 1",
"sprint_items": [
null
]
}
],
"backlog_items": [
null,
{
"id": "961b2438-a16b-4f30-83f1-723a05592d68",
"name": "Another User Story 1",
"backlog": true
},
{
"id": "a2d93017-ab87-4ec2-9589-71f6cebba936",
"name": "New Comment",
"backlog": true
},
null,
{
"id": "961b2438-a16b-4f30-83f1-723a05592d68",
"name": "Another User Story 1",
"backlog": true
},
{
"id": "a2d93017-ab87-4ec2-9589-71f6cebba936",
"name": "New Comment",
"backlog": true
}
]
}
Any pointers to what functions I should read up would be greatly appreciated.

User selected values from JSON

When a user fills out a form in a mobile application a json is created. I load this json into a postgres database and wanting to pull is apart and select the inputs that the user has selected.
I find this hard to explain you really need to see the json and the expected results. The json looks like this...
{
"iso_created_at":"2019-06-25T14:50:59+10:00",
"form_fields":[
{
"field_type":"DateAndTime",
"mandatory":false,
"form_order":0,
"editable":true,
"visibility":"public",
"label":"Time & Date Select",
"value":"2019-06-25T14:50:00+10:00",
"key":"f_10139_64_14",
"field_visibility":"public",
"data":{
"default_to_current":true
},
"id":89066
},
{
"field_type":"Image",
"mandatory":false,
"form_order":6,
"editable":true,
"visibility":"public",
"label":"Photos",
"value":[
],
"key":"f_10139_1_8",
"field_visibility":"public",
"data":{
},
"id":67682
},
{
"field_type":"DropDown",
"mandatory":true,
"form_order":2,
"editable":true,
"visibility":"public",
"label":"Customer ID",
"value":"f_10139_35_13_35_1",
"key":"f_10139_35_13",
"field_visibility":"public",
"data":{
"options":[
{
"is_default":false,
"display_order":0,
"enabled":true,
"value":"f_10139_35_13_35_1",
"label":"27"
}
],
"multi_select":false
},
"id":86039
},
{
"field_type":"CheckBox",
"mandatory":true,
"form_order":3,
"editable":true,
"visibility":"public",
"label":"Measure",
"value":[
"f_7422_10_7_10_1",
"f_7422_10_7_10_2"
],
"key":"f_10139_1_5",
"field_visibility":"public",
"data":{
"options":[
{
"is_default":true,
"display_order":0,
"enabled":true,
"value":"f_7422_10_7_10_1",
"label":"Kg"
},
{
"is_default":true,
"display_order":0,
"enabled":true,
"value":"f_7422_10_7_10_2",
"label":"Mm"
}
],
"multi_select":true
},
"id":67679
},
{
"field_type":"ShortTextBox",
"mandatory":true,
"form_order":4,
"editable":true,
"visibility":"public",
"label":"Qty",
"value":"1000",
"key":"f_10139_9_9",
"field_visibility":"public",
"data":{
},
"id":85776
}
],
"address":"Latitude: -37.811812 Longitude: 144.971745",
"shape_id":6456,
"category_id":75673,
"id":345,
"account_id":778
}
Can anyone help me?
Expected results:
account_id | report_id | field_label | field_value
------------------------------------------------------------------------
778 | 345 | Time & Date Select | 2019-06-25T14:50:00+10:00
778 | 345 | Photos | []
778 | 345 | Customer ID | 27
778 | 345 | Measure | Kg
778 | 345 | Measure | Mm
778 | 345 | Qty | 1000
like say #Amadan, might need to hardcode each field separately instead of making a clever loop ,especially with the fields "Customer ID" "and" "Measure" or any other that requires it.
You can use the json function: json_array_elements_text, here you have a example, you can adjust to your case, i am trying with your case:
select account_id::text,report_id::text,field_label::text,
--case ""Customer ID"" and ""Measure""
case
when field_label::text='"Customer ID"' then ((todo->'data'->'options')->0->'label')::text
when field_label::text='"Measure"' then ((todo->'data'->'options')->0->'label')::text ||',' ||((todo->'data'->'options')->1->'label')::text
else
field_value::text
end as field_value
from
(
select dato->'account_id' as account_id,dato->'id' as report_id,
(json_array_elements_text(dato->'form_fields')::json)->'label' as field_label,
(json_array_elements_text(dato->'form_fields')::json)->'value' as field_value,
(json_array_elements_text(dato->'form_fields')::json) as todo
from (
select '{"iso_created_at": "2019-06-25T14:50:59+10:00", "form_fields": [
{"field_type": "DateAndTime", "mandatory": false, "form_order": 0, "editable": true, "visibility": "public", "label": "Time & Date Select", "value": "2019-06-25T14:50:00+10:00", "key": "f_10139_64_14", "field_visibility": "public", "data":
{"default_to_current": true}, "id": 89066},
{"field_type": "Image", "mandatory": false, "form_order": 6, "editable": true, "visibility": "public", "label": "Photos", "value": [], "key": "f_10139_1_8", "field_visibility": "public", "data": {}, "id": 67682},
{"field_type": "DropDown", "mandatory": true, "form_order": 2, "editable": true, "visibility": "public", "label": "Customer ID", "value": "f_10139_35_13_35_1", "key": "f_10139_35_13", "field_visibility": "public", "data": {"options": [{"is_default": false, "display_order": 0, "enabled": true, "value": "f_10139_35_13_35_1", "label": "27"}], "multi_select": false}, "id": 86039},
{"field_type": "CheckBox", "mandatory": true, "form_order": 3, "editable": true, "visibility": "public", "label": "Measure", "value": ["f_7422_10_7_10_1","f_7422_10_7_10_2"], "key": "f_10139_1_5", "field_visibility": "public", "data": {"options": [{"is_default": true, "display_order": 0, "enabled": true, "value": "f_7422_10_7_10_1", "label": "Kg"},{"is_default": true, "display_order": 0, "enabled": true, "value": "f_7422_10_7_10_2", "label": "Mm"}], "multi_select": true}, "id": 67679},
{"field_type": "ShortTextBox", "mandatory": true, "form_order": 4, "editable": true, "visibility": "public", "label": "Qty", "value": "1000", "key": "f_10139_9_9", "field_visibility": "public", "data": {}, "id": 85776}
], "address": "Latitude: -37.811812 Longitude: 144.971745", "shape_id": 6456, "category_id": 75673, "id": 345, "account_id": 778}'::json as dato) as dat
) dat2
and i get this result, similar to you:
take this example and ajust to you
regards
You need to unnest the values in form_fields and then pick the label and the value from that JSON object:
select fd.account_id,
fd.report_id,
ff.field ->> 'label' as field_label,
ff.field ->> 'value' as field_value
from form_data fd
left join jsonb_array_elements(data -> 'form_fields') as ff(field) on true;
The left join is needed to still see the row from form_data even if no form_fields is available in the main JSON column.
The above assumes a table form_data with the columns account_id, report_id and data (which contains the JSON)
Online example: https://rextester.com/RNIBSB94484

Transform Json Nested Object Array To Table Row

I have a json like:
[
{
"Id": "1234",
"stockDetail": [
{
"Number": "10022_1",
"Code": "500"
},
{
"Number": "10022_1",
"Code": "600"
}
]
},
{
"Id": "1235",
"stockDetail": [
{
"Number": "10023_1",
"Code": "100"
},
{
"Number": "10023_1",
"Code": "100"
}
]
}
]
How to convert it in sql table like below:
+------+---------+------+
| Id | Number | Code |
+------+---------+------+
| 1234 | 10022_1 | 500 |
| 1234 | 10022_1 | 600 |
| 1235 | 10023_1 | 100 |
| 1235 | 10023_1 | 100 |
+------+---------+------+
If you need to define typed columns you can use OPENJSON with WITH clause:
DECLARE #j nvarchar(max) = N'[
{
"Id": "1234",
"stockDetail": [
{ "Number": "10022_1",
"Code": "500"
},
{ "Number": "10022_1",
"Code": "600"
}
]
},
{
"Id": "1235",
"stockDetail": [
{ "Number": "10023_1",
"Code": "100"
},
{ "Number": "10023_1",
"Code": "100"
}
]
}
]'
select father.Id, child.Number, child.Code
from openjson (#j)
with (
Id int,
stockDetail nvarchar(max) as json
) as father
cross apply openjson (father.stockDetail)
with (
Number nvarchar(100),
Code nvarchar(100)
) as child
Result:
In your case you may try to CROSS APPLY the JSON child node with the parent node:
DECLARE #json nvarchar(max)
SET #json = N'
[
{
"Id": "1234",
"stockDetail": [
{
"Number": "10022_1",
"Code": "500"
},
{
"Number": "10022_1",
"Code": "600"
}
]
},
{
"Id": "1235",
"stockDetail": [
{
"Number": "10023_1",
"Code": "100"
},
{
"Number": "10023_1",
"Code": "100"
}
]
}
]'
SELECT
JSON_Value (i.value, '$.Id') as ID,
JSON_Value (d.value, '$.Number') as [Number],
JSON_Value (d.value, '$.Code') as [Code]
FROM OPENJSON (#json, '$') as i
CROSS APPLY OPENJSON (i.value, '$.stockDetail') as d
Output:
ID Number Code
1234 10022_1 500
1234 10022_1 600
1235 10023_1 100
1235 10023_1 100