How can we use regexp in sql query to fetch data(key/value) from a json field.
For more understanding i have a table market inside that book is json having a field as title .
{
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": ["A123"],
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": ["A1235"],
"price": 12.99
}]
The query i have written is :
select * from market where books REGEXP('"title":\s*(\["[A-Za-z0-9]*"\])');
But I do not get output.
there you go - follow the url
MY_Sql_5.7_JSON_FUNCTIONS
JSON_EXTRACT - is the method - what you need -
JSON_EXTRACT(json_doc, path[, path] ...) - is the syntax for it.
On the page you will find the examples also. Check it out.
Related
I have data like this:
[ {
"name": "Apple",
"price": 1,
"type": "Food"
},
{
"name": "Apple",
"price": 0.90,
"type": "Food"
},
{
"name": "Apple",
"price": 1000,
"type": "Computer"
},
{
"name": "Apple",
"price": 900,
"type": "Computer"
}
]
Using the Great Expectations automatic profile, a valid range for price would be 0.90 to 1,000. Is it possible to have it slice on the type dimension, so food would be 0.90 to 1 and computer would be 900 to 1000? Or would I need to transform the data first using dbt? I know the column that will create the dimension, but I don't know the particular values.
Also, same question on differences between rows. Like if they had a timestamp, instead of 900 to 1000, it validates -100 for the change in value.
I used this approach to first load the data in a pandas data frame:
https://discuss.greatexpectations.io/t/how-can-i-use-the-return-format-unexpected-index-list-to-select-row-from-a-pandasdataset/70/2
I have a CSV file with several million rows, and want to load it as a PostgreSQL table. One of the rows in the column 'json_doc' as an example contains:
{"id": <>,
"base":
{"ateco":
[
{
"code": "<>",
"rootCode": "<>",
"description": "<>"
}
],
"founded": "<>",
"legalName": "<>",
"legalForms":
[
{
"name": "<>",
"level": <>
},
{
"name": "<>",
"level": <>
}
]
},
"name": "<>",
"people":
{
"items":
[
{
"name": "<>",
"givenName": "<>",
"familyName": "<>"
}
]
},
"country": "<>",
"locations": {}
}
Which as you can see has many nested dictionaries. And there are several million of these.
I'd like to get this file into an SQL table with even the sub-dictionary values in their own columns. How can I do this? It would seem I have to use some sort of name spacing technique for the nested data as there are some duplicate keys i.e. 'name'.
The data will be analysed using Pandas, but I'd like to get this straight into Postgres if possible. Any assistance greatly appreciated.
The result will look like:
id | base_ateco_code | etc | base_ateco_legalForms_name | etc |
Unless there are any ideas about this - it's a pretty open project from my employer - I just need to be able to use this information as part of a JOIN with another table.
Many thanks.
I have a json string that I need to modify
{"RecordCount":3,"Top":10,"Skip":0,"SelectedSort":"Seed asc","value":[{"AccountProductListId":22091612871138,"Name":"April 4th 2018","AccountId":256813438078643,"IsPublic":false,"Comment":"Test order sheet","Quantity":3},{"AccountProductListId":166305848801939,"Name":"test","AccountId":256813438078643,"IsPublic":false,"Comment":"","Quantity":1},{"AccountProductListId":21177711287586,"Name":"Test Order sheet","AccountId":256813438078643,"IsPublic":true,"Comment":"the very first sheet","Quantity":2}]}
Inside value the array looks like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2
}],
What I need to do is append some data from another table:
AccountProductListId ProductID
21177711287586 97096131867163|32721319938943
22091612871138 97096131867163|145461009584740|130005306921282
166305848801939 8744071222157
As you can see the AccountProductListId is already in the JSON result so I should know which array it should go to. The only problem is I don't know the syntax to merge the ProductID data into its specific array index. The JSON array could have more than 3 items.
Essentially ending up with something like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3,
"ProductID": "97096131867163|145461009584740|130005306921282"
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1,
"ProductID": "8744071222157"
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2,
"ProductID": "97096131867163|32721319938943"
}],
Any information would be greatly appreciated. Thanks.
Process with SQL Server
Prior to SQL Server 2016, there is not in-built support to read or write JSON.
Starting SQL Server 2016, you can use OPENJSON rowset function to read JSON and FOR JSON clause to write JSON.
See https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
The approach would be to use OPENJSON to read the JSON string as a rowset, join it with the table to pickup ProductID and use FOR JSON to convert back to JSON.
Process outside SQL Server
Depending on your situation, it might be simpler to parse JSON outside SQL Server. If going that route, then you could
Collect all the AccountProductListIDs from the parsed JSON
Send the collected id to SQL Server via a stored procedure that takes a TVP input and outputs the AccountProductListID -> ProductID mapping
Inject the ProductIDs into JSON object and serialize back to string
I am pushing my logs to elasticsearch which stores a typical doc as-
{
"_index": "logstash-2014.08.11",
"_type": "machine",
"_id": "2tSlN1P1QQuHUkmoJfkmnQ",
"_score": null,
"_source": {
"category": "critical log with list",
"app_name": "attachment",
"stacktrace_array": [
"this is the first line",
"this is the second line",
"this is the third line",
"this is the fourth line",
],
"#timestamp": "2014-08-11T13:30:51+00:00"
},
"sort": [
1407763851000,
1407763851000
]
}
Kibana makes searching substrings very easy. For example searching for "critical" in the dashboard will fetch all logs with the word critical in any string mapped value.
How do i go about searching for something like "second line" which is a string nested in an array within my doc?
It would be a simple field:<search_term> query, like -
"query": {
"query_string": {
"query": "stacktrace_array:*second line*"
}
...
So in layman terms, for Kibana dashboard, put your search query like so -
stacktrace_array:*second line*
is anyone know, how to exclude some topics with specified type(s) using search api and mql?
For example i'm try to find all topics "Voodoo People", and exclude only those, that have composition and release types, and sort result by score desc: http://tinyurl.com/3tjkb7y.
Sorting work perfect, but i can't find functionality for excluding :(
I'm try to use mql_filter: http://tinyurl.com/644xkow, but releases still there.
And one more question: i see in type_strict param possible values: "all", "any", "should". But there is no value "not" or "not in". Is needed result can be obtained in any other way?
The syntax that you're looking for is "optional" : "forbidden". In your query that would look like this:
[{
"search": {
"query": "Voodoo People",
"score": null,
"mql_filter": [{
"type": {
"id": "/music/release",
"optional": "forbidden"
}
}]
},
"name": null,
"id": null,
"type": [],
"/common/topic/notable_for": {
},
"limit": 15,
"sort": "-search.score"
}]