I have a requirement to Parse a XML Text field in Oracle to remove Specific Characters/Strings in the data.
I/p:-
{{Value : "Actual: 15' 0" X 7' 0" Opening: 15' 0" X 7' 0"", Description : "Size", PrintCode : "", PrintSequence : 80},
{Value : "Section Color: Desert Tan-,Trim Board Color: White", Description : "Color", PrintCode : "", PrintSequence : 90},
{Value : "Top Section: Standard-,Board Width: Standard", Description : "Design Modifications", PrintCode : "", PrintSequence : 100},
{Value : "Size: 2"-,Mount: Bracket Mount-,Radius: 15"", Description : "Track", PrintCode : "", PrintSequence : 110},
{Value : "Springs: Standard-,Drums: Standard-,Shaft: 16 Gauge Tube", Description : "Counterbalance", PrintCode : "", PrintSequence : 120},
{Value : "Hinge: Standard-,Struts: Standard", Description : "Hardware", PrintCode : "", PrintSequence : 130}}
I need the O/P like -
"Actual: 15' 0" X 7' 0" Opening: 15' 0" X 7' 0"", Description : "Size",
"Section Color: Desert Tan-,Trim Board Color: White", Description : "Color",
"Top Section: Standard-,Board Width: Standard", Description : "Design Modifications",
"Size: 2"-,Mount: Bracket Mount-,Radius: 15"", Description : "Track",
"Springs: Standard-,Drums: Standard-,Shaft: 16 Gauge Tube", Description : "Counterbalance",
"Hinge: Standard-,Struts: Standard", Description : "Hardware"
1) I would like to remove all the brackets.
2) I would like to remove all the code that starts with PrintCode until the bracket end.
3) Should replace value : string with null.
Any help would be greatly appreciated. Thanks. :)
Try this.
SQL Fiddle
The pattern matches for the curly braces - {} or value : or PrintCode until the bracket end and replaces it with blanks. Additional trims are needed to remove , and space. .+? is for non-greedy match that searches until first occurrence of next character(} in this case).
I hope there are no new lines in the code as you said in the comments. The results could be different otherwise and the n Pattern-Matching option could be used to handle it.
Query 1:
WITH t ( input ) AS
(SELECT '{{Value : "Actual: 15'' 0" X 7'' 0" Opening: 15'' 0" X 7'' 0"", Description : "Size", PrintCode : "", PrintSequence : 80}, {Value : "Section Color: Desert Tan-,Trim Board Color: White", Description : "Color", PrintCode : "", PrintSequence : 90}, {Value : "Top Section: Standard-,Board Width: Standard", Description : "Design Modifications", PrintCode : "", PrintSequence : 100},{Value : "Size: 2"-,Mount: Bracket Mount-,Radius: 15"", Description : "Track", PrintCode : "", PrintSequence : 110},{Value : "Springs: Standard-,Drums: Standard-,Shaft: 16 Gauge Tube", Description : "Counterbalance", PrintCode : "", PrintSequence : 120},{Value : "Hinge: Standard-,Struts: Standard", Description : "Hardware", PrintCode : "", PrintSequence : 130}}'
FROM dual
)
SELECT RTRIM ( TRIM ( REGEXP_REPLACE (input,'({|}|Value +:|PrintCode.+?}(,|}))', '')) ,',') AS output
FROM t
Results:
"Actual: 15' 0" X 7' 0" Opening: 15' 0" X 7' 0"", Description : "Size", "Section Color: Desert Tan-,Trim Board Color: White", Description : "Color", "Top Section: Standard-,Board Width: Standard", Description : "Design Modifications", "Size: 2"-,Mount: Bracket Mount-,Radius: 15"", Description : "Track", "Springs: Standard-,Drums: Standard-,Shaft: 16 Gauge Tube", Description : "Counterbalance", "Hinge: Standard-,Struts: Standard", Description : "Hardware"
Related
I have table with multiple JSON rows. In this JSON structure stored different different key value like as :
"official_form_attributes":{"81459" : "Y", "81460" : " ", "81293" : "1~Yes", "80985" : " ", "80953" : "1", "80952" : " ", "80951" : "8~Forward", "81291" : "1~Yes", "81295" : "1~Yes", "81294" : "1~Yes", "80986" : "1~PRED", "81292" : "1~Yes", "80954" : "4", "80950" : " "}
"official_form_attributes":{"81321" : "6", "81315" : "15/06/2020", "81364" : "Approved", "81320" : "100000", "81466" : " ", "81314" : "1~Pucca", "80958" : "9~Forward to Tahsildar", "81318" : "20", "81325" : "20", "81465" : "Y", "81322" : "20000", "81324" : "1~Partially Damaged", "81323" : "20000", "81317" : "30", "81326" : "5200", "81319" : "600", "81316" : " "}
"official_form_attributes":{"82817" : " ", "82818" : " ", "82835" : "4", "81486" : "1~Yes", "82855" : "4", "83240" : "29/10950004/2020/07/09/29006271/10950004_9356416_3914_1594303859111.pdf", "81487" : " ", "80963" : "approved", "81488" : "5200", "80962" : "11~Approve by Tahsildar"}.
I have to find the key in this table. the result of that query is return all rows with some value and null value regarding this key.
But my requirement is that return only those rows which have some value that key.
CASE 1
MY query
select application_id, current_process_id, processing_json->'official_form_attributes'->'81488'
from schm_ka.processing_data_json
where application_id = 9356416;
Result:
applid keyvalue
9356416 ""
9356416 ""
9356416 "5200"
But I need only this
applid keyvalue
9356416 "5200"
CASE 2
select application_id, current_process_id, processing_json->'official_form_attributes'->'81488',processing_json->'official_form_attributes'->'81315'
from schm_ka.processing_data_json
where application_id = 9356416;
Result
applid key1value key2value
9356416 "" ""
9356416 "" "15/06/2020"
9356416 "5200" ""
But I need only this
applid key1value key2value
9356416 "5200" "15/06/2020"
How to do this?
For case 1, just put it into the WHERE clause:
select application_id,
processing_json->'official_form_attributes' ->> '81488'
from schm_ka.processing_data_json
where application_id = 9356416
and processing_json -> 'official_form_attributes' ? '81488';
The ? operator tests if a key is present in the JSON value
For case 2 you need aggregation:
select application_id,
max(processing_json->'official_form_attributes'->>'81488') as key_value_1,
max(processing_json->'official_form_attributes'->>'81315') as key_value_2
from schm_ka.processing_data_json
where application_id = 9356416
and processing_json->'official_form_attributes' ?| array['81488', '81315']
group by application_id;
The ?| operator tests if any of the provided keys in the array is present in the JSON value. As you apparently get multiple rows with that condition, the aggregation is needed to collapse them into one row.
I have been trying to export the contents of a JSON file to an SQL Server table. However, despite the presence of multiple rows in the JSON, the output SQL table consists of only the first row from the JSON. The code I am using is as follows:
DROP TABLE IF EXISTS testingtable;
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT * INTO testingtable FROM OPENJSON(#json) WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5)
)
SELECT * FROM testingtable
And the output obtained is as follows:
Click to view
A multiline JSON text is enclosed in a square bracket, for example;
[
{first data set},
{second data set}, .....
]
You can either add square brackets while passing data to this query or else you can add square brackets to your #json variable (eg. '['+ #json + ']')
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT * INTO testingtable FROM OPENJSON ('['+ #json + ']') WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5)
)
SELECT * FROM testingtable
The string isn't valid JSON. You can't have two root objects in a JSON document. Properly formatted, the JSON string looks like this
DECLARE #json VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
It should be
DECLARE #json VARCHAR(MAX) = '[{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
]';
Looks like OPENJSON parsed the first object and stopped as soon as it encounterd the invalid text.
The quick & dirty way to fix this would be to add the missing square brackets :
SELECT * FROM OPENJSON('[' + #json + ']') WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5))
I suspect that string came from a log or event file that stores individual records in separate lines. That's not valid, nor is there any kind of standard or specification for this (name squatters notwithstanding) but a lot of high-traffic applications use this, eg in log files or event streaming.
The reason they do this is that there's no need to construct or read the entire array to get a record. It's easy to just append a new line for each record. Reading huge files and processing them in parallel is also easier - just read the text line by line and feed it to workers. Or split the file in N parts to the nearest newline and feed individual parts to different machines. That's how Map-Reduce works.
That's why adding the square brackets is a dirty solution - you have to read the entire text from a multi-MB or GB-sized file before you can parse it. That's not what OPENJSON was built to do.
The proper solution would be to read the file line-by-line using another tool, parse the records and insert the values into the target tables.
If you know the JSON docs will not contain any internal newline characters, you can split the string with string_split. OPENJSON doesn't care about leading whitespace or trailing ,. That way you avoid adding the [ ] characters, and don't have to parse it as one big document.
EG:
DROP TABLE IF EXISTS testingtable;
DECLARE #jsonFragment VARCHAR(MAX) = '{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" },
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }';
SELECT *
INTO testingtable
FROM string_split(#jsonFragment,CHAR(10)) docs
cross apply
(
select *
from openjson(docs.value)
WITH (_id int, city varchar(20), loc float(50), pop int, state varchar(5))
) d
SELECT * FROM testingtable
This format is what you might call a "JSON Fragment", by analogy with XML. And this is another difference between XML and JSON in SQL Server. For XML the engine is happy to parse and store XML Fragments, but not with JSON.
This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I am quite a newbie to mongodb, my requirement is to filter within the array of objects of a single document
for example : Below is my json document. I want to query in the combinations to find all the Manufacturer with value "abc manufacturer".
The query I tried is
db.Product.find({"Combinations": {$elemMatch: {"Manufacturer":"abc manufacturer"}}}). unfortunately its not returning the result with abc manufacturer but all.
My result should be exactly similar to the screenshot attached down below. Since its in sql server, Now I want the equivalent query for the same in mongodb.. Some experts in the forum throw me some light.
Screenshot
{
"_id" : ObjectId("59e8c938ab3166800493273f"),
"ProductId" : 26,
"Combinations" : [
{
"#Type" : "S",
"Manufacturer" : "abc manufacturer",
"Model Name" : "Squatting Urinal",
"Size" : "475 x 365 x 105 mm",
"Colour" : "White"
},
{
"#Type" : "S",
"Manufacturer" : "abc manufacturer",
"Model Name" : "Squatting",
"Size" : "430 x 350 x 100 mm"
},
{
"#Type" : "S",
"Manufacturer" : "def manufacturer",
"Model Name" : "Squatting Urinal",
"Size" : "440 x 355 x 102 mm",
"Colour" : "White"
},
{
"#Type" : "S",
"Manufacturer" : "xyz manufacturer",
"Model Name" : "Squatting Urinal",
"Size" : "440 x 355 x 102 mm",
"Colour" : "Ivory"
},
{
"#Type" : "S",
"Manufacturer" : "ghi manufacturer",
"Model Name" : "Squatting Pan - 861"
},
{
"#Type" : "S",
"Manufacturer" : "xyz manufacturer",
"Model Name" : "Mateo",
"Size" : "470 x 365 x 100 mm"
},
{
"#Type" : "S",
"Manufacturer" : "xyz manufacturer",
"Model Name" : "Squatting",
"Size" : "340 x 435 x 100 mm",
"Colour" : "White"
}
]
}
If I understand what you are searching for, is all the Combinations where the Manufacturer is abc.
The query you just did gives you all the Documents that contains a Combination with the said Manufacturer.
You should check the aggregations for that.
https://docs.mongodb.com/manual/aggregation/
I had much success building my own little search with elasticsearch in the background. But there is one thing I couldn't find in the documentation.
I'm indexing the names of musicians and bands. There is one band called "The The" and due to the stop words list this band is never indexed.
I know I can ignore the stop words list completely but this is not what I want since the results searching for other bands like "the who" would explode.
So, is it possible to save "The The" in the index but not disabling the stop words at all?
You can use the synonym filter to convert The The into a single token eg thethe which won't be removed by the stopwords filter.
First, configure the analyzer:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"filter" : {
"syn" : {
"synonyms" : [
"the the => thethe"
],
"type" : "synonym"
}
},
"analyzer" : {
"syn" : {
"filter" : [
"lowercase",
"syn",
"stop"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
Then test it with the string "The The The Who".
curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=The+The+The+Who&analyzer=syn'
{
"tokens" : [
{
"end_offset" : 7,
"position" : 1,
"start_offset" : 0,
"type" : "SYNONYM",
"token" : "thethe"
},
{
"end_offset" : 15,
"position" : 3,
"start_offset" : 12,
"type" : "<ALPHANUM>",
"token" : "who"
}
]
}
"The The" has been tokenized as "the the", and "The Who" as "who" because the preceding "the" was removed by the stopwords filter.
To stop or not to stop
Which brings us back to whether we should include stopwords or not? You said:
I know I can ignore the stop words list completely
but this is not what I want since the results searching
for other bands like "the who" would explode.
What do you mean by that? Explode how? Index size? Performance?
Stopwords were originally introduced to improve search engine performance by removing common words which are likely to have little effect on the relevance of a query. However, we've come a long way since then. Our servers are capable of much more than they were back in the 80s.
Indexing stopwords won't have a huge impact on index size. For instance, to index the word the means adding a single term to the index. You already have thousands of terms - indexing the stopwords as well won't make much difference to size or to performance.
Actually, the bigger problem is that the is very common and thus will have a low impact on relevance, so a search for "The The concert Madrid" will prefer Madrid over the other terms.
This can be mitigated by using a shingle filter, which would result in these tokens:
['the the','the concert','concert madrid']
While the may be common, the the isn't and so will rank higher.
You wouldn't query the shingled field by itself, but you could combine a query against a field tokenized by the standard analyzer (without stopwords) with a query against the shingled field.
We can use a multi-field to analyze the text field in two different ways:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"text" : {
"fields" : {
"shingle" : {
"type" : "string",
"analyzer" : "shingle"
},
"text" : {
"type" : "string",
"analyzer" : "no_stop"
}
},
"type" : "multi_field"
}
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"no_stop" : {
"stopwords" : "",
"type" : "standard"
},
"shingle" : {
"filter" : [
"standard",
"lowercase",
"shingle"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
Then use a multi_match query to query both versions of the field, giving the shingled version more "boost"/relevance. In this example the text.shingle^2 means that we want to boost that field by 2:
curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"multi_match" : {
"fields" : [
"text",
"text.shingle^2"
],
"query" : "the the concert madrid"
}
}
}
'
tbar : new Ext.Toolbar({
items : [
'',{
xtype : 'radiofield',
name : 'searchType',
value : 'order_name',
boxLabel : 'Order Name'
},'',{
xtype : 'radiofield',
name : 'searchType',
value : 'order_no',
boxLabel : 'Order No'
},'',{
xtype : 'radiofield',
name : 'searchType',
value : 'status',
boxLabel : 'Status'
},'=',{
xtype : 'textfield',
name : 'keyword',
value : 'Keyword'
},'|',{
xtype : 'datefield',
name : 'order_from',
fieldLabel : 'From ',
labelStyle : 'width:50px',
value : new Date()
},'~',{
xtype : 'datefield',
name : 'order_to',
fieldLabel : "To ",
labelStyle : 'width:50px',
value : new Date()
},'|',{
xtype : 'button',
text : "Search"
}
]
})
I put my questions into attached image.
(Space between ratio buttons and remove strange right margin space in datefield.)
and the button in the tbar looks not like button. it looks just text. anybody know make it good looking button?
thank you!
To add space you can add inside the quotes, essentially any html can be inserted including images.
The extra space is related to the width the date field is trying to grab. You should not set width in the labelStyle but set it directly so that the field can properly calculate the amountof space that it needs.
For example:
labelWidth: 50, //label only
width: 200, //label + input
your live example: http://jsfiddle.net/dbrin/PhAbR/2/