How to make a intersect in SOLR? - apache

I am implementing a solr project, with the structure below of my indexed object.
{
"OBJECT_HEADER_ID": 173604,
"CHARACTERISTIC_VALUE_ID": 143287,
"OBJECT_TYPE_ID": 1,
"SEQUENCE": 0,
"CHARACTERISTIC_ID": 1488,
"OBJECT_VARIANT_ID": 169941,
"ID": "84445897",
"TYPE": 0
},
{
"OBJECT_HEADER_ID": 173604,
"CHARACTERISTIC_VALUE_ID": 23502,
"OBJECT_TYPE_ID": 1,
"SEQUENCE": 0,
"CHARACTERISTIC_ID": 992,
"OBJECT_VARIANT_ID": 169941,
"ID": "84445898",
"TYPE": 0
}
And I need to make a intersect between various results in sub queries, and I don't found nothing on the WEB about how to make the query, for example:
-> Get all results that have (CHARACTERISTIC_ID = 1488 and CHARACTERISTIC_VALUE_ID = 143287) INTERSECT BY OBJECT_VARIANT_ID WITH (CHARACTERISTIC_ID = 992 and CHARACTERISTIC_VALUE_ID = 23502).

Related

Azure Data Factory JSON syntax

In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...
In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.

Update jsonb object in postgres

One of my column is jsonb and have value in the format. The value of a single row of column is below.
{
"835": {
"cost": 0,
"name": "FACEBOOK_FB1_6JAN2020",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
},
"876": {
"cost": 0,
"name": "MARVEL_BLACK_WIDOW_4DEC2019",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
}
}
I want to update campaign_info(column name) column's the inner key "processed" and "models" of the campaign_id is "876".
I have tried this query:
update safe_vid_info
set campaign_info -> '835' --> 'processed'='completed'
where cid = 'kiywgh';
But it didn't work.
Any help is appreciated. Thanks.
Is this what you want?
jsonb_set(campaign_info, '{876,processed}', '"completed"')
This updates the value at path "876" > "processed" with value 'completed'.
In your update query:
update safe_vid_info
set campaign_info = jsonb_set(campaign_info, '{876,processed}', '"completed"')
where cid = 'kiywgh';

bigquery nested object : No such field

I have a table with this schema :
I'm trying to upload some data from Google Coud Storage using the python client. The file is JSON newline delimited. Most of my lines don't have the field "passenger_origin.accuracy" but when the filed is present I have the following error :
Error while reading
data, error message: JSON parsing error in row starting at position
2122510: No such field: driver_origin.accuracy. (error code: invalid)
Error while reading
data, error message: JSON parsing error in row starting at position
2126317: No such field: passenger_origin.accuracy. (error code:
invalid)
Example of an invalid row :
{
"id": 1479443,
"is_obsolete": 0,
"seat_count": 1,
"is_ticket_checked": 0,
"score": 0.3709318902,
"is_multimodal": 0,
"fake_paths": 0,
"passenger_origin": {
"id": 2204,
"poi_uuid": "15b4e52c-7c58-442c-98df-1eb06079f6bb",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:15:39",
"created": "2016-02-05T17:06:26",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"driver_origin": {
"id": 412491,
"poi_uuid": "47e90b6d-e178-4e02-9f02-f4ea5f8beaa1",
"user_id": 71471,
"disabled": 0,
"last_update": "2017-11-02T10:09:09",
"created": "2017-11-02T10:09:09",
"modified_by_user": 0,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"passenger_destination": {
"id": 2203,
"poi_uuid": "c531c3ca-47f0-4003-8098-1272fee8d018",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:12:42",
"created": "2016-02-05T17:06:19",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 1,
}
}
The table is created before the upload of the data and is not modified since. I don't understand why the upload is failing on theses fields ? Do the RECORD fields have to be REPEATED ?
To ignore the fields that aren't present in the schema, use a combination of:
configuration.load.ignoreUnknownValues
configuration.load.maxBadRecords
Setting the first to true and the second to some arbitrarily-high number, e.g. 100000, will enable the load to succeed even if there are extra fields.
The problem was configuration.load.autodetect was set to True. I set it to False and the problem was fix

RavenDB facet takes to long query time

I am new to ravendb and trying it out to see if it can do the job for the company i work for .
i updated a data of 10K records to the server .
each data looks like this :
{
"ModelID": 371300,
"Name": "6310I",
"Image": "0/7/4/9/28599470c",
"MinPrice": 200.0,
"MaxPrice": 400.0,
"StoreAmounts": 4,
"AuctionAmounts": 0,
"Popolarity": 16,
"ViewScore": 0.0,
"ReviewAmount": 4,
"ReviewScore": 40,
"Cat": "E-Cellphone",
"CatID": 1100,
"IsModel": true,
"ParamsList": [
{
"ModelID": 371300,
"Pid": 188396,
"IntVal": 188402,
"Param": "Nokia",
"Name": "Manufacturer",
"Unit": "",
"UnitDir": "left",
"PrOrder": 0,
"IsModelPage": true
},
{
"ModelID": 371305,
"Pid": 398331,
"IntVal": 1559552,
"Param": "1.6",
"Name": "Screen size",
"Unit": "inch",
"UnitDir": "left",
"PrOrder": 1,
"IsModelPage": false
},.....
where ParamsList is an array of all the attributes of a single product.
after building an index of :
from doc in docs.FastModels
from docParamsListItem in ((IEnumerable<dynamic>)doc.ParamsList).DefaultIfEmpty()
select new { Param = docParamsListItem.IntVal, Cat = doc.Cat }
and a facet of
var facetSetupDoc = new FacetSetup
{
Id = "facets/Params2Facets",
Facets = new List<Facet> { new Facet { Name = "Param" } }
};
and search like this
var facetResults = session.Query<FastModel>("ParamNewIndex")
.Where(x => x.Cat == "e-cellphone")
.ToFacets("facets/Params2Facets");
it takes more than a second to query and that is on only 10K of data . where our company has more than 1M products in DB.
am i doing something wrong ?
In order to generate facets, you have to check for each & every individual value of docParamsListItem.IntVal. If you have a lot of them, that can take some time.
In general, you shouldn't have a lot of facets, since that make no sense, it doesn't help the user.
For integers, you usually use ranges, instead of the actual values.
For example, price within a certain range.
You use just the field for things like manufacturer, or the MegaPixels count, where you have lot number or items (about a dozen or two)
You didn't mention which build you are using, but we made some major improvements there recently.

Is it possible to turn an array returned by the Mongo GeoNear command (using Ruby/Rails) into a Plucky object?

As a total newbie I have been trying to get the geoNear command working in my rails application and it appear to be working fine. The major annoyance for me is that it is returning an array with strings rather than keys which I can call on to pull out data.
Having dug around, I understand that MongoMapper uses Plucky to turn the the query resultant into a friendly object which can be handled easily but I haven't been able to find out how to transform the result of my geoNear query into a plucky object.
My questions are:
(a) Is it possible to turn this into a plucky object and how do i do that?
(b) If it is not possible how can I most simply and systematically extract each record and each field?
here is the query in my controller
#mult = 3963 * (3.14159265 / 180 ) # Scale to miles on earth
#results = #db.command( {'geoNear' => "places", 'near'=> #search.coordinates , 'distanceMultiplier' => #mult, 'spherical' => true})
Here is the object i'm getting back (with document content removed for simplicity)
{"ns"=>"myapp-development.places", "near"=>"1001110101110101100100110001100010100010000010111010", "results"=>[{"dis"=>0.04356444023196527, "obj"=>{"_id"=>BSON::ObjectId('4ee6a7d210a81f05fe000001'),...}}], "stats"=>{"time"=>0, "btreelocs"=>0, "nscanned"=>1, "objectsLoaded"=>1, "avgDistance"=>0.04356444023196527, "maxDistance"=>0.0006301239824196907}, "ok"=>1.0}
Help is much appreciated!!
Ok so lets say you store the results into a variable called places_near:
places_near = t.command( {'geoNear' => "places", 'near'=> [50,50] , 'distanceMultiplier' => 1, 'spherical' => true})
This command returns an hash that has a key (results) which maps to a list of results for the query. The returned document looks like this:
{
"ns": "test.places",
"near": "1100110000001111110000001111110000001111110000001111",
"results": [
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b93b83c574d8760280"),
"y": [
1,
1
],
"category": "Coffee"
}
},
{
"dis": 69.29646421910687,
"obj": {
"_id": ObjectId("4b8bd6b03b83c574d876027f"),
"y": [
1,
1
]
}
}
],
"stats": {
"time": 0,
"btreelocs": 1,
"btreelocs": 1,
"nscanned": 2,
"nscanned": 2,
"objectsLoaded": 2,
"objectsLoaded": 2,
"avgDistance": 69.29646421910687
},
"ok": 1
}
To iterate over the responses just iterate as you would over any list in ruby:
places_near['results'].each do |result|
# do stuff with result object
end