Update specific field on SOLR index - lucene

I want to using solr for search on articles
I have 3 table:
Group (id , group name)
ArticleBase (id, groupId, some other field)
Article(id, articleBaseId, title, date, ...)
in solr schema.xml file i just define all article field that mixed with ArticleBase table (for use one index on solr) like this: (id, articleBaseId, groupId, ...)
problem: Admin want to change group (ArticleBase), therefore i must update (or replace) all indexed article in solr. right ? can i update groupId only in solr index ?
have any solution ?
Note:Article table contains more than 200 million article, and i using solr for index only (not store any field data except article id)

Solr does not support updating individual fields yet, but there is a JIRA issue about this (almost 3 years old as of this writing).
Until this is implemented, you have to update the whole document.
UPDATE: as of Solr 4+ this is implemented, here's the documentation.

Please refer to this document about the "Partial Documents Update" Feature in Solr 4.0
Solr 4.0 is now final and production-ready.
This feature makes it possible to update fields and even adding values to multiValued fields.
Mauricio was right with his answer back in 2010, but this is the way things are today.

SolrPHP doesn't provide any method to update a specific field in Solr.
However, you can make a Curl call in PHP to update a specific field:
<?php
// Update array
$update = array(
'id' => $docId,
$solrFieldName => array(
'set' => $solrFieldValue
)
);
$update = json_encode(array($update));
// Create curl resource and URL
$ch = curl_init('http://'.SOLR_HOSTNAME.':'.SOLR_PORT.'/'.SOLR_COLLECTION.'/update?commit=true');
// Set Login/Password auth (if required)
curl_setopt($ch, CURLOPT_USERPWD, SOLR_LOGIN.':'.SOLR_PASSWORD);
// Set POST fields
curl_setopt($ch, CURLOPT_POST,true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $update);
// Return transfert
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Set type of data sent
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type:application/json'));
// Get response result
$output = json_decode(curl_exec($ch));
// Get response code
$responseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close Curl resource
curl_close($ch);
if ($responseCode == 200)
{
echo 'SOLR: Updated successfully field '.$solrFieldName.' for id:'.$docId.' (query time: '.$output->responseHeader->QTime.'ms).';
}
else
{
echo ('SOLR: Can\'t update field '.$solrFieldName.' for id:'.$docId.', response ('.$responseCode.') is: '.print_r($output,true));
}
I use this code to update in JSON, you can also provide data in XML.

My Solution was something as below:
$client = new SolrClient($options);
$query = new SolrQuery();
// Find old Document
$query->setQuery('id:5458');
$query->setStart(0);
$query->setRows(1);
$query_response = $client->query($query);
// I had to set the parsemode to PARSE_SOLR_DOC
$query_response->setParseMode(SolrQueryResponse::PARSE_SOLR_DOC);
$response = $query_response->getResponse();
$doc = new SolrInputDocument();
// used the getInputDocument() to get the old document from the query
$doc = $response->response->docs[0]->getInputDocument();
if ($response->response->numFound) {
$second_doc = new SolrInputDocument();
$second_doc->addField('cat', "category123");
// Notice I removed the second parameter from the merge()
$second_doc->merge($doc);
$updateResponse = $client->addDocument($second_doc);
$client->commit();
}

You can refer to this documentation for Partial Updates. You can make an update either by replacing it or adding more values to that particular field although (like a list), it's not required in your case

Solr supports different types of Update Operations.
The set of update operations supported by Solr.
'add' - add a new value or values to an existing Solr document field, or add a new field and value(s).
'set' - change the value or values in an existing Solr document field.
'remove' - remove all occurrences of the value or values from an existing Solr document field.
Here is an example of how to do a partial update via Solr’s Java client, SolrJ
// create the SolrJ client
HttpSolrClient solrClient = new HttpSolrClient("http://localhost:8983/solr");
// for clould there is CloudSolrClient api
// create the document
SolrInputDocument solrDocument = new SolrInputDocument();
solrDocument.addField("id","12345");
Map<String,Object> solrUpdates = new HashMap<>(1);
solrUpdates.put("address","Pune");
solrDocument.addField("cat", solrUpdates);
solrClient.add( solrDocument );
solrClient.close();

Related

WORDPRESS Database How to return meta value from database?

I have in database this meta_key fw:ext:mm:io:primefeed and this meta_value a:5:{s:4:"type";s:6:"column";s:3:"row";a:0:{}s:6:"column";a:1:{s:14:"item_thumbnail";a:2:{s:13:"attachment_id";s:2:"11";s:3:"url";s:49:"//primefeed.loc/wp-content/uploads/2020/01/01.jpg";}}s:4:"item";a:0:{}s:7:"default";a:0:{}}
How to return this meta value (link) //primefeed.loc/wp-content/uploads/2020/01/01.jpg ?
That's a PHP serialized array. You can see the structure using
print_r( unserialize( $value ) );
e.g. repl.it demo
Inside WordPress, you can do
$meta_value = get_post_meta($post_id, "fw:ext:mm:io:primefeed", true);
$url = $meta_value["column"]["item_thumbnail"]["url"];
where get_post_meta does both the database fetch and unserialize.

How to map colums in a table database in to Properties and relation with Fields

I'm studying Sensenet Framework and installed successfull on my computer, and now I'm developing our website based on this framework.I read documents on wiki and understood relationship between Database <-> Properties <--> Fields <-> View (you can see the image in this link: http://wiki.sensenet.com/Field_-_for_Developers). For suppose, if I added a new table in to Sensenet's database and desiderate show all datas inside this table to our page, but I don't know how to dev flow by this model: Database <=> Property <=> Field <=> View. ? can you show steps to help me?
Please consider storing your data in the SenseNet Content Repository instead of keeping custom tables in the database. It is much easier to work with regular content items and you will have all the feature the repo offers - e.g. indexing, permissions, and of course an existing UI. To do this, you will have to take the following steps:
Define content types in SenseNet for every entity type you have in your existing db (in the example below this is the Car type).
Create a container in the Content Repository where you want to put your content (in this case this is a Cars custom list under the default site).
Create a command line tool using the SenseNet Client library to migrate your existing data to the Content Repository.
To see the example in detail, please check out this article:
How to migrate an existing database to the Content Repository
The core of the example is really a few lines of code that actually saves content items into the Content Repository (through the REST API):
using (var conn = new SqlConnection(ConnectionString))
{
await conn.OpenAsync();
using (var command = new SqlCommand("SELECT * FROM Cars", conn))
{
using (var reader = await command.ExecuteReaderAsync())
{
while (await reader.ReadAsync())
{
var id = reader.GetInt32(0);
var make = reader.GetString(1);
var model = reader.GetString(2);
var price = reader.GetInt32(3);
// Build a new content in memory and fill custom metadata fields. No need to create
// strongly typed objects here as the client Content is a dynamic type.
// Parent path is a Content Repository path, e.g. "/Root/Sites/Default_Site/Cars"
dynamic car = Content.CreateNew(ParentPath, "Car", "Car-" + id);
car.Make = make;
car.Model = model;
car.Price = price;
// save it through the HTTP REST API
await car.SaveAsync();
Console.WriteLine("Car-" + id + " saved.");
}
}
}
}

MailChimp API Get Subscribers ammount

I have been looking at the mailchimp api, and am wondering how to display the live ammount of subscribers to a list, is this possible? And is it possible to have this counter LIVE? I.e as users join, the number increases in real time?
EDIT:
I have been getting used to the API slightly...
after using Drewm's mailchimp php wrapper its starting to make more sense...
I have so far
// This is to tell WordPress our file requires Drewm/MailChimp.php.
require_once( 'src/Drewm/MailChimp.php' );
// This is for namespacing since Drew used that.
use \Drewm;
// Your Mailchimp API Key
$api = 'APIKEY';
$id = 'LISTID';
// Initializing the $MailChimp object
$MailChimp = new \Drewm\MailChimp($api);
$member_info = $MailChimp->call('lists/members', array(
'apikey' => $api,
'id' => $id // your mailchimp list id here
)
);
But not sure how to display these values, it's currently just saying 'array' when I echo $member_info, this maybe completly because of my ignorance in PHP. Any advice to s
I know this may be old, but maybe this will help someone else looking for this. Latest versions of API and PHP Files.
use \DrewM\MailChimp\MailChimp;
$MailChimp = new MailChimp($api_key);
$data = $MailChimp->get('lists');
print_r($data);// view output
$total_members = $data['lists'][0]['stats']['member_count'];
$list_id = $data['lists'][0]['id'];
$data['lists'][0] = First list. If you have more, then it would be like $data['lists'][1] ect...
And to get a list of members from a list:
$data = $MailChimp->get("lists/$list_id/members");
print_r($data['members']);// view output
foreach($data['members'] as $member){
$email = $member['email_address'];
$added = date('Y/m/d',strtotime($member['timestamp_opt']));
// I use reverse dates for sorting in a *datatable* so it properly sorts by date
}
You can view the print_r output to get what you want to get.

RavenDB : Can we query a not yet saved document?

I'm asking my question here because I didn't find any answer in the RavenDB's online documentation.
However, my question is quite simple : can we query a non saved document in the same session that the document has been stored ?
using( var session = store.OpenSession() )
{
session.Store( new SampleObject() { Name = "My name is sample" } );
var sample = (from o in session.Query<SampleObject>()
where o.Name = "My name is sample").FirstOrDefault();
}
sample will be null ?
Do I have to use "Customize" method on the query to load non stale data ?
Thanks for you help.
The new document wasn't transmitted to the database yet, you have to call session.SaveChanges() before querying. Additionally you have to customize your query to wait for the index to catch the new documents, but you've spotted that already.

Wikipedia API - grab 'Background Inforamtion' Table?

Does MediaWiki provide a way to return the information present in 'Background Information' Table? (usually right of the article page) For example I would like to grab the Origin from Radiohead:
http://en.wikipedia.org/wiki/Radiohead
Or do I need to parse the html page?
You can use the revisions property along with the rvgeneratexml parameter to generate a parse tree for the article. Then you can apply XPath or traverse it and look for the desired information.
Here's an example code:
$page = 'Radiohead';
$api_call_url = 'http://en.wikipedia.org/w/api.php?action=query&titles=' .
urlencode( $page ) . '&prop=revisions&rvprop=content&rvgeneratexml=1&format=json';
You have to identify yourself to the API, see more on Meta Wiki.
$user_agent = 'Your name <your email>';
$curl = curl_init();
curl_setopt_array( $curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_URL => $api_call_url,
) );
$response = json_decode( curl_exec( $curl ), true );
curl_close( $curl );
foreach( $response['query']['pages'] as $page ) {
$parsetree = simplexml_load_string( $page['revisions'][0]['parsetree'] );
Here we use XPath in order to find the Infobox musical artist's parameter Origin and its value. See the XPath specification for the syntax and such. You could as well traverse the tree and look for the nodes manually. Feel free to investigate the parse tree to get a better grip of it.
$infobox_origin = $parsetree->xpath( '//template[contains(string(title),' .
'"Infobox musical artist")]/part[contains(string(name),"Origin")]/value' );
echo trim( strval( $infobox_origin[0] ) );
}
MediaWiki as installed on Wikipedia provides no way to get this information (there are extensions such as Semantic MediaWiki that are designed for this sort of thing, but they are not installed on Wikipedia). You can either parse the output HTML or parse the page's wikitext, or in certain cases (e.g. birth/death year) you might be able to look at the page's categories via the API.
It's a steep learning curve but DBpedia does what you want.
The "Background information table" you mention is called an "Infobox" in Wikipedia parlance and DBpedia allows very powerful queries on them. Unfortunately because it's powerful it's not easy to learn and I've mostly forgotten what I learned about it a year or two ago. I'll paste a query here though if I manage to learn it again (-:
In the meantime, here is DBpedia's idea of an introduction in how to use it.
This previous SO question will help: Getting DBPedia Infobox categories
UPDATE
OK here is the SPARQL query:
SELECT ?org
WHERE {
<http://dbpedia.org/resource/Radiohead> dbpprop:origin ?org
}
Here is a URL where you can see it working and play with it.
And here is the output on that page: (you can get output in various formats too)
SPARQL results: org "Abingdon,
Oxfordshire, England"#en