Apache Ignite: can I create cache tables manually, using DDL statements? - ignite

Speaking about the latest version (currently 2.3). Seems like old-way a little bit useless now.
If it is possible to create table(s) manually, here comes another question: how to map model POJO's fields and column names so I can fill in cache using DataStreamers. (#QuerySqlField.name, isn't it?)

If you are going to populate cache with data, using DataStreamer, then you should create it using Java API, or by configuring it in Ignite XML configuration.
Tables can be configured, using indexedTypes or queryEntities properties in CacheConfiguration. Take a look at the documentation: https://apacheignite.readme.io/docs/cache-queries#section-query-configuration-by-annotations

Related

Column Deletion from Apache Druid

How can we delete a column from druid datasource ?
I removed it from the datasource spec but still i can see it in the datasource.
Please assist if anyone is familiar with this.
Druid is not like a conventional database where you define a structure, and that the structure is applied for all the data.
The data is stored in segments. Each segment contains the data which was put in this segment, together with the "structure" of that segment.
So, changing it in your dataSource spec will make sure that newly created segments will not include that new column. However, existing segments will still contain the column.
To remove this column, you need to re-index the older segments. During this re-index task, you can read the data from your existing segments and apply your new dataSource spec to it. You can then write it to the same segment where you have read it from.
See this link to read data from existing data sources:
https://druid.apache.org/docs/latest/ingestion/native-batch.html#druid-input-source
In the latest version of druid (0.17.0) this is changed. It previously was done by an IngestSegmentFirehose.
Please make sure that you process the WHOLE segment. If you only overwrite a part of the segment, all the other data will be lost (at least, in the new version of your data).
Also note: After applying the rewrite, druid will put your new data in a newer version. However, your "old" version still exists. If you are not aware of this, your data storage can grow very quickly.
If you are happy with your result, you should execute a KILL task. This will delete all data (from older versions) which are no longer the "active" version.
If you are an PHP user, you can take a look at this package: https://github.com/level23/druid-client
We have implemented these re-index tasks together with easy querying in a class. Maybe it helps.

Merge two Endeca Servers (Endeca 3.1) into one. Including their current data

Let me explain in more detail:
1st: I'm running endeca 3.1, so Endeca Server here refers to 3.0's Data Domain.
I'm required to use an Endeca Server currently present on Endeca (Downloaded a Demo VM). All the info on it, including, groups, attributes and data, must be merged into out Endeca Server. (It can also be the other way around, i could merge my Endeca Server into this one.)
So far, i've tried to do the following:
1) Clone the Endeca Server
2) use the putCollection sconfig operation to create a collection on it with the same name i have on mine.
3) Load configurations using the LoadCollection & LoadAttributes graphs from OEID POC Template 3.1. I point to the new collection on the Configuration.xls file.
This is where i encounter an issue. The LoadAttributes graph gets a T/O message from the server's WS. Then the config WSDL becomes inaccesible for a while. I can't go beyond this point.
I've been able to load data into the collection, but i need to load the attributes first.
THanks in advance for your replies.
Regards
There are a few techniques.
Have you tried exporting the data domain and then importing it?
You can use the endeca-cmd tools to export to a file, and then import from that file. This would enable you to add 2 datastores into one server.
If you want to combine 2 datastores then that is a different question.
The simplest approach in 3.1 if the data collections are small. Extract then as CSV (via a data-table), convert to XLS and add them via self provisioning into separate collections within a single data store. If you are running in the VM this is potentially the easiest approach.
This can also be done using Integrator.
You don't need to load the attributes unless you are using multi-value types. You can call against the conversation web-service to extract data and then load it using 'bulk-load' I would not worry too much about creating the attributes unless this becomes essential due to their type or complexity. If you cannot call against the conversation web-service, then again extract as csv and load using Integrator.

Deleting rows in datastore by time range

I have a CKAN datastore with a column named "recvTime" of type timestamp (i.e. using "timestamp" as type at datastore_create time, as shown in this link). Example value for this column is "2014-06-12T16:08:39.542000".
I have a large numbers of records in the datastore (thousands) and I would like to delete the rows before a given date in "recvTime". My first thought was doing it using the REST API with the datastore_delete operation using a range filter, but it is not possible as described in the following Q&A.
Is there any other way of solving the issue, please?
Given that I have access to the host where CKAN server is running, I wonder if this could be achieved executing a regular SQL sentence on the Postgresql engine where the datastore is persisted. However, I haven't found information about manipulating the CKAN underlying datamodel in the CKAN documentation, so don't know if this a good idea or if it is risky...
Any workaround or information pointer is highly welcome. Thanks!
You could definitely do this directly on the underlying database if you were willing to dig in there (the structure is pretty simple with tables named after the corresponding resource id). You could even turn this into an API of your own using an extension (though you'd want to be careful about permissions).
You might also be interested in the new support (master only atm) for extending the DataStore API via a plugin in an extension - see https://github.com/ckan/ckan/pull/1725

eclipselink without persistence.xml

I'm not a big fan of XML files. Therefore I'm wondering if there is a way to use eclipselink without its persistence.xml configuration file. Why?
Because I want to manage different databases dynamically. It would be much easier to do it without the XML file.
I'm surprised that I couldn't find anything on the web for now.
Not really, but you could create an EclipseLink ServerSession directly and wrap it with an EntityManagerFactoryImpl, but I would not suggest it.
You would be better off creating a persistence.xml. You can still do dynamic databases, you just need to pass a properties file to createEntityManagerFactory(Map) that include your database info.
Though it is not an direct answer to your question, this will help for the second part of your question. For managin multiple database connections, you can define multiple server sessions in sessions.xml and access those where you want.
you may use follwoing lines for accessing particular session
ServerSession aSession = = (ServerSession) SessionManager.getManager().getSession("session_2");

sql server global data version

I wonder what is the best way to implement global data version for database. I want for any modification that is done to the database to incerease the version in "global version table" by one. I need this so that when I talk to application users I know what version of data we are talking about.
Should I store this information in table?
Should I use triggers for this?
This version number can be stored in a configuration table or in a dedicated table (with one field).
This parameter should not be automatically updated because you are the owner of the schema and you are responsible for knowing when you need to update it. Basically, you need to update this number every time you deploy a new application package (regardless of the reason for the package: code or database change).
Each and every deployment package should take care of updating the schema version number and the database schema (if necessary)
I tend to have a globals or settings table with various pseudo-static values stored.
- Just one row
- Many fields
This can include version numbers.
In terms of maintaining the version number you refer to, would this change when the data content changes? If so, the a trigger would be useful. If you mean for the version number to relate to table structures, etc, I'd be more inclined to manage this by hand. (Some changes may be irrelevant as far as teh applications are concerned, or there maybe several changes wrapped up into a single version upgrade.)
The best way to implement a "global data version for database" is via your source control system and build process. When all the changes have been submitted and passed testing your build process will increment your versioning number schema.
The version number could be implemented in a stored procedure. The result of the call to the stored proc could be added to a screen in your app so you can avoid users directly accessing a table.
To complete the previous answers, I came across the concept of "Migrations" (from the Ruby on Rails world apparently) today, and there was already a question on SO that covered existing frameworks in .Net.
The concept is still to store DB versioning information as data in a table somewhere, but for that versioning information to be managed automatically by a framework, rather than manually by your custom deployment processes:
previous SO question with overview of options: https://stackoverflow.com/questions/313/net-migrations-engine