How to configure cache for ignite-spark-dataframe?

How to configure cache for ignite-spark-dataframe? - apache-spark-sql

I managed to save and load spark dataframe from ignite by the example here: https://apacheignite-fs.readme.io/docs/ignite-data-frame
By following the code example when the cache is created in ignite it automatically has a name like "SQL_PUBLIC_name_of_table_in_spark".
One the other hand if I want to change some cache configuration I need to specify the same cache name in xml or code before creating ignite cache. Because cache configuration can not be changed after cache is created. See following code.
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<!-- Set a cache name. -->
<property name="name" value="SQL_PUBLIC_name_of_table_in_spark"/>
<!-- Set cache mode. -->
<property name="cacheMode" value="PARTITIONED"/>
</bean>
</property>
</bean>
Then one of them will be reject by "cache already exists" The result is I could't change any cache configuration by xml/code.
Is this expected? And how can I change the cache configuration in this case?

The doc page you've link contains a code piece that creates an SQL table:
CREATE TABLE person (
id LONG,
name VARCHAR,
city_id LONG,
PRIMARY KEY (id, city_id)
) WITH "backups=1, affinityKey=city_id”;
This SQL command is what actually creates the cache. You can change this command to change the parameters of the cache that will be created. Refer to the CREATE TABLE doc.
In particular, the parameter that gives the most flexibility is WITH template=mytemplate. It lets you create a cache from a pre-existing template configuration. To register a template you can specify it in your cacheConfiguration with a name ending with asterisk, like
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="mytemplate*"/>
<!-- your parameters. -->
</bean>
</property>
</bean>
You can also specify the WITH parameters for CREATE TABLE in the OPTION_CREATE_TABLE_PARAMETERS setting if the table is being created automatically by Spark.

Related

Running scripts in H2 database

I am using create-drop as the value for hibernate.hbm2ddl.auto . I have also created a script since I need to insert some amount of data in by tables before actually performing any sort of testing. However since my schema is getting automatically created as I have used the create-drop mode but the script gets an error and says that no such table exists for the SQL operations I am trying to perform. I think its because the script is running before the tables are being created. How do I make this work ? One option is to use validate and provider a create.sql file to make the schema and then run the scripts. But I want to use create-drop.
<properties>
<!-- Configuring JDBC properties -->
<property name="javax.persistence.jdbc.url" value="jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;INIT=RUNSCRIPT FROM 'classpath:create.sql'\;RUNSCRIPT FROM 'classpath:data.sql'"/>
<property name="javax.persistence.jdbc.driver" value="org.h2.Driver"/>
<!-- Hibernate properties -->
<property name="hibernate.dialect" value="org.hibernate.dialect.H2Dialect"/>
<property name="hibernate.hbm2ddl.auto" value="create-drop"/>
<property name="hibernate.format_sql" value="false"/>
<property name="hibernate.show_sql" value="true"/>
</properties>

Ignite expiry policy is not working for old data

I have 400M records on a ignite cache. And native persistence is enabled. I want to enable expiry policy. TO do so i have added below below on my xml config.
<!-- Enabling expiry policy -->
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="CACHE_L4_TRIGGER_NOTIFICATION"/>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="MINUTES"/>
<constructor-arg value="60"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
</list>
</property>
It worked for newly added data but i have old 400M data. i need help to remove 30 days old data from this 400M data. How can do this? I have searched but cant find anything. Also i cant purge all data as they are important.

You can't do this for existing data. Ignite doesn't keep track of when an entry was created or modified in any way if expiry policy is not set. You have to iterate over all your data and clean it manually based on the contents (e.g. if you have a creation timestamp attribute).

Apache Ignite JDBC Thin Client Does not work with Existing Cache

I have created an Ignite cache "contact" and added "Person" object to it.
When I use Ignite JDBC Client mode I am able to query this cache. But when I implement JDBC Thin Client, it says that the table Person does not exist.
I tried the query this way:
Select * from Person
Select * from contact.Person
Both did not work with Thin Client. I am using Ignite 2.1.
I appreciate your help as how to query an existing cache using Thin Client.
Thank you.
Cache Configuration in default-config.xml
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="persistentStoreConfiguration">
<bean class="org.apache.ignite.configuration.PersistentStoreConfiguration"/>
</property>
<property name="binaryConfiguration">
<bean class="org.apache.ignite.configuration.BinaryConfiguration">
<property name="compactFooter" value="false"/>
</bean>
</property>
<property name="memoryConfiguration">
<bean class="org.apache.ignite.configuration.MemoryConfiguration">
<!-- Setting the page size to 4 KB -->
<property name="pageSize" value="#{4 * 1024}"/>
</bean>
</property>
<!-- Explicitly configure TCP discovery SPI to provide a list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<!-- In distributed environment, replace with actual host IP address. -->
<value>127.0.0.1:55500..55502</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Cache Configuration in the Server Side of the Code
CacheConfiguration<Long, Person> cc = new CacheConfiguration<>(cacheName);
cc.setCacheMode(CacheMode.REPLICATED);
cc.setRebalanceMode(CacheRebalanceMode.ASYNC);
cc.setIndexedTypes(Long.class, Person.class);
cache = ignite.getOrCreateCache(cc);
Thin Client JDBC URL
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.1.111:10800");
Statement st = conn.createStatement();

If you want to query data from an existing cache using SQL, you should specify an SQL schema in the cache configuration. Add the following code before the cache creation:
cc.setSqlSchema("PUBLIC");
Note that you have persistence configured, so when you do ignite.getOrCreateCache(cc); the new configuration won't be applied, if a cache with this name is already persisted. You should, for example, remove persistence data or use createCache(...) method instead.

[apache ignite]ignite cache data lost when i create the primary and backup cache

I run an example with two ignite cache node in two jvm. each jvm runs a ignite node. the nodes map to the same cache.
ignite-config.xml
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
...
<property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<!-- Set a cache name. -->
<property name="name" value="cacheName"/>
<!-- Set cache mode. -->
<property name="cacheMode" value="PARTITIONED"/>
<!-- Number of backup nodes. -->
<property name="backups" value="1"/>
...
</bean>
</property>
</bean>
test steps:
one of the ignite nodes start first and write 10 pieces of
data(key-value: 1-1,2-2,3-3...10-10).
then the second one start and map to the cache.
then ignite nodes start to rebalancing data
for them. the first node has 4 pieces, the second has 6 pieces.
then i kill the jvm of first cache node.
result: the backup node doesn't own 10 pieces as i expect.why?

I'm not sure why ignitevisorcmd.sh is reporting the keys are lost. I suggest looking directly into the cache by querying it after you kill the node. Or as Valentin Suggests, you can try IgniteCache.size()

Adding new facet to DSpace has no effect (DSpace 4.1)

I changed the discovery.xml file as described in the documentation to add a new facet over dc.type to our DSpace. When I finished reindexing and deleting the cache I see the new search filter at advanced search but not as a facet.
These are the changes I made to discovery.xml:
Added filter to sidbarFacets and SearchFilter:
<ref bean="searchFilterType" />
and this is the filter:
<bean id="searchFilterType" class="org.dspace.discovery.configuration.DiscoverySearchFilterFacet">
<property name="indexFieldName" value="type"/>
<property name="metadataFields">
<list>
<value>dc.type</value>
</list>
</property>
</bean>
Thanks in advance

The following modifications to discovery.xml on the latest DSpace master branch worked on my local setup:
https://github.com/bram-atmire/DSpace/commit/3f084569cf1bbc6c6684d114a09a1617c8d3de5d
One reason why the facet wouldn't appear in your setup, could be that you omitted to add it to both the "defaultconfiguration" as well as the specific configuration for the DSpace homepage.
After building and deploying, a forced discovery re-index using the following command made the facet appear:
./dspace index-discovery -f

Here is an example facet that I have configured in our instance. Try setting the facetLimit, sortOrder, and splitter. Re-index and see if that resolves the issue.
<bean id="searchFilterGeographic"
class="org.dspace.discovery.configuration.HierarchicalSidebarFacetConfiguration">
<property name="indexFieldName" value="geographic-region"/>
<property name="metadataFields">
<list>
<value>dc.coverage.spatial</value>
</list>
</property>
<property name="facetLimit" value="5"/>
<property name="sortOrder" value="COUNT"/>
<property name="splitter" value="::"/>
</bean>

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to configure cache for ignite-spark-dataframe? - apache-spark-sql

Related

Running scripts in H2 database

Ignite expiry policy is not working for old data

Apache Ignite JDBC Thin Client Does not work with Existing Cache

[apache ignite]ignite cache data lost when i create the primary and backup cache

Adding new facet to DSpace has no effect (DSpace 4.1)

Categories

Resources