How to migrate the data from Magnolia CMS Apache Jackrabbit content repository to normal SQL SERVER database - sql-server-2012

I am new to Magnolia CMS and the Apache Jackrabbit content repository concepts.
There is a web application which is using Magnolia CMS. Magnolia is using SQL SERVER 2012 database as persistence manager.
Here Apache Jackrabbit content repository implementation is done. There are two separate configurations of the Magnolia CMS which are used for the application, referred to as the public and author instances.
Now here we are trying to replace the existing Magnolia CMS with a custom ASP.NET MVC 5 application with all the functionalities.
I analysed the tables in the SQL SERVER database and found that data stored in format of Node_ID and Bundle_Data which is very difficult to analyse.
In short, it is not easy to interpret.
Based on the custom CMS a new database model for author instance (SQL SERVER 2012) is developed.
Hence as part of migration task ,I am trying to migrate the old data that is stored in the SQL SERVER with the Apache Jackrabbit content repository implementation to a normal SQL SERVER 2012 (as per the new database model).
Can anyone help me to know are there are any proven methods or tools available to accomplish this task.

The question is more on the jackrabbit-side, not so much on the Magnolia side, especially since you want to replace Magnolia entirely, not just the persistence layer:
Now here we are trying to replace the existing Magnolia CMS with a
custom ASP.NET MVC 5 application with all the functionalities.
although my question really is whether you really want to replace Jackrabbit entirely, or still use Jackrabbit with your ASP.NET application but with a MS SQL Server datastore (which would be my personal suggestion)? Otherwise you will be getting rid of all the benefits that Jackrabbit has.
Jackrabbit does support SQL Server and I would suggest to use it.
https://wiki.apache.org/jackrabbit/DataStore#Configuration-1:
Currently supported are: db2, derby, h2, mssql, mysql, oracle,
sqlserver.
Developing a WebCMS with just ASP.NET and SQL Server and without a content repository layer in between sounds like developing everything that a WebCMS usually comes with from scratch, especially if you want to have all the functionality that Magnolia offers (versioning, history, search, etc.).
You can check details regarding Jackrabbit data store here: http://wiki.apache.org/jackrabbit/DataStore although I am wondering why you or your customer would want to change the data store of the content repository to SQL Server. I guess you are not speaking of using MySQL for the persistence of the meta data, but really to store the binary content (a mistake that by the way OpenCms, another Java-based open source WebCMS, made in their architecture design - imho).
Note that usually large files are not stored in the database itself (with Magnolia), but on the file system.
https://wiki.magnolia-cms.com/display/WIKI/Setting+up+a+Jackrabbit+persistence+manager#SettingupaJackrabbitpersistencemanager-Datastorageandbackup:
BLOBs are not by default stored in the database when they exceed a
certain threshold definied in your Jackrabbit configuration - instead
they are saved on the file system. The default threshold used by a
Magnolia installation is 1024 bytes. All files above the defined
threshold are put onto the filesystem and not in the database.
In case you really want to get rid of Jackrabbit entirely and only use SQL Server as the persistence layer and store all binary content in it regardless of size (not recommended), I would write a custom export/import script for it, which queries the Jackrabbit repo (standard CMIS protocol) and takes the content from the file system, reading as FileInputStream and writing it to the Oracle DB (Example: http://www.java2s.com/Code/Java/Database-SQL-JDBC/StoreBLOBsdataintodatabase.htm). This would be my suggested method.
I don't think there are any out-of-the-box tools for that.

Related

How mysql repository works in Pentaho User console?

Based on Pentaho guideline (https://help.pentaho.com/Documentation/8.2/Setup/Installation/Archive/MySQL_Repository) I successfully converted pentaho File based repository to MySQL database repository.
Now does anyone have any idea how MySQL repository store the data in database? It means If create new folder, new dashboard or new connection then how pentaho store this data in mysql database. Also need to know which tables is used for which purpose of data store.
Default created attached schema and tables based on mysql pentaho repository.
Please Provide any inputs or any reference material for same?
Pentaho's repository comprises three third party technologies: Jackrabbit, Hibernate, and Quartz. Reports/Jobs/Transformations and any other artifacts stored inside the Pentaho Server are generally stored in Jackrabbit. Scheduling info and triggers are stored in Quartz. And diagnostic info is stored in Hibernate (such as who accessed what reports, how long a report took to run, etc.).
None of this info is designed to be human readable directly out of the database tables. These are sort of "black box" technologies. These are third party technologies that Pentaho simply leverages for its repository functions. If you have additional questions, I'd recommend checking out the technologies themselves on their project pages.

Best Practices of continuous Integration with SQL Server project or local mdf file in project

Today I maintain project that has really messy DB that need a lot of refactor and publish on clients machines.
I know that I could add a SQL Server Database project that contains just scripts of the database and creates a .dacpac file that allows me to change clients databases automatically.
Also I know that I could just add an .mdf file to the App_Data or even to Solution_Data folder and have my database there. I suppose that localDb that already exists allows me to startup my solution without SQL Server
And atlast i know that Entity Framework exist with it's own migrations. But i don't want to use it, besouse i can't add and change indexes with it's migrations and i don't have anought flexibility when i need to describe difficult migrations scenarios.
My goals:
Generate migration scripts to clients DB's automaticaly.
Make my solution self-contained, that any new Programmer that came to project don't even need to install SQL Server on his machine.
Be able to update local (development) base in 1-2 clicks.
Be able to move back in history of db changes (I have TFS server)
Be able to have clean (only with dictionaries or lookup tables) db in solution with up to date DB scheme.
Additionally i want to be able to update my DB model (EF or .dbml) automatically or very easy way.
So what I what to ask:
What's a strengths and weaknesses of using this 2 approaches if I want to achive my goals?
Can be that I should use sort of combination of this tools?
Or don't I know about other existing tool from MS?
Is there a way to update my DAL model from this DB?
What's a strengths and weaknesses of using this 2 approaches if I want to achive my goals?
Using a database project allows you to version control all of the database objects. You can publish to various database instances and roll out changes incrementally, rather than having to drop and recreate the database, thus preserving data. These changes can be in the form of a dacpac, a SQL script, or done right through the VS interface. You gain a lot of control over deployments using pre- and post-deployment scripts and publishing profiles. Developers will be required to install SQL Server (the developer/express edition is usually good enough).
LocalDB is a little easier to work with -- you can make your changes directly in the database without having to publish. LocalDB doesn't have a built-in publish process for pushing changes to other instances. No SQL Server installation required.
Use a database project if you need version control for your database objects, if you have multiple users concurrently making changes, or if you have multiple applications that use the same database. Use LocalDB if none of those conditions apply or for small apps that require their own standalone database.
Can be that I should use sort of combination of this tools?
Yes. According to Kevin's comment below, "If the Database Project is set as your startup project, hitting F5 will automatically deploy it to LocalDB. You don't even need a publish profile in this case."
Or don't I know about other existing tool from MS?
Entity Framework's Code First approach comes close.
Is there a way to update my DAL model from this DB?
Entity Framework's POCO generator works well unless you make changes to your DAL classes, then those changes get lost the next time you run the generator.
There is a new tool called SqlSharpener which can generate classes from the SQL files in a database project. I have not used it so I cannot vouch for it but it looks promising.
One way for generating client script for DB changes is to use database modeling tool like ERWin Which have a free community edition. The best way to meet your database version control requirement and easy script generation is Redgate SQL Source Control. Using Redgate tool you will meet the first five goals mentioned. Moreover, you can now update EF Model by single click after changing DB schema (i.e. Database first approach) as required in goal 6.
I do not recommend using LocalDB at all. It always make issues with source control like "DB File is in use and can't commit...” In addition, the developer in the project will not have common set of updated data to work on unless a developer add test data to the database and ask others to get latest version and overwrite their own database Or generate update script by the previous mentioned tool and ask every developer to run it on his localDB.
The best way in your situation is to use SQL Server on network. A master version that all the developers use. Since you have version control on the database using previously mentioned tool, you can rollback any buggy change in the database server.
If you think that RedGate tool is expensive for the budget of your project. A second approach is to generate single SQL file from your database that has all database object and the other developers update the SQL file in source control per their changes. This can be done easily by using schema compare tool in visual studio and appending the generated script to SQL file in the source control. With EF DB First approach, you will not have to add many migration classes as in EF Code first.

Stop exporting a SQL Server database to secure it

I have a vb.net windows form application with a database on SQL Server 2008 on the ./SQLEXPRESS instance.
I have created a setup of my project using the link below..
http://msdn.microsoft.com/en-US/library/49b92ztk(v=vs.80).aspx
When a user installs my application, the database will be available for him, and user can just export the SQL Server database.
How can I secure my database so that user shouldn't have a easily available copy of my database?
I thought of creating a new password protected server (as I have created the database in above walkthrough)... while installation of my application on user's pc, other than ./sqlexpress. And a complete copy of database used by my application will not be simply available for user to just export and get a copy of my database.
So could anyone please guide me...
The question is; how far do you want to go to protect your data?
Better protection of your data usually comes at the cost of more development time and likely less user friendliness, for example due to lower performance (encryption is not free). More complex code usually results in more support requests too.
Where the best balance is depends on your business model (if any) and on your user requirements.
Keep in mind that anything you deploy to an end-users machine is in the end vulnerable. If something is valuable enough there will be people trying to steal it.
So, you could argue that the best protection is not to deploy the data at all. You could back your end-user application with a web service and keep the data on your own server, for example in the cloud.
I've found however that you sometimes just need to trust your users. If you build a good product that makes them happy, they have no reason to steal from you. In fact, they are probably glad to pay you.
If you decide that you need to deploy the data and that you need to encrypt it, you should think about why you chose SQL Server.
What database features do you need exactly? Do you need a fullblown database server for that?
Any local admin can gain control over any SQL Server database in seconds so the built-in SQL server authentication will not bring you a lot of benefits.
You could switch to SQLServer CE and keep the database within your application. That would make the database a lot harder to access for a regular user.
If all you're doing is looking up words, you may be better off with a different storage engine like Lucene.
Lucene is actually a search engine, so it's highly optimized for matching words or parts of words.
You can run Lucene inside your .NET application so you don't even need the end-user to install SQL Server. There is a .NET version of Lucene here.
Lucene however doesn't protect your data. There's tooling available that will allow anybody to view and extract the data from the stored index files.
Since Lucene is open source though, you could extend it to support encrypted data storage (see this related question).

best way of migrating customised metadata associated with source component into Tridion environment

If we are migrating content from source Content Management System to Tridion, what is the best way of migrating customized metadata associated with the components(content) of source Content Management System into Tridion? Should we directly migrate it to the sql server or is there an option to migrate it in the form of some xml file, etc.?
Migrating directly into SQL Server is unsupported, and the entire system would be unsupported at that point, due to possible data consistency issues.
The most straightforward way is to read the data from the source system, and use the Tridion API to recreate the item.
If migrating metadata, some of the data would likely fit best into a taxonomy, which would mean you'd want to migrate the keywords / structure first, then tag the content as it came into Tridion.
You have a few options when migrating content into Tridion.
I can't understand from the above if you are talking about migrating to SQL server as an intermediate format, or directly into the Tridion database. Importing directly into the Tridion database is definitely not a supported solution, and could lead to unpredictable results.
You need to use the API, either the Core Service or the TOM.NET API (If you have Tridion 2011) or the old TOM API if not.
A popular approach is to export all content into an XML format that you can then process with a .NET application.
There's some good articles on migrating content into Tridion by Ryan Durkin here, and Nuno Linhares here.
As mention before, migrating directly into the Database is not an option if you are planning to use SDL Tridion as the final CMS.
Apart of the supported mechanism chosen for Migrate, play attention about how you are going to structure the metadata in the new CMS, as depending on the volume, structure, hierarchy, relation across metadata items the process can become complex.
Also play special attention at the Blueprint concept, as probably you can merge duplicated values from the old system into only one that is inherited.
Don't think only in how to put the metadata in the system, also how that Metadata will be used and maintained in the new CMS, in this case SDL Tridion
You can check also a recent post about Migration and plan Migration in general, in case adds some more information
Can we automate migrating to SDL Tridion?

Semantic Application Development

hi every body
i was looking for a solution or a methodology to develop a sample
web site that utilizes a semantic database backe-end
i have used protege in previous steps and i have converted my OWL project to
mysql database backend successfully.
Any help would be appreciated
I did not see any such resource.
In general you will need a server to host your data. If your data is in relational databases you will need server like
D2R
or RDF server like
Sesame
and
Virtouso
These servers will allow accessing RDF data. Then you can use different techniques to embed this data in HTML to present it to user(Google it). There are RDF Browsers available as well.
There is a very good article on this topic here based on this publication.