Liquibase load data in a format other than CSV - sql

With the load data option that Liquibase provides, one can specify seed data in a CSV format. Is there a way I can provide say, a JSON or XML file with data that Liquibase would understand?
The use case is we are trying to put in some sample data which is hierarchical. E.g. Category - Subcategory relation which would require putting in parent id for all related categories. If there is a way to avoid including the ids in the seed data via say, JSON.
{
"MainCat1": ["SubCat11", "SubCat12"],
"MainCat2": ["SubCat21", "SubCat22"]
}
Very likely to have this as not supported (couldn't make Google help me) but is there a way to write a plugin or something that does this? Pointer to a guide (if any) would help.
NOTE: This is not about specifying the change log in that format.

This not currently supported and supporting it robustly would be pretty difficult. The main difficultly lies in the fact that Liquibase is designed to be database-platform agnostic, combined with the design goal of being able to generate the SQL required to do an operation without actually doing the operation live.
Inserting data like you want without knowing the keys and just generating SQL that could be run later is going to be very difficult, perhaps even impossible. I would suggest approaching Nathan, who is the main developer for Liquibase, more directly. The best way to do that might be through the JIRA bug database for Liquibase.
If you want to have a crack at implementing it, you could start by looking at the code for the LoadDataChange class (source in Github), which is where the CSV support currently lives.

Related

Databricks: Best practice for creating queries for data transformation?

My apologies in advance for sounding like a newbie. This is really just a curiosity question I have as an outsider observing my team clash with our client. Please ask any questions you have, and I will try my best to answer it.
Currently, we are storing our transformation queries in a DynamoDB table. When needed, we pull into Databricks and execute the query. Simple as that. Our client has called this out as “hard coding” (more on that soon)
Our client has come up with an alternative that involves creating JSON config files containing the transformation rules (all tables/attributes required, target table names, Alias names, join keys, etc. etc.). From here, the SQL query is dynamically created. This approach is still “hard coding” since these config files would need to be manually edited anytime there is a change in the rules.
The way I see this: I think storing the transform rules in JSON is more business user friendly, but that’s about where I see the pros end. It brings in much more complexity to the code and likely will need to be continuously developed to support new queries. Also, I don’t see anyway to prevent “hard coding”. The client business leads seem to think there is some magical tool to convert plain English text to complex SQL queries
I just wanted to get some experts thoughts on this. Which solution is better, or is there another approach that should be taken?

Normalizer in cloudconnect for Gooddata

I have some doubts. I'm doing a BI for my company, and I needed to develop a data converter in ETL because the database to which it's connected (PostgreSQL) is bringing me some negative values within the time CSV. It doesn't make much sense to bring from the database (to which we don't have many accesses) negative data like this:
The solution I found so that I don't need to rely exclusively on dealing directly with the database would be to perform a conversion within the cloudconnect. I verified in my researches that the one that most contemplates would be the normalizer, but there are not many explanations available. Could you give me a hand? because I couldn't parameterize how I could convert this data from 00:00:-50 to 00:00:50 with the normalizer.
It might help you to review our CC documentation: https://help.gooddata.com/cloudconnect/manual/normalizer.html
However, I am not sure if normalizer would be able to process timestamps.
Normalizer is basically a generic transform component with a normalization template. You might as well use reformat component that is more universal.
However, what you are trying to do would require some very custom transform script, written in CTL (CloudConnect transformation language) or java.
You can find some templates and examples in the documentation: https://help.gooddata.com/cloudconnect/manual/ctl-templates-for-transformers.html

Extract labels from serialized array using SQL

I do not have control of how this data is stored (I know as normalized data would be better for sql), because it is saved via the WordPress GravityForms plugin. The plugin uses a serialized array to define the question id (field_id), question label (label). My goal is to extract these three values in the following format:
field_id label
1 1. I know my organization’s mission (what it is trying to accomplish).
2 2. I know my organization’s vision (where it is trying to go in the future).
Here is the serialized array.
Can anyone please provide a specific example as to how to parse these values out with sql?
A specific example, no. This kind of stuff is complex. If your are working with straight json-formatted data, here are several options, none of which are simple.
You can build your own parser. Yuck.
You can upgrade everything you have to just-released SQL 2016, and hope that the built-in json tools do what you need (I've heard iffy things about them, but don't know what their final form is like. Too, updating all your database servers right now, oh sure.)
Phil Factor over on SimpleTalk built a json T-SQL parser (https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/). It looks horrible and may run poorly, but it would do the needful.
Buried in the comments of that article are links to a CLR tool that John Galt built (at https://github.com/jgcoding/J-SQL). I have used this successfully, though I haven't done anything too complex. (If you're json is relatively simple, this could do the trick.)
There are other json parsers for SQL out there, some free, some for sale. The key thing would be to not try and write your own, but rather find and use someone else's solution that addresses your requirements.

Migrating RMS to RDB

We're approaching the migration of legacy OpenVMS RMS files into relational database (both MS SQL 2012 and Oracle 10g are available).
I wonder if there are:
Tools to retrieve schema of indexed files
Tools to parse indexed files
Tools to deal with custom RMS data formats (zoned decimals etc)
as a bundle/API/Library
Perhaps I should change the approach?
There are several tools available, notably through ODBC vendors (I work for one: Attunity).
1 >> Tools to retrieve schema of indexed files
Please clarify. Looking for just record/column layout and indexes within the files or also relationships between files.
1a) How are the files currently being used? Cobol, Basic, Fortran programs? Datatrieve?
They will be using some data definition method, so you want a tool which can exploit that.
Connx, and Attunity Connect can 'import' CDD definitions, BASIC - MAP files, Cobol Copybooks. Variants are typically covered as well. I have written many a (perl/awk) script to convert special definition to XML.
1b ) Analyze/RMS, or a program with calling RMS XAB's can get available index information. Atunity connect will know how to map those onto the fields from 1a)
1c ) There is no formal, stored, relationship between (indexed) files on OpenVMS. That's all in the program logic. However, some modestly smart Perl/Awk/DCL script can often generate a tablem of likely foreign/primary keys by looking at filed names and datatypes matches.
How many files / layouts / gigabytes are we talking about?
2 >> Tools to parse indexed files
Please clarify? Once the structure is known (question 1), the parsing is done by reading using that structure right? You never ever want to understand the indexed file internals. Just tell RMS to fetch records.
3 >> Tools to deal with custom RMS data formats (zoned decimals etc) as a bundle/API/Library
Again, please clarify. Once the structure is known just use the 'right' tool to read using that structure and surely it will honor the detailed data definitions.
(I know it is quite simple to write one yourself, just thought there would be something in the industry)
Famous last words... 'quite simple'. Entire companies have been build and thrive doing just that for general cases. I admit that for specific cases it can be relatively straightforward, but 'the devil is in the details'.
In the Attunity Connect case we have a UDT (User Defined data Type) to handle the 'odd' cases, often involving DATES. Dates in integers, in strings, as units since xxx are all available out of the box, but for example some have -1 meaning 'some high date' which needs some help to be stored in a DB.
All the databases have some bulk load tool (BCP, SQL$LOADER).
As long as you can deliver data conforming to what those expect (tabular, comma-seperated, quoted-or-not, escapes-or-not) you should be in good shape.
The EGH tool Vselect may be a handy, and high performance, way to bulk read indexed files, filter and format some and spit out sequential files for the DB loaders. It can read RMS indexed file faster than RMS can! (It has its own metadata language though!)
Attunity offers full access and replication services.
They include a CDC (change data capture) to not a only load the data, but to also keep it up to date in near-real-time. That's useful for 'evolution' versus 'revolution'.
Check out Attunity 'Replicate'. Once you have a data dictionary, just point to the tables desired (include, exlude filters), point to a target DB and click to replicate. Of course there are options for (global or per-table) transformations (like an AREA-CODE+EXHANGE+NUMBER to single phone number, or adding a modified date columns ).
Will this be a single big switch conversion, or is there desire to migrate the data and keep the old systems alive for days, months, years perhaps, all along keeping the data in close sync?
Hope this helps some,
Hein van den Heuvel.
OP: Perhaps I should change the approach? Probably.
You might consider finding data migration vendors, some which likely have off-the-shelf solutions, if not as a COTS tool, more likely packaged as a service (I don't think this is a big market).
What this won't help you with is what I think of as much bigger problem with the application code: who is going to change all the code that is making RMS calls, in the corresponding code that makes relational DB calls? How will the entity ("Joe Programmer", or some tool), know where the data migrated to, so that he can write the correct call? What are you doing to do about the fact that the data representation is like to change?
Ideally you'd like an automated migration tool, that will move the data itself (therefore knows that datalayouts and representation changes), and will make the code changes that correspond. You can look for these kind of vendors, too.

Is there an editor for inserting/editing rows into a Core Data DB?

I've created a Core Data schema in xcode (3.2.5 if it matters) so I have the .xcdatamodel file with the proper entities and relations.
Now - How can I insert data, edit data and/or delete data from it, NOT from within the code ?
Like what phpMyAdmin is for MySql.
Thanks.
Core Data is meant to be used programmatically. Once you run the app once, it should create a file somewhere on disk (exactly where is probably specified in the AppDelegate class). It is likely that this file will be a SQLite database, but it doesn't have to be (the point of Core Data is to abstract your data away from the file format used to store it). It could also be an XML file or a binary file.
If it's a SQLite file, then you can open it in your favorite SQLite editor.
HOWEVER
The schema used in the SQLite format is not documented. If you go mucking around in it, you might get stuff to work, but it's also very likely that you could irreparably screw it up. (If it's an XML file or a binary file, you're probably totally out of luck)
In the end, Core Data is supposed to be used programmatically. To use it in a different way (such as what you're asking for) would be to use it in a way for which it was not intended and therefore not designed.
I don't know if you already solved your problem, but there's this SQLite Manager plug-in for firefox: http://code.google.com/p/sqlite-manager/
I haven't tried importing data or using the INSERT command to insert individual rows, but you could give it a try. It's free and works very well for me as is.
There's quite a few database management tools available for sqlite that allow you to do this. I've tried a few but to be honest none of them have impressed me that much as yet.
Would be great to have something like Toad available.
Anyway, find wherever your database file is, then drop it onto whichever application.
You can then add, delete, and edit rows and columns.
Of course, you will need to maintain any foreign keys and such like.
I find the generated Core Data models to be pretty easy to understand.
Example tools are SQLite Database Browser (free), SQLiteManager (not free), and Base. A quick Google search should reveal those and a few more.
I normally use SQLite Database Browser although it does crash occasionally.
See Christian Kienle's Core data editor. It's not free, but is designed to work directly with core data models and stores via Apple's API, supports binary data, builds relationships and even triggers validation, etc. I've found it's worth the $20.