Converting SQL Result Sets to XML - sql

I am looking for a tool that can serialize and/or transform SQL Result Sets into XML. Getting dumbed down XML generation from SQL result sets is simple and trivial, but that's not what I need.
The solution has to be database neutral, and accepts only regular SQL query results (no db xml support used). A particular challenge of this tool is to provide nested XML matching any schema from row based results. Intermediate steps are too slow and wasteful - this needs to happen in one single step; no RS->object->XML, preferably no RS->XML->XSLT->XML. It must support streaming due to large result sets, big XML.
Anything out there for this?

With SQL Server you really should consider using the FOR XML construct in the query.
If you're using .Net, just use a DataAdapter to fill a dataset. Once it's in a dataset, just use its .WriteXML() method. That breaks your DB->object->XML rule, but it's really how things are done. You might be able to work something out with a datareader, but I doubt it.

Not that I know of. I would just roll my own. It's not that hard to do, maybe something like this:
#!/usr/bin/env jruby
import java.sql.DriverManager
# TODO some magic to load the driver
conn = DriverManager.getConnection(ARGV[0], ARGV[1], ARGV[2])
res = conn.executeQuery ARGV[3]
puts "<result>"
meta = res.meta_data
while res.next
puts "<row>"
for n in 1..meta.column_count
column = meta.getColumnName n
puts "<#{column}>#{res.getString(n)}</#{column}"
end
puts "</row>"
end
puts "</result>"
Disclaimer: I just made all of that up, I'm not even bothering to pretend that it works. :-)

In .NET you can fill a dataset from any source and then it can write that out to disk for you as XML with or without the schema. I can't say what performance for large sets would be like. Simple :)

Another option, depending on how many schemas you need to output, and/or how dynamic this solution is supposed to be, would be to actually write the XML directly from the SQL statement, as in the following simple example...
SELECT
'<Record>' ||
'<name>' || name || '</name>' ||
'<address>' || address || '</address>' ||
'</Record>'
FROM
contacts
You would have to prepend and append the document element, but I think this example is easy enough to understand.

dbunit (www.dbunit.org) does go from sql to xml and vice versa; you might be able to modify it more for your needs.

Technically, converting a result set to an XML file is straight forward and doesn't need any tool unless you have a requirement to convert the data structure to fit specific export schema. In general the result set gets the top-level element of an XML file, then you produce a number of record elements containing attributes, which effectively are the fields of a record.
When it comes to Java, for example, you just need appropriate JDBC driver for interfacing with DBMS of your choice addressing the database independency requirement (usually provided by a DBMS vendor), and a few lines of code to read a result set and print out an XML string per record, per field. Not a difficult task for an average Java developer in my opinion.
Anyway, the more concrete purpose you state the more concrete answer you get.

In Java, you may just fill an object with the xml data (like an entity bean) and then use XMLEncoder to get it to xml. From there you may use XSLT for further conversion or XMLDecoder to bring it back to an object.
Greetz, GHad
PS: See http://ghads.wordpress.com/2008/09/16/java-to-xml-to-java/ for an example for the Object to XML part... From DB to Object multiple more way are possible: JDBC, Groovy DataSets or GORM. Apache Common Beans may help to fill up JavaBeans via Reflection-like methods.

I created a solution to this problem by using the equivalent of a mail merge using the resultset as the source, and a template through which it was merged to produce the desired XML.
The template was standard XML, with a Header element, a Footer element and a Body element. Using a CDATA block in the Body element allowed me to include a complete XML structure that acted as the template for each row. In order to include a fields from the resultset in the template, I used markers that looked like this <[FieldName]>. The template was then pre-parsed to isolate the markers such that in operation, the template requests each of the fields from the resultset as the Body is being produced.
The Header and Footer elements are output only once at the beginning and end of the output set. The body could be any XML or text structure desired. In your case, it sounds like you might have several templates, one for each of your desired schemas.
All of the above was encapsulated in a Template class, such that after loading the Template, I merely called merge() on the template passing the resultset in as a parameter.

Related

How to skip the SQL(part of the SQL) parsing in antlr4?

Sorry for this question was closed and can not be reopened, and my poor english, it was translated by website indeed. :)
https://stackoverflow.com/questions/70035964/how-to-skip-sql-parsing-in-antlr4
#BartKiers Thanks for being interested in this question, let me give it a detailed example.
There are lots of SQL queries, such as select * from user or update user set field1 = 'value1' where condition = 'value' etc, let's called it original SQL queries.
There is a java program which intercepts and parses all the original SQL queries into Parse Tree nodes by ANTLR4, and then rewrites the query (which depended on the parse phase) by the java program, so the original SQL queries may be parsed and rewritten as
select field1, field1_encrypted, field1_digest, field2 from user
or
update user
set field1 = value1,
field1_encrypted = encrypt_algorithm(value1),
field1_digest = digest_algorithm(value1)
where condition_digest = digest_algorithm(values)
etc.
While they finished the rewritten phase, they should be executed as SQLStatement, the SELECT was executed as SelectSQLStatement while UPDATE executed as UpdateSQLStatement.
Now I thought some of the original SQL queries should skip the parse phase, and the rewrite phase which should be skipped as the same, but the originalSQL queries should be executed as it was.
I thought to mark those with comment as
/* PARSE_PHASE_SKIPPED=TRUE */ originalSQL
or prefix SKIP as
SKIP originalSQL
, I wish to parse the whole marked but original SQL query part into Parse Tree nodes by ANTLR4, and executed it as ParsePhaseSkippedSQLStatement.
Can ANTLR4 support on this situation, and how should the grammar be written? Thanks in advance.
====================
Thank you for your reply #Mike Cargal, Yes, almost.
Let me say it again and give a more detailed example.
There is a java system that we call it X, X has lots of SQL queries that the developers write and guarantee that those SQLs can be executed correctly by Ibatis / JPA etc, let's named those SQL queries as original SQL queries.
Using below original SQL queries as examples:
insert into user (username, id_no) values ('xyz', '123456')
select username, id_no from user u where u.id_no = '123456'
We say the column id_no on table user is sensitive data, we should save ciphertext instead of plaintext, so the originalSQLs would be parsed by ANTLR and rewritten by java code as below, let's named those SQLs as rewritten SQL queries, also rewritten SQL queries should be executed correctly by Ibatis / JPA etc.
insert
into user (username, id_no, id_no_cipher, id_no_digest)
values ('xyz', '', 'encrypted_123456', 'digest_123456')
select username, id_no_cipher as id_no
from user u
where u.id_no_digest = 'digest_123456'
In this case:
1、we see that the rewrite phase depends on the parse phase, original SQL queries need to be correctly parsed then to be rewritten by java code.
2、all original SQL queries are parsed but only a few matching the sensitive rules are rewritten to rewritten SQL queries.
But there are lots original SQL queries we clearly know that do not need to be rewritten, and also no need to be parsed, and may report exceptions in various complex situations while parsing it, but it should be executed correctly by Ibatis / JPA etc.
So I planed to use sql comment / customized keyword annotation to "turn off" parse phase of it.
If I understand your question correctly, you wish to use some sort of comment/annotation to "turn off" execution of the following SQL statement.
(NOTE: You can't really skip "parsing" part of the input. This will address ways in which you could skip processing part of the parsed input, which I believe is what you're ultimately wanting to accomplish.)
This would not be an ANTLR concern. ANTLR's responsibility is to parse you input stream and produce a parse tree (not technically an AST) that correctly represents the structure of your input.
Executing the SQL is not what ANTLR does. It does, however, generate utility Listener and Visitor classes that can be used to cleanly and efficiently navigate the resulting parse tree. There can be a lot of code involved in actually executing the SQL from the parse tree. Often, the first step is to produce an AST from the parse tree to make it easier to deal with.
You have a couple of alternatives (as you hint at).
1 - Using the current grammar an putting these annotations inside of comments (/* PARSE_PHASE_SKIPPED=TRUE */)
This can be done, but it's a bit "messy". It's most likely that COMMENT tokens are -> skiped (or perhaps sent to -> channel(HIDDEN)). This makes it MUCH easier to write the parser rules as you don't have to include optional COMMENTs everywhere a comment could appear. That said, if you send COMMENT tokens to the HIDDEN channel, they are still in the token stream even though they are ignored by the parser. The COMMENT tokens won't be in the rule Context objects that the listeners/visitors deal with, but you could look backwards/forwards in the token stream for COMMENT nodes.
2 - you could introduce some new syntax for annotations (similar to your SKIP idea). To do this you'll have to extend the syntax in the grammar to recognize these annotations. They'd have to be distinguishable from valid SQL, so a simple SKIP is probably not going to work.
The benefit of this approach is that, when you extend the grammar to recognize annotations, you can be very specific about where annotations are allowed. You'd be able to include them in your parse tree, making them easier to locate.
With either of these approaches, you would use a visitor or listener to go through your parse tree looking for the comment/annotation and then mark the subsequent statement with an indicator that you don't want to execute it. (You might use the information to simply skip the parse tree to AST transformation of the "skipped" nodes).
Let me see if I understand your question correctly. In your environment you run SQL queries (not "SQLs", btw.), which may contain data that must not reach the server as is. It doesn't matter if that is sensitive data or what else. All what matters is that you want to replace the text in the queries.
For that you parse the queries and rewrite them, before sending them to the server. However, you don't want to do that for all queries, but only for specific ones. And you came up with the idea to mark queries (or query parts) that must not be transformed, with a special comment. Does this match your intention so far?
Now I wonder why you want to accomplish that on query (parsing) level. It's not the parsing you want to modify but the semantic handling of the parse result (here the parse tree, as Mike Cargal already mentioned). So, in my opinion you don't need to introduce special markup for your queries, but instead define criterions that indicate which data must be replaced.
When you think about that you will probably realise that data for specific fields (columns) in specific tables need to be replaced. You can actually keep a list of schema/table/columns tuples, which tell your rewriter if a value must be rewritten. Everything else stays as it is.
ANTLR4 has nothing to do in this process. It's all to be done in the semantic phase (the processing of the parse tree using a parse tree listener). In this phase you have to collect all column references that are used in a query. Then you compare that list with the list for the rewriter. If a column reference matches, the rewriter has to rewrite the text for it in the query.
That task is however non-trivial, because of nested queries (subqueries, where inner queries can reference tables from an outer query). This is btw. pretty similar to the way code completion works, where you have to provide a possible column list for all mentioned tables in a query. That's why I have written (C++) code to collect such references in MySQL Workbench's SQL code completion.

Standard deep nested data type?

I took the nice example clientPrintDescription.py and create a HTML form from the description which matches the input data types for the particular RFC function.
In SAP data types can contain data types which can contain data types, and I want to test my HTML form generator with a very nested data type.
Of course I could create my own custom data type, but it would be more re-usable if I would use an existing (rfc-capable) data type.
Which data type in SAP contains a lot of nested data types? And maybe a lot of different data types?
I cannot tell which structure is the best for your case but you could filter the view DD03VV (now that is a meaningful name) using the transaction se16h. If you GROUP BY the column TABNAME and filter on WHERE TABCLASS = 'INTTAB' the number of entries is an indicator for the size of the structure.
You could also aggregate and in a next step filter on the maximum DEPTH value (like a SQL HAVING, which afaik does not exist in SAP R/3). On my system the maximum depth is 12.
Edit: If you cannot access se16h, here's a workaround: Call se37 and execute SE16N_START with I_HANA = 'X'. If you cannot access se37 use sa38 and call RSFUNCTIONBUILDER (the report behind se37).
PS: The requests on DD03VV are awfully slow, probably due to missing optimzation for complex requests on ABAP dictionary views.
If I had to give only one DDIC structure, I would give this one:
FDT_TEST_DDIC_BIND_DEEP_S
It contains many elements of miscellaneous types, including nested ones, and it exists in any ABAP-based system (it belongs to the "BASIS" layer).
As it contains some data and object references in sub-levels which are invalid in RFC, you'll have to copy it and remove those reference fields.
There are also these structures (column "TABNAME") with fields of some interest:
TABNAME FIELDNAME Description
-------------------- ------------- ------------------------------------------------
SFW_BF FROM_RELEASE elementary built-in type
SAUNIT_S_ALERT WHEN data element
SAUNIT_S_ALERT HEADER structure
SAUNIT_S_ALERT TEXT_INFOS table type
SAUNIT_PROG_INFO .INCLUDE include structure SAUNIT_S_TADIR_KEY
SKWF_IOFLD .INCLU-FLD include structure SKWF_IO
SWFEXPSTRU2 .INCLU--AP append structure SWFEXPSTRU3
APPEND_BAPI0002_2_2 .APPEND_DU append structure recursive (append of BAPI0002_2) (unique component of APPEND_BAPI0002_2_2)
SOADDRESS Structure with nested structures on 2 levels
Some structures may not be valid in some ABAP releases. They used to exist in ABAP basis 7.02 and 7.52.
Try the function module RFC_METADATA_TEST...
It has some deeply nested parameters.
In Se80 under Enterpise service browser, you will find examples of Proxy structures that are complex DDIC structures. With many different types.
Example edo_tw_a0401request
Just browse around, you will find something you like.
I found STFC_STRUCTURE in the docs of test_datatypes of PyRFC.
Works find for testing, since it is already available in my SAP system. I don't need a dummy rfc for testing. Nice.

How to query documents in MarkLogic and process results

I've been working off of the tutorial pages but seem to have a fundamental disconnect in my thinking transitioning off of RDBMS systems. I'm using MarkLogic and handling this database interaction through the Java API focusing on the search access via POJO method outlines in the tutorial documentation.
My reference up to this point has come from here principally: http://developer.marklogic.com/learn/java/processing-search-results
My scenario is this:
I have a series of documents. We'll call them 'books' for simplicity. I'm writing these books into my DB like this:
jsonDocMgr.write("/" + book.getID() + "/",
new StringHandle(
"{name: \""+book.getID()+"\","+
"chaps: "+ book.getNumChaps()+","+
"pages: "+ book.getNumPages()+","+
"}"));
What I want is to execute the following type of operation:
-Query all documents with the name "book*" (as ID is represented by book0, book1, book2, etc)
where chaps > 3. For these documents only, I want to modify the number of pages by reducing by half.
In an RDBMS, I'd use something like jdbcTemplate and get a result set for me to iterate through. For each iteration I'd know I was working with a single record (aka a book), parse the field values from the result set, make a note of the ID, then update the DB accordingly.
With MarkLogic, I'm awash in a sea of different handlers and managers...none of which seems to follow the pattern of the ResultSet with a cursor abstraction. Ultimately I want to do a two-step operation of check the chapter count then update the page field for that specific URI.
What's the most common approach to this? It seems like the most basic of operations...
Try the high-level Java API and see if it works for you. Create a multi-statement transaction with a query by example, then use document operations.
At a lower level, the closest match to a ResultSet is the ResultSequence class. The examples at http://docs.marklogic.com/javadoc/xcc/overview-summary.html are pretty good. For updates the interaction model between Java and MarkLogic is a bit different from JDBC and SQL. There is no SELECT... FOR UPDATE syntax.
The most efficient low-level technique is to select and update in one XQuery transaction, something like a stored procedure. However this requires good knowledge of XQuery. The other low-level approach is to use an XCC multi-statement transaction, which requires a little less knowledge of XQuery.
A minor issue in your code ... you definately do NOT want to end your JSON docuement URIs with "/" as you do in your sample code. You should end them with the ".json" or some other extension or no extension but definately not "/" as that is treated specially in the server.

Sanitizing User Input

I'm writing an R application in which I'm interacting with a SQL database using the RODBC package. I'm looking up items in the database based on columns in a .csv file. Although I have no reason to expect malicious content in these files, I'd much rather be safe than sorry.
How do you sanitize user input for use in a SQL query in R? In most languages I've come across, there were libraries that would accept a string, and return a sanitized string back to you. Does anything like that exist in R?
You could always use regular expressions to construct an accepted pattern and remove the cases that don't match.
I'd try that to be in full control. Don't know of any ready made checks.

Can you free-text search an XML file using SQL Server?

I have an XML feed of a resume. Each part of the resume is broken down into its constituent parts. For example <employment_history>, <education>, <skills>.
I am aware that I could save each section of the XML file into a database. For example columnID = employment_history | education | skills & then conduct a free text search just on those individual columns. However I would prefer not to do this because it would create duplication of data that is already contained within the XML file and may put extra strain on indexing.
Therefore I wondered if it is possible to conduct a free text search of an XML file within the <employment_history></employment_history> using SQL Server.
If so an example would be appreciated.
Are you aware that SQL Server supports columns with the data type of "XML"? These can contain an entire XML document.
You can also index these columns and you can use XQuery to perform query and data manipulation tasks on those columns.
See Designing and Implementing Semistructured Storage (Database Engine)
Querying xml by doing string searching using sql is probably going to run into a lot of trouble.
Instead, I would parse it into whatever language you're using to interact with your database and use xpath (most languages/environments have some kind of built in or popular 3rd party library) to query it.
I think you can create a function (UDF) that takes the xml text as a parameter then it fetches the data inside tag then you make the filter you want