Build an XML with XMLELEMENT - ORACLE SQL 11g query

Build an XML with XMLELEMENT - ORACLE SQL 11g query - sql

I'm trying to create the following XML using a SQL query (Oracle):
<Changes>
<Description>Some static test</Description>
<Notes>Some static test</Notes>
<UserChange>
<Operation>Static Text</Operation>
<User>VALUE from Table - record #1</User>
<BusinessSource>VALUE from Table #1</BusinessSource>
<ApplicationRole>VALUE from Table #1</ApplicationRole>
</UserChange>
<UserChange>
<Operation>Static Text</Operation>
<User>VALUE from Table - record #2</User>
<BusinessSource>VALUE from Table #2</BusinessSource>
<ApplicationRole>VALUE from Table #2</ApplicationRole>
</UserChange>
<UserChange>
<Operation>Static Text</Operation>
<User>VALUE from Table - record #3</User>
<BusinessSource>VALUE from Table #3</BusinessSource>
<ApplicationRole>VALUE from Table #3</ApplicationRole>
</UserChange>
</Changes>
The table I'm using looks like this:
ID USER SOURCE ROLE
1 test1 src1 role1
2 test1 src1 role1
3 test1 src1 role2
4 user2 src role
5 user3 src role
6 user1 src role
I want to write a query that will create a dynamic XML based on the values in the table.
For example:
The query should only take the values where user='test1' and the output will be the following XML:
<Changes>
<Description>Some static test</Description>
<Notes>Some static test</Notes>
<UserChange>
<Operation>Static Text</Operation>
<User>user1</User>
<BusinessSource>src1</BusinessSource>
<ApplicationRole>role1</ApplicationRole>
</UserChange>
<UserChange>
<Operation>Static Text</Operation>
<User>user1</User>
<BusinessSource>src1</BusinessSource>
<ApplicationRole>role1</ApplicationRole>
</UserChange>
<UserChange>
<Operation>Static Text</Operation>
<User>user1</User>
<BusinessSource>src1</BusinessSource>
<ApplicationRole>role2</ApplicationRole>
</UserChange>
</Changes>
I've started to write the query:
SELECT XMLElement("Changes",
XMLElement("Description", 'sometext'),
XMLElement("Notes", 'sometext'),
XMLElement("FulfillmentDate", 'Some Date'),
XMLElement("UserChange",
XMLElement("Operation", 'sometext'),
XMLElement("User", 'sometext'),
XMLElement("BusinessSource", 'sometext'),
XMLElement("ApplicationRole", 'sometext')
)).GETSTRINGVAL() RESULTs
FROM DUAL;
I need to iterate on the other values and make them part of the complete XML.
Appreciate your help.
Thanks

I was able to find a solution:
select XMLElement("Changes",
XMLElement("Description", 'sometext'),
XMLElement("Notes", 'sometext'),
XMLElement("FulfillmentDate", 'Some Date'),
XMLAgg(XML_CANDIDATE) ).GETSTRINGVAL() RESULTS
from
(
select XMLAGG(
XMLElement("UserChange",
XMLElement("Operation", 'sometext'),
XMLElement("User", 'sometext'),
XMLElement("BusinessSource", 'sometext'),
XMLElement("ApplicationRole", 'sometext'))) XML_CANDIDATE
from
table);

For future readers, here are open source programming solutions to transfer a SQL query to XML document, using OP's data needs as example.
Below code examples are not restricted to any database SQL dialect (i.e., transferrable to other RDBMS using corresponding connection modules as below are Oracle-specific).
For Python (using cx_Oracle and lxml modules):
import os
import cx_Oracle
import lxml.etree as ET
# Set current directory
cd = os.path.dirname(os.path.abspath(__file__))
# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd#database")
cur = db.cursor()
cur.execute("SELECT * FROM OracleData where user='test1'")
# WRITING XML FILE
root = ET.Element('Changes')
DescNode = ET.SubElement(root, "Description").text = 'Some static test'
NotesNode = ET.SubElement(root, "Notes").text = 'Some static test'
# LOOPING THROUGH QUERY RESULTS TO WRITE CHILD ELEMENTS
for row in cur.fetchall():
UCNode = ET.SubElement(root, "UserChange")
ET.SubElement(UCNode, "Operation").text = 'Static Text'
ET.SubElement(UCNode, "User").text = row[1]
ET.SubElement(UCNode, "BusinessSource").text = row[2]
ET.SubElement(UCNode, "ApplicationRole").text = row[3]
# CLOSE CURSOR AND DATABASE
cur.close()
db.close()
tree_out = (ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8"))
xmlfile = open(os.path.join(cd, 'OracleXML.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
For PHP (using PDO Oracle OCI and DOMDocument)
// Set current directory
$cd = dirname(__FILE__);
// create a dom document with encoding utf8
$domtree = new DOMDocument('1.0', 'UTF-8');
$domtree->formatOutput = true;
$domtree->preserveWhiteSpace = false;
// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";
try {
$dbh = new PDO($db,$db_username,$db_password);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = "SELECT * FROM OracleData where user='test1'";
$STH = $dbh->query($sql);
$STH->setFetchMode(PDO::FETCH_ASSOC);
}
catch(PDOException $e) {
echo $e->getMessage();
exit;
}
/* create the root element of the xml tree */
$xmlRoot = $domtree->createElement("Changes");
$xmlRoot = $domtree->appendChild($xmlRoot);
$DescNode = $xmlRoot->appendChild($domtree->createElement('Description', 'Some static test'));
$NotesNode = $xmlRoot->appendChild($domtree->createElement('Notes', 'Some static test'));
/* loop query results through child elements */
while($row = $STH->fetch()) {
$UCNode = $xmlRoot->appendChild($domtree->createElement('UserChange'));
$operationNode = $UCNode->appendChild($domtree->createElement('Operation', 'Some static text'));
$userNode = $UCNode->appendChild($domtree->createElement('User', $row['USER']));
$sourceNode = $UCNode->appendChild( $domtree->createElement('BusienssSource', $row['SOURCE']));
$roleNode = $UCNode->appendChild($domtree->createElement('ApplicationRole', $row['ROLE']));
}
file_put_contents($cd. "/OracleXML.xml", $domtree->saveXML());
# Closing db connection
$dbh = null;
exit;
For R (using ROracle and XML packages):
library(XML)
library(ROracle)
# SET CURRENT DIRECTORY
setwd("C:\\Path\\To\\R\\Script")
# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
df <- dbGetQuery(conn, "select * from OracleData where user= 'test1';")
dbDisconnect(conn)
# CREATE XML FILE
doc = newXMLDoc()
root = newXMLNode("Changes", doc = doc)
descNode = newXMLNode("Description", "Some static test", parent = root)
notesNode = newXMLNode("Notes", "Some static test", parent = root)
# WRITE XML NODES AND DATA
for (i in 1:nrow(df)){
UCNode = newXMLNode("UserChange", parent = root)
operationNode = newXMLNode("Operation", "Some static text", parent = UCNode)
userNode = newXMLNode("User", df$USER[i], parent = UCNode)
sourceNode = newXMLNode("BusinessSource", df$SOURCE[i], parent = UCNode)
roleNode = newXMLNode("ApplicationRole", df$ROLE[i], parent = UCNode)
}
# OUTPUT XML CONTENT TO FILE
saveXML(doc, file="OracleXML.xml")

You can use this query:
select xmlelement("Changes",
xmlforest(
'Some Static Text' "Description"
, 'Some Static Text' "Notes")
, xmlagg(
xmlelement("UserChange",
xmlforest('Static Text' "Operation",
"USER" "User",
SOURCE "BusinessSource",
ROLE "ApplicationRole")
)
)
),getclobval()
from table
where "USER" = 'test1';
But remember that the XMLAGG function is an aggregate function. In this case every selected column from table is included in the aggregate, so no group by is needed. However, if you wanted to include some column from table outside of the XMLAGG you would need to include them in a group by statement. Also since USER is a reserved word it needs to be surrounded by double quotes to be used as a column reference.

Related

How can I automatically infer schemas of CSV files on S3 as I load them?

Context
Currently I am using Snowflake as a Data Warehouse and AWS' S3 as a data lake. The majority of the files that land on S3 are in the Parquet format. For these, I am using a new limited feature by Snowflake (documented here) that automatically detects the schema from the parquet files on S3, which I can use to generate a CREATE TABLE statement with the correct column names and inferred data types. This feature currently only works for Apache Parquet, Avro, and ORC files. I would like to find a way that achieves the same desired objective but for CSV files.
What I have tried to do
This is how I currently infer the schema for Parquet files:
select generate_column_description(array_agg(object_construct(*)), 'table') as columns
from table (infer_schema(location=>'${LOCATION}', file_format=>'${FILE_FORMAT}'))
However if I try specifying the FILE_FORMAT as csv that approach will fail.
Other approaches I have considered:
Transferring all files that land on S3 to parquet (this involves more code, and infra setup so wouldn't be my top choice, especially that I'd like to keep some files in their natural type on s3)
Having a script (using libraries like Pandas in Python for example) that infer the schema for files in S3 (this also involves more code, and will be strange in the sense that parquet files are handled in Snowflake, but non parquet files are handled by some script on aws).
Using a Snowflake UDF to infer the schema. Haven't fully considered my options there yet.
Desired Behaviour
As a new csv file lands on S3 (on a pre-existing STAGE), I would like to infer the schema, and be able to generate a CREATE TABLE statement with the inferred data types. Preferably, I would like to do that within Snowflake as the existing aforementioned schema-inference solution exists there. Happy to add further information if needed.

UPDATE: I modified the SP that infers data types in untyped (all string type columns) tables and it now works directly against Snowflake stages. The project code is available here: https://github.com/GregPavlik/InferSchema
I wrote a stored procedure to assist with this; however, its only goal is to infer the data types of untyped columns. It works as follows:
Load the CSV into a table with all columns defined as varchars.
Call the SP with a query against the new table (main point is to get only the columns you want and limit the row count to keep type inference times reasonable).
Also in the SP call is the DB, schema, and table for the old and new locations -- old with all varchar and new with the inferred types.
The SP will then infer the data types and create two SQL statements. One statement will create the new table with the inferred data types. One statement will copy from the untyped (all varchar) table to the new table with appropriate wrappers such as try_multi_timestamp(), a UDF that extends try_to_timestamp() to try various common formats.
I meant to extend this so that it didn't require the untyped (all varchar) table at all, but haven't gotten around to it. Since it's come up here, I may circle back and update the SP with that capability. You can specify a query that reads directly from the stage, but you'd have to use $1, $2... with aliases for the column names (or else the DDL will try to create column names like $1). If the query runs directly against a stage, for the old DB, schema, and table, you could put in whatever because that's only used to generate an insert from select statement.
-- This shows how to use on the Snowflake TPCH sample, but could be any query.
-- Keep the row count down to reduce the time it take to infer the types.
call infer_data_types('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.LINEITEM limit 10000',
'SNOWFLAKE_SAMPLE_DATA', 'TPCH_SF1', 'LINEITEM',
'TEST', 'PUBLIC', 'LINEITEM');
create or replace procedure INFER_DATA_TYPES(SOURCE_QUERY string,
DATABASE_OLD string,
SCHEMA_OLD string,
TABLE_OLD string,
DATABASE_NEW string,
SCHEMA_NEW string,
TABLE_NEW string)
returns string
language javascript
as
$$
/****************************************************************************************************
* *
* DataType Classes
* *
****************************************************************************************************/
class Query{
constructor(statement){
this.statement = statement;
}
}
class DataType {
constructor(db, schema, table, column, sourceQuery) {
this.db = db;
this.schema = schema;
this.table = table;
this.sourceQuery = sourceQuery
this.column = column;
this.insert = '"#~COLUMN~#"';
this.totalCount = 0;
this.notNullCount = 0;
this.typeCount = 0;
this.blankCount = 0;
this.minTypeOf = 0.95;
this.minNotNull = 1.00;
}
setSQL(sqlTemplate){
this.sql = sqlTemplate;
this.sql = this.sql.replace(/#~DB~#/g, this.db);
this.sql = this.sql.replace(/#~SCHEMA~#/g, this.schema);
this.sql = this.sql.replace(/#~TABLE~#/g, this.table);
this.sql = this.sql.replace(/#~COLUMN~#/g, this.column);
}
getCounts(){
var rs;
rs = GetResultSet(this.sql);
rs.next();
this.totalCount = rs.getColumnValue("TOTAL_COUNT");
this.notNullCount = rs.getColumnValue("NON_NULL_COUNT");
this.typeCount = rs.getColumnValue("TO_TYPE_COUNT");
this.blankCount = rs.getColumnValue("BLANK");
}
isCorrectType(){
return (this.typeCount / (this.notNullCount - this.blankCount) >= this.minTypeOf);
}
isNotNull(){
return (this.notNullCount / this.totalCount >= this.minNotNull);
}
}
class TimestampType extends DataType{
constructor(db, schema, table, column, sourceQuery){
super(db, schema, table, column, sourceQuery)
this.syntax = "timestamp";
this.insert = 'try_multi_timestamp(trim("#~COLUMN~#"))';
this.sourceQuery = SOURCE_QUERY;
this.setSQL(GetCheckTypeSQL(this.insert, this.sourceQuery));
this.getCounts();
}
}
class IntegerType extends DataType{
constructor(db, schema, table, column, sourceQuery){
super(db, schema, table, column, sourceQuery)
this.syntax = "number(38,0)";
this.insert = 'try_to_number(trim("#~COLUMN~#"), 38, 0)';
this.setSQL(GetCheckTypeSQL(this.insert, this.sourceQuery));
this.getCounts();
}
}
class DoubleType extends DataType{
constructor(db, schema, table, column, sourceQuery){
super(db, schema, table, column, sourceQuery)
this.syntax = "double";
this.insert = 'try_to_double(trim("#~COLUMN~#"))';
this.setSQL(GetCheckTypeSQL(this.insert, this.sourceQuery));
this.getCounts();
}
}
class BooleanType extends DataType{
constructor(db, schema, table, column, sourceQuery){
super(db, schema, table, column, sourceQuery)
this.syntax = "boolean";
this.insert = 'try_to_boolean(trim("#~COLUMN~#"))';
this.setSQL(GetCheckTypeSQL(this.insert, this.sourceQuery));
this.getCounts();
}
}
// Catch all is STRING data type
class StringType extends DataType{
constructor(db, schema, table, column, sourceQuery){
super(db, schema, table, column, sourceQuery)
this.syntax = "string";
this.totalCount = 1;
this.notNullCount = 0;
this.typeCount = 1;
this.minTypeOf = 0;
this.minNotNull = 1;
}
}
/****************************************************************************************************
* *
* Main function *
* *
****************************************************************************************************/
var pass = 0;
var column;
var typeOf;
var ins = '';
var newTableDDL = '';
var insertDML = '';
var columnRS = GetResultSet(GetTableColumnsSQL(DATABASE_OLD, SCHEMA_OLD, TABLE_OLD));
while (columnRS.next()){
pass++;
if(pass > 1){
newTableDDL += ",\n";
insertDML += ",\n";
}
column = columnRS.getColumnValue("COLUMN_NAME");
typeOf = InferDataType(DATABASE_OLD, SCHEMA_OLD, TABLE_OLD, column, SOURCE_QUERY);
newTableDDL += '"' + typeOf.column + '" ' + typeOf.syntax;
ins = typeOf.insert;
insertDML += ins.replace(/#~COLUMN~#/g, typeOf.column);
}
return GetOpeningComments() +
GetDDLPrefixSQL(DATABASE_NEW, SCHEMA_NEW, TABLE_NEW) +
newTableDDL +
GetDDLSuffixSQL() +
GetDividerSQL() +
GetInsertPrefixSQL(DATABASE_NEW, SCHEMA_NEW, TABLE_NEW) +
insertDML +
GetInsertSuffixSQL(DATABASE_OLD, SCHEMA_OLD, TABLE_OLD) ;
/****************************************************************************************************
* *
* Helper functions *
* *
****************************************************************************************************/
function InferDataType(db, schema, table, column, sourceQuery){
var typeOf;
typeOf = new IntegerType(db, schema, table, column, sourceQuery);
if (typeOf.isCorrectType()) return typeOf;
typeOf = new DoubleType(db, schema, table, column, sourceQuery);
if (typeOf.isCorrectType()) return typeOf;
typeOf = new BooleanType(db, schema, table, column, sourceQuery); // May want to do a distinct and look for two values
if (typeOf.isCorrectType()) return typeOf;
typeOf = new TimestampType(db, schema, table, column, sourceQuery);
if (typeOf.isCorrectType()) return typeOf;
typeOf = new StringType(db, schema, table, column, sourceQuery);
if (typeOf.isCorrectType()) return typeOf;
return null;
}
/****************************************************************************************************
* *
* SQL Template Functions *
* *
****************************************************************************************************/
function GetCheckTypeSQL(insert, sourceQuery){
var sql =
`
select count(1) as TOTAL_COUNT,
count("#~COLUMN~#") as NON_NULL_COUNT,
count(${insert}) as TO_TYPE_COUNT,
sum(iff(trim("#~COLUMN~#")='', 1, 0)) as BLANK
--from "#~DB~#"."#~SCHEMA~#"."#~TABLE~#";
from (${sourceQuery})
`;
return sql;
}
function GetTableColumnsSQL(dbName, schemaName, tableName){
var sql =
`
select COLUMN_NAME
from ${dbName}.INFORMATION_SCHEMA.COLUMNS
where TABLE_CATALOG = '${dbName}' and
TABLE_SCHEMA = '${schemaName}' and
TABLE_NAME = '${tableName}'
order by ORDINAL_POSITION;
`;
return sql;
}
function GetOpeningComments(){
return `
/**************************************************************************************************************
* *
* Copy and paste into a worksheet to create the typed table and insert into the new table from the old one. *
* *
**************************************************************************************************************/
`;
}
function GetDDLPrefixSQL(db, schema, table){
var sql =
`
create or replace table "${db}"."${schema}"."${table}"
(
`;
return sql;
}
function GetDDLSuffixSQL(){
return "\n);";
}
function GetDividerSQL(){
return `
/**************************************************************************************************************
* *
* The SQL statement below this attempts to copy all rows from the string tabe to the typed table. *
* *
**************************************************************************************************************/
`;
}
function GetInsertPrefixSQL(db, schema, table){
var sql =
`\ninsert into "${db}"."${schema}"."${table}" select\n`;
return sql;
}
function GetInsertSuffixSQL(db, schema, table){
var sql =
`\nfrom "${db}"."${schema}"."${table}" ;`;
return sql;
}
//function GetInsertSuffixSQL(db, schema, table){
//var sql = '\nfrom "${db}"."${schema}"."${table}";';
//return sql;
//}
/****************************************************************************************************
* *
* SQL functions *
* *
****************************************************************************************************/
function GetResultSet(sql){
cmd1 = {sqlText: sql};
stmt = snowflake.createStatement(cmd1);
var rs;
rs = stmt.execute();
return rs;
}
function ExecuteNonQuery(queryString) {
var out = '';
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
rs = stmt.execute();
}
function ExecuteSingleValueQuery(columnName, queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(columnName);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
function ExecuteFirstValueQuery(queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(1);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
function getQuery(sql){
var cmd = {sqlText: sql};
var query = new Query(snowflake.createStatement(cmd));
try {
query.resultSet = query.statement.execute();
} catch (err) {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
return query;
}
$$;

Have you tried STAGES?
Create 2 stages ... one with no header and the other with header .. .
see examples below.
Then a bit of SQL and voila your DDL.
Only issue - you need to know the # of cols to put correct number of t.$'s.
If someone could automate that ... we'd have an almost automatic DDL generator for CSV's.
Obviously once you have the SQL stmt then just add the create or replace table to the front and your table is nicely created with all the names from the CSV.
:-)
-- create or replace stage CSV_NO_HEADER
URL = 's3://xxx-x-dev-landing/xxx/'
STORAGE_INTEGRATION = "xxxLAKE_DEV_S3_INTEGRATION"
FILE_FORMAT = ( TYPE = CSV SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"' )
-- create or replace stage CSV
URL = 's3://xxx-xxxlake-dev-landing/xxx/'
STORAGE_INTEGRATION = "xxxLAKE_DEV_S3_INTEGRATION"
FILE_FORMAT = ( TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"' )
select concat('select t.$1 ', t.$1, ',t.$2 ', t.$2,',t.$3 ', t.$3, ',t.$4 ', t.$4,',t.$5 ', t.$5,',t.$6 ', t.$6,',t.$7 ', t.$7,',t.$8 ', t.$8,',t.$9 ', t.$9,
',t.$10 ', t.$10, ',t.$11 ', t.$11,',t.$12 ', t.$12 ,',t.$13 ', t.$13, ',t.$14 ', t.$14 ,',t.$15 ', t.$15 ,',t.$16 ', t.$16 ,',t.$17 ', t.$17 ,' from #xxxx_NO_HEADER/SUB_TRANSACTION_20201204.csv t') from
--- CHANGE TABLE ---
#xxx/SUB_TRANSACTION_20201204.csv t limit 1;

ATG - How to override RQL for insert statements

I need to insert records to an Oracle DB table that already has records in it by using the table's sequence.
I tried using RQL which creates an auto-generated id for the primary key but sometimes those generated ids already exist in the database and as a result, a constraint violation error is thrown.
ATG documentation provides an alternative named Overriding RQL-Generated SQL but I didn't manage to make it work for insert statements.
GSARepository repo =
(GSARepository)request.resolveName("/examples/TestRepository");
RepositoryView view = repo.getView("canard");
Object params[] = new Object[4];
params[0] = new Integer (25);
params[1] = new Integer (75);
params[2] = "french";
params[3] = "greek";
Builder builder = (Builder)view.getQueryBuilder();
String str = "SELECT * FROM usr_tbl WHERE (age_col > 0 AND age_col < 1
AND EXISTS (SELECT * from subjects_tbl where id = usr_tbl.id AND subject
IN (2, 3)))";
RepositoryItem[] items =
view.executeQuery (builder.createSqlPassthroughQuery(str, params));
Is there any way to use table's sequence for insert statements via ATG Repository API?

Eventually, I did not manage to make it work but I found the following solution.
I retrieved the sequence number as below and then used it in the RQL insert statement.
RepositoryView view = getRestServiceDetailsRepository().getView("wsLog");
String sql = "select log_seq.nextval from dual";
Object[] params = {};
Builder builder = (Builder) view.getQueryBuilder();
Query query = builder.createSqlPassthroughQuery(sql, params);
RepositoryItem[] items = view.executeQuery(query);
if (items != null && items.length > 0) {
items[0].getRepositoryId();
}

Converting StructType to Avro Schema, returns type as Union when using databricks spark-avro

I am using databricks spark-avro to convert a dataframe schema into avro schema.The returned avro schema fails to have a default value. This is causing issues when i am trying to create a Generic record out of the schema. Can, any one help with the right way of using this function ?
Dataset<Row> sellableDs = sparkSession.sql("sql query");
SchemaBuilder.RecordBuilder<Schema> rb = SchemaBuilder.record("testrecord").namespace("test_namespace");
Schema sc = SchemaConverters.convertStructToAvro(sellableDs.schema(), rb, "test_namespace");
System.out.println(sc.toString());
System.out.println(sc.getFields().get(0).toString());
String schemaString = sc.toString();
sellableDs.foreach(
(ForeachFunction<Row>) row -> {
Schema scEx = new Schema.Parser().parse(schemaString);
GenericRecord gr;
gr = new GenericData.Record(scEx);
System.out.println("Generic record Created");
int fieldSize = scEx.getFields().size();
for (int i = 0; i < fieldSize; i++ ) {
// System.out.println( row.get(i).toString());
System.out.println("field: " + scEx.getFields().get(i).toString() + "::" + "value:" + row.get(i));
gr.put(scEx.getFields().get(i).toString(), row.get(i));
//i++;
}
}
);
This is the df schema:
StructType(StructField(key,IntegerType,true), StructField(value,DoubleType,true))
This is the avro converted schema:
{"type":"record","name":"testrecord","namespace":"test_namespace","fields":[{"name":"key","type":["int","null"]},{"name":"value","type":["double","null"]}]}

The problems is that the class SchemaConverters does not include default values as part of the schema creation. You have 2 options, modify the schema adding default values before Record creation or filling the record before building with some value( it could be actually values from your row). For example null. This is an example how create a Record using your schema
import org.apache.avro.generic.GenericRecordBuilder
import org.apache.avro.Schema
var schema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"testrecord\",\"namespace\":\"test_namespace\",\"fields\":[{\"name\":\"key\",\"type\":[\"int\",\"null\"]},{\"name\":\"value\",\"type\":[\"double\",\"null\"]}]}")
var builder = new GenericRecordBuilder(schema);
for (i <- 0 to schema.getFields().size() - 1 ) {
builder.set(schema.getFields().get(i).name(), null)
}
var record = builder.build();
print(record.toString())

Multiple SQL statements using Groovy

Delete multiple entries from DB using Groovy in SoapUI
I am able to execute one SQL statement, but when I do a few it just hangs.
How can I delete multiple rows?
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info("inside try")
log.info("before")
String Que =
"""delete from table name where user in (select user from user where ID= '123' and type= 262);
delete from table name where user in (select user from user where ID= '1012' and type= 28)
delete from table name where user in (select user from user where ID= '423' and type= 27)
"""
log.info (Que)
def output = sql.execute(Que);
log.info(sql)
log.info(output)
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()

Here is what you are looking for.
I feel it is more effective way if you want bulk number of records using the following way.
Create a map for the data i.e., id, type as key value pair that needs to be removed in your case.
Used closure to execute the query by iterating thru it.
Added comments appropriately.
//Closure to execute the query with parameters
def runQuery = { entry ->
def output = sql.execute("delete from table name where user in (select user from user where ID=:id and type=:type)", [id:entry.key, type:entry.value] )
log.info(output)
}
//Added below two statements
//Create the data that you want to remove in the form of map id, and type
def deleteData = ['123':26, '1012':28, '423':27]
def sql = Sql.newInstance('jdbc:oracle:thin:#jack:1521:test1', 'test', 'test', 'oracle.jdbc.driver.OracleDriver')
log.info("SQL connetced")
sql.connection.autoCommit = false
try {
log.info(sql)
log.info("inside try")
log.info("before")
//Added below two statements
//Call the above closure and pass key value pair in each iteration
deleteData.each { runQuery(it) }
log.info("after")
sql.commit()
println("Successfully committed")
}catch(Exception ex) {
sql.rollback()
log.info("Transaction rollback"+ex)
}
sql.close()
If you are just looking after execution of multiple queries only approach, then you may look at here and not sure if your database supports the same.

How to Build an XML Document Incrementally in PL/pgSQL

What's the best way to incrementally build an XML document/string using PL/pgSQL? Consider the following desired XML output:
<Directory>
<Person>
<Name>Bob</Name>
<Address>1234 Main St</Address>
<MagicalAddressFactor1>3</MagicalAddressFactor1>
<MagicalAddressFactor2>8</MagicalAddressFactor2>
<MagicalAddressFactor3>1</MagicalAddressFactor3>
<IsMagicalAddress>Y</IsMagicalAddress>
</Person>
<Person>
<Name>Joshua</Name>
<Address>100 Broadway Blvd</Address>
<MagicalAddressFactor1>2</MagicalAddressFactor1>
<MagicalAddressFactor2>1</MagicalAddressFactor2>
<MagicalAddressFactor3>4</MagicalAddressFactor3>
<IsMagicalAddress>Y</IsMagicalAddress>
</Person>
</Directory>
Where:
Person name and address is based on a simple person table.
MagicalAddressFactor 1, 2, and 3 are all based on some complex links and calculations to other tables from the Person table.
IsMagicalAddress is based on the sum of the three MagicalAddressFactors being greater than 10.
How could I generate this with PL/pgSQL using XML functions to ensure a well-formed XML element? Without using XML functions the code would look like this:
DECLARE
v_sql text;
v_rec RECORD;
v_XML xml;
v_factor1 integer;
v_factor2 integer;
v_factor3 integer;
v_IsMagical varchar;
BEGIN
v_XML := '<Directory>';
v_sql := 'select * from person;'
FOR v_rec IN v_sql LOOP
v_XML := v_XML || '<Name>' || v_rec.name || '</Name>' ||
'<Address>' || v_rec.Address || '</Address>';
v_factor1 := get_factor_1(v_rec);
v_factor2 := get_factor_2(v_rec);
v_factor3 := get_factor_3(v_rec);
v_IsMagical := case
when (v_factor1 + v_factor2 + v_factor3) > 10 then
'Y'
else
'N'
end;
v_XML := v_XML || '<MagicalAddressFactor1>' || v_factor1 || '</MagicalAddressFactor1>' ||
'<MagicalAddressFactor2>' || v_factor2 || '</MagicalAddressFactor2>' ||
'<MagicalAddressFactor3>' || v_factor3 || '</MagicalAddressFactor3>' ||
'<IsMagicalAddress>' || v_IsMagical || '</IsMagicalAddress>';
v_XML := v_XML || '</Person>'
END LOOP;
v_XML := v_XML || '</Directory>'
END;

For OP and future readers, consider a general purpose language whenever needed to migrate database content to XML documents. Simply connect via ODBC/OLEDB drivers, retrieve query, and output to XML document. Using OP's needs, calculations can be incorporated into one select query or a stored procedure that returns a resultset and have coding language import records for document building.
Below are open-source solutions including Java where each connect using corresponding PostgreSQL drivers (requiring installation). SQL queries assumes get_factor1(), get_factor2(), get_factor3() are inline database functions and Persons maintain a unique ID in first column.
Java (using the Postgre JDBC driver)
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.OutputKeys;
import java.sql.* ;
import java.util.ArrayList;
import java.io.IOException;
import java.io.File;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class SQLtoXML {
public static void main(String[] args) {
String currentDir = new File("").getAbsolutePath();
try {
String url = "jdbc:postgresql://localhost/test";
Properties props = new Properties();
props.setProperty("user","sqluser");
props.setProperty("password","secret");
props.setProperty("ssl","true");
Connection conn = DriverManager.getConnection(url, props);
String url = "jdbc:postgresql://localhost/test?user=sqlduser&password=secret&ssl=true";
Connection conn = DriverManager.getConnection(url);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT name, address, "
+ "get_factor_1(v_rec) As v_factor1, "
+ "get_factor_2(v_rec) As v_factor2, "
+ "get_factor_3(v_rec) As v_factor3, "
+ " CASE WHEN (get_factor_1(v_rec) + "
+ " get_factor_2(v_rec) + "
+ " get_factor_3(v_rec)) > 10 "
+ " THEN 'Y' ELSE 'N' END As v_isMagical "
+ " FROM Persons;");
// Write to XML document
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
// Root element
Element rootElement = doc.createElement("Directory");
doc.appendChild(rootElement);
// Export table data
ResultSetMetaData rsmd = rs.getMetaData();
int columnsNumber = rsmd.getColumnCount();
while (rs.next()) {
// Data rows
Element personNode = doc.createElement("Person");
rootElement.appendChild(personNode);
Element nameNode = doc.createElement("name");
nameNode.appendChild(doc.createTextNode(rs.getString(2)));
personNode.appendChild(nameNode);
Element addressNode = doc.createElement("address");
addressNode.appendChild(doc.createTextNode(rs.getString(3)));
personNode.appendChild(addressNode);
Element magicaladd1Node = doc.createElement("MagicalAddressFactor1");
magicaladd1Node.appendChild(doc.createTextNode(rs.getString(4)));
personNode.appendChild(magicaladd1Node);
Element magicaladd2Node = doc.createElement("MagicalAddressFactor2");
magicaladd2Node.appendChild(doc.createTextNode(rs.getString(5)));
personNode.appendChild(magicaladd2Node);
Element magicaladd3Node = doc.createElement("MagicalAddressFactor3");
magicaladd3Node.appendChild(doc.createTextNode(rs.getString(6)));
personNode.appendChild(magicaladd3Node);
Element isMagicalNode = doc.createElement("IsMagicalAddress");
isMagicalNode.appendChild(doc.createTextNode(rs.getString(7)));
personNode.appendChild(isMagicalNode);
}
rs.close();
stmt.close();
conn.close();
// Output content to xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(currentDir + "\\PostgreXML_java.xml"));
transformer.transform(source, result);
System.out.println("Successfully created xml file!");
} catch (ParserConfigurationException pce) {
System.out.println(pce.getMessage());
} catch (TransformerException tfe) {
System.out.println(tfe.getMessage());
} catch (SQLException err) {
System.out.println(err.getMessage());
}
}
}
Python (Using the Psycopg module)
import psycopg2
import os
import lxml.etree as ET
cd = os.path.dirname(os.path.abspath(__file__))
# DB CONNECTION AND QUERY
db = psycopg2.connect("dbname=test user=postgres")
cur = db.cursor()
cur.execute("SELECT name, address, \
get_factor_1(v_rec) As v_factor1, \
get_factor_2(v_rec) As v_factor2, \
get_factor_3(v_rec) As v_factor3, \
CASE WHEN (get_factor_1(v_rec) + \
get_factor_2(v_rec) + \
get_factor_3(v_rec)) > 10 \
THEN 'Y' ELSE 'N' END As v_isMagical \
FROM Persons;")
# WRITING XML FILE
root = ET.Element('Directory')
for row in cur.fetchall():
personNode = ET.SubElement(root, "Person")
ET.SubElement(personNode, "Name").text = row[1]
ET.SubElement(personNode, "Address").text = row[2]
ET.SubElement(personNode, "MagicalAddressFactor1").text = row[3]
ET.SubElement(personNode, "MagicalAddressFactor2").text = row[4]
ET.SubElement(personNode, "MagicalAddressFactor3").text = row[5]
ET.SubElement(personNode, "IsMagicalAddress").text = row[6]
# CLOSE CURSOR AND DATABASE
cur.close()
db.close()
# OUTPUT XML
tree_out = (ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8"))
xmlfile = open(os.path.join(cd, 'PostgreXML_py.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
print("Successfully migrated SQL to XML data!")
PHP (using Postgre PDO Driver)
<?php
$cd = dirname(__FILE__);
// create a dom document with encoding utf8
$domtree = new DOMDocument('1.0', 'UTF-8');
$domtree->formatOutput = true;
$domtree->preserveWhiteSpace = false;
# Opening db connection
$host="root";
$dbuser = "*****";
try {
$dbh = new PDO("pgsql:dbname=$dbname;host=$host", $dbuser, $dbpass);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = "SELECT name, address,
get_factor_1(v_rec) As v_factor1,
get_factor_2(v_rec) As v_factor2,
get_factor_3(v_rec) As v_factor3,
CASE WHEN (get_factor_1(v_rec) +
get_factor_2(v_rec) +
get_factor_3(v_rec)) > 10
THEN 'Y' ELSE 'N' END As v_isMagical
FROM Persons;";
$STH = $dbh->query($sql);
$STH->setFetchMode(PDO::FETCH_ASSOC);
}
catch(PDOException $e) {
echo $e->getMessage();
exit;
}
/* create the root element of the xml tree */
$xmlRoot = $domtree->createElement("Directory");
$xmlRoot = $domtree->appendChild($xmlRoot);
/* loop query results through child elements */
while($row = $STH->fetch()) {
$personNode = $xmlRoot->appendChild($domtree->createElement('Person'));
$nameNode = $personNode->appendChild($domtree->createElement('Name', $row['name']));
$addNode = $personNode->appendChild($domtree->createElement('Address', $row['address']));
$magadd1Node = $personNode->appendChild($domtree->createElement('MagicalAddressFactor1', $row['v_factor1']));
$magadd2Node = $personNode->appendChild($domtree->createElement('MagicalAddressFactor2', $row['v_factor2']));
$magadd3Node = $personNode->appendChild($domtree->createElement('MagicalAddressFactor3', $row['v_factor3']));
$ismagicalNode = $personNode->appendChild($domtree->createElement('IsMagicalAddress', $row['v_isMagical']));
}
file_put_contents($cd. "/PostgreXML_php.xml", $domtree->saveXML());
echo "\nSuccessfully migrated SQL data into XML!\n";
# Closing db connection
$dbh = null;
exit;
?>
R (using the RPostgreSQL package)
library(RPostgreSQL)
library(XML)
#setwd("C:/path/to/working/folder")
# OPEN DATABASE AND QUERY
drv <- dbDriver("PostgreSQL")
conn <- dbConnect(drv, dbname="tempdb")
df <- sqlQuery(conn, "SELECT name, address,
get_factor_1(v_rec) As v_factor1,
get_factor_2(v_rec) As v_factor2,
get_factor_3(v_rec) As v_factor3,
CASE WHEN (get_factor_1(v_rec) +
get_factor_2(v_rec) +
get_factor_3(v_rec)) > 10
THEN 'Y' ELSE 'N' END As v_isMagical
FROM Persons;")
close(conn)
# CREATE XML FILE
doc = newXMLDoc()
root = newXMLNode("Directory", doc = doc)
# WRITE XML NODES AND DATA
for (i in 1:nrow(df)){
personNode = newXMLNode("Person", parent = root)
nameNode = newXMLNode("name", df$name[i], parent = personNode)
addressNode = newXMLNode("address", df$address[i], parent = personNode)
magicaladdress1Node = newXMLNode("MagicalAddressFactor1", df$v_factor1[i], parent = personNode)
magicaladdress2Node = newXMLNode("MagicalAddressFactor2", df$v_factor2[i], parent = personNode)
magicaladdress3Node = newXMLNode("MagicalAddressFactor3", df$v_factor3[i], parent = personNode)
ismagicalNode = newXMLNode("IsMagicalAddress", df$v_isMagical[i], parent = personNode)
}
# OUTPUT XML CONTENT TO FILE
saveXML(doc, file="PostgreXML_R.xml")
print("Successfully migrated SQL to XML data!")

Your code has three issues:
FOR IN variable LOOP doesn't work - if you really needs dynamic SQL then you have to use form FOR IN EXECUTE variable, but better is directly write the SQL query,
But, it cannot by fast, if persons are more than few
iteration over expensive cycle body is slow,
string concatenation is expensive
The output XML can be wrong, because you are missing escaping.
Last two points are solved pretty well by SQL/XML functions - I'll write only simple example - but it is really pretty strong ANSI/SQL feature (supported by Postgres).
SELECT xmlelement(NAME "Directory",
xmlagg(xmlelement(NAME "Person",
xmlforest(name AS "Name",
address AS "Address"))))
FROM persons;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Build an XML with XMLELEMENT - ORACLE SQL 11g query - sql

Related

How can I automatically infer schemas of CSV files on S3 as I load them?

ATG - How to override RQL for insert statements

Converting StructType to Avro Schema, returns type as Union when using databricks spark-avro

Multiple SQL statements using Groovy

How to Build an XML Document Incrementally in PL/pgSQL

Categories

Resources