ssis import xml attributes as elements - sql

I have the following (this is just a sample) xml that is received from a third party (and we have no influence on changing the structure) that we need to import to SQL Server. Each of these files have multiple top level nodes (excuse me if the terminology is incorrect, but I mean the "CardAuthorisation" element). So some are CardFee, Financial etc etc
The issue is that the detail is in attributes. This file is from a new vendor. There is an xml file currently being received from another vendor which is a lot easier to import as the data is in elements and not in attributes.
Here is a sample:
<CardAuthorisation>
<RecType>ADV</RecType>
<AuthId>32397275</AuthId>
<AuthTxnID>11606448</AuthTxnID>
<LocalDate>20140612181918</LocalDate>
<SettlementDate>20140612</SettlementDate>
<Card PAN="2009856214560271" product="MCRD" programid="DUMMY1" branchcode=""></Card>
<Account no="985621456" type="00"></Account>
<TxnCode direction="debit" Type="atm" Group="fee" ProcCode="30" Partial="NA" FeeWaivedOff="0"></TxnCode>
<TxnAmt value="0.0000" currency="826"></TxnAmt>
<CashbackAmt value="0.00" currency="826"></CashbackAmt>
<BillAmt value="0.00" currency="826" rate="1.00"></BillAmt>
<ApprCode>476274</ApprCode>
<Trace auditno="305330" origauditno="305330" Retrefno="061200002435"></Trace>
<MerchCode>BOIA </MerchCode>
<Term code="S1A90971" location="PO NORFOLK STR 3372308 CAMBRIDGESHI3 GBR" street="" city="" country="GB" inputcapability="5" authcapability="7"></Term>
<Schema>MCRD</Schema>
<Txn cardholderpresent="0" cardpresent="yes" cardinputmethod="5" cardauthmethod="1" cardauthentity="1"></Txn>
<MsgSource value="74" domesticMaestro="yes"></MsgSource>
<PaddingAmt value="0.00" currency="826"></PaddingAmt>
<Rate_Fee value="0.00"></Rate_Fee>
<Fixed_Fee value="0.20"></Fixed_Fee>
<CommissionAmt value="0.20" currency="826"></CommissionAmt>
<Classification RCC="" MCC="6011"></Classification>
<Response approved="YES" actioncode="0" responsecode="00" additionaldesc=" PO NORFOLK STR 3372308 CAMBRIDGESHI3 GBR"></Response>
<OrigTxnAmt value="0.00" currency="826"></OrigTxnAmt>
<ReversalReason></ReversalReason>
</CardAuthorisation>
And what we need to do is be able to import this to various tables (one for each top level element type).
So for example CardAuthorisation should be imported to the "Authorisation" table, the CardFinancial should go to the "Financial" table etc.
So the question is what is the best method to employ to import this data.
Having read a bit, I understand xslt can be used for this and would be able to make the above into:
<CardAuthorisation>
<RecType>ADV</RecType>
<AuthId>32397275</AuthId>
<AuthTxnID>11606448</AuthTxnID>
<LocalDate>20140612181918</LocalDate>
<SettlementDate>20140612</SettlementDate>
<PAN>"2009856214560271"</PAN>
<product>MCRD</product>
<programid>DUMMY1</programid>
<branchcode>1</branchcode>
<Accountno>"985621456"</Accountno>
<type>"00"</type>
<TxnCodedirection>"debit"</TxnCodedirection
<TxnCodeType>"atm" </TxnCodeType>
<TxnCodeGroup>"fee" </TxnCodeGroup>
<TxnCodeProcCode>"30" </TxnCodeProcCode>
<TxnCodePartial>"NA" </TxnCodePartial>
<TxnCodeFeeWaivedOff>"0"</TxnCodeFeeWaivedOff>
<TxnAmtvalue>"0.0000"</TxnAmtvalue>
<TxnAmtcurrency>"826"</TxnAmtcurrency>
<CashbackAmtvalue>"0.00"</CashbackAmtvalue>
<CashbackAmtcurrency>"826"</CashbackAmtcurrency>
<BillAmtvalue>"0.00" </BillAmtvalue>
<BillAmtcurrency>"826" </BillAmtcurrency>
<BillAmtrate=>1.00"></BillAmtrate>
<ApprCode>476274</ApprCode>
etc etc
</CardAuthorisation>
But the info I read was quite old (4-5 yrs old) and I know SSIS is always being improved so not sure if it was still valid advice today?
Thanks in advance for your thoughts.

Related

Feature and FeatureView versioning

my team is interested in a feature store solution that enables rapid experimentation of features, probably using feature versioning. In the Feast slack history, I found
#Benjamin Tan’s post that explains their feast workflow, and they explain FeatureView versioning:
insights_v1 = FeatureView(
features=[
Feature(name="insight_type", dtype=ValueType.STRING)
]
)
insights_v2 = FeatureView(
features=[
Feature(name="customer_id", dtype=ValueType.STRING)
Feature(name="insight_type", dtype=ValueType.STRING)
]
)
Is this the recommended best practice for FeatureView versioning? It looks like Features do not have a version field. Is there a recommended strategy for Feature versioning?
Creating a new column for each Feature version is one approach:
driver_rating_v1
driver_rating_v2
But that could get unwieldy if we want to experiment with dozens of permutations of the same Feature.
Featureform appears to have support for feature versions through the "variant" field, but their documentation is a bit unclear.
Adding additional clarity on Featureform: Variant is analogous to version. You'd supply a string which then becomes an immutable identifier for the version of the transformation, source, etc. Variant is one of the common metadata fields provided in the Featureform API.
Using the example of an ecommerce dataset & spark, here's an example of using the variant field to version a source (a parquet file in this case):
orders = spark.register_parquet_file(
name="orders",
variant="default",
description="This is the core dataset. From each order you might find all other information.",
file_path="path_to_file",
)
You can set the variant variable ahead of time:
VERSION="v1" # You can change this to rerun the definitions with with new variants
orders = spark.register_parquet_file(
name="orders",
variant=f"{VERSION}",
description="This is the core dataset. From each order you might find all other information.",
file_path="path_to_file",
)
And you can create versions or variants of the transformations -- here I'm taking a dataframe called total_paid_per_customer_per_day and aggregating it.
# Get average order value per day
#spark.df_transformation(inputs=[("total_paid_per_customer_per_day", "default")], variant="skeller88_20220110")
def average_daily_transaction(df):
from pyspark.sql.functions import mean
return df.groupBy("day_date").agg(mean("total_customer_order_paid").alias("average_order_value"))
There are some more details on the Featureform CLI here: https://docs.featureform.com/getting-started/interact-with-the-cli

Is there any way to select some of values for SUMO to write to the output file

I would like to generate the FullOutput file in SUMO, but in https://sumo.dlr.de/docs/Simulation/Output/FullOutput.html we can see that, FullOutput file seems like that:
<full-export>
<data timestep="<TIME_STEP>">
<vehicles>
<vehicle id="<VEHICLE_ID>" eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>" electricity="<VEHICLE_ELECTRICITY>" noise="<VEHICLE_NOISE>" route="<VEHICLE_ROUTE>" type="<VEHICLE_TYPE>"
waiting="<VEHICLE_WAITING>" lane="<VEHICLE_LANE>" pos_lane="<VEHICLE_POS_LANE>" speed="<VEHICLE_SPEED>"
angle="<VEHICLE_ANGLE>" x="<VEHICLE_POS_X>" y="<VEHICLE_POS_Y>"/>
... more vehicles ...
</vehicles>
<edges>
<edge id="<EDGE_ID>" traveltime="<EDGE_TRAVELTIME>">
<lane id="<LANE_ID>" co="<LANE_CO>" co2="<LANE_CO2>" nox="<LANE_NOX>" pmx="<LANE_CO>"
hc="<LANE_HC>" noise="<LANE_NOISE>" fuel="<LANE_FUEL>" electricity="<LANE_ELECTRICITY>" maxspeed="<LANE_MAXSPEED>" meanspeed="<LANE_MEANSPEED>"
occupancy="<LANE_OCCUPANCY>" vehicle_count="<LANE_VEHICLES_COUNT>"/>
... more lanes of the edge if exists
</edge>
... more edges of the network
</edges>
<tls>
<trafficlight id="0/0" state="GgGr"/>
... more traffic lights
</tls>
</data>
... the next timestep ...
</full-export>
The outputed .xml file is too big, usually more than 1GB, and it contains a lot of values, such as
eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>"
which I don't need.
So I wonder, is there any way to select some of values I need to output?
You have different options here:
Use a different output. Maybe fcd-output is already enough. It contains all the vehicle positions and you can usually aggregate it yourself to edges if you want to. Furthermore fcd-output can also be given a list of attributes to write using --fcd-output.attributes (full-output does not have this feature).
Filter the output directly. Instead of giving an output file you can give a socket connection and sumo will direct the output there. See https://github.com/eclipse/sumo/blob/main/tests/complex/sumo/socketout/runner.py for an example.
If you are on Linux use a named pipe Example of using named pipes in Linux shell (Bash) and filter yourself
Redirect the output to the xml2csv.py script which removes at least the XML overhead and it may be easier to remove columns in a csv files depending on your setup.

Issue regarding the Attribute Names

Xml Document
I am having a problem regarding the xml attribute names coming from sharepoint which contains the attributes names like description0,ows_x0020_long_desc coming in the xmldoc
<z:row ows_LinkFilename="Aerospace Energy.jpg"
ows_Title="Aerospace"
ows_ContentType="Image"
ows__ModerationStatus="0"
ows_PreviewOnForm="Aerospace Energy.jpg"
ows_ThumbnailOnForm="Technology Experience/Aerospace Energy.jpg"
ows_Modified="2011-12-07 12:02:34"
ows_Editor="1073741823;#System Account"
ows_Description0="Honeywell's SmartPath® Ground-Based Augmentation System (GBAS), which offers airports improved efficiency and capacity, greater navigational accuracy, and fewer weather-related delays."
ows_ID="28"
ows_Created="2011-12-02 11:26:01"
ows_Author="1073741823;#System Account"
ows_FileSizeDisplay="6091"
ows_Mode="Energy"
ows_Solution="Business"
ows_Long_x0020_Desc="Honeywell's SmartTraffic™ and IntuVue® 3-D Weather Radar technologies make the skies safer and enable pilots to more efficiently route flights. SmartTraffic ."
ows_Brief_x0020_Desc="Honeywell's Required Navigation Performance (RNP) capabilities enable aircraft to fly more precise approaches through tight corridors and congested airports, leading to fewer delays."
ows_Tags="True"
ows__Level="1"
ows_UniqueId="28;#{928FDA3E-94FA-47A5-A9AD-B5D98C12C18C}"
ows_FSObjType="28;#0"
ows_Created_x0020_Date="28;#2011-12-02 11:26:01"
ows_ProgId="28;#"
ows_FileRef="28;#Technology Experience/Aerospace Energy.jpg"
ows_DocIcon="jpg"
ows_MetaInfo="28;#Solution:SW|Business vti_thumbnailexists:BW|true vti_parserversion:SR|14.0.0.4762 Category:SW|Enter Choice #1 Description0:LW|Honeywell's SmartPath® Ground-Based Augmentation System (GBAS), which offers airports improved efficiency and capacity, greater navigational accuracy, and fewer weather-related delays. vti_stickycachedpluggableparserprops:VX|wic_XResolution Subject vti_lastheight vti_title vti_lastwidth wic_YResolution oisimg_imageparsedversion vti_lastwidth:IW|294 vti_author:SR|SHAREPOINT\\system vti_previewexists:BW|true vti_modifiedby:SR|SHAREPOINT\\system Long Desc:LW|Honeywell's SmartTraffic™ and IntuVue® 3-D Weather Radar technologies make the skies safer and enable pilots to more efficiently route flights. SmartTraffic . Keywords:LW| vti_foldersubfolderitemcount:IR|0 vti_lastheight:IW|172 ContentTypeId:SW|0x0101009148F5A04DDD49CBA7127AADA5FB792B00AADE34325A8B49CDA8BB4DB53328F21400623D4FCEEB2ADC4EA8269BF873F0BB6F _Author:SW| vti_title:SW|Aerospace wic_System_Copyright:SW| Mode:SW|Energy Tags:SW|True wic_YResolution:DW|96.0000000000000 oisimg_imageparsedversion:IW|4 Brief Desc:LW|Honeywell's Required Navigation Performance (RNP) capabilities enable aircraft to fly more precise approaches through tight corridors and congested airports, leading to fewer delays. _Comments:LW| wic_XResolution:DW|96.0000000000000 Subject:SW|Aerospace vti_folderitemcount:IR|0"
ows_Last_x0020_Modified="28;#2011-12-07 12:02:34"
ows_owshiddenversion="6"
ows_FileLeafRef="28;#Aerospace Energy.jpg"
ows_PermMask="0x7fffffffffffffff"
xmlns:z="#RowsetSchema" />
Could you please tell the solution for this.
SharePoint when returning data in xml will always use this fromat.
Field names will be prepended by ows_
Internal names of field will be used not display names.
Internal field names in SharePoint contain unicode equivalents for special characters
e.g. if you create a field with name 'Field Name' from SharePoint UI,
SharePoint will create internal name as 'Field_x0020_Name'
where 0020 is unicode representation of space.
If fields are created by code or feature however you can specify your own internal and display names.
So if you are parsing such xml you will have to code remembering these rules.
SharePoint does not add x0020 escape sequence in field's internal name unless there is a space in the display name while creating the field from UI.
Also once the field is created, changing the display name has no effect on the internal name of a field.
So if you create a field 'Long Desc' from UI and the later change the name to 'LongDesc', the internal name will still be Long_x0020_Desc.

Issue regarding the attributes in sharepoint [duplicate]

Xml Document
I am having a problem regarding the xml attribute names coming from sharepoint which contains the attributes names like description0,ows_x0020_long_desc coming in the xmldoc
<z:row ows_LinkFilename="Aerospace Energy.jpg"
ows_Title="Aerospace"
ows_ContentType="Image"
ows__ModerationStatus="0"
ows_PreviewOnForm="Aerospace Energy.jpg"
ows_ThumbnailOnForm="Technology Experience/Aerospace Energy.jpg"
ows_Modified="2011-12-07 12:02:34"
ows_Editor="1073741823;#System Account"
ows_Description0="Honeywell's SmartPath® Ground-Based Augmentation System (GBAS), which offers airports improved efficiency and capacity, greater navigational accuracy, and fewer weather-related delays."
ows_ID="28"
ows_Created="2011-12-02 11:26:01"
ows_Author="1073741823;#System Account"
ows_FileSizeDisplay="6091"
ows_Mode="Energy"
ows_Solution="Business"
ows_Long_x0020_Desc="Honeywell's SmartTraffic™ and IntuVue® 3-D Weather Radar technologies make the skies safer and enable pilots to more efficiently route flights. SmartTraffic ."
ows_Brief_x0020_Desc="Honeywell's Required Navigation Performance (RNP) capabilities enable aircraft to fly more precise approaches through tight corridors and congested airports, leading to fewer delays."
ows_Tags="True"
ows__Level="1"
ows_UniqueId="28;#{928FDA3E-94FA-47A5-A9AD-B5D98C12C18C}"
ows_FSObjType="28;#0"
ows_Created_x0020_Date="28;#2011-12-02 11:26:01"
ows_ProgId="28;#"
ows_FileRef="28;#Technology Experience/Aerospace Energy.jpg"
ows_DocIcon="jpg"
ows_MetaInfo="28;#Solution:SW|Business vti_thumbnailexists:BW|true vti_parserversion:SR|14.0.0.4762 Category:SW|Enter Choice #1 Description0:LW|Honeywell's SmartPath® Ground-Based Augmentation System (GBAS), which offers airports improved efficiency and capacity, greater navigational accuracy, and fewer weather-related delays. vti_stickycachedpluggableparserprops:VX|wic_XResolution Subject vti_lastheight vti_title vti_lastwidth wic_YResolution oisimg_imageparsedversion vti_lastwidth:IW|294 vti_author:SR|SHAREPOINT\\system vti_previewexists:BW|true vti_modifiedby:SR|SHAREPOINT\\system Long Desc:LW|Honeywell's SmartTraffic™ and IntuVue® 3-D Weather Radar technologies make the skies safer and enable pilots to more efficiently route flights. SmartTraffic . Keywords:LW| vti_foldersubfolderitemcount:IR|0 vti_lastheight:IW|172 ContentTypeId:SW|0x0101009148F5A04DDD49CBA7127AADA5FB792B00AADE34325A8B49CDA8BB4DB53328F21400623D4FCEEB2ADC4EA8269BF873F0BB6F _Author:SW| vti_title:SW|Aerospace wic_System_Copyright:SW| Mode:SW|Energy Tags:SW|True wic_YResolution:DW|96.0000000000000 oisimg_imageparsedversion:IW|4 Brief Desc:LW|Honeywell's Required Navigation Performance (RNP) capabilities enable aircraft to fly more precise approaches through tight corridors and congested airports, leading to fewer delays. _Comments:LW| wic_XResolution:DW|96.0000000000000 Subject:SW|Aerospace vti_folderitemcount:IR|0"
ows_Last_x0020_Modified="28;#2011-12-07 12:02:34"
ows_owshiddenversion="6"
ows_FileLeafRef="28;#Aerospace Energy.jpg"
ows_PermMask="0x7fffffffffffffff"
xmlns:z="#RowsetSchema" />
Could you please tell the solution for this.
SharePoint when returning data in xml will always use this fromat.
Field names will be prepended by ows_
Internal names of field will be used not display names.
Internal field names in SharePoint contain unicode equivalents for special characters
e.g. if you create a field with name 'Field Name' from SharePoint UI,
SharePoint will create internal name as 'Field_x0020_Name'
where 0020 is unicode representation of space.
If fields are created by code or feature however you can specify your own internal and display names.
So if you are parsing such xml you will have to code remembering these rules.
SharePoint does not add x0020 escape sequence in field's internal name unless there is a space in the display name while creating the field from UI.
Also once the field is created, changing the display name has no effect on the internal name of a field.
So if you create a field 'Long Desc' from UI and the later change the name to 'LongDesc', the internal name will still be Long_x0020_Desc.

Creating, Visualizing and Querying simple Data Structures

Simple and common tree like data structures
Data Structure example
Animated Cartoons have 4 extremities (arm, leg,limb..)
Human have 4 ext.
Insects have 6 ext.
Arachnids have 6 ext.
Animated Cartoons have 4 by extremity
Human have 5 by ext.
Insects have 1 by ext.
Arachnids have 1 by ext.
Some Kind of Implementation
Level/Table0
Quantity, Item
Level/Table1
ItemName, Kingdom
Level/Table2
Kingdom, NumberOfExtremities
Level/Table3
ExtremityName, NumberOfFingers
Example Dataset
1 Homer Simpson, 1 Ralph Wiggum, 2 jon
skeet, 3 Atomic ant, 2 Shelob (spider)
Querying.. "Number of fingers"
Number = 1*4*4 + 1*4*4 + 1*4*5 + 3*6*1 + 2*6*1 = 82 fingers (Let Jon be a Human)
I wonder if there is any tool for define it parseable for automatic create the inherited data, and drawing this kind of trees, (with the plus of making this kind of data access, if where posible..)
It could be drawn manually with for example FreeMind, but AFAIK it dont let you define datatype or structures to automatically create inherited branch of items, so it's really annoying to have to repeat and repeat a structure by copying (and with the risk of mistake). Repeated Work over Repeated Data, (an human running repeated code), it's a buggy feature.
So I would like to write the data in the correct language that let me reuse it
for queries and visualization, if all data is in XML, or Java Classes, or in a Database File, etc.. there is some tool for viewing the tree and making the query?
PD : Creating nested folders in a filesystem and using Norton Commander in tree view, is not an option, I hope (just because It have to be builded manually)
Your answer is mostly going to depend on what programming skills you already have and what skills you are willing to acquire. I can tell you what I would do with what I know.
I think for drawing trees you want a LaTeX package like qtree. If you don't like this one, there are a bunch of others out there. You'd have to write a script in whatever your favorite scripting language is to parse your input into the LaTeX code to generate the trees, but this could easily be done with less than 100 lines in most languages, if I properly understand your intentions. I would definitely recommend storing your data in an XML format using a library like Ruby's REXML, or whatever your favorite scripting language has.
If you are looking to generate more interactive trees, check out the Adobe Flex Framework. Again, if you don't like this specific framework, there are bunches of others out there (I recommend the blog FlowingData).
Hope this helps and I didn't miserably misunderstand your question.
Data structure that You are describing looks like it can fit in xml format. Take a look at Exist XML database, and if I can say so it is the most complete xml database. It comes with many tools to get you started fast ! like XQuery Sandbox option in admin http interface.
Example Dataset
1 Homer Simpson, 1 Ralph Wiggum, 2 jon skeet, 3 Atomic ant, 2 Shelob (spider)
I am assuming that there are 2 instances of jon skeet, 3 instances of Atomic ant and 2 instances of Shelob
Here is a XQuery example:
let $doc :=
<root>
<definition>
<AnimatedCartoons>
<extremities>4</extremities>
<fingers_per_ext>4</fingers_per_ext>
</AnimatedCartoons>
<Human>
<extremities>4</extremities>
<fingers_per_ext>5</fingers_per_ext>
</Human>
<Insects>
<extremities>6</extremities>
<fingers_per_ext>1</fingers_per_ext>
</Insects>
<Arachnids>
<extremities>6</extremities>
<fingers_per_ext>1</fingers_per_ext>
</Arachnids>
</definition>
<subject><name>Homer Simpson</name><kind>AnimatedCartoons</kind></subject>
<subject><name>Ralph Wiggum</name><kind>AnimatedCartoons</kind></subject>
<subject><name>jon skeet</name><kind>Human</kind></subject>
<subject><name>jon skeet</name><kind>Human</kind></subject>
<subject><name>Atomic ant</name><kind>Insects</kind></subject>
<subject><name>Atomic ant</name><kind>Insects</kind></subject>
<subject><name>Atomic ant</name><kind>Insects</kind></subject>
<subject><name>Shelob</name><kind>Arachnids</kind></subject>
<subject><name>Shelob</name><kind>Arachnids</kind></subject>
</root>
let $definitions := $doc/definition/*
let $subjects := $doc/subject
(: here goes some query logic :)
let $fingers := fn:sum(
for $subject in $subjects
return (
for $x in $definitions
where fn:name($x) = $subject/kind
return $x/extremities * $x/fingers_per_ext
)
)
return $fingers
XML Schema Editor with visualization is perhaps what I am searching for
http://en.wikipedia.org/wiki/XML_Schema_Editor
checking it..