Is there any way to select some of values for SUMO to write to the output file - file-io

I would like to generate the FullOutput file in SUMO, but in https://sumo.dlr.de/docs/Simulation/Output/FullOutput.html we can see that, FullOutput file seems like that:
<full-export>
<data timestep="<TIME_STEP>">
<vehicles>
<vehicle id="<VEHICLE_ID>" eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>" electricity="<VEHICLE_ELECTRICITY>" noise="<VEHICLE_NOISE>" route="<VEHICLE_ROUTE>" type="<VEHICLE_TYPE>"
waiting="<VEHICLE_WAITING>" lane="<VEHICLE_LANE>" pos_lane="<VEHICLE_POS_LANE>" speed="<VEHICLE_SPEED>"
angle="<VEHICLE_ANGLE>" x="<VEHICLE_POS_X>" y="<VEHICLE_POS_Y>"/>
... more vehicles ...
</vehicles>
<edges>
<edge id="<EDGE_ID>" traveltime="<EDGE_TRAVELTIME>">
<lane id="<LANE_ID>" co="<LANE_CO>" co2="<LANE_CO2>" nox="<LANE_NOX>" pmx="<LANE_CO>"
hc="<LANE_HC>" noise="<LANE_NOISE>" fuel="<LANE_FUEL>" electricity="<LANE_ELECTRICITY>" maxspeed="<LANE_MAXSPEED>" meanspeed="<LANE_MEANSPEED>"
occupancy="<LANE_OCCUPANCY>" vehicle_count="<LANE_VEHICLES_COUNT>"/>
... more lanes of the edge if exists
</edge>
... more edges of the network
</edges>
<tls>
<trafficlight id="0/0" state="GgGr"/>
... more traffic lights
</tls>
</data>
... the next timestep ...
</full-export>
The outputed .xml file is too big, usually more than 1GB, and it contains a lot of values, such as
eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>"
which I don't need.
So I wonder, is there any way to select some of values I need to output?

You have different options here:
Use a different output. Maybe fcd-output is already enough. It contains all the vehicle positions and you can usually aggregate it yourself to edges if you want to. Furthermore fcd-output can also be given a list of attributes to write using --fcd-output.attributes (full-output does not have this feature).
Filter the output directly. Instead of giving an output file you can give a socket connection and sumo will direct the output there. See https://github.com/eclipse/sumo/blob/main/tests/complex/sumo/socketout/runner.py for an example.
If you are on Linux use a named pipe Example of using named pipes in Linux shell (Bash) and filter yourself
Redirect the output to the xml2csv.py script which removes at least the XML overhead and it may be easier to remove columns in a csv files depending on your setup.

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

imported .owl files have #'s in prefixes vs original rdf4j triplestore

When I import the dump "PathwayCommons12.All.BIOPAX.owl.gz" (linked from this page) of this Virtuoso triplestore, I've noticed that there are "#"s inserted after the prefix of various URIs.
In particular, the following query runs on the original endpoint:
# Query 1
PREFIX pfx: <http://pathwaycommons.org/pc12/>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
But to get it running on the local endpoint (imported owl dump) I have to add a "#" to the end of pfx: like:
# Query 2
PREFIX pfx: <http://pathwaycommons.org/pc12/#>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
Note that Query 1 works only on the original endpoint, while Query 2 works only on the local endpoint.
What is going on here?
If we look at the first few lines of that massive RDF/XML file, we see:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#"
xml:base="http://pathwaycommons.org/pc12/">
<owl:Ontology rdf:about="">
<owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#" />
</owl:Ontology>
<bp:ExperimentalForm rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0">
<bp:comment rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">[ExperimentalFormVocabulary_bait]</bp:comment>
<bp:experimentalFormDescription rdf:resource="#ExperimentalFormVocabulary_701737e5cf53d06134cbd3ee59611827" />
</bp:ExperimentalForm>
Note the value of the rdf:ID attribute here: "ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0". This is a relative URI, and needs to be resolved against the base URI (which is declared in the document header: "http://pathwaycommons.org/pc12/"). How this resolution is supposed to happen is described in section 2.14 of the RDF/XML syntax specifcation:
The rdf:ID attribute on a node element (not property element, that has another meaning) can be used instead of rdf:about and gives a relative IRI equivalent to # concatenated with the rdf:ID attribute value. So for example if rdf:ID="name", that would be equivalent to rdf:about="#name".
(emphasis mine)
Example 16 in the specification illustrates this further.
What it comes down to is that in parsing this RDF/XML, the values supplied as rdf:ID attributes all resolve to http://pathwaycommons.org/pc12/#<ID>. So the result you're getting in GraphDB is correct for the given input. Why it is different in the Virtuoso endpoint I don't know: either they used a different input file, or they have a bug in their parser, or whatever tool was used to produce this dump file contains a bug.
It is probably safe to say that the intent of whoever created the dump file was that
rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0" would resolve to the IRI http://pathwaycommons.org/pc12/ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0 (that, is without the added # character). There are several ways to fix this in the file: either replace all occurrences of rdf:ID with rdf:about, or else don't rely on relative URI resolution and just use the full URI as the rdf:ID value.

ssis import xml attributes as elements

I have the following (this is just a sample) xml that is received from a third party (and we have no influence on changing the structure) that we need to import to SQL Server. Each of these files have multiple top level nodes (excuse me if the terminology is incorrect, but I mean the "CardAuthorisation" element). So some are CardFee, Financial etc etc
The issue is that the detail is in attributes. This file is from a new vendor. There is an xml file currently being received from another vendor which is a lot easier to import as the data is in elements and not in attributes.
Here is a sample:
<CardAuthorisation>
<RecType>ADV</RecType>
<AuthId>32397275</AuthId>
<AuthTxnID>11606448</AuthTxnID>
<LocalDate>20140612181918</LocalDate>
<SettlementDate>20140612</SettlementDate>
<Card PAN="2009856214560271" product="MCRD" programid="DUMMY1" branchcode=""></Card>
<Account no="985621456" type="00"></Account>
<TxnCode direction="debit" Type="atm" Group="fee" ProcCode="30" Partial="NA" FeeWaivedOff="0"></TxnCode>
<TxnAmt value="0.0000" currency="826"></TxnAmt>
<CashbackAmt value="0.00" currency="826"></CashbackAmt>
<BillAmt value="0.00" currency="826" rate="1.00"></BillAmt>
<ApprCode>476274</ApprCode>
<Trace auditno="305330" origauditno="305330" Retrefno="061200002435"></Trace>
<MerchCode>BOIA </MerchCode>
<Term code="S1A90971" location="PO NORFOLK STR 3372308 CAMBRIDGESHI3 GBR" street="" city="" country="GB" inputcapability="5" authcapability="7"></Term>
<Schema>MCRD</Schema>
<Txn cardholderpresent="0" cardpresent="yes" cardinputmethod="5" cardauthmethod="1" cardauthentity="1"></Txn>
<MsgSource value="74" domesticMaestro="yes"></MsgSource>
<PaddingAmt value="0.00" currency="826"></PaddingAmt>
<Rate_Fee value="0.00"></Rate_Fee>
<Fixed_Fee value="0.20"></Fixed_Fee>
<CommissionAmt value="0.20" currency="826"></CommissionAmt>
<Classification RCC="" MCC="6011"></Classification>
<Response approved="YES" actioncode="0" responsecode="00" additionaldesc=" PO NORFOLK STR 3372308 CAMBRIDGESHI3 GBR"></Response>
<OrigTxnAmt value="0.00" currency="826"></OrigTxnAmt>
<ReversalReason></ReversalReason>
</CardAuthorisation>
And what we need to do is be able to import this to various tables (one for each top level element type).
So for example CardAuthorisation should be imported to the "Authorisation" table, the CardFinancial should go to the "Financial" table etc.
So the question is what is the best method to employ to import this data.
Having read a bit, I understand xslt can be used for this and would be able to make the above into:
<CardAuthorisation>
<RecType>ADV</RecType>
<AuthId>32397275</AuthId>
<AuthTxnID>11606448</AuthTxnID>
<LocalDate>20140612181918</LocalDate>
<SettlementDate>20140612</SettlementDate>
<PAN>"2009856214560271"</PAN>
<product>MCRD</product>
<programid>DUMMY1</programid>
<branchcode>1</branchcode>
<Accountno>"985621456"</Accountno>
<type>"00"</type>
<TxnCodedirection>"debit"</TxnCodedirection
<TxnCodeType>"atm" </TxnCodeType>
<TxnCodeGroup>"fee" </TxnCodeGroup>
<TxnCodeProcCode>"30" </TxnCodeProcCode>
<TxnCodePartial>"NA" </TxnCodePartial>
<TxnCodeFeeWaivedOff>"0"</TxnCodeFeeWaivedOff>
<TxnAmtvalue>"0.0000"</TxnAmtvalue>
<TxnAmtcurrency>"826"</TxnAmtcurrency>
<CashbackAmtvalue>"0.00"</CashbackAmtvalue>
<CashbackAmtcurrency>"826"</CashbackAmtcurrency>
<BillAmtvalue>"0.00" </BillAmtvalue>
<BillAmtcurrency>"826" </BillAmtcurrency>
<BillAmtrate=>1.00"></BillAmtrate>
<ApprCode>476274</ApprCode>
etc etc
</CardAuthorisation>
But the info I read was quite old (4-5 yrs old) and I know SSIS is always being improved so not sure if it was still valid advice today?
Thanks in advance for your thoughts.

How to make LIKE in SQL look for specific string instead of just a wildcard

My SQL Query:
SELECT
[content_id] AS [LinkID]
, dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name'))) AS [Physician Name]
FROM
[DB].[dbo].[table1]
WHERE
[id] = '188'
AND
(content LIKE '%Urology%')
AND
(contentS = 'A')
ORDER BY
--[content_title]
dbo.usp_ClearHTMLTags(CONVERT(nvarchar(600), CAST([content_html] AS XML).query('root/Physicians/name')))
The issue I am having is, if the content is Neurology or Urology it appears in the result.
Is there any way to make it so that if it's Urology, it will only give Urology result and if it's Neurology, it will only give Neurology result.
It can be Urology, Neurology, Internal Medicine, etc. etc... So the two above used are what is causing the issue.
The content is a ntext column with XML tag inside, for example:
<root><Location><location>Office</location>
<office>Office</office>
<Address><image><img src="Rd.jpg?n=7513" /></image>
<Address1>1 Road</Address1>
<Address2></Address2>
<City>Qns</City>
<State>NY</State>
<zip>14404</zip>
<phone>324-324-2342</phone>
<fax></fax>
<general></general>
<from_north></from_north>
<from_south></from_south>
<from_west></from_west>
<from_east></from_east>
<from_connecticut></from_connecticut>
<public_trans></public_trans>
</Address>
</Location>
</root>
With the update this content column has the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Physicians>
<name>Doctor #1</name>
<picture>
<img src="phys_lab coat_gradation2.jpg?n=7529" />
</picture>
<gender>M</gender>
<langF1>
English
</langF1>
<specialty>
<a title="Neurology" href="neu.aspx">Neurology</a>
</specialty>
</Physicians>
</root>
If I search for Lab the result appears because there is the text lab in the column.
This is what I would do if you're not into making a CLR proc to use Regexes (SQL Server doesn't have regex capabilities natively)
SELECT
[...]
WHERE
(content LIKE #strService OR
content LIKE '%[^a-z]' + #strService + '[^a-z]%' OR
content LIKE #strService + '[^a-z]%' OR
content LIKE '%[^a-z]' + #strService)
This way you check to see if content is equal to #strService OR if the word exists somewhere within content with non-letters around it OR if it's at the very beginning or very end of content with a non-letter either following or preceding respectively.
[^...] means "a character that is none of these". If there are other characters you don't want to accept before or after the search query, put them in every 4 of the square brackets (after the ^!). For instance [^a-zA-Z_].
As I see it, your options are to either:
Create a function that processes a string and finds a whole match inside it
Create a CLR extension that allows you to call .NET code and leverage the REGEX capabilities of .NET
Aaron's suggestion is a good one IF you can know up front all the terms that could be used for searching. The problem I could see is if someone searches for a specific word combination.
Databases are notoriously bad at semantics (i.e. they don't understand the concept of neurology or urology - everything is just a string of characters).
The best solution would be to create a table which defines the terms (two columns, PK and the name of the term).
The query is then a join:
join table1.term_id = terms.term_id and terms.term = 'Urology'
That way, you can avoid the LIKE and search for specific results.
If you can't do this, then SQL is probably the wrong tool. Use LIKE to get a set of results which match and then, in an imperative programming language, clean those results from unwanted ones.
Judging from your content, can you not leverage the fact that there are quotes in the string you're searching for?
SELECT
[...]
WHERE
(content LIKE '%""Urology""%')

Modify entry in OpenLDAP directory

I have a large Openldap directory. In the directory the display name property for every is filled but i need to modify these entry and make it like "givenName + + sn". Is there are way i can do it directly in the directory just like sql queries (update query). I have read about the ldapmodify but could not find the way to use it like this.
Any help in this regard will be appreciated.
There is no way to do this with a single LDAP API call. You'll always have to use one LDAP search operation to get givenname and sn attributes, and one LDAP modify operation to modify the displayName attribute.
If you use the command line ldaptools "ldapsearch" and "ldapmodify", you can do this easily with some shell scripting, but you'll have to be careful: sometimes ldapsearch(1) can return LDIF data in base64 format, with UTF-8 strings that contain characters beyond ascii. For instance: 'sn:: Base64data' (note the double ':')
So, if I were you I would use a simple script in my language of choice, that has an LDAP API, instead of using shell commands. This would save me the troubles of base64 decoding that the ldaptools sometimes impose.
For instance, with php-cli, your script would be roughly like this (perhaps some more error checking would be appropriate):
<?php
$ldap = ldap_connect('host');
ldap_bind($ldap, ...);
$sr = ldap_search($ldap, 'ou=people,...', 'objectclass=*');
$entries= ldap_get_entries($ldap, $sr);
for($i=0; $i<$entries['count']; $i++) {
$modify = array('displayname' => $entries[$i]['givenname'] . ' ' . $entries[$i]['sn']);
ldap_modify($ldap, $entries[$i]['dn'], $modify);
}
Addendum: if you want to keep this data up to date without any intervention, you will probably need to use a specialized OpenLDAP module that keeps "virtual" attributes, or even a virtual directory, such as Penrose or Oracle Virtual Directory, on top of OpenLDAP. However this might be overkill for a simple concatenation of attributes.