I have a scenario where multiple feature files are using a similar set of data so I want to have one data table within the framework and refer it in multiple feature files/scenarios
Example: Suppose i have 2 feature files feature1 and feature2 both expecting signing data as test data. currently what i am doing is writing signing details in both feature files but I want common data table in one place. Is it possible in the cucumber framework?
Data tables are typically associated strictly with a single Scenario (or Scenario Outline), so sharing across feature files is not possible. While it may seem annoying to have to replicate the data, it aids in keeping processing of the feature file simple, and aids in reading the content off as a human reader.
Related
I hope it’s Ok to post a complete naive question here for LO or OO experts.
I’m looking for advice on whether scripting LibreOffice or OpenOffice would be suitable for the following:
General Question
I’m looking to generate PDF reports, based on a combination of a “template” and a set of data (currently in JSON format) and inserted images.
This would act as a headless service that gets invoked when necessary from a web server, when a user requests a PDF report (on linux).
We have a need to frequently modify/customise/generate new templates, hence the reluctance to go down a route of using something like Reportlab (plus I don't know Reportlab at all, so face huge learning curve that way
Background
This is in contrast to using an approach of using a PDF library like Reportlab directly within the web server, and having to build up the template/report programmatically.
As LibreOffice/OpenOffice is obviously a lot faster for generating good looking report "templates", this is a question about doing both the template generation, plus final template + data -> PDF generation all directly within LibreOffice.
Some more specifics
The data values would mostly either be substituted into fields in the template, with no to minimal processing of values required.
However, there would be situations where some of the data is in “sets” that would be shown in a table type view, and the number of fields (and so number of table rows for instance) would need to vary per report, based on the number of values in that particular JSON data.
Additionally, I’d need to be able to include (“import”) images into the report. Some of the JSON data would be paths to image files, and I’d like to include those. Again for these, the number of image may vary between each report.
This wouldn't be high frequency at all, so would not need to run either LO/OO as a service, but could simply invoke when required with a sys call. Conceptually something like "LibreOffice --template 'make_fancy.report' <data.json> <output_file.pdf>"
If this approach would be reasonable in either LO or OO, what languages are best to script in? (Hopefully python3).
I'm looking for advice on structuring FlatBuffer schemas for a framework which allows users to extend the data types defined by the framework, but also allows the framework developers to add new fields when new versions of the framework are published.
My original thinking was that when you create a project using this framework, it would generate several FlatBuffer schema files which you could then edit for your specific project. You could then compile the schemas and start developing code using the framework APIs.
However, this becomes a problem when the framework developers decide to add fields to the base types. As you probably know, FlatBuffers requires that any additional fields be appended to the end (or at least have a higher ID than other fields). So there is a conflict between the additions made by the framework developer and the framework user.
One possible solution would be to have a set of 'non-user-extensible' types that are owned by the framework creator, and which should not be modified by users of the framework; and these types would then be embedded within the data types defined by the framework user. However, given the restrictions on fields changing size, I am not sure if this would even work.
I'm also willing to hear alternatives to using flatbuffers if it turns out that there is no good solution otherwise.
To have open ended extension like that, you should really have the framework authors and users work in two separate tables.. where one can own the other. There is no good way to extend a single table if all contributors aren't sharing the schema in source control.
If these extensions must be in a single object for whatever reason, then Protocol Buffers is more flexible than FlatBuffers, since it doesn't require adjacent field ids. You can simply say that all field ids >=1000 are for framework users, for example.
In retrospect (answering my own question two years later), it seems that FlatBuffers was not the right choice for my use case. These days I'm using a combination of msgpack (in cases where I care about byte-size) and JSON (in cases where I don't) and I'm pretty happy with each.
I'm new to semantic technologies. I understand what RDF, OWL and Ontologies and other basic terminologies are and how semantic search uses them. When we create a semantic search module using anzo with enterprise search capabilities. It connects with various data sources and creates relationship between them. Now I'm interested in knowing what a semantic tool like anzo does internally.
Does it creates a copy of data on local machine or it hits data sources every time we execute a SPARQL query
If it stores data, is this data stored in its row format or data is stored after cleaning and creating semantic relation between them.
What happens to data after query is executed. How does it get current data every time?
Any thoughts over it would be valuable for me.
Thanks a lot in advance!
Based on your comments, it appears you're using Anzo Graph Query Engine? If so, then the answers to you questions are
A copy of the data is held in memory
Not clear from any of the published information.
It doesn't. You need to load in the data using the 'LOAD' command.
A bit more on 3: You would be responsible for implementing a mechanism to keep the data in here up-to-date with the underlying data source. (which might be as simple as rebuilding the graph from a nightly dump or trying to implement a change data capture against the underlying store which replicated CRUD operations on the graph)
My answers are based on the marketing and support information available on the CambridgeSemantics site.
Dacpac is nice solution for versioning schema and we have to use pre/post deployment to amend the reference data.
Any better solution to do that?
The best way I have seen is to use merge statements, one table per file and import them into your post-deploy script using :r imports.
You get version history and easily comparable data and using sp_generate_merge makes it really simple.
Ed
If you're looking for a solution in SSDT to handle reference that does not involve the use of pre/post-deployment scripts, unfortunately there currently isn't one.
But it is currently one of the most requested features in SSDT so perhaps there's a chance it will get implemented some time in the future.
I am the maintainer of the sp_generate_merge OSS utility Ed mentioned, and at Redgate we recommend this approach to handling reference data in an offline way to our customers, in the following circumstances:
If the data within the table is changing very frequently, as the one-file-per-table approach allows for branching/merging of concurrent changes from multiple developers
If the data in the table contains environment-specific values, like application settings, as this method allows you to use SQLCMD variables and feed the values in from a deployment tool
Where the offline approach can be problematic:
Non-determinism of the MERGE statement: before actually running the deployment against your target environment, it can be difficult to know what changes will be applied (if any). Worst case scenario, you could hit one of the documented issues in MERGE
The workflow isn't necessarily the most natural way to edit data, as it requires running the utility proc and copying+pasting the output back into the original file. Editing the file directly is an alternative, but isn't the most user-friendly experience especially with large amounts of reference data
Co-ordinating changes to both the schema and data within the reference table can be a challenge, given that SSDT is still responsible applying the schema changes. For example, if you want to add a new NOT NULL column without a default.
Another solution involves following an online approach, which is supported by our SSDT-alternative, ReadyRoll database projects. It allows the data to be edited directly in the database and subsequently imported into the project, with the sync script (i.e. containing INSERT, UPDATE, DELETE statements instead of MERGE) generated by its data comparison tool, alongside any schema changes.
You can read more about how the offline and online approaches differ in the ReadyRoll documentation.
I am trying to find a software solution that will allow our group to easily upload datasets (scriptable and or through some UI), tag those datasets, retrieve those datasets, access control for the datasets, search the tags, search the files name/attributes/metadata (e.g. file creation date). The datasets can be anything from CSV files, image(binary) datasets, texts, server logs, folders within folders of images, zip files of csv data. It can be anything. We will need to be storing GBs to potentially PBs of data. A single file can range from a few KB to 100's of GB. Usable API to retrieve these datasets programmatically.
We just want to have a centralized location of finding information and we want to be able to answer a question such as "Hey do you know if we have any lightening strike datasets?" If there is a file/folder/zip file tagged with "lightening" when I search it should pull back that dataset.
A possible solution would be something like Dataverse, Dspace, Fedora Commons, CKAN. However, those seem to be really geared towards academia and publications or small datasets. On top of that they remove any type of complex folder structure that might exist (e.g. Folder1-->subFolder1-->subFolder2). I also question the scalability of having a 10 million 100kb files within one of these systems.
A filesystem share would allow us to simply store whatever we want but I don't know of a reasonable way of enabling tagging of data.
It is almost like I am looking for a combination of the two. Does someone know of a tool preferably open source that would be able to do something like this?
From what you have described so far, DSpace does seem to be a good fit.
With following examples I want to address the concerns you raised:
Scalability
Here's an example of a multi-terabyte item:
https://ore.exeter.ac.uk/repository/handle/10871/14881
Complex structure
Dryad is based on DSpace and uses a more complex data model, with data files, data packages and the original publication each being represented as separate objects:
http://datadryad.org/resource/doi:10.5061/dryad.322vn
If that's what you want, you can also start your project off the Dryad codebase, since this one is open source as well:
https://github.com/datadryad/dryad-repo