Time Domain Data for DTC storage in Autosar Diagnostic - embedded

Autosar Diagnostic is implemented by taking standards of UDS ( ISO 14229).
As per that, once DTC is logged the snap shot data is stored as per UDS. Snap shot data is implemented via freeze frame data concept in Autosar Dem Module.
But I want to save some more information about DTC apart form snap shot data. I want to store data to be stored before 3 second and after 1 second of confirming DTC with sampling of 400 millisecond. So I need to store 10 samples of data every time when DTC gets locked.
I want to implement this time domain data in Autosar Diagnostic. Can I do that?, If yes, How?
Thanks.

We had a customer, that wanted to have almost the same, 15 FreezeFrames, 12 before the failure, one at the failure, and two after that, with a similar cyle. We used a ringbuffer updated cyclically. We used a callout from Dem (either DemCallbackEventStatusChanged() or DemCallbackDTCStatusChanged()), to stop the ringbuffer and count for two more. After they are logged, we stored them in an extra NvM Block. You might have several of these NvM blocks, and link that number to the DemEvent (FF Data?). E.g. the NvM Block could be a NVM_DATASET, so you could use an index. When reading out DTCs look for the assignment and read out the NvM DataSet index.
Otherwise, you might find a way with StorageConditions, disable them at first at first reporting and enabling after the freezeframes are complete?
I don't know though of a Dem feature to support this directly.

I don't really understand where your problem is.
As you mentioned, snapshot data is stored together with the DTC. The content of the snapshot data you can define referencing DIDs. So, you need to define a new (internal) DID (in Dcm) where you provide your time domain data and add this DID in the Dem to the snapshot data (freeze frame).

Related

Storage Use Case "Logging + Images + Metadata"

I have the following use case for which I'm trying to find an optimal use of either filsystem, database (rdbms or a flavour of noSql solution). Any advice is welcome, as I want to see what is optimal.
Client application: will generate logs intervals of 1-3 seconds. By logs I mean structured log data (either about connections, applications used, processes used, screenshots, etc..). Some log data will be structured, some will be unstructured (where the schema can change thus).
Storage solution: will need to persist all this data very fast. Will sit on 1-* server(s). It doesn't matter if it's a hybrid solution between filesystem/rdbms/(any suitable flavour of) noSql.
Post processing: the data needs to be queryable ofcourse. E.g. just a key-value store would not suffice, that's a given (maybe for the screenshots only yes).
As a reference, here's a more concrete example:
User runs the client for 2-3 hours (during a "monitoring period"). It sends log data over the wire to the server (storage). Writing speed and data accuracy is vital here.
Management system accumulates the data and makes a report on certain characteristics. All log data should be able to be fetched if needed - but there will be a specific query for a set of users in a given monitoring period. Reading speed is less necessary here, but data accuracy and finding all log parts back eventually is necessary.
If I need to give more information, please let me know.
If you prefer to roll your own rather than use logging packages, I would stick with append only text files. You can certainly encode screenshots in Base64 and keep it in the same file, but I would rather store that separately in the file system with a generated filename stored in the log.
As for reporting, you can obviously read it through a text editor, but if you need a more sophisticated and regular management reporting, you can create an ETL of only the info you report on into a RDBMS. You can always go back and rerun ETL if you decide that you want more info later on.

Should the first step of an SSIS solution be to check connections?

In SSIS 2012, how would I check whether a connection and credentials are working?
In a scenario where you must pull and aggregate data from different sources and load it into some target databases, should the whole project be preceeded by a special package that checks whether the connections are up and running?
If so, how would I check whether the connections to which the package needs access are up and ready to be connected to and that the credentials are right?
I understand that if I omit the step of checking the connections, the whole thing will just error out, but I feel like the best practice would be to explicitly check whether the connections and credentials are working.
Thank you very much for your kind attention.
Apparently your check would be as Siyual suggested a select 1.. But the bigger question here is do all the connections need to be active for you to get the target data or would you have some data to load even if some connections faulted - which obviously would depend on the nature of your requirement. Here is where the concept of staging tables, concept of Delta and using the completion workflow instead of the success work flow would help. For example and simplicity sake lets say I am working on pulling sales data from 3 different point of sale locations for my company (Brazil, US and India), I store the dates I last pulled from in a table and update them when the extraction is successful. In this case if one extraction fails I would still want to go to the next location and to try to pull the data.

How should data be provided to a web server using a data warehouse?

We have data stored in a data warehouse as follows:
Price
Date
Product Name (varchar(25))
We currently only have four products. That changes very infrequently (on average once every 10 years). Once every business day, four new data points are added representing the day's price for each product.
On the website, a user can request this information by entering a date range and selecting one or more products names. Analytics shows that the feature is not heavily used (about 10 users requests per week).
It was suggested that the data warehouse should daily push (SFTP) a CSV file containing all data (currently 6718 rows of this data and growing by four each day) to the web server. Then, the web server would read data from the file and display that data whenever a user made a request.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. What are problems with this approach?
Would it be better to have the web server make a request to the data warehouse per user request? Or does this have issues such as a greater chance for network errors or performance issues?
Would it be better to have the web server make a request to the data warehouse per user request?
Yes it would. You have very little data, so there is no need to try and 'cache' this in some way. (Apart from the fact that CSV might not be the best way to do this).
There is nothing stopping you from doing these requests from the webserver to the database server. With as little information as this you will not find performance an issue, but even if it would be when everything grows, there is a lot to be gained on the database-side (indexes etc) that will help you survive the next 100 years in this fashion.
The amount of requests from your users (also extremely small) does not need any special treatment, so again, direct query would be the best.
Or does this have issues such as a greater chance for network errors or performance issues?
Well, it might, but that would not justify your CSV method. Examples and why you need not worry, could be
the connection with the databaseserver is down.
This is an issue for both methods, but with only one connection per day the change of a 1-in-10000 failures might seem to be better for once-a-day methods. But these issues should not come up very often, and if they do, you should be able to handle them. (retry request, give a message to user). This is what enourmous amounts of websites do, so trust me if I say that this will not be an issue. Also, think of what it would mean if your daily update failed? That would present a bigger problem!
Performance issues
as said, this is due to the amount of data and requests, not a problem. And even if it becomes one, this is a problem you should be able to catch at a different level. Use a caching system (non CSV) on the database server. Use a caching system on the webserver. Fix your indexes to stop performance from being a problem.
BUT:
It is far from strange to want your data-warehouse separated from your web system. If this is a requirement, and it surely could be, the best thing you can do is re-create your warehouse-database (the one I just defended as being good enough to query directly) on another machine. You might get good results by doing a master-slave system
your datawarehouse is a master-database: it sends all changes to the slave but is inexcessible otherwise
your 2nd database (on your webserver even) gets all updates from the master, and is read-only. you can only query it for data
your webserver cannot connect to the datawarehouse, but can connect to your slave to read information. Even if there was an injection hack, it doesn't matter, as it is read-only.
Now you don't have a single moment where you update the queried database (the master-slave replication will keep it updated always), but no chance that the queries from the webserver put your warehouse in danger. profit!
I don't really see how SQL injection could be a real concern. I assume you have some calendar type field that the user fills in to get data out. If this is the only form just ensure that the only field that is in it is a date then something like DROP TABLE isn't possible. As for getting access to the database, that is another issue. However, a separate file with just the connection function should do fine in most cases so that a user can't, say open your webpage in an HTML viewer and see your database connection string.
As for the CSV, I would have to say querying a database per user, especially if it's only used ~10 times weekly would be much more efficient than the CSV. I just equate the CSV as overkill because again you only have ~10 users attempting to get some information, to export an updated CSV every day would be too much for such little pay off.
EDIT:
Also if an attack is a big concern, which that really depends on the nature of the business, the data being stored, and the visitors you receive, you could always create a backup as another option. I don't really see a reason for this as your question is currently stated, but it is a possibility that even with the best security an attack could happen. That mainly just depends on if the attackers want the information you have.

What is this vague accusation of RRD data loss about?

I want to use CollectD to gather some statistics (about storage) and have Graphite display them nicely. Apparently this can be done either by
having CollectD store the data as RRD files and pointing Graphite at
those, or
using a CollectD plugin to push the data to Graphite's Carbon API, which will store the data in a Whisper database (which is similar to RRD but not compatible).
I think I want to go with RRDs, but I found this statement in the Whisper docs that concerns me:
In many cases (depending on configuration) if an update is made to an
RRD series but is not followed up by another update soon, the original
update will be lost.
Hmmm. That's a bit scary, but the accusation is so vague that I don't know what to make of it. What is the configuration they are talking about, and the situation in which it causes data loss?
My situation is that the metrics data I am gathering will be available in chunks -- periodically I will go get the latest data and make as many entries into the database as there are new samples available. So, for example, I might grab some data and update the database with the values from 3 minutes ago, 2 minutes ago, and 1 minute ago, one right after the other. In fact, I might have dozens of new samples to put in the database at once. Does using RRD this way have anything to do with the Whisper accusation?
NOTE: I do not need to back-fill data; I will always be adding newer data than what has already been stored.
One scenario I see this happening would be if you have an AVERAGE RRA setup, and have the xxf value set to a low percentage. When the data is compressed over time, you could receive an unknown value and 'loose' all the data that was averaged. If you are using a RRD for what it was designed for, and have it setup with the proper type and settings, I wouldn't think you will run into a problem.
I would recommend taking an in depth look at the RRD documentation found HERE to answer questions about how RRD's and RRA's handle the data, and the different storage techniques that are available to you.

SQL Service Broker: Collecting data -- plug-in scenario analysis

(2nd Update from 2012/12/06 -- new protocol, a sligtly different view)
The question is whether the solution below seems reasonable for you, or whether there is any flaw that I did not notice (being quite new to SQL Server Service Broker)...
I would like to continue in analysis of the problem presented in the SQL Service Broker: Collecting data from distributed sources. I would like to focus on the problem of protocol to be used when collecting data from the satellite SQL servers. The usage of the SQL Server Service Broker is a must -- it is dictated also by other reasons not presented here. So, please, do not suggest completely alternative solutions.
I would like to focus on details of what should be done and how to use the Service Broker naturally (the best possible way) for the exact problem. The overall goal was presented in the above mentioned question. The picture first:
Now more details to be considered...
Plug-in architecture wanted
The satellite machines are related to real physical production lines. It can happen that some machine is added to the technology process, some machine can disappear, some machine can be replaced in the sense it will use the same production-line identification, but it is physically different -- i.e. its SQL server is a different instance.
The central server knows nothing about the satellite until it gets first messages from it. There is no centralized database of the satelite servers. No knowledge about what and how many satelite SQL servers are to be included to the system. It is always decided on the satelite site.
Any activity related to collecting the data should be initiated by events generated by the satellite machines.
Important: The goal is to continually transfer all the newly created data (from sensors), and to discover and fix drop-outs -- independently on whatever could cause them.
To give you the concrete example:
The machine identified by line number 3 (yellow) was recently added to the environment. Its SQL Server Express was launched and it started to collect the sensor data (the third party solution, dedicated table with special structure). The machine was not connected to the central server, yet.
The only configuration thing is the reliably assigned fixed identification of the production line (here 3), and all the neccessary details to connect to the central SQL server. But the central SQL server does not know the information. The central is just ready to accept data from any new souce, but never knows when. (It was already tested for one machine using the approach suggested by Remus Rusanu answer to the question SQL Service Broker — one central SQL and more satelite SQL….)
The piece of the SQL software is deployed on the machine 3 just a bit later. It starts to talk with the central. The satellite part is not dumb, but its own activity is to send the sensor data whenever new record is inserted to the sensor data table (see point 1 above). From the record, UTC time is calculated (from the proprietary format), several sensor data from one record is converted to the same number of normalized records (formatted as one XML message), and sent to the central SQL server.
The central is activated by the message with the sensor data sent from the satellite machine. The failures of the physical connection is masked by the Service Broker queues.
After a reasonable interval (here one hour), the central server checks whether the so far collected data should be processed or not. There is a work unit that takes some production time, and the data should be processed and added to the documentation of the unit. The processing should happen only when the unit was finished.
The central also checks whether it has all the data for the unit. As the sensor sampling is done in known regular intervals (here about 1 minute), the central can check whether there are some drop-outs. There also is an initial "drop-out" for the time interval when the satellite was not connected to the central via SSB. The mechanism should recover from whatever situation. It can also happen that the sensor where out of order or the data were not collected. The detected drop-out at the central may actually mean that central asks: "I have no data from you for this time interval. Send me some of them if they exist, or tell me they do not exist."
The satellite should send only that much data that can be sent between the sampling times. The recovery from drop-outs can be rather slow. The delay of processing the data at the central server is not critical. However, the central should know when the data is ready (or does not exist for the detected time interval).
Some picture, more solution details
I have chosen the "Recycling conversations" by Remus Rusanu as the basic framework for the communication between the satellite and the central. It defines the EndOfStream message type to signal that the conversation handle should be thrown away and the new one should be used. The lifetime is limited by the above mentioned one hour interval generated by the Service Broker timer.
The message is (mis)used at the central server also for activation of the data processing. At about the same time, the central checks for drop-outs. The central keeps the time below that the drop-outs where already checked. This way it knows what data are ready to be processed.
Do you consider the scenario reasonable? Can you see any problem with it?
(I am going to refine the question to reflect your suggestions.)
Thanks for your time and experience, and have a nice day.
Petr
All data should be stored in table. On satellite side, you should create a table for last processed row to be stored. When new request from Central arrives, new data pack will be sent back to Central depending on last processed record value.
Note: i recommend to limit a number of rows to be sent depending on your data to do not create very large data packs.
When Central processed all rows, appropriate message should be sent to Satellite. It also should contain information about data import errors occurred.
You can start Service Broker conversation when database activity is registered (using DML/DDL triggers on both Central/Satellite database) or within schedule (using Central Agent job).