Best way to store date in a CDM (Conceptual Data Model)? - conceptual

I'm currently retrieving data from a vehicle fleet (fuel used, distance traveled...) through the manufacturer API. For each set of data, there is the date when the metric has been mesured. I will retrieve the data everyday through a rcon call and store it in my DB. The purpose is to keep the history for each vehicle so that I could get every metrics mesured for a vehicle between X date and X date later on.
I first thought of this :
But it doesn't seem right because of the "1,1" cardinality, so i transformed the 3 way relationship into 2 normal relationship :
At this point i wondered wether i could not simply store the field date in the metric entity (because I noticed the API would give me a datetime, so it's very unlikely that two metrics will be mesured at the same time):
And finally, i was wondering if putting everything in a data entity would not even be easier (but it feels kinda wrong):
So i'm completely lost as to what would be the best way to do this. Could someone tell me which way is the best or even if there is a better way and why ?

Related

Orderbook matching engine

My question is more of a conceptual one, rather than coding question, but I also accept code (the ideal answer).
So I have a huge dataset of secondly orderbook snapshots (that is, for each second, I have the best 200 ask prices (and their volumes) and the best 200 bid prices (and their volumes)). This is real data, real orders that were submitted at some point in time. For each state, the data is represented as pandas dataframe which has timestamp,side,price,volume. So, an example is:
2023-02-14 00:01:01, 'ask', 19874.11, 0.3
But we have many ask and bid orders per state. My question is the following: for a state s_i, if I decide to do a limit order with a specified price and volume, how would that change change state s_(i+1) (this is just a simulation). Same question goes if I had a market order with some volume.
Purpose:
I am trying to optimize order execution, and there is already existing literature on this subject. The idea is, when I train my agent, I want to reflect each decision it makes so I can update my next states based on what actions/decisions the agent has done.
Literature:
https://www.econstor.eu/bitstream/10419/216206/1/1696077540.pdf
You can try to deploy your exchange and test it there, if you can implement the logic you need for working with orders.
There is an open-source project of crypto exchange Opencex, here is a link to it:
https://github.com/Polygant/OpenCEX

Object condition in multiple places/repeated code (DRY)

This is a fundamental application design question I’ve struggled with and flip-flopped on for years. We have a legacy webapp that doesn't really have a solid ORM, if that tidbit might influence your answer. To abstract my question let’s say we have a class Car, and a corresponding table in our database named car. Car has a few properties: color, weight, year, maxspeed These properties directly correspond to columns in the db table.
In our application, we define the car as “classic old” if year is < 1960 and color = black. And in many places within our app knowing whether the car is "classic old" is extremely important (maybe we’re running a very illogical insurance agency which gives steep discounts and other perks to cars which are “classic old”).
All over our application, we do things like:
--list all classic old cars
--give the current user a discount if their car is classic old
--list all classic old cars with max speed > 100 miles per hour
--email the current user if their car is classic old and weights more than 1000 pounds
What is the best way to go about this? We have a legacy application that does this in some places:
getOldClassicCars()
select * where year < 1960 and color = black
and in other places:
cararray = getAllCars();
for each car in cararray
if car.year < 1960 and car.color = black
oldcararray = car.add()
The point being that this very important, fundamental piece of our application – is the car classic old – is “hardcoded” as year < 1960 and color = black in many places. Sometimes in SQL, sometimes in application code, etc. Obviously that is not good, but as we’ve refactored things I’m not sure we’re refactoring things the best way we can.
Well, you are stuck with the fundamental problem that
you cant run your code on the database
you want to be able to use the database's selection functionality on this criteria.
you want the calculation of "classic old" to be defined in a single place (preferably code)
Lets enumerate the solutions
1: Put the calculation in a sproc and always use the sproc to retrieve cars.
The problem here is if you create a new car in code, its class status is undefined, so you haven't really solved the 'not in two places' problem.
2: Get the DB to run your calc via an assembly. for example you can get mssql to run functions from a .net assembly which you can also use in your code base to perform the same calculation.
Problem, its hard work. Plus essentially its still in two places, you have to keep the db up to date and ensure that the table is accessed correctly
3: Persist the calculated value on the DB, but perform the calc in the code
Problem, if the calculation changes the DB values will be incorrect and need updating.
3 seems to be the best option, as we will know when the calculation changes and be able to take some action to resolve the situation.
However, it might be best, given the fundamental nature of this calculation, to make that 'out of dateness' implicit in the way we structure the code.
Instead of simply persisting car.IsClassic we could add a CarStatusReport object with a datetime property. We then generate a CarStatusReport(2017) which evaluates all the cars at that point in time and saves that data in a separate table.
Our business logic is then no longer, "Is this car a classic?" but "What does the latest CarStatusReport say the status of this car is?"
You Business Logic will then reside in a single CarStatusReportGenerator service and any other logic accessing the IsClassic calculation, will be forced to acknowledge the ephemeral nature of the stored info.
No optimal solution here. But, one good point will be to move all the business logic into the one place. If you can't (when you make methods or functions calculating some property, for example isOld()) then hide all those inconsistencies under the hood, so implementation users (conceptually) will never notice DRY violation from outside.

Best data structure to store temperature readings over time

I used to work with SQL like MySQL, Postgres or MSSQL.
Now I want to play with Redis. I'm working on a little home project, that I think is the best choice for starting using Redis.
I have a machine that reads temperature (indoor and outdoor) and humidity. I need to store the readings into Redis. Can you help me to understand the best data structure to do so?
Other than this data I need to store the time (ex. unix timestamp) of the temperature reading for use plotting a graphic.
I installed Redis read the documentation, so I understand the commands and data types.
Since this is your first Redis project and it's a home project, I'd be careful about being to careful. Here's a couple ways to consider designing it (NOTE: I only dug deep into REDIS this past weekend so hopefully others will weigh in).
IDEA 1:
Four ordered sets
KEY for sets are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
VALUES are the temperatures / humidities
SCORE is the date stored as EPOCH
IDEA 2:
Four types of keys (best shown by example)
datetime_key = /year:2014/month:07/day:12/hour:07/minute:32/second:54
type_keys = [indoor_temps, outdoor_temps, indoor_humidity, outdoor_humidity]
keys are of form type + "/" + datetime_key
values are the temp and humidity itself
You probably want to implement some initial design and then work with the data immediately - graph it, do stats, etc. Whatever you plan to do with it. That will expose flaws and if they are major, flush the database and try again. These designs should really only take ~1 hour to implement since the only thing you're really changing is a few Redis commands and some string manipulation to convert the data to keys.
I like Tony's suggestions, but I'll also throw out another possibility.
4 lists
keys are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
values are of the form < timestamp >_< reading > ie.( "1403197981_27.2" )
Push items onto the front of the list using LPUSH. Get a set of readings using LRANGE. The list will always be ordered by the time of the reading. Obviously split the value on "_" to get your time and reading...
In all honesty, this will give the same properties as Tony's first example, with slightly worse lookup performance, but better memory usage. I'm guessing for this project you'll be neither memory, nor CPU constrained, so the choice is probably not an issue. That said, if you expect to be saving 100's of thousands or more readings, I would suggest the list unless you want to consume a large portion of your system's memory.
Also, it's a good idea to call EXPIRE on your entries with some reasonable TTL that encompasses the length of time you want to save the readings for. If your plan is to have them live in perpetuity then you may want to look at backing them up to a disk DB over time, and just use Redis as a quick lookup cache for recent readings.
Thank to all answer, I choose this strucure:
4 lists: tempIN, tempOut, humidIN and humidOUT
values are: [value]:[timestamp]. For example: "25.4:1403615247"
As suggested from wallacer i want to backup old entries out from Redis.
For main frontend i need only last two days of sample.
For example i can create Redis RDB file snapshot and "trim" the live lists. This solution is not convenient in the event that, in the future you want to recover old values​​.
Do you have any tips on what kind of procedure to adopt to store the data? Maybe use of SQLIte DB?

Redis Sorted Set ... store data in "member"?

I am learning Redis and using an existing app (e.g. converting pieces of it) for practice.
I'm really struggling to understand first IF and then (if applicable) HOW to use Redis in one particular use-case ... apologies if this is super basic, but I'm so new that I'm not even sure if I'm asking correctly :/
Scenario:
Images are received by a server and info like time_taken and resolution is saved in a database entry. Images are then associated (e.g. "belong_to") with one Event ... all very straight-forward for a RDBS.
I'd like to use a Redis to maintain a list of the 50 most-recently-uploaded image objects for each Event, to be delivered to the client when requested. I'm thinking that a Sorted Set might be appropriate, but here are my concerns:
First, I'm not sure if a Sorted Set can/should be used in this associative manner? Can it reference other objects in Redis? Or is there just a better way to do this altogether?
Secondly, I need the ability to delete elements that are greater than X minutes old. I know about the EXPIRE command for keys, but I can't use this because not all images need to expire at the same periodicity, etc.
This second part seems more like a query on a field, which makes me think that Redis cannot be used ... but then I've read that I could maybe use the Sorted Set score to store a timestamp and find "older than X" in that way.
Can someone provide come clarity on these two issues? Thank you very much!
UPDATE
Knowing that the amount of data I need to store for each image is small and will be delivered to the client's browser, can is there anything wrong with storing it in the member "field" of a sorted set?
For example Sorted Set => event:14:pictures <time_taken> "{id:3,url:/images/3.png,lat:22.8573}"
This saves the data I need and creates a rapidly-updatable list of the last X pictures for a given event with the ability to, if needed, identify pictures that are greater than X minutes old ...
First, I'm not sure if a Sorted Set can/should be used in this
associative manner? Can it reference other objects in Redis?
Why do you need to reference other objects? An event may have n image objects, each with a time_taken and image data; a sorted set is perfect for this. The image_id is the key, the score is time_taken, and the member is the image data as json/xml, whatever; you're good to go there.
Secondly, I need the ability to delete elements that are greater than
X minutes old
If you want to delete elements greater than X minutes old, use ZREMRANGEBYSCORE:
ZREMRANGEBYSCORE event:14:pictures -inf (currentTime - X minutes)
-inf is just another way of saying the oldest member without knowing the oldest members time, but for the top range you need to calculate it based on current time before using this command ( the above is just an example)

Tables with the same structure and similar data

This is a question about best practice really.
I am developing a system, where I will be collecting some measurements (call them HR and RR) and calculating average values of these measurements. Now for the user interface we are only intrested in these average values, but for the in-depth data analysis later on, we need all individual measurements (to export to matlab) as well as all the average calculations (don't ask - user requirement, I would just save individual measurements and calculate average later if it is needed).
Here are the details about average calculations etc:
- HR: we get readings every 500 - 1500ms (variable). We calculate the average based on 4-12 readings (depending on time between the readings).
- RR: we get readings every 3-17sec (variable). We calculate average based on 2-3 readings (depending on time between the readings).
For both we save:
- Average value (decimal) together with the timestamp of first reading from the readings used for the average calculation.
- Each individual reading (decimal) together with timestamps of when the reading was taken.
As you can see the data is the same for average calculations and individual readings. The same with HR/RR - the data is the same and could be represented as:
- - - - - - - - - -
| Reading |
- - - - - - - - - -
| Timestamp |
| Value |
- - - - - - - - - -
Since we compute data at different time intervals etc, we cannot store HR+RR as a single row in the database, we need separate rows or tables.
The questions are:
1. Is it better practice to create seperate tables for HR and RR? Or is it better to store them in the same table as seperate rows, with a column indicating if a given row is HR or RR?
2. Is it better to create seperate tables for each individual readings? Or is it better to create self-referencing table, where each individual reading would reference a row in the same table, with the average calculation it was used in?
I am not that great with DB design and I am not sure what are the best practices used in that situation.
I was also considering using MongoDB (rather than SQL database - probably MSSQL since the project is C# based), that would probably make life easier since I could have an array of individual measurements embeded in a document with average calculation etc. As far as I know writes to Mongo are very fast...
Any pointers? Thanks.
As wishy washy as it sounds, it depends. To your first question, one could very legitimately look at this as a table of readings or as two more specific tables. That said, years ago I would’ve said a single table, but over the years have gravitated toward the two tables. For one, your key values become more specific--(Reading) vs. (Reading + Type). And otherwise you’ll find yourself adding “AND ReadType =…” in your sleep. It also leaves you more flexibility when someone decides one reading needs to be to a different precision or also store the color of shirt the technician was wearing.
On the second question, again, opinions will vary but I’d lean toward a parent table of reading sets and a detail of individual readings. The self-referencing table feels like it wins some style points but joining back to oneself can get tricky depending on the answers you’re trying to get. Also, your final DB platform choice may or may not include some of the specialized options like MSSQL’s CTEs that address some of these complexities.
Overall, you could probably have:
ReadingSet (ReadingSetID [, other info as needed])
ReadingR (ReadingRID,ReadingSetID, Value, TimeStamp)
ReadingH (ReadingHID, ReadingSetID, Value, TimeStamp)