Populating a table with data from file based on condition. Correct approach?

Populating a table with data from file based on condition. Correct approach? - sql

I've briefly encountered SQL in the past, however this is the first time I'm trying anything this advanced. The overall goal of the database is to provide step-by-step instructions to assemble a physical part in a factory. The catch, however, is that different variants of different parts can be assembled.
I have a basic data architecture (described below), however I'm not too sure how robust it is. The general idea is that a part is detected and identified via RFID and the required instructions are 'loaded' into the instruction table.
I have a table named 'Assembly', which contains the partID, type and variant. This has a one-to-many relationship with a table called 'Instruction', which contains fields such as scheduledStart, actualStart, status and ID. 'Assembly' also has a 1-1 relationship with a WorkStation table (parts can move between work stations for different stages of assembly), which contains a 'filePathToWorkInstructions' field that links to a file with the work instructions for each part and variant for that station.
I'm not expecting a full step-by-step answer, nor do I really want one (I quite like the challenge). I'd just quite like to know if I'm heading in the correct direction or if there is a better way to approach the problem.

Related

Tool to extract a DB row and ALL its referenced "objects" (recursively)?

For our project it has become increasingly complicated to reproduce certain error conditions that show up in productive use. Extracting and recreating certain conditions sometimes takes hours to re-enter the data and reproduce a situation mainly because the required "graph" can be huge and there are many referential constraints that must be fulfilled and recreating these in an analysis DB (production can not be used for this for obvious reasons) in the correct order is often extremely complicated and tedious.
What would ease such analyses enormously would be some tool that - given a specific table and row-id as starting point - would traverse the entire graph as defined by a table's references (foreign keys) and emit all referenced entries recursively.
Ideally it would emit all these rows (table name, column names and their values) as sql-insert statements such that one could execute these as inserts scripts to load a relevant subset into another DB for analysis.
Does such a tool exist? I could imagine that this is not such a seldom and exotic wish or requirement. Or is this wishful dreaming and I am in for a longer programming exercise?
The DB we are using is Oracle (v12) - in case that matters.
Hope I could make myself clear and convey the intention.

Years ago I did this and it was very easy because I had object-relational mapping to java objects. I could just pick up the "parent" object from source db [which would recursively traverse all the relationships and pick up all the children], open a connection to the target DB and save the fully instantiated parent object tree. I don't know of any other tools. Folks in my company try to keep a pre-prd DB periodically refreshed from prod.
Taking hours to manually reproduce the problem conditions in the data seems like a long time, but so would hand building a custom solution.
If you are having data problems, it is likely a bug in the code, so if you can get developers to "write the test that fails" based on the conditions, you'll be better off.

How should I (if I should at all) implement Generic DB Tables without falling into the Inner-platform effect?

I have a db model like this:
tb_Computer (N - N) tb_Computer_Peripheral (N - 1) tb_Peripheral
Each computer has N peripherals. But each peripheral is different in nature, and will have different fields. A keyboard will have model, language, etc, and a network card has specification about speed and such.
But I don't think it's viable to create as many tables as there are peripherals. Because one day someone will come up with a very specific peripheral and I don't want him to be unable to add it just because it is not a keyboard neither a network card.
Is it a bad practice to create a field data inside tb_Peripheral which contains JSON data about a specific peripheral?
I could even create a tb_PeripheralType with specific information about which data a specific type of peripheral has.
I read about this in many places and found everywhere that this is a bad practice, but I can't think of any other way to implement this the way I want, completely dynamic.
What is the best way to achieve what I want? Is the current model wrong? What would you do ?

It's not a question of "good practices" or "bad practices". Making things completely dynamic has an upside and a downside. You have outlined the upside fairly well.
The downside of a completely dynamic design is that the process of turning the data into useful information is not nearly as routine as it is with a database that pins down the semantics of the data within the scope of the design.
Can you build a report and a report generating process that will adapt itself to the new structure of the data when you begin to add data about a new kind of peripheral? If you end up stuck with doing maintenance on the application when requirements change, what have you gained by making the database design completely dynamic?
PS: If the changes to the database design consist only of adding new tables, the "ripple effect" on your existing applications will be negligible.

I can think of four options.
The first is to create a table peripherals that would have all the information you could want about peripherals. This would have NULLs in the columns where the field is not appropriate to the type. When a new peripheral is added, you would have to add the descriptive columns.
The second is to create a separate table for each peripheral.
The third is to encode the information in something like JSON.
The fourth is to store the data as pairs. So each peripheral would have many different rows.
There are also hybrids for these approaches. For instance, you could store common fields in a single table (ala (1)) and then have key value pairs for other values.
The question is how this information is going to be used. I do most of my work directly in SQL, so the worst option for me is (3). I don't want to parse strange information formats to get something potentially useful to a SQL query.
Option (4) is the most flexible, but it also requires more work to get a complete picture of all the possible attributes.
If I were starting from scratch, and I had a pretty good idea of what fields I wanted, then I would start with (1), a single table for peripherals. If I had requirements where peripherals and attributes would be changing fairly regularly, then I would seriously consider (4). If the tables are only being used by applications, then I might consider (3), but I would probably reject it anyway.

Only one question to answer when you do this sort of design. JSON, a serialised object, xml, or heaven forbid a csv, doesn't really matter.
Do you want to consume them outside of the API that knows the structure?
If you want to say use sql to get all peripherals of type keyboard with a number of keys property >= 102 say.
If you do, it gets messy, much messier than extra tables.
No different to say having a table of pdfs or docs and trying to find all the ones which have more than 10 pages.
Gets even funnier if you want to version the content as your application evolves.
Have a look at a Nosql back end, it's designed for stuff like this, a relational database is not.

Tree Structure With History in SQL Server

I'm not sure if it's a duplicate question, but I couldn't find anything on this subject.
What's the best way for storing tree structure in a table with ability to store history of changes in that structure.
Thank you for any help.
UPDATE:
I have Employees Table and I need to store the structure of branches, departments, sections, sub sections.. in a table.
I need to store historical information on employees branches, departments, to be able to retrieve an employee's branch, department, section, sub section even if the structure has been changed.
UPDATE 2:
There's a solution to save the whole structure in a history table on every change in the structure, but is that the best approach?
UPDATE 3:
There's also Orders Table. I must store employee's position, branch, department, section, sub section and other on every order. That's the main reason for storing history. It will be used very often. In another words I should be able to show db data for every past day.
UPDATE 4:
Maybe using hierarchyid is an option?
What if a node is renamed? What should I do, if I need the old name on old orders?

I think you are looking for something like this. It provides a complete tree structure. This is used for a directory, but it can be used for anything: divisions, departments, sections, etc.
The History is separate and it is best if you get your head aroun dthe Node structure before contemplating the history. For the History of any table, all that is required is the addition of a DateTime or TimeStamp column to the PK. The history stores before images of the current rows.
Functions such as (a) resolving the path of the tree and (b) finding the relevant history rows that were current at some point in time, are performed using pure SQL. With MS you can use recursive CTEs for (a), or write a simpler form in a stored proc.
For (b), I would implement a derived version of the table (Node, Department, whatever) that returns the rows that were current at the relevant point in time; then use that for the historic version of (a).
It is not necessary to copy and save the entire tree structure every time it changes.
Feel free to ask questions if you need any clarification.
Data Model
▶Tree Structure with History Data Model◀
Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.

Depending on how the historical information will be used will determine whether you need a temporal solution or simply an auditing solution. In a temporal solution, you would store a date range over which a given node applies in its given position and the "current" hierarchy is derived by using the current date and time to query for active nodes in order to report on the hierarchy. To say that this is a complicated solution is an understatement.
Given that we are talking about an employee hierarchy, my bet is that an auditing solution will suffice. In an auditing solution, you have one table(s) for the current hierarchy and store changes somewhere else that is accessible. It should be noted that the "somewhere else" doesn't necessarily need to be data. If changes to the company hierarchy are infrequent, then you could even use a seriously low-tech solution of creating a report (or series of reports) of the company hierarchy and store those in PDF format. When someone wanted to know what the hierarchy looked like last May, they could go find the corresponding PDF printout.
However, if it is desired to have the audit trail be queryable, then you could consider something like SQL Server 2008's Change Tracking feature or a third-party solution which does something similar.
Remember that there is more to the question of "best" than the structure itself. There is a cost-benefit analysis of how much effort is required vs. the functionality it provides. If stakeholders need to query for the hierarchy at any moment in time and it fluctuates frequently (wouldn't want to work there :)) then a temporal solution may be best but will be more costly to implement. If the hierarchy changes infrequently and granular auditing is not necessary, then I'd be inclined to simply store PDFs of the hierarchy on a periodic basis. If there is a real need to query for changes or granular auditing is needed, then I'd look at an auditing tool.
Change Tracking
Doing a quick search, here's a couple of third-party solutions for auditing:
ApexSQL
Lumigent Technologies

there are several approaches, however with your limited information I would say to use single table with parent relationship, for the history you can simply implement audit of the table
UPDATE: based on your new information I would not use a single table data store, it looks like your hierarchical structure is much more complex than a simple tree, also it looks like you have some well defined nodes of the structures especially at the top before getting into the sections and sub-sections; so multi-table relations might be a better fit;
as far as the audit tables, there are plenty of resources to find out what will work best for you, there are per row and per column audits, etc.

One thing to note is that you dont ahve to historize the tree structure. It always only grows, never gets smaller.
What chagnes is the USE of the nodes. Their data may change.
For eaxample a site.
/a/b/c
will be there forever. A may be a folder, b too, c a file. Later a may be a folter, b a tombstone (no data here at the moment) as c. But the tree in itself never changes.
Then, add a version number, and for every node a historiy / list of uses (node types) with start and possibly (can be null) end version.
The code for showing a version X then can build out the "real" tree for this moment easily.
To support moves of nodes, have a "rehook" type that indicates the noe was moved to another item at this version.
Voila.

How do I structure my database so that two tables that constitute the same "element" link to another?

I read up on database structuring and normalization and decided to remodel the database behind my learning thingie to reduce redundancy.
I have different types of entries that can be learned. Gap texts/cloze tests (one text, many gaps) and simple known-unknown (one question, one answer) types.
Now I'm in a bit of a pickle:
gaps need exactly the same columns in the user table as question-answer types
but they need less columns than question-answer types (all that info is in the clozetests table)
I'm wishing for a "magic" foreign key that can point both to the gap and the terms table. Of course their ids would overlap though. I don't like having both a term_id and gap_id in the user_terms, that seems unelegant (but is the most elegant I can come up with after googling for a while, not knowing what name this pickle goes by).
I don't want a user_gaps analogue to user_terms, because then I'd be in the same pickle when it comes to the table user_terms_answers.
I put up this cardboard cutout collage of my schema. I didn't remove the stuff that isn't relevant for this question, but I can do that if anyone's confusion can be remedied like that. I think it looks super tidy already. Tidier than my mental concept of this at least.
Did I say any help would be greatly appreciated? Answerers might find themselves adulated for their wisdom.
Background story if you care, it's not really relevant to the question.
Before remodeling I had them all in one table (because I added the gap texts in a hurry), so that the gap texts were "normal" items without answers, while the gaps where items without questions. The application linked them together.
Edit
I added an answer after SO coughed up some helpful posts. I'm not yet 100% satisfied. I try to write views for common queries to this set up now and again I feel like I'll have to pull application logic for something that is database turf.

As mentioned in the comment, it is hard to answer without knowing the whole story. So, here is a story and a model to match. See if you can adapt this to you example.
School of (foreign) languages offers exams for several levels of language proficiency. The school maintains many pre-made tests for each level of each language (LangLevelTestNo).
Each test contains several (many) questions. Each question can be simple or of the close-text-type. Correct answers are stored for each simple question. Correct terms are stored for each gap of each close-text question.
Student can take an exam for a language level and is presented with one of the pre-made tests. For each student exam, the exam form is maintained which stores students answers for each question of the exam. Like a question, an answer may be of a simple of of a close-text-type.

After editing my question some Stackoverflow started relating the right questions to me.
I knew this was a common problem, but I really couldn't find it, just couldn't come up with the right search terms, I guess.
The following threads address similar problems and I'll try to apply that logic to my own design. They all propose adding a higher-level description for (in my case terms and gaps) like items. That makes sense and reflects the logic behind my application.
Relation Database Design
Foreign Key on multiple columns in one of several tables
Foreign Key refering to primary key across multiple tables
And this good person illustrates how to retrieve the data once it's broken up across tables. He also clues me to the keyword class table inheritance, so now I know what to google.
I'll post back with my edited schema once I've applied this. It does seem more elegant like this.
Edited schema

How to document a database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
This post was edited and submitted for review 12 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
(Note: I realize this is close to How do you document your database structure? , but I don't think it's identical.)
I've started work at a place with a database with literally hundreds of tables and views, all with cryptic names with very few vowels, and no documentation. They also don't allow gratuitous changes to the database schema, nor can I touch any database except the test one on my own machine (which gets blown away and recreated regularly), so I can't add comments that would help anybody.
I tried using "Toad" to create an ER diagram, but after leaving it running for 48 hours straight it still hadn't produced anything visible and I needed my computer back. I was talking to some other recent hires and we all suggested that whenever we've puzzled out what a particular table or what some of its columns means, we should update it in the developers wiki.
So what's a good way to do this? Just list tables/views and their columns and fill them in as we go? The basic tools I've got to hand are Toad, Oracle's "SQL Developer", MS Office, and Visio.

In my experience, ER (or UML) diagrams aren't the most useful artifact - with a large number of tables, diagrams (especially reverse engineered ones) are often a big convoluted mess that nobody learns anything from.
For my money, some good human-readable documentation (perhaps supplemented with diagrams of smaller portions of the system) will give you the most mileage. This will include, for each table:
Descriptions of what the table means and how it's functionally used (in the UI, etc.)
Descriptions of what each attribute means, if it isn't obvious
Explanations of the relationships (foreign keys) from this table to others, and vice-versa
Explanations of additional constraints and / or triggers
Additional explanation of major views & procs that touch the table, if they're not well documented already
With all of the above, don't document for the sake of documenting - documentation that restates the obvious just gets in people's way. Instead, focus on the stuff that confused you at first, and spend a few minutes writing really clear, concise explanations. That'll help you think it through, and it'll massively help other developers who run into these tables for the first time.
As others have mentioned, there are a wide variety of tools to help you manage this, like Enterprise Architect, Red Gate SQL Doc, and the built-in tools from various vendors. But while tool support is helpful (and even critical, in bigger databases), doing the hard work of understanding and explaining the conceptual model of the database is the real win. From that perspective, you can even do it in a text file (though doing it in Wiki form would allow several people to collaborate on adding to that documentation incrementally - so, every time someone figures out something, they can add it to the growing body of documentation instantly).

One thing to consider is the COMMENT facility built into the DBMS. If you put comments on all of the tables and all of the columns in the DBMS itself, then your documentation will be inside the database system.
Using the COMMENT facility does not make any changes to the schema itself, it only adds data to the USER_TAB_COMMENTS catalog table.

In our team we came to useful approach to documenting legacy large Oracle and SQL Server databases. We use Dataedo for documenting database schema elements (data dictionary) and creating ERD diagrams. Dataedo comes with documentation repository so all your team can work on documenting and reading recent documentation online. And you don’t need to interfere with database (Oracle comments or SQL Server MS_Description).
First you import schema (all tables, views, stored procedures and functions – with triggers, foreign keys etc.). Then you define logical domains/modules and group all objects (drag & drop) into them to be able to analyze and work on smaller chunks of database. For each module you create an ERD diagram and write top level description. Then, as you discover meaning of tables and views write a short description for each. Do the same for each column. Dataedo enables you to add meaningful title for each object and column – it’s useful if object names are vague or invalid. Pro version enables you to describe foreign keys, unique keys/constraints and triggers – which is useful but not essential to understand a database.
You can access documentation through UI or you can export it to PDF or interactive HTML (the latter is available only in Pro version).
Described here is a continuous process rather than one time job. If your database changes (eg. new columns, views) you should sync your documentation on regular basis (couple clicks with Dataedo).
See sample documentation:
http://dataedo.com/download/Dataedo%20repository.pdf
Some guidelines on documentation process:
Diagrams:
Keep your diagrams small and readable – just include important tables, relations and columns – only the one that have any meaning to understand big picture – primary/business keys, important attributes and relations,
Use different color for key tables in a diagram,
You can have more than one diagram per module,
You can add diagram to description of most important tables/with most relations.
Descriptions:
Don’t document the obvious – don’t write description “Document date” for document.date column. If there’s nothing meaningful to add just leave it blank,
If objects stored in tables have types or statuses it’s good to list them in general description of a table,
Define format that is expected, eg. “mm/dd/yy” for a date that is stored in text field,
List all known/important values an it’s meaning, e.g. for status column could be something like this: “Document status: A – Active, C – Cancelled, D – Deleted”,
If there’s any API to a table – a view that should be used to read data and function/procedures to insert/update data – list it in the description of table,
Describe where does rows/columns’ values come from (procedure, form, interface etc.) ,
Use “[deprecated]” mark (or similar) for columns that should not be used (title column is useful for this, explain which field should be used instead in description field).

We use Enterprise Architect for our DB definitions. We include stored procedures, triggers, and all table definitions defined in UML. The three brilliant features of the program are:
Import UML Diagrams from an ODBC Connection.
Generate SQL Scripts (DDL) for the entire DB at once
Generate Custom Templated Documentation of your DB.
You can edit your class / table definitions within the UML tool, and generate a fully descriptive with pictures included document. The autogenerated document can be in multiple formats including MSWord. We have just less than 100 tables in our schema, and it's quite managable.
I've never been more impressed with any other tool in my 10+ years as a developer. EA supports Oracle, MySQL, SQL Server (multiple versions), PostGreSQL, Interbase, DB2, and Access in one fell swoop. Any time I've had problems, their forums have answered my problems promptly. Highly recommended!!
When DB changes come in, we make then in EA, generate the SQL, and check it into our version control (svn). We use Hudson for building, and it auto-builds the database from scripts when it sees you've modified the checked-in sql.
(Mostly stolen from another answer of mine)

This answer extends Kieveli's above, which I upvoted. If your version of EA supports Object Role Modeling (conceptual design, vs. logical design = ERD), reverse engineer to that and then fill out the model with the expressive richness it gives you.
The cheap and lighter-weight option is to download Visiomodeler for free from MS, and do the same with that.
The ORM (call it ORMDB) is the only tool I've ever found that supports and encourages database design conversations with non-IS stakeholders about BL objects and relationships.
Reality check - on the way to generating your DDL, it passes through a full-stop ERD phase where you can satisfy your questions about whether it does anything screwy. It doesn't. It will probably show you weaknesses in the ERD you designed yourself.
ORMDB is a classic case of the principle that the more conceptual the tool, the smaller the market. Girls just want to have fun, and programmers just want to code.

A wiki solution supports hyperlinks and collaborative editing, but a wiki is only as good as the people who keep it organized and up to date. You need someone to take ownership of the document project, regardless of what tool you use. That person may involve other knowledgeable people to fill in the details, but one person should be responsible for organizing the information.
If you can't use a tool to generate an ERD by reverse engineering, you'll have to design one by hand using TOAD or VISIO.
Any ERD with hundreds of objects is probably useless as a guide for developers, because it'll be unreadable with so many boxes and lines. In a database with so many objects, it's likely that there are "sub-systems" of a few dozen tables and views each. So you should make custom diagrams of these sub-systems, instead of expecting a tool to do it for you.
You can also design a pseudo-ERD, where groups of tables are represented by a single object in one diagram, and that group is expanded in another diagram.
A single ERD or set of ERD's are not sufficient to document a system of this complexity, any more than a class diagram would be adequate to document an OO system. You'll have to write a document, using the ERD's as illustrations. You need text descriptions of the meaning and use of each table, each column, and the relationships between tables (especially where such relationships are implicit instead of represented by referential integrity constraints).
All of this is a lot of work, but it will be worth it. If there's a clear and up-to-date place where the schema is documented, the whole team will benefit from it.

Since you have the luxury of working with fellow developers that are in the same boat, I would suggest asking them what they feel would convey the needed information, most easily. My company has over 100 tables, and my boss gave me an ERD for a specific set tables that all connect. So also, you might want to try breaking 1 massive ERD into a bunch of smaller, manageable, ERDs.

Well, a picture tells a thousand words so I would recommend creating ER diagrams where you can view the relationship between tables at a glance, something that is hard to do with a text-only description.
You don't have to do the whole database in one diagram, break it up into sections. We use Visual Paradigm at work but EA is a good alternative as is ERWIN, and no doubt there are lots of others that are just as good.
If you have the patience, then using html to document the tables and columns makes your documentation easier to access.

If describing your databases to your end users is your primary goal Ooluk Data Dictionary Manager can prove useful. It is a web-based multi-user software that allows you to attach descriptions to tables and columns and allows full text searches on those descriptions. It also allows you to logically group tables using labels and browse tables using those labels. Tables as well as columns can be tagged to find similar data items across your database/databases.
The software allows you to import metadata information such as table name, column name, column data type, foreign keys into its internal repository using an API. Support for JDBC data sources comes built-in and can be extended further as the API source is distributed under ASL 2.0. It is coded to read the COMMENTS/REMARKS from many RDBMSs.You can always manually override the imported information. The information you can store about tables and columns can be extended using custom fields.
The Data Dictionary Manager uses the "data object" and "attribute" terminology instead of table and column because it isn't designed specifically for relational databases.
Notes
If describing technical aspects of your database such as triggers,
indexes, statistics is important this software isn't the best option.
It is however possible to combine a technical solution with this
software using hyperlink custom fields.
The software doesn't produce an ERD
Disclosure: I work at the company that develops this product.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas