I have a SQL Server database with an Access front end where I need users to be able to allocate resources to multiple projects at once. For example, we have a stock of a particular part that is used in a number of mechanical assemblies and we need to allocate these parts to the particular assemblies for production.
How do people normally represent data with these requirements?
Currently my data is stored as follows:
Resource | A | B | C (etc.)
---------+-------+-------+------
a | 10 | 20 | NULL
b | 11 | NULL | 31
c | 12 | NULL | NULL
d | NULL | 40 | NULL
Where A, B, C are different projects.
Advantages:
Easy visualisation and updating of Resources across all projects
Disadvantages:
Database structure changes every time a project is added or finished..
.. therefore many queries need to be rewritten/dynamic
Difficult to get aggregate resource allocation summaries
If old projects are retained, the table could easily exceed the column count limit
Alternatives
It seems to me that a more 'standard' representation would be a table as below. However, I have found it more or less impossible to present this to the user in a way that will allow easy visualisation and resource allocation over multiple projects.
ID | Project | Resource | Quantity
-----+---------+----------+----------
1 | A | a | 10
2 | A | b | 11
3 | A | c | 12
4 | B | a | 20
5 | B | d | 40
6 | C | b | 31
Advantages:
No structural changes when adding/removing projects
Easy resource summaries
Easy archiving of old projects
Disadvantages:
Views that recreate the interface of the top example using JOINs will only allow editing of one column at a time and will not allow insertion or deletion by updating from/to NULL:
e.g.
-- 'Resources' table has resource ID as primary key (& other info about resource),
-- 'ProjectResources' is the 'standard' table above
SELECT ResourceTable.ID, ProjA.Quantity AS A, ProjB.Quantity AS B, ProjC.Quantity AS C
FROM Resources
LEFT JOIN (SELECT ProjectResources.Quantity, ProjectResources.Resource
FROM ProjectResources
WHERE ProjectResources.Project = 'A') AS ProjA
ON Resources.ID = ProjA.Resource
LEFT JOIN (SELECT ProjectResources.Quantity, ProjectResources.Resource
FROM ProjectResources
WHERE ProjectResources.Project = 'B') AS ProjB
ON Resources.ID = ProjB.Resource
LEFT JOIN (SELECT ProjectResources.Quantity, ProjectResources.Resource
FROM ProjectResources
WHERE ProjectResources.Project = 'C') AS ProjC
ON Resources.ID = ProjC.Resource
Using an INSTEAD OF UPDATE trigger on the above view can make it fully editable, but using this view in any further queries to add information (e.g. the 'stock' of the resource that we're allocating) make these fields read only (error along lines of 'field cannot be updated because it it participates in a JOIN and has an INSTEAD OF UPDATE trigger')
Both of the above options require that the front end can cope with a varying number of table columns, which is a bit awkward, though in the second case this requirement can be limited only to specific circumstances.
Are there any other options on how to represent this data and allow easy editing that I have missed?
Your current database schema is not desirable for the reasons you listed. You are probably better off making the schema sensible and then dealing with the view in your UI rather than producing a bad schema to suit a particular view (if you want other views too, you'll be back in the same bucket with a hard-to-use schema for the new view).
While it's possible to produce the view as a crosstab query in Access, you will not be able to edit it (same as you found for your query; you are essentially building a crosstab query manually there).
A potential solution is to generate a temporary table in the front-end with the desired projects as columns (perhaps a subset of all projects), populate it with the resource data from the real table and display it as a datasheet. It will be editable and you can write changes back to the real table. It requires coding but it resolves the disadvantages.
Related
Much ink has been spilled on the topic of sum types in SQL. The standard solutions are called absorption, separation, and partition; see, e.g.: https://www.inf.unibz.it/~montali/teaching/1415/dpm/slides/4.relational-mapping.pdf .
I want to ask about how to encode open sums. Normal sums allow a field to be one of a fixed set of several different types; with open sums, this set is not fixed.
The basic setup in our program: There is a list of "triggers," where each trigger can be one of many different things. Plugins can be written defining new trigger types, although the set of trigger types can be assumed to be known at compile time.
We want a table of all triggers.
Our current best idea:
Dynamically create a materialized view of the following form:
id | id_in_plugin_table | thing_in_main_program_it_refs | plugin_name
---------------------------------------------------------------------
1 | 27 | 8 | RegexTrigger
2 | 27 | 12 | RidiculouslyUnsafeCustomJSTrigger
This relation is automatically generated from the various plugin tables, each of which have their own ID and a thing_in_main_program_it_refs field.
For illustration, here's what the referenced tables may look like.
RegexTrigger table:
id | thing_in_main_program_it_refs | regex
---------------------------------------------------------------------
27 | 8 | hel*o
RidiculouslyUnsafeCustomJSTrigger
id | thing_in_main_program_it_refs | custom_js
---------------------------------------------------------------------
27 | 12 | (x) => isPrime(x.length())
Either use two roundtrips to lookup the plugin table and then query it, or combine them into a single SQL program which uses EXEC.
I'm happy with part 1, but not with part 2. Neither option sounds efficient, and the latter option uses EXEC.
So, we're looking for either (a) a better way to dynamically select a table in a query, or (b) a different approach to open sums.
Background
I need to compare two tables in two different datacenters to make sure they're the same. The tables can be hundreds of millions, even a billion lines.
An example of this is having a production data pipeline and a development data pipeline. I need to verify that the tables at the end of each pipeline are the same, however, they're located in different datacenters.
The tables are the same if all the values and datatypes for each row and column match. There are primary keys for each table.
Here's an example input and output:
Input
table1:
Name | Age |
Alice| 25.0|
Bob | 49 |
Jim | 45 |
Cal | 52 |
table2:
Name | Age |
Bob | 49 |
Cal | 42 |
Alice| 25 |
Output:
table1 missing rows (empty):
Name | Age |
| |
table2 missing rows:
Name | Age |
Jim | 45 |
mismatching rows:
Name | Age | table |
Alice| 25.0| table1|
Alice| 25 | table2|
Cal | 52 | table1|
Cal | 42 | table2|
Note: The output doesn't need to be exactly like the above format, but it does need to contain the same information.
Question
Is it faster to import these tables into a new, common SQL environment, then use SQL to produce my desired output?
OR
Is it faster to use something like JDBC, retrieve all rows for each table, sort each table, then compare them line by line to produce my desired output?
Edits:
The above solutions would be executed at a datacenter that's hosting one of the tables. In the first solution, the only purpose for creating a new database would be to compare these tables using SQL, there are no other uses.
You should definitively start with the database option. Especially if the databases are connected with a database link you can easy set up the transfer of the data.
Such comparison often leads to a full outer join of the two sources and the experience tell us that DIY joins are notorically less performant that the native database implementation (you can deploy for example a parallel option).
Anyway you may try to implement some sofisticated algoritm that can make the compare without the necessity to transfer the whole table.
An example is based on the Merkle Trees where you first scan both source in their location to recognise which parts are identical (that can be ignored) and transfer and compare only the party with a difference.
So if you expect the tables are nearly identical and have keys that allows some hierarchy such approach could end better than a brute force full compare.
The faster solution is to load both tables to variables (memory) in your programing language and then compare them with your favorite algorithm.
Copy them first to a new table is the more than the double of time in read/write operations to disk, especially the write ones.
I have the following model, where the relation tables create a many to many relationship, as usual.
|Table car | |relation car worker|
|Id car |1----*|id car |
|car things| |id worker |*-----
|
|
|table worker | /
|id worker |1----------/
|worker things | \
\_______________
\
|table building | |relation worker building| |
|id building |1----*|id building | |
|building things| |id worker |*---------
When I load this model in Power Bi, it can build a table visualization (and others) containing one of the following:
Option 1:
Car things
Worker things
Option 2:
Worker things
Building things
But it totally fails when I try to put in the table visualization something from the edges of the model:
Car things
Building things
This is the error:
What's going on here? Why the error?
Basically I need to see which cars visit which buildings (and do one or two summarizations)
From what i can understand from your model.. you need to set your cross filter direction to both rather than single so it can filter back up the tables
Here is an example which may help you to understand
For example, for 1 from tableA, it can relate 1 or 2 from tableB, since 1 from Table B can also relate 1 or 2 from tableC, so for 1 from tableA, it can’t determine whether 1 or 2 from tableC should be refer to
*----------------------------*
|Table A Table B Table C |
*----------------------------*
| 1 1 1,2 |
| 1 2 1,2 |
*----------------------------*
You can create a new table with the formula
tableName = CROSSJOIN(tableA,tableB,tableC....tableZ)
I had to create a new table as if I were doing an inner join in SQL (which works just as expected).
Click "modeling", "create table".
For the formula:
NewTable = NATURALINNERJOIN(tb_cars,
NATURALINNERJOIN(tb_rel_car_worker,
NATURALINNERJOIN(tb_workers,
NATURALINNERJOIN(tb_rel_worker_building, tb_buildings))))
Then I start using everything from this table instead of the others.
In order to create metrics that include zero counts grouped by one of the edge tables, I had to leave this table separate:
Partial Table= NATURALINNERJOIN(tb_rel_car_worker,
NATURALINNERJOIN(tb_workers,
NATURALINNERJOIN(tb_rel_worker_building, tb_buildings))))
Then in the relationship editor, I added a relation between Partial Table[id_car] and tb_cars[id_car].
Now the metrics grouped by car can use the technique of adding +0 to the DAX formulas to show also cars whose metric sum zero in charts.
I'm creating a simple directory listing page where you can specify what kind of thing you want to list in the directory e.g. a person or a company.
Each user has an UserTypeID and there is a dbo.UserType lookup table. The dbo.UserType lookup table is like this:
UserTypeID | UserTypeParentID | Name
1 NULL Person
2 NULL Company
3 2 IT
4 3 Accounting Software
In the dbo.Users table we have records like this:
UserID | UserTypeID | Name
1 1 Jenny Smith
2 1 Malcolm Brown
3 2 Wall Mart
4 3 Microsoft
5 4 Sage
My SQL (so far) is very simple: (excuse the pseudo-code style)
DECLARE #UserTypeID int
SELECT
*
FROM
dbo.Users u
INNER JOIN
dbo.UserType ut
WHERE
ut.UserTypeID = #UserTypeID
The problem is here is that when people want to search for companies they will enter in '2' as the UserTypeID. But both Microsoft and Sage won't show up because their UserTypeIDs are 3 and 4 respectively. But its the final UserTypeParentID which tells me that they're both Companies.
How could I rewrite the SQL to ask it to return to return records where the UserTypeID = #UserTypeID or where its final UserTypeParentID is also equal to #UserTypeID. Or am I going about this the wrong way?
Schema Change
I would suggest you to break it down this schema a little bit more, to make your queries and life simpler, with this current schema you will end up writing a recursive query every time you want to get simplest data from your Users table, and trust me you dont want to do this to yourself.
I would break down this schema of these tables as follow:
dbo.Users
UserID | UserName
1 | Jenny
2 | Microsoft
3 | Sage
dbo.UserTypes_Type
TypeID | TypeName
1 | Person
2 | IT
3 | Compnay
4 | Accounting Software
dbo.UserTypes
UserID | TypeID
1 | 1
2 | 2
2 | 3
3 | 2
3 | 3
3 | 4
You say that you are "creating" this - excellent because you have the opportunity to reconsider your whole approach.
Dealing with hierarchical data in a relational database is problematic because it is not designed for it - the model you choose to represent it will have a huge impact on the performance and ease of construction of your queries.
You have opted for an Adjacently List model which is great for inserts (and deletes) but a bugger for selects because the query has to effectively reconstruct the hierarchy path. By the way an Adjacency List is the model almost everyone goes for on their first attempt.
Everything is a trade off so you should decide what queries will be most common - selects (and updates) or inserts (and deletes). See this question for starters. Also, since SQL Server 2008, there is a native HeirachyID datatype (see this) which may be of assistance.
Of course, you could store your data in an XML file (in SQL Server or not) which is designed for hierarchical data.
I'm going to start work on a medium sized application, and i'm planning it's db design.
One thing that I'm not sure about is this.
I will have many tables which will need internationalization, such as: "membership_options, gender_options, language_options etc"
Each of these tables will share common i18n fields, like:
"title, alternative_title, short_description, description"
In your opinion which is the best way to do it?
Have an i18n table with the same fields for each of the tables that will need them?
or do something like:
Membership table Gender table
---------------- --------------
id | created_at id | created_at
1 - 22.03.2001 1 - 14.08.2002
2 - 22.03.2001 2 - 14.08.2002
General translation table
-------------------------
record_id | table_name | string_name | alternative_title| .... |id_language
1 - membership regular null 1 (english)
1 - membership normale null 2 (italian)
1 - gender man null 1(english)
1 -gender uomo null 2(italian)
This would avoid me repeating something like:
membership_translation table
-----------------------------
membership_id | name | alternative_title | id_lang
1 regular null 1
1 normale null 2
gender_translation table
-----------------------------
gender_id | name | alternative_title | id_lang
1 man null 1
1 uomo null 2
and so on, so i would probably reduce the number of db tables, but i'm not sure about performance.I'm not much of a DB designer, so please let me know.
The most common way I've seen this done is with two tables, membership and membership_ml, with one storing the base values and the ml table storing the localized strings. This is similar to your second option. Most of the systems I see like this are made that way because they weren't designed with internationalization in mind from the get go, so the extra _ml tables were "tacked on" later.
What I think is a better option is similar to your first option, but a little bit different. You would have a central table for storing all the translations, but instead of putting the table name and field name in there, you would use tokens and a central "Content" table to store all the translations. That way you can enforce some kind of RI between the tokens in the base table and the translations in the Content table if you want as well.
I actually asked a question about this very thing a while back, so you can have a look at that for some more info (rather than repasting the schema examples here).
I also think the best solution is to keep translations on different table. This approach use Open Cart which is open source and you can take a look the way it deals with the problem. Another source of information is here "http://www.gsdesign.ro/blog/multilanguage-database-design-approach/" especially on the comments sections