Dimension Lookup-Update create null data on dimension table [closed] - pentaho

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Dimension Lookup-Update create null data on dimension table. Anyone know how to avoid that?
Thanks.

Don't.
Keep it.
It avoids a lot of future referential integrity problems by providing with some kind of default in case the value of the natural key is not found.
It does not harm, because you can always query with technical key >0, if you need a list of existing cases.
Now you are warned, and if you fancy lots of NullPointerException and Foreign key integrity violation, nothing prevents you to remove the technical key =0 from the dimension table with a Delete step.
And an additional recommendation: do not alter the slowly dimension table to add manually an auto-increment on the technical key. Let the PDI do it's job as programmed.

Related

SQL column naming best practice, should I use abbreviation? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I want to which is, in your opinion, the best practice for name SQL columns.
Example: Let's say that i have two columns name referenceTransactionId and source
now there is another way to write this like refTxnId and src
which one is the better way and why? Is it because of the memory usage or because of the readability?
Although this is a matter of opinion, I am going to answer anyway. If you are going to design a new database, write out all the names completely. Why?
The name is unambiguous. You'll notice that sites such as Wikipedia spell out complete names, as do standards such as time zones ("America/New_York").
Using a standard like the complete name means that users don't have to "think" about what the column might be called.
Nowadays, people type much faster than they used to. For those that don't, type ahead and menus provide assistance.
Primary keys and foreign keys, to the extent possible, should have the same name. So, I suspect that referenceTransactionId should simply be transactionId if it is referencing the Transactions table.
This comes from the "friction" of using multiple databases and having to figure out what a column name is.

Is it good to use id of object from a third-party server as my PK at SQL database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I write some app which analyze Instagram and Twitter posts (post serves in separate tables) and I load comments and likes too. So, it's good to use they id's as my primary key, or is better to create my id's which will not be related to third-party id.
Create your own ids in your database. In general you want these properties to be true about your primary keys:
Unique. This one the database management system will enforce for you.
Unrelated to the data they identify. This means that you shouldn't be able to calculate the primary key to any row based on the info in the row. For example, first name+last name would be a bad primary key for a People table, and credit card number would be a bad primary key for BillingInfo table.
By using the id generated by a third party service as your PK, you are unnecessarily coupling your database with their service.
Instead, there is a common pattern of using an altId column to store an extra id. You could even name the column better by calling it twitterId or something similar.
Apart from uniqueness and minimality three sensible criteria for deciding on keys are stability, simplicity and familiarity.
Above all, there is the business requirement: the need to represent the external reality with some acceptable degree of accuracy. If your database is intended to represent accurately things sourced from some given domain then you will need an identifier also sourced from that domain.

Unique constraint on 42 Columns? Why should or shouldn't I consider doing this? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Theoretical:
I've been tasked with building a database to contain guidelines for manufacturing purposes. The guidelines to be returned are based on 42 input values and are all specific to the particular combination of inputs.
I plan on indexing all of these columns and realize that it will be resource intensive if I have to rebuild or re-index.
What design approaches have I not considered? What potential problems exist with the approach of creating a unique constraint on 42 columns? Does anyone have any experience with this sort of a design or any insights?
Thanks for any help!
A good reason for not doing it is that SQL Server doesn't support it:
Up to 32 columns can be combined into a single composite index key.
(documentation here).
It seems unlikely that you really need a single composite index with 42 columns. But, you can't have one even if that were desirable.
Put index only on columns which will be searched/sorted.
Add simple autoincrement index.

Is column order in a table relevant for version control? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
A version control system compares the scripted definition of a table to the checked in state. So I guess many cvs will see column reordering of a table as a change.
Since tsql does not support to add a new column in the middle of a table and because in a relational DB the ordering should not matter, what are good practices for version control of table definitions if the column-order could change.
Sometimes you could need to redo a drop column in the middle of a table.
You should be storing scripts to setup your database in source control, not trying to have something reverse-engineer those scripts from the state of the database. Column-order then becomes a non-issue.
Specifically, I've seen two schemes that work well. In the first, each database schema update script is given a sequential number, and the database tracks which sequence number is the last applied. In the second, each database schema update script is given a UUID, and the database tracks all UUIDs that have been applied.
I would checkout the book Refactoring Databases for more details and examples of how to manage database changes.

Performance tradeoffs with junction table as opposed to a foreign key array? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
how does one know when to use an array for foreign keys vs a junction table? (which will achieve better performance in which scenarios) IE if i know i only have the 12 months as my foreign key i would assume that the array would perform better. Does anyone know how to justify when one would perform better than the other? are there things to look for?
like if the many to many relationship exists where each one only has typically 5 foreign keys. what about 10. how about 1000, etc. when do you make the call to stop using a junction table and use an array?
I am not looking for opionions. I am looking for ways in which to determine how to measure or make a call for what is the better performance choice.
If you have a many-to-many table, you should always use a junction table. If you have a one to many relationship you should have parent child tables. What you shoudl virtually never have is a table with fields like: Phone1, Phone2, Phone3.