introduce an array to very big flat table (Terra byte of data) good or bad? - data-science

I have a flat table in couch database where it having TB (Terra Byte) of data. Now we got a challenge where in this table have a column called "supervisor" (single value). But client want this supervisor as an Array. Is It good to keep supervisor field as an Array in this flat table ?
Above is where I am coming from.
what challenges do I expect if I introduce it as an array in this flat table ? what are the good approach for this if it not use as an array?
subject specialist warmly welcome your thoughts. I am currently doing research on this, just want your thoughts.

Related

Primary key is also a foreign key in a star schema. Good or bad?

I am creating a database following star schema. This is my schema:
The question is: is it good or bad practice that my side tables don't have an ID column but uses the same column ApsilankymoID as PK, and FK?
Relationship with 1Apsilankymas and other tables is 1:0..1.
Thanks
That's called vertical partitioning. Let's say you have one table with mostly small data types (INT, DATE, etc.) and one column that's a big data type like NVARCHAR(8000). By moving that large data type into its own table, sharing the same PK as the other table, it can reduce IO on the server; especially if you don't use the large field that often. If it's not that much data in the table, you probably won't get the bang for your buck, but if it's a lot of data, it can help a lot. Here's a site with more info. There's other intricacies that you should read about. As far as good for a star schema... not sure. I'm sure someone else will have a good answer there.
https://www.sqlshack.com/database-table-partitioning-sql-server/

Normalizing DBs - Can this be normalized further?

I have this mockup for a database I will be creating. I'm wondering how I can further normalize it, and so far my thoughts are breaking out date into it's own table. What would be common practice?
The answer is: probably yes. But without having an exact definition of every field, i.e. what do they mean in the context of your data model, it's hard for us to give a good answer on this.
Looking at the trips table, I'm seeing the column zip_code which looks to be out of place. zip_code field is not directly related to the primary key of the trip table (AFAICT anyway). A zip code is a property of a city. I would say that zip_code should be stored in the city table.
What you are aiming for is probably to end up in a database normalized to the third normal form (3NF). You should read up on normalization and apply the rules up to 3NF. To go further into what this entails would be duplicating numerous tutorials, courses and books. You could take this question on SO as a starting point and try to apply this to your data model.

Database design variable amount of fields

I'm making a website for a client but I stumbled upon a problem and I need some advice on it.
For each project, they want to have the possibility to set a variable amount of images and (sometimes) some corresponding text.
I was thinking about storing all of the information in one field, instead of making field_1 to field_99 just in case they need 99 fields.
// database column
'../fotos/foto1.png',
'hier komt tekst',
'../fotos/foto2.png',
'', (empty text)
'../fotos/foto3.png'
This solution has some disadvantadges, there must be better manners out there to achieve this.
What's the preferred way to do this?
Create another table (e.g. FOTO_CODES) with all possibly values of foto and generate id for them.
Create another child table that will have the master table record id and ID from FOTO_CODES table and FOTO data (Image).
It's called normalization.
The solution you described violates the principle of atomicity and therefore the 1NF. You'd have trouble maintaining and querying data in this format.
This is a classic 1-to-many relationship, that can be modeled in two ways:
1) Identifying relationship:
2) Non-identifying relationship:
Both have pros and cons, StackOverflow already has plenty of discussions on this topic.

Small Help with Table Script?

i am new to SQL and i have a small question. i am writing a table script and i have question about two fields in that table. Here is the Table structure :
Billing
CustomerName
CustomerPhone
BGFlag (Y/N)
UpdateIndicator (B=Before,A=After)
My question is, do i have to write script for (Y/N) in BGFlag and (B=Before,A=After)
in UpdateIndicator in the create table script. what i am thinking is i just have to create table with these column names and (Y/N), (B=Before,A=After) is the data for that two columns which i will get in sample file. Any suggestions?
Thanks
Sounds like that is just application-specific metadata about those columns. You could put that in extended properties of the table, but nobody except a curious DBA is going to see it.
Keep in mind, even if the data you are importing into your database uses Y/N and B/A, you can always transform that into a bit value (0/1), which seems a better idea from a field design perspective.
Or, if you literally want it to hold those text values (Y/N and B/A), then just use a CHAR(1) field. The risk, though, is that anyone could put any single-character text value in these columns.

Optimal DB structure for additional fields entity

I have a table in a DB (Postgres based), which acts like a superclass in object-oriented programming. It has a column 'type' which determines, which additional columns should be present in the table (sub-class properties). But I don't want the table to include all possible columns (all properties of all possible types).
So I decided to make a table, containg the 'key' and 'value' columns (i.e. 'filename' = '/file', or 'some_value' = '5'), which contain any possible property of the object, not included in the superclass table. And also made one related table to contain the available 'key' values.
But there is a problem with such architecture - the 'value' column should be of a string data type by default, to be able to contain anything. But I don't think converting to and from strings is a good decision. What is the best way to bypass this limitation?
The design you're experimenting with is a variation of Entity-Attribute-Value, and it comes with a whole lot of problems and inefficiencies. It's not a good solution for what you're doing, except as a last resort.
What could be a better solution is what fallen888 describes: create a "subtype" table for each of your subtypes. This is okay if you have a finite number of subtypes, which sounds like what you have. Then your subtype-specific attributes can have data types, and also a NOT NULL constraint if appropriate, which is impossible if you use the EAV design.
One remaining weakness of the subtype-table design is that you can't enforce that a row exists in the subtype table just because the main row in the superclass table says it should. But that's a milder weakness than those introduced by the EAV design.
edit: Regarding your additional information about comments-to-any-entity, yes this is a pretty common pattern. Beware of a broken solution called "polymorphic association" which is a technique many people use in this situation.
How about this instead... each sub-type gets its own DB table. And the base/super table just has a varchar column that holds the name of the sub-type DB table. Then you can have something like this...
Entity
------
ID
Name
Type
SubTypeName (value of this column will be 'Dog')
Dog
---
VetName
VetNumber
etc
If you don't want your (sub-)table names to be varchar values in the base table, you can also just have a SubType table whose primary key will be in the base table.
The only workaround (while retaining your strucure) is to have separate tables:
create table IntProps(...);
create table StringProps(...);
create table CurrencyProps(...);
But I do not think that this is a good idea...
One common approach is having the key-value table contain multiple columns, one for each data type, i.e. StringValue, DecimalValue, etc.
Just know you're trading queryability and performance for a database schema you don't need to change. You could also consider ORM mapping or an object database.
You could have a per type key/value table. The available table would need to encode the availability of a specific key/type pair to point to the correctly typed key/value table.
This seems like a highly inefficient architecture in for a row based relational databases however.
Perhaps you should take a look at a column oriented relational database?
Thanks for the answers. I'll explain a little bit more specifically what i need.
There's a need to program a blog+forum website, and I've been looking at the WordPress DB structure.
There's a strong need for the ability to place comments to any kind of 'object', like a blog entry, or a video file attachment to it. The above DB structure being very easy to scale and to fulfill all our needs was the reason of its choice.
But that's not late to change it, cause this is in stage of early engineering. Also our model smells now like a completely tree-hierarchy based DB. For now I'll accept Bill Karwin's and fallen888 answers, but maybe I'm going in a totally wrong direction?
about the user being able to add a new field to the table:
I admire all these people making comments.
I used to be interested in this kind of thing a few years ago, but have written little code recently (apart from a little bit of PHP and MYSQL).
I think it's fine if you want to keep going - you may end up with something new.
Sorry to pour any cold water on the scheme - I admire your efforts. My personal belief is that if you go far enough in this direction, you will end up with a system that interprets more of natural language than SQL does. (Around 1970, SQL was actually spelt Sequel, and it actually stood for "structured english query language", but after they standardized it in the 1970's - I think someone said that Oracle was the first commercial implementation, 19079, the "English" got dropped off, because I guess they decided that it was only a tiny subset of English.
I have run out of steam in this area, because I haven't got a job. Without an easy job that pays the bills, where I can experiment with these ideas, it's a bit hard to concentrate on this area.
Best wishes to all.
sorry, I wrote 19079 above, I meant the year 1979. Oracle got their first contract writing a database for the CIA.