SQL Architecture design - sql

I have the following tables:
+---+---------+
|id | name | foreign_key1 = this table's id
+---+---------+
|1 | White |
|2 | Black |
+---+---------+
+----+------+-------------+
|id | name | foreign_key1|
+----+------+-------------+
|1 | Grey | 1 |
|2 | Grey | 2 |
+----+------+-------------+
Is there a way that I could persist the last table's information with only one row? So that table could represent that grey is both white and black in one row?

You could use an array-like column type (string) and make it a one row record, but I wouldn't suggest that, it's better to have them as separate rows. Your approach is fine, but I'll suggest (if I've understood your idea) a little different schema:
You can make two tables: Colors, and Related_Colors, like this:
Colors
+---+---------+
|id | name |
+---+---------+
|1 | White |
|2 | Black |
|3 | Gray |
+---+---------+
Related_Colors
+---+---------+---------+
|id |color1_id|color2_id|
+---+---------+---------|
|1 |3 |1 |
|2 |3 |2 |
+---+---------+---------+

You could; it's called denormalization (https://en.wikipedia.org/wiki/Denormalization). Essentially you would need to create (in the second table) a column for each one of the possible IDs in the first table. So the schema of the second table would be:
ID
Name
White
Black
This also explains why you probably should not do it; what happens when you want to add another ID in the first table (e.g. purple)? As things are you would just need to add another row in table 1, and reference it from the relevant rows. If you denormalize this way, you would need to change the schema to accommodate the new possible value. The new column would of course be empty for most rows.
Another possibility would be to maintain the values as a concatenated string; so the schema would be
ID
Name
List Of IDs From Table1
And in this case the last field would contain White, Black. The drawback of this approach is that you can no longer query efficiently by the values from table1. (You can't properly index that field)
Ultimately the question is - what are your needs. If you need to read rows quickly, and have them in a 'reporting friendly' format, denormalization may work for you. But in most DB design cases it would not be required.

Related

Pentaho Data Integration De-normalize many row values as field names

I am reading data from a survey in a table that has 3 fields:
- record
- question
- answer
in each row for every record there are many questions with the relative answer:
|record|question|answer|
------------------------
|1 |q1. |a1. |
|1 |q2. |a2. |
|2 |q1. |a1. |
|2 |q2. |a2. |
What i want to do in Pentaho is transform this table to one where i have the record field and then each question should be a field so that rows contain record id and answer values:
|record|q1 |q2. |
------------------------
|1 |a1 |a2 |
|2 |a1 |a2 |
I would do it with the de-normalization step, but in my case i have a lot and possibly changing questions, so i was wondering if there is an automatic way to have the values in the input question field mapped to the output field names.
You can try with Metadata injection to inject these values in runtime!!!

Select a large number of ids from a Hive table

I have a large table with format similar to
+-----+------+------+
|ID |Cat |date |
+-----+------+------+
|12 | A |201602|
|14 | B |201601|
|19 | A |201608|
|12 | F |201605|
|11 | G |201603|
+-----+------+------+
and I need to select entries based on a list with around 5000 thousand IDs. The straighforward way would be to use the list as a WHERE clause but that would have a really bad performance and probably it even would not work. How can I do this selection?
Using a partitioned table things run fast. Once you partitioned the table add your ids into the where.
You can also extract a subtable from the original one selecting all the rows which have their ids between the min and the max of you ids list.

Database Sql Chain Integers in row

At the moment I have for each users an integer in my database
which datatype would I have to use if I want to chain multiply integers for a single users in my database?
Now:
__________________________________
Users Numbers
Tom 2
__________________________________
What I want:
__________________________________
Users Numbers
Tom 2,12
__________________________________
As #jarlh stated, you shouldn't design your database to contain a set of data.
A relational database column must contain only a data of a single kind and not a set of data or different kinds of data through your rows.
To fix your error you can create another table named Numbers and associate it to your Users table with a 1:N (one to many) relation like shown here:
_Users___ _Numbers________________
|ID |name | |NumberID |UserID |value |
|1 |Tom | |1 |3 |243 |
|2 |Jess | |2 |1 |12 |
|3 |Luis | |3 |2 |87 |
In the Users table you have an ID and the name, then in your Numbers table you associate a number (for each new number you must insert a new row) to its owner with the foreign key UserID

Database design: I want a column value to determine which table to query

I don't have much experience in designing databases. I want a column value to determine which table to query, and I don't know if there is a better method for this. Here is the concrete problem for better understanding:
I am designing a database for a survey creator application. I want to store different kind of questions (for example: multiple choice questions and basic text question). I have the following tables:
QUESTION
| ID | Title | TypeID |
----------------------------------------------
| 1 | "Pick a num from 1-10" | 1 |
| 2 | "Choose some from the list:" | 2 |
TYPE
| ID | Name | ExtraValues |
--------------------------------------------
|1 |Scale Question |ScaleValues |
|2 |Multiple Choice |MultiValues |
SCALE VALUES
|Question_ID | Min | Max |
--------------------------
|1 | 1 |10 |
MULTI VALUES
|Question_ID | Name | Value |
--------------------------------
|2 | Sugar | 10 |
|2 | Milk | 20 |
|2 | Egg | 14 |
So from now on, if a question is a "Multiple choice" type, than I want to check the table MULTI VALUES, else the SCALE VALUES. I can do it with stored procedure or I can just query the all the SOMETHING VALUES tables for the question_ID. But is there a better way to do it?
You can certainly design your database that way. However you can't grab the "ExtraValues" column in a query and have that automagically pull in that table into a query. Not without dynamically executed sql. You're best bet is just use branching logic on the question type and use that to determine where to get other related data.
You could also move the min and max fields into the QUESTION table and do away with the ScaleValues table completely. You could just set the to NULL if it's a multiple choice question.
I think there is definetely a better way to do it. Set up a many to many relationship between questions and available answers. Add a third column, named points. So your three tables would be:
Question - QuestionId and Text
Answer - AnswerId and Text
QuestionAnswer - QuestionId, AnswerId, and Points.
Award 0 points for wrong answers.
This design might be too simple. You might need a Test Table as well. Then you would need a TestId field in that many to many table, which would now be called, TestQuestionAnswer.

complex'ish SQL joins across multiple tables with multiple conditions across all tables

Given the following tables:
labels tags_labels
|id |name | |url |labelid |
|-----|-------| |/a/b |1 |
|1 |punk | |/a/c |2 |
|2 |ska | |/a/b |3 |
|3 |stuff | |/a/z |4 |
artists tags
|id |name | |url |artistid |albumid |
|----|--------| |------|-----------|---------|
|1 |Foobar | |/a/b |1 |2637 |
|2 |Barfoo | |/a/z |2 |23 |
|3 |Spongebob| |/a/c |1 |32 |
I would like to get a list of urls that match a couple of conditions (which can be entered by the user into the script that uses these statements).
For example, the user might want to list all urls that have the labels "(1 OR 2) AND 3", but only if they are by the artists "Spongebob OR Whatever".
Is it possible to do this within a single statement using inner/harry potter/cross/self JOINs?
Or would I have to spread the query across multiple statements and buffer the results inside my script?
Edit:
And if it is possible, what would the statement look like? :p
Yes, you can do this in one query. And maybe an efficient way would be to dynamically generate the SQL statement, based on the conditions the user entered.
This query would allow you to filter by label name or artist name.
Building the sql dynamically to concatenate the user parameters or
passing the desired parameters into a stored procedure would obviously change
the where clauses but that really depends on how dynamic your 'script' must be...
SELECT tl.url
FROM labels l INNER JOIN tags_labels tl ON l.id = tl.labelid
WHERE l.name IN ('ska','stuff')
UNION (
SELECT t.url
FROM artists a INNER JOIN tags t ON a.id = t.artistid
WHERE a.name LIKE '%foo%'
)
Good Luck!