I have to fix a very poorly designed database.
The problem:
One Job Advertisment has one jobtitle, but many qualifying degrees.
(e.g., JobTitle:Analyst, Qualifications: Accounting Degree, or Finance Degree or Business Degree)
The tables:
TableName: UniqueJobName Columns: jobName(char) uniqueJobUid(bigint)
TableName: UniqueDegree Columns: degreeName(char) degreeUid(bigint)
TableName: Jobs Columns: jobName(char) jobUid(bigint),uniqueJobUid(bigint)
TableName: Job_Degree: jobUid(char) degreeName(char)
Relations
onetomany UniqueJobName.uniqueJobUid -> Jobs.uniqueJobUid
onetomany Jobs.jobUid-> Job_Degree.jobUid
There is NO relation between Jobs and UniqueDegree.
Technical Requirement
Rather than creating a column in Job_Degree for degreeUid, I want to create a new table: UniqueJob_UniqueDegree_Job (There are reasons for this that I won't explain here)
UniqueJob_UniqueDegree_Job will have three columns:
uniqueJobUid
jobId
degreeId
The trouble is that the Job table is already very big, 500,000 rows (and the Job_Degree table even bigger)
QUESTION:
What is the most efficient SQL statement for creating the UniqueJob_UniqueDegree_Job table given that part of the statement will be comparing the char column of UniqueDegree.degreeName and Job_Degree.degreeName?
Any hints would be most appreciated.
select j.jobname, j.jobuid, ud.degreeid
into UniqueJob_UniqueDegree_Job
from jobs j
join job_degree jd on j.jobuid = jd.jobuid
join uniquedegree ud on ud.jobname = jd.jobname
Having a hard time with getting uppercase letters etc because I use a worthless cellphone.
This should however do it. Note in order to do select Into... From the table cannot be created already (you can use convert or cast on each attribute in the select statement to get the data types correct with certainty.
If the table already exist then alter the query into
insert Into ..
select ...
from ....
500k rows is rather small as well. This shouldn't take more than a couple of seconds I'd estimate.
Related
Assuming that the database is already populated with data, and that each of the following SQL statements is the one and only query that an application will perform, why is it better to use row-wise or column-wise record storage for the following queries?...
1) SELECT * FROM Person
2) SELECT * FROM Person WHERE id=5
3) SELECT AVG(YEAR(DateOfBirth)) FROM Person
4) INSERT INTO Person (ID,DateOfBirth,Name,Surname) VALUES(2e25,’1990-05-01’,’Ute’,’Muller’)
In those examples Person.id is the primary key.
The article Row Store and Column Store Databases gives a general discussion on this, but I am specifically concerned about the four queries above.
SELECT * FROM ... queries are better for row stores since it has to access numerous files.
Column store is good for aggregation over large volume of date or when you have quesries that only need a few fields from a wide table.
Therefore:
1st querie: row-wise
2nd query: row-wise
3rd query: column-wise
4th query: row-wise
I have no idea what you are asking. You have this statement:
INSERT INTO Person (ID, DateOfBirth, Name, Surname)
VALUES('2e25', '1990-05-01', 'Ute', 'Muller');
This suggests that you have a table with four columns, one of which is an id. Each person is stored in their own column.
You then have three queries. The first cannot be optimized. The second is optimized, assuming that id is a primary key (a reasonable assumption). The third requires a full table scan -- although that could be ameliorated with an index only on DateOfBirth.
If the data is already in this format, why would you want to change it?
This is a very simple data structure. Three of your four query examples access all columns. I see no reason why you would not use a regular row-store table structure.
I am working on a generalized problem where I am given only schema definition of multiple tables that i have.
Now i have to retrieve certain columns by joining multiple tables such that number of joins are minimized.
Example: Suppose i have 3 tables and here is the list of columns that they have.
Table 1:(1,2,3,4,5),
Table 2:(5,6,7),
Table 3:(5,6,7,8)
Now suppose I have a query in which i want all the columns 1,2,3,4,5,6,7,8.
Now i can join either table 1,table 2 and table 3 OR
table 1 and table 3.I would get the required information in both the cases but joining table 1 and table 3 would require only 1 join rather than 2 join in other case.
What i was trying was a greedy algorithm in which first i would consider table that has maximum number of required columns then eliminate the common columns between the query and table(from both query and table) and then consider updated required columns and update tables and so on.But i guess it would be slow.
So is there a generalized algorithm or if anyone can give me any hint in this direction?
first of all, I have to mention that it's not "join", but "union".
Then I have to mention that if you want to use the greedy algorithm, you have to first join the 2 most short, cause when you join a table 2 times, it would be of o(n), and so you will have 2n operations to do, and so it would be better if n be as smaller as possible.
Beside these, the following link may be useful for you:
Merging 3 tables/queries using MS Access Union Query
Ok, so I have a Student table that has 6 fields, (StudentID, HasBamboo, HasFlower, HasAloe, HasFern, HasCactus) the "HasPlant" fields are boolean, so 1 for having the plant, 0 for not having the plant.
I want to find the average number of plants that a student has. There are hundreds of students in the table. I know this could involve transposing of some sort and of course counting the boolean values and getting an average. I did look at this question SQL to transpose row pairs to columns in MS ACCESS database for information on Transposing (never done it before), but I'm thinking there would be too many columns perhaps.
My first thought was using a for loop, but I'm not sure those exist in SQL in Access. Maybe a SELECT/FROM/WHERE/IN type structure?
Just hints on the logic and some possible reading material would be greatly appreciated.
you could just get individual totals per category:
SELECT COUNT(*) FROM STUDENTS WHERE HasBamboo
add them all up, and divide by
SELECT COUNT(*) FROM STUDENTS
It's not a great database design though... Better normalized would be:
Table Students; fields StudentID, StudentName
Table Plants; fields PlantID, PlantName
Table OwnedPlants; fields StudentID,PlantID
The last table then stores records for each student that owns a particular plant; but you could easily add different information at the right place (appartment number to Students; Latin name to Plants; date aquired to OwnedPlants) without completely redesigning table structure and add lots of fields. (DatAquiredBamboo, DateAquiredFlower, etc etc)
I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);
i am tracking exercises. i have a workout table with
id
exercise_id (foreign key into exercise table)
now, some exercises like weight training would have the fields:
weight, reps (i just lifted 10 times # 100 lbs.)
and other exercises like running would have the fields: time, distance (i just ran 5 miles and it took 1 hours)
should i store these all in the same table and just have some records have 2 fields filled in and the other fields blank or should this be broken down into multiple tables.
at the end of the day, i want to query for all exercises in a day (which will include both types of exercises) so i will have to have some "switch" somewhere to differentiate the different types of exercises
what is the best database design for this situation
There are a few different patterns for modelling object oriented inheritance in database tables. The most simple being Single table inheritance, which will probably work great in this case.
Implementing it is mostly according to your own suggestion to have some fields filled in and the others blank.
One way to do it is to have an "exercise" table with a "type" field that names another table where the exercise-specific details are, and a foreign key into that table.
if you plan on keeping it only 2 types, just have exercise_id, value1, value2, type
you can filter the type of exercise in the where clause and alias the column names in the same statment so that the results don't say value1 and value2, but weight and reps or time and distance