I have the following table security of securities:
id | bbg_ticker | ticker | currency | exchange | type
-------+------------------+----------+----------+----------+--------
1762 | 653 HK Equity | 653 | | HK | Equity
5734 | LO US Equity | LO | | US | Equity
6175 | H US Equity | H | | US | Equity
4563 | BDTY Index | BDTY | | | Index
6253 | MOA Comdty | MOA | | | Comdty
7414 | 1333 JP Equity | 1333 | | JP | Equity
7538 | 2377 TT Equity | 2377 | | TT | Equity
As you can guess, the Bloomberg ticker (column bbg_ticker) is actually the concatenation of:
ticker + exchange + type if exchange is not null
ticker + type else
This is obviously an irrelevant duplicating of data. For instance, one could one day change the exchange of a security without changing its bbg_ticker, which would lead to a failure in the data integrity.
As I am currently refactoring my database, I was wondering if I could do something like a "view" but with a column (not a whole table). This view would replace the bbg_tickercolumn (would be named the same), and would be defined for instance with (I know the following request is totally wrong):
CREATE COLUMN VIEW bbg_ticker
ON security AS (
ticker
|| (IF exchange IS NOT NULL THEN (' ' || exchange) ELSE '')
|| ' '
|| type
)
;
It seems than VIEWS cannot be defined this way, but would you see another solution which would lead to the same result?
I know I could use check constraints and regex, which would solved the integrity question, but not the duplicating one. My question is more about having a better solution.
I am on PostgreSQL 9.5.
Thank you
You can just create a view from the whole table:
CREATE VIEW improved_security AS
SELECT id, concat_ws(' ', ticker, exchange, type) AS bbg_ticker,
ticker, currency, exchange, type
FROM security;
The concat_ws() function does all the nasty string concatenation for you, with the appropriate separator (a space in this case) and ignoring NULLs.
If you need to maintain the name security in order not to break any applications, then you should rename the table (which will require a table lock for a brief period of time) and then create the view with the name security and grant appropriate permissions to the view and revoke the same from the table. Wrap this all inside of a transaction so you will not get any errors on the client side, at best a brief delay in putting or pulling data.
Related
I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.
I have two Excel tables- the first one is the data table and the second one is a look up table. Here is how they are structured-
Data Table
+----------+-------------+----------+----------+
| Category | Subcategory | Division | Business |
+----------+-------------+----------+----------+
| A | Red | Home | Q |
| B | Blue | Office | R |
| C | Green | City | S |
| D | Yellow | State | T |
| D | Red | State | T |
| D | Green | Office | Q |
+----------+-------------+----------+----------+
Lookup Table Lookup Table
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| 0 | 0 | 0 | Q | ABC |
| B | 0 | Office | 0 | DEF |
| C | Green | 0 | 0 | MNO |
| D | 0 | State | T | RST |
+----------+-------------+----------+----------+--------------+
So I want to add the lookup value column to the data table based on the criteria given in the lookup table. Eg, for the first row in the lookup table, I dont want to lookup on Category, Subcategory, or Division. but if the Business is Q, then I want to populate the lookup value as ABC. Similarly, for the second row I dont want to consider the Subcategory. and Business. but if the Category. is "B" and Division is "Office", I want it to populate DEF. So the result should look like this-
[Final Resulting Data Table]
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| A | Red | Home | Q | ABC |
| B | Blue | Office | R | DEF |
| C | Green | City | S | MNO |
| D | Yellow | State | T | RST |
| D | Red | State | T | RST |
| D | Green | Office | Q | ABC |
+----------+-------------+----------+----------+--------------+
I am very new to SQL and the actual data set is very complex wih multiple lookup values based on different criteria. IF you think any other scripting language would work better, I am open to that too. My data is in Excel currently
If the data is so complex, you should first consider if you want to put it in a (relational) database (like MS Access, MySQL, etc.) instead of in a spreadsheet (like MS Excel).
Both kind of programs are used for structured data handling, but databases focus primarily on efficient data storage and data integrity (including guarding type safety, required fields, unique fields, required references between various datasets/tables, etc.) and spreadsheets focus primarily on data analysis and calculations.
Relational databases support Structured Query Language (SQL) to let clients query their data. Spreadsheets normally do not use or support SQL (as far as I know).
It is possible to let MS Excel import or reference data in an external data source (like a relational database) to perform analysis and calculations on it.
The other way around is (sometimes) possible too: to link to spreadsheet worksheets as external tables inside a relational database system to - within certain limits - allow that data to be queried using SQL. But using a database to store the data and a spreadsheet (as a database client) to perform analysis on the data in the database would be a more logical design in my opinion.
However, creating such an integrated solution using multiple MS Office applications and/or external databases can be a complex challenge, especially when you are just starting to learn about them.
To be honest, I am not experienced with designing MS Office based solutions, so I cannot guide you around any pitfalls. I do hope, that this answer helps you a little with finding the right way to go here...
Is it possible to convert table with many columns to many tables of two columns without losing data?
I will show what I mean:
Let say I have a table
+------------+----------+-------------+
|country code| site | advertiser |
+------------+----------+-------------|
| US | facebook | Cola |
| US | yahoo | Pepsi |
| FR | facebook | BMW |
| FR | yahoo | BMW |
+------------+----------+-------------+
The number of rows = [(number of countries) X (number of sites)] and the advertiser column is a variable that gets a value from a list with a limited number of advertisers
Is it possible to transform the 3 columns table to several tables with 2 columns without losing data?
If create two tables likes this I will surly lose data:
+------------+------------+
|country code| advertiser |
+------------+------------+
| US | Cola,Pepsi |
|-------------------------|
| FR | BMW |
+-------------------------+
+------------+------------+
| site | advertiser |
+------------+------------+
| facebook | Cola,BMW |
|-------------------------|
| yahoo | Pepsi,BMW |
+-------------------------+
But is I add a third "connection" table this will it help keep all the data and have the ability to recreate the original table?
+--------------+--------------------+
| country code | site |
+--------------+--------------------+
| US | facebook,yahoo |
|-----------------------------------|
| FR | facebook,yahoo |
+-----------------------------------+
Whether the table you specify can be 'converted' into into multiple tables is determined by whether the table is in fifth normal form i.e. if and only if every non-trivial join dependency in it is implied by the candidate keys.
If the table is in fifth normal form then it cannot be converted into multiple tables. If the table is not in fifth normal form then it is in one of the four lower normal forms and can be further normalized into fifth normal form by 'converting' it into multiple tables.
A table's normal form is determined by the column dependencies. These are determined by the meaning of the table i.e. what this table represents in the real world. You have not stated what the meaning of this table is and so whether this particular table can be converted into multiple tables is unknown.
You need to understand the process of normalization and using this you should be able to determine if it is possible to convert table with many columns to many tables of two columns without losing data? based on the column dependencies in the table.
You may be looking for Entity-Attribute-Value. Certainly it is much better than your proposal for keeping field values organized and not requiring a search of the field to determine if a value is present.
I've designed a few Vehicle Identification Number (VIN) decoders for different OEMs. The thing about VIN numbers...despite being somewhat standardized, each OEM can overload the character position codes and treat them differently, add "extra" metadata (i.e. asterisks pointing to more data outside the VIN number), etc., etc. Despite all that, I've been able to build several different OEM VIN decoders, and now I'm trying to build a GM VIN decoder, and it is giving me a headache.
The gist of the problem is that GM treats the vehicle attributes section (position 4,5,6,7) differently depending on whether it is a truck or a car. Here is the breakdown:
GM Passenger Car VIN breakdown
GM Truck VIN breakdown
Normally what I do is design my own crude ETL process to import the data into an RDMBS - each table roughly correlates with the major VIN breakdown. For example, there will be a WMI table, EngineType table, ModelYear table, AssemblyPlant table, etc. Then I construct a View that joins on some contextual data that may or not be gleaned directly from the character codes in the VIN number itself (e.g. some vehicle types only have certain vehicle engines).
To look up a VIN is simply a matter of querying the VIEW with each major character code position breakdown of the VIN string. For example, an example VIN of 1FAFP53UX4A162757 breaks down like this in a different OEM's VIN structure:
| WMI | Restraint | LineSeriesBody | Engine | CheckDigit | Year | Plant | Seq |
| 123 | 4 | 567 | 8 | 9 | 10 | 11 | 12-17 |
---------------------------------------------------------------------------------
| 1FA | F | P53 | U | X | 4 | A | ... |
GM has thrown a wrench into this...depending on whether it is a car or truck, the character code positions mean different things.
Example of what I mean - each ASCII table below correlates somewhat to a SQL table. etc.. means there is a whole lot of other columnar data
Passenger Car
Here's an example of position 4,5 (corresponds to vehicle line/series). These really go together, the VIN source data doesn't really differentiate between position 4 and 5 despite the breakdown illustrated above.
| Code (45)| Line | Series | etc..
--------------------------------------
| GA | Buick | Lacrosse | etc..
..and position 6 corresponds to body style
| Code (6) | Style | etc..
--------------------------------------
| 1 | Coupe, 2-Door | etc..
Trucks
..but for trucks, the structure is completely different. Consider position 4 stands on its own as Grosse Vehicle Weight Restriction GVWR.
| Code (4) | GVWR | etc..
-------------------------------
| L | 6000 lbs | etc..
..and positions 5,6 (Chassis/Series) now mean something similar to position 4,5 of passenger car:
| Code (56) | Line | Series | etc..
---------------------------------------
| RV | Buick | Enclave | etc..
I'm looking for a crafty way to resolve this in the relational design. I would like to return a common structure when a VIN is decoded -- if possible (i.e. not returning a different structure for cars vs. trucks)
Based on your answer to my comment regarding if you can identify the type of vehicle by using other values, a possible approach could be to have a master table with the common fields and 2 detail tables, each one with the appropriate fields for either cars or trucks.
Approximately something like the following (here I am guessing WMI is the PK):
Master table
| WMI | Restraint | Engine | CheckDigit | Year | Plant | Seq |
| 123 | 4 | 8 | 9 | 10 | 11 | 12-17 |
Car detail table
| WMI | Veh Line | Series | Body Type |
| 123 | 2 | 3 | 4 |
Truck detail table
| WMI | GWVR | Chassis |Body Type |
| 123 | 7 | 8 | 9 |
Having this, you could use a unique select to retrieve the needed data like following:
Select *
From
(
Select M.*,
C.*,
Null GWVR,
Null Chassis,
Null Truck_Body_Type
From Master_Table M
Left Join Car_Table C
on M.WMI = C.WMI
Union
Select M.*,
Null Veh_Line,
Null Series,
Null Car_Body_Type
T.*
From Master_Table M
Left Join Truck_Table T
on M.WMI = T.WMI
)
As for DML SQL you would only need to control prior to insert or update sentences whether you have a car or a truck model.
Of course you would need to make sure that only one detail exists for each master row, either on the car detail table or on the truck detail table.
HTH
Why you do not define both of these rules for the decoding; only one will resolve a valid result.
I have been working to build a more abstract schema, where there had been several tables modeling remarkably similar relationships, I want to model just the "essence". Due to the environment I am working with (Drupal 7), I can't change the nature of the issue: that a relationship of the same essential type could reference one of two different tables for the object in one role. Let's bring in some example to clarify (this is not my actual problem domain, but a similar problem). Here are the requirements:
First, if you are unfamiliar with Drupal, here's the gist: Users in one table, every other entity in a single second table (gross generalization, but enough).
Let's say we want to model the "works for" relationship, and lets have the given be that "companies" are of type "entity" and "supervisor" is of type "user" (and by "type" I mean that's the table in the database where their tuples reside). Here are the simplified requirements:
A user can work for a company
A company can work for a company
These "works for" relationships should be in the same table.
I have two ideas, and both don't exactly sit well with my current disposition toward schema quality, and this is where I would like some insight.
One foreign-key column paired with a 'type' column
Two foreign-key columns, always at most one utilized (ick!)
In case you are a visual thinker, here are the two options representing the fact that users 123 and 632, as well as entity 123 all work for entity 435:
Option 1
+---------------+-------------+---------------+-------------+
| employment_id | employee_id | employee_type | employer_id |
+---------------+-------------+---------------+-------------+
| 1 | 123 | user | 435 |
+---------------+-------------+---------------+-------------+
| 2 | 123 | entity | 435 |
+---------------+-------------+---------------+-------------+
| 3 | 632 | user | 435 |
+---------------+-------------+---------------+-------------+
Option 2
+---------------+------------------+--------------------+-------------+
| employment_id | employee_user_id | employee_entity_id | employer_id |
+---------------+------------------+--------------------+-------------+
| 1 | 123 | <NULL> | 435 |
+---------------+------------------+--------------------+-------------+
| 2 | <NULL> | 123 | 435 |
+---------------+------------------+--------------------+-------------+
| 3 | 632 | <NULL> | 435 |
+---------------+------------------+--------------------+-------------+
Thoughts on option 1: I like that the employee_id column has concrete role, but I despise that it has ambiguous target. Option 2 has ambiguous role (which column is the employee?), but has concrete target for any given FK, so I can think of it this way:
+-----------+-----------+----------+
| | ROLE |
| | ambiguous | concrete |
+-----------+-----------+----------+
| T | | |
| A ambig. | | 1 |
| R | | |
| G -------+-----------+----------+
| E | | |
| T concr. | 2 | ? |
| | | |
+-----------+-----------+----------+
Option two has very pragmatic benefits for my project, but I do not feel comfortable with so many nulls (you might not even call it 1NF!)
So here's the crux of my question for SO: How can option 1 be improved, or else what knowledge gap might I have that leaves me unsettled? While I can't bring to mind a specific rule which it violates, the design clearly is not in keeping with the intentions of normalization (requiring two columns to uniquely identify a relationship is not doing me any favors for safeguarding against anomalies).
I do understand that the ideal solution would be to redesign the users entity to be the same as what I have been calling "entity" here, but please consider that beside the point/circumstantial (or at least let's draw the pragmatic line right exactly there for this question).
Again, the essential question: What, in terms of normalization, is wrong with schema option 1, and how might you model this relationship given the constraint of not refactoring "user" into "entity"?
note: For this, I am more interested in theoretical purity than a pragmatic solution
The solutions you present contravene 4th normal form as #podiluska says. If this is recast into the form below, then the solution removes this difficult and is in 5NF (and even 6NF?).
Adopt one of the patterns for sub/super types. This uses the relation definitions set out below, plus the super/subtype constraint. This constraint is that each tuple in the super type relation must correspond exactly to one sub type tuple. In other words, the subtypes must form a disjoint, covering set over the supertype.
I suspect the performance of this in a real situation might require some heavy tuning:
Table: Employment
+---------------+-------------+
| employee_id | employer_id |
+---------------+-------------+
| 1 | 435 |
+---------------+-------------+
| 2 | 435 |
+---------------+-------------+
| 3 | 435 |
+---------------+-------------+
Table: Employee (SuperType)
+---------------+
| employee_id |
+---------------+
| 1 |
+---------------+
| 2 |
+---------------+
| 3 |
+---------------+
Table: User employee (SubType)
+---------------+-------------+
| employee_id | user_id |
+---------------+-------------+
| 1 | 123 |
+---------------+-------------+
| 3 | 632 |
+---------------+-------------+
Table: Entity employee (SubType)
+---------------+-------------+
| employee_id | entity_id |
+---------------+-------------+
| 2 | 123 |
+---------------+-------------+
What is wrong with option 1 ( and option 2) is that it is a multivalued dependency, and as such, a breach of 4th normal form. However, within the constraints you have given, there's not a lot you can do about that.
If you could replace the worksfor table with a view, then you could keep user-company and company-company relations separate.
Of your two choices, Option 2 has the advantage that it may be easier to enforce the referential integrity, depending on your platform.
One potential, if icky, pragmatic solution within you current constraints could be to give companies positive IDs and users negative IDs which eliminates the empty column of option 2 and turns the type column of option 1 into an implication, but I feel dirty even suggesting it.
Similarly, if you don't need to know what type the entity is as long as you can determine it via joining, then using Guids as IDs would eliminate the need for the type column