How to deal with a 'self' relationship in SQL? - sql

In our application, we have clients and each client has a list of customers
client table:
id | name
-------------
1 | happy
2 | bashful
customer table:
id | client_id | name
----------------------------------------
50 | 1 | happys first customer
51 | 1 | happys second customer
52 | 2 | bashfuls first customer
Without going into too much detail, each client is going to have a list of prices that apply to them. For simplicity's sake, we'll say we also have a product table with product ids 1,2 and 3, and every customer will have a unique price against each item. So customer 50 will have 3 rows, customer 51 will have 3 rows, and customer 52 will have 3 rows in this price table.
price table:
id | customer_id | product_id |
----------------------------------------
50 | 50 | 1 | 4.99
51 | 50 | 2 | 6.20
52 | 50 | 3 | 8.00
...
Now here's the kicker: each client should also have their own rows on this price table. We'll refer to this client price list as the 'base list', because in the context of the app it's what all the customer prices will be compared against.
There are three immediately obvious solutions to me, but I'm not sure if any of them are right, or which one is optimal:
Solution 1
Add a row into the customer table where the name is something like 'self', so that 'self' can be treated almost like a client
.
Solution 2
Make the price table have two foreign key columns, one with customer_id and one with client_id, and allow customer_id to be null -- if customer_id is null, I know that row is the client row.
.
Solution 3
Have 2 price tables that are basically identical, one to foreign-key into customers and one to foreign-key into clients.

It is a good idea to declare foreign key relationships. No database that I know of supports a conditional foreign key relationships, so that eliminates having one column for both clients and their customers.
You have not specified if customers are unique to clients, so let me assume that they are not.
That suggests that Options 2 and 3 are the most reasonable. There is actually little to separate them. With a single table, you want a check constraint that exactly one of the ids is set -- unless you have customers shared across clients and you are allowing client-specific, customer-specific, and customer-client specific prices.
The more important consideration, I think, is that prices and relationships change over time. You should be thinking about how to incorporate effective and end dates into the data model to capture this information.

Related

Auto generate columns in Microsoft Access table

How can we auto generate column/fields in microsoft access table ?
Scenario......
I have a table with personal details of my employee (EmployDetails)
I wants to put their everyday attendance in an another table.
Rather using separate records for everyday, I want to use a single record for an employ..
Eg : I wants to create a table with fields like below
EmployID, 01Jan2020, 02Jan2020, 03Jan2020,.........25May2020 and so on.......
It means everyday I have to generate a column automatically...
Can anybody help me ?
Generally you would define columns manually (whether that is through a UI or SQL).
With the information given I think the proper solution is to have two tables.
You have your "EmployDetails" which you would put their general info (name, contact information etc), and the key which would be the employee ID (unique, can be autogenerated or manual, just needs to be unique)
You would have a second table with a foreign key to the empployee ID in "EmployDetails" with a column called Date, and another called details (or whatever you are trying to capture in your date column idea).
Then you simply add rows for each day. Then you do a join query between the tables to look up all the "days" for an employee. This is called normalisation and how relational databases (such as Access) are designed to be used.
Employee Table:
EmpID | NAME | CONTACT
----------------------
1 | Jim | 222-2222
2 | Jan | 555-5555
Detail table:
DetailID | EmpID (foreign key) | Date | Hours_worked | Notes
-------------------------------------------------------------
10231 | 1 | 01Jan2020| 5 | Lazy Jim took off early
10233 | 2 | 02Jan2020| 8 | Jan is a hard worker
10240 | 1 | 02Jan2020| 7.5 | Finally he stays a full day
To find what Jim worked you do a join:
SELECT Employee.EmpID, Employee.Name, Details.Date, Details.Hours_worked, Details.Notes
FROM Employee
JOIN Details ON Employee.EmpID=Details.EmpID;
Of course this will give you a normalised result (which is generally what's wanted so you can iterate over it):
EmpID | NAME | Date | Hours_worked | Notes
-----------------------------------------------
1 | Jim | 01Jan2020 | 5 | ......
1 | Jim | 02Jan2020 | 7 | .......
If you want the results denormalised you'll have to look into pivot tables.
See more on creating foreign keys

Why is INNER JOIN producing more records than original file?

I have two tables. Table A & Table B. Table A has 40516 rows, and records sales by seller_id. The first column in Table A is the seller_id that repeats every time a sale is made.
Example: Table A (40516 rows)
seller_id | item | cost
------------------------
1 | dog | 5000
1 | cat | 50
4 |lizard| 80
5 |bird | 20
5 |fish | 90
The seller_id is also present in Table B, and also contains the corresponding name of the seller.
Example: Table B (5851 rows)
seller_id | seller_name
-------------------------
1 | Dog and Cat World INC
4 | Reptile Love.com
5 | Ocean Dogs Inc
I want to join these two tables, but only display the seller name from Table B and all other columns from Table A. When I do this with an INNER JOIN I get 40864 rows (348 extra rows). Shouldn't the query produce only the original 40516 rows?
Also not sure if this matters, but the seller_id can contain several zeros before the number (e.g., 0000845, 0000549).
I've looked around on here and haven't really found an answer. I've tried LEFT and RIGHT joins and get the same results for one and way more results for the other.
SQL Code Example:
SELECT public.table_B.seller_name, *
FROM public.table_A
INNER JOIN public.table_B ON public.table_A.seller_id =
public.table_B.seller_id;
Expected Results:
seller_name | seller_id | item | cost
------------------------------------------------
Dog and Cat World INC | 1 | dog | 5000
Dog and Cat World INC | 1 | cat | 50
Reptile Love.com | 4 |lizard| 80
Ocean Dogs Inc | 5 |bird | 20
Ocean Dogs Inc | 5 |fish | 90
I expected the results to contain the same number of rows in Table A. Instead I gut names matching up and an additional 348 rows...
Update:
I changed "unique_id" to "seller_id" in the question.
I guess I should have chosen a better name for unique_id in the original example. I didn't mean it to be unique in the sense of a key. It is just the seller's id that repeats every time there is a sale (in Table A). The seller's ID does repeat in Table A because it is supposed to. I simply want to pair up the seller IDs with the seller names.
Thanks again everyone for their help!
unique_id is already not correctly named in the first table, so there is no reason to assume it is unique in the second table either.
Run this query to find the duplicates:
select unique_id
from table_b
group by unique_id
having count(*) > 1;
You can fix the query using distinct on:
SELECT b.seller_name, a.*
FROM public.table_A a JOIN
(SELECT DISTINCT ON (b.unique_id) b.*
FROM public.table_B b
ORDER BY b.unique_id
) b
ON a.unique_id = b.unique_id;
In this case, you may get fewer records, if there are no matches. To fix that, use a LEFT JOIN.
Because unique id column is not unique.
Gordon Linoff was correct. The seller_id (formerly listed as unique_id) was indeed duplicated throughout the data set. I foolishly assumed otherwise. Also the seller_name had many duplicates too! In the end I had to use the CONCAT() function to join the seller_id with second identifier to create a type of foreign key. After I did this the join worked as expected. Thanks everyone!

Proper Way to Key Data Warehouse Fact Table

When keying a FACT table in a data warehouse, is it better best to use the primary key from the foreign table or the unique key or identifier used by the business?
For example (see below illustration), assume you have two dimension tables "DimStores" and "DimCustomers" and one FACT table named "FactSales". Both of the dimension tables have an indexed primary key field that is an integer data type and is named "ID". They also have an indexed unique business key field that is a alpha-numeric text data type named "Number".
Typically you'd use the primary key of dimension tables as the foreign keys in the FACT table. However, I'm wondering if that is the best approach.
By using the primary key, in order to look up or do calculations on the facts in the FACT table, you'd likely have to always do a join query on the primary key and use the business key as your look up. The reason is because most users won't know the primary key value to do a lookup in the FACT table. They will, however, likely know the business key. Therefore to use that business key you'd have to do a join query to make the relationship.
Since the business key is indexed anyway, would it be better to just use that as the foreign key in the FACT table? That way you wouldn't have to do a join and just do your lookup or calculations directly?
I guess it boils down to whether join queries are that expensive? Imagine you're dealing with a billion record FACT table and dimensions with tens of millions of records.
Example tables:
DimStores:
+------------+-------------+-------------+
| StoreId | StoreNumber | StoreName |
+------------+-------------+-------------+
| 1 | S001 | Los Angeles |
| 2 | S002 | New York |
+------------+-------------+-------------+
DimCustomers:
+------------+----------------+--------------+
| CustomerId | CustomerNumber | CustomerName |
+------------+----------------+--------------+
| 1 | S001 | Michael |
| 2 | S002 | Kareem |
| 3 | S003 | Larry |
| 4 | S004 | Erving |
+------------+----------------+--------------+
FactSales:
+---------+------------+------------+
| StoreId | CustomerId | SaleAmount |
+---------+------------+------------+
| 1 | 1 | $400 |
| 1 | 2 | $300 |
| 2 | 3 | $200 |
| 2 | 4 | $100 |
+---------+------------+------------+
In the above to get the total sales for the Los Angles store I'd have to do this:
Select Sum(SaleAmount)
From FactSales FT
Inner Join DimStores D1 ON FT.StoreId = D1.StoreId
Where D1.StoreNumber = 'S001'
Had I used the "StoreNumber" and "CustomerNumber" fields as the foreign keys instead in the "FactSales" table. I wouldn't have had to do a join query and could have directly done this instead:
Select Sum(SaleAmount)
From FactSales
Where StoreNumber = 'S001'
The reason you use artificial primary keys is to isolate the data warehouse from business decisions.
Your business grows. Now you have more than 1000 stores. The keys for the stores change. How do you handle this?
If the store key is spread throughout your data warehouse, this is a painful operation. If the store key is just an attribute on a dimension table, then this is easy.
I should also note that in many cases, the dimensions might be type 2 dimensions -- meaning that they change over time. For instance, customers can change their names, but you might want to know what their name was at a particular point in time.
And a third reason. Artificial primary keys are usually integers. These are better for indexing than strings (particularly strings with variable lengths). The difference in performance is minor, but it is a reason to use the primary keys. In fact, if the keys are strings and are longer than integers, it might be more efficient to use the artificial keys in terms of space.

Trying to find non-duplicate entries in mostly identical tables(access)

I have 2 different databases. They track different things about inventory. in essence they share 3 common fields. Location, item number and quantity. I've extracted these into 2 tables, with only those fields. Every time I find an answer, it doesn't get all the test cases, just some of the fields.
Items can be in multiple locations, and as a turn each location can have multiple items. The primary key would be location and item number.
I need to flag when an entry doesn't match all three fields.
I've only been able to find queries that match an ID or so, or who's queries are beyond my comprehension. in the below, I'd need a query that would show that rows 1,2, and 5 had issues. I'd run it on each table and have to verify it with a physical inventory.
Please refrain from commenting on it being silly having information in 2 different databases, All I get in response it to deal with it =P
Table A
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 5
1a1b | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1d | as1005 | 15
Table B
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 10
1a1d | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1e | as1005 | 15
This article seemed to do what I wanted but I couldn't get it to work.
To find entries in Table A that don't have an exactly matching entry in Table B:
select A.*
from A
left join B on A.location = B.location and A.ItemNum = B.ItemNum and A.qty = B.qty
where B.location Is Null
Just swap all the A's and B's to get the list of entries in B with no matching entry in A.

MySQL query for initial filling of order column

Sorry for vague question title.
I've got a table containing huge list of, say, products, belonging to different categories. There's a foreign key column indicating which category that particular product belongs to. I.e. in "bananas" row category might be 3 which indicates "fruits".
Now I added additional column "order" which is for display order within that particular category. I need to do initial ordering. Since the list is big, I dont wanna change every row by hand. Is it possible to do with one or two queries? I dont care what initial order is as long as it starts with 1 and goes up.
I cant do something like SET order = id because id counts from 1 up regardless of product category and order must start anew from 1 up for every different category.
Example of what I need to achieve:
ID | product | category | Order
1 | bananas | fruits | 1
2 | chair | furniture | 1
3 | apples | fruits | 2
4 | cola | drinks | 1
5 | mango | fruits | 3
6 | pepsi | drinks | 2
(category is actually a number because it's foreign key, in example I put names just for clarification)
As you see, order numbers start anew from 1 for each different category.
Sounds like something a SQL procedure would be handy for.
Why not just set the order to the category? That is, why not:
update Table
set SortOrder = Category;
As an aside, you cannot have a column named order -- that is a reserved word in SQL.