Using 'LIKE' in a join (Replacing CASE with dimension table) - sql

I've successfully prototyped a system that is working quite well and it is now time for me to go back and clean some things up before proceeding - as per suggestion from my senior.
In a generic sense, we're using views to only give us customers from one company and group them into one parent company. For example, grouping 'Tesco UK & Ireland' as company 'Tesco'.
I do this with:
CASE
WHEN CustName = 'Tesco UK & Ireland' THEN 'Tesco'
ELSE CustName
END
However, there is one issue with this approach (which works until you need to incorporate the grouping as a dimension table). Some companies have more specific names that if I were to go through all of them would have a case statement worth hundreds of lines; and some other times the customer names aren't being uploaded correctly. Another example with a random company: 'PC World' is what i'd expect, although sometimes I'm given 'Currys PC World', 'PC World Glasgow' and different variations of that. So to combat this I've tried:
CASE
WHEN CustName LIKE 'Tesco UK & Ireland' THEN 'Tesco'
WHEN CustName LIKE '%PC World%' THEN 'PC World Other'
END CustName
END
However, I was wondering if there is a way to incorporate this in a dimension/mapping table?
Ideally, I'd like to join on CustName to a dimension table and be given the generic name.
Any ideas?
Paul.

I would suggest that you maintain a mapping table with two (important) columns, one for the original name and one for the mapped name. In queries, you would use a left join.
Here is an example:
create table CompanyNameSynonyms (
CompanyNameSynonymId identity(1, 1) primary key,
CompanyName varchar(255) unique,
MappedName varchar(255),
CreatedAt datetime default getdate()
);
Then a query would like like:
select coalesce(cns.MappedName, t.CompanyName) as Name, count(*)
from t left join
CompanyNameSynonyms cns
on t.CompanyName = cns.CompanyName
group by coalesce(cns.MappedName, t.CompanyName);
You do need to populate this with all examples of the alternative names, and then keep the data up-to-date. However, I consider this a benefit for three reasons. First, being explicit is usually a good idea for such reports, to avoid unnecessary confusion. Second, the join using = is faster than join using like with wildcards. Third, all the code that uses this table will be updated when the table is updated.

Related

Display correct results in many to many relatonship tables

I currently have three tables.
master_tradesmen
trades
master_tradesmen_trades (joins the previous two together in a many-to-many relationship). The 'trade_id' and 'master_tradesman_id' are the foreign keys.
Here is what I need to happen. A user performs a search and types in a trade. I need a query that displays all of the information from the master_tradesmen table whose trade in the master_tradesmen_trade table matches the search. For example, if 'plumbing' is typed in the search bar (trade_id 1), all of the columns for Steve Albertsen (master_tradesman_id 6) and Brian Terry (master_tradesman_id 8) would be displayed from the master_tradesmen table. As a beginner to SQL, trying to grasp the logic of this is about to make my head explode. I'm hoping that someone with more advanced SQL knowledge can wrap their head around this much easier than I can. Note: the 'trades' column in master_tradesmen is for display purposes only, not for querying. Thank you so much in advance!
You have a catalog for the tradesmen, & another catalog for the trades.
The trades should only appear once in the trades catalog in order to make your DB more consistent
Then you have your many-to-many table which connects the trades & master tradesmen tables.
If we want to get the tradesmen according to the given trade in the input, we should first
know the id of that trade which has to be unique, so in your DB you would have something
like the img. below :
Now we can make a query to select the id of trade :
DECLARE #id_trade int = SELECT trade_id FROM trades WHERE trade_name LIKE '%plumbing%'
Once we know the trading id, we can redirect to the 'master_tradesmen_trades' table to know the name of the people how work that trade :
SELECT * FROM master_tradesmen_trades WHERE trade_id = #id_trade
You will get the following result :
You may say, 'But there is still something wrong with it, as i am still not able to see the tradesmen', this is the moment when we make an inner join:
SELECT * FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade
IF you need to see specific columns, you can do :
SELECT first_name, last_name, city, state FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade

Which of these is preferable? Adding a column to the table or using sub-query to get data?

What I meant was, there's a Table A having 5 columns. I have a SP that I use to get 3 columns from Table A and one column from Table B. Now, would it be better to add the column from Table B to Table A or use a sub-query in that SP to get that column from Table B?
Your question is still confusing and it very much looks like you haven't understood how to use a relational database. So let me try to explain:
Let's say you have two tables:
client
client_number
first_name
last_name
date_of_birth
...
order
order_number
client_number
order_date
...
These are two separate tables, so as to have a normalized relational database. The order table contains the client number, so you can look up the clients name and date of birth in the client table. The date of birth may be important in order know whether the client is allowed to order certain articles. But you don't want to store the date of birth with every order - it doesn't change.
If you want to look up the age you can use a sub query:
select
order_number,
order_date,
quantity,
(
select date_of_birth
from client c
where c.client_number = o.client_number
)
from order o
where item = 'whisky';
but most often you would simply join the tables:
select
o.order_number,
o.order_date,
o.quantity,
c.date_of_birth,
c.first_name,
c.last_name
from order o
join client c on c.client_number = o.client_number;
You would however not change your tables and invite redundancy with all its problems only not to have to join. You design your database such that it is a well-formed relational database, not such that it makes your latest query easy to write. It is very, very common to use joins and subqueries, and having to use them usually shows that you built your database well.
I think this a good time you look up database normalization, e.g. in Wikipedia https://en.wikipedia.org/wiki/Database_normalization.

SQL Access with Table

It wont let me upload image but columns are OrderID, CustomerName, CustomerAddress, ProductNumber, SellDate, ProductDescription
I am trying to teach myself SQL. Could someone please help me identify a few things?
1) I want to write a SQL statement that retrieves the customer name and address of the customer that placed order 7.
Is this right?
Select CustomerName, Address
From Order
Where OrderID = ‘7’
2)Next I want to write an SQL statement that adds a new order to the Order table.
Is this right?
INSERT INTO order(OrderID, CustomerName, CustomerAddress, ProductNumber, SellDate, ProductDescription)
VALUES (8, 'Ben C', '12 Kents Road', 01/15/2012, Clay :));
3) What is wrong with this data model and how would you redesign it? I really need help here. Does it need to be sorted? How could I describe a new high level design?
4) How would I move this data from an old model to a new model?
5)Using the new data model, I need to write a JOIN that retrieves the customer name and address of the customer that placed order 7. I have not gotten here yet because I am not sure why the old data model is bad.
First, you need to answer a question:
Can a customer place more than one order? If your answer is 'yes', would you like to have a customer catalog?
In this scenario, you need to normalize your database. First of all, you need to separate the data into logical sets; in this case, Customers, Products and Orders... I will asume that an order can have one or more products.
Then, design your tables (I will use MySQL style for the code):
Your customers catalog:
create table tbl_customers (
customerId int not null primary key,
customerName varchar(100),
customerAdress varchar(200)
);
Your products catalog:
create table tbl_products (
productNumber int not null primary key,
productName varchar(100),
);
Your orders catalog:
create table tbl_orders (
orderId int not null primary key,
orderDate date,
customerId int unsigned not null
);
For each order, you will need to know how many 'units' of which products you will be ordering:
create table tbl_orders_products (
orderProductId int not null primary key,
orderId int not null,
productNumber int not null,
units int,
);
After this, you will populate your tables with your data, and then you can perform whichever query fits you.
A few notes:
tbl_orders is related with tbl_customers... your customer's data will have to be inserted in tbl_customers before he can place an order.
Before you insert the order's details, you will need to create the order
Before you insert the order's details, you will need to populate tbl_products
This is just a way to solve it.
Hope this helps you
Now, if you want to move to this model, you have some work to do:
Populate your products catalog: insert into tbl_products values (1,'productA'), (2, 'productB'), ...
Populate your customers catalog
Then you can start placing your orders. I'll asume that you have the following customers:
customerId | customerName | customerAdress
---------------------------------------------
1 | John Doe | 31 elm street
2 | Jane Doe | 1223 park street
... and products:
productNumber | productName
------------------------------
1000 | Pencil
2000 | Paper clip
3000 | Bottled water
Now, placing an order is a two-step process: first, create the order record, and then insert the order details:
The order (Customer John Doe): insert into tbl_orders values (1, '2012-10-17', 1);
The order details (one pencil, ten paper clips): insert into tbl_orders_products values (1, 1, 1000, 1), (2, 1, 2000, 10);
Now, to select the customer for order seven (as stated in your question), you can use the following query:
select c.*
from tbl_customers as c
inner join tbl_orders as o on c.customerId = o.customerId
where o.orderId = 7;
This is just a start point. You should look for good books or online tutorials (w3 tutorials can be a good online 'place' to start).
Although I don't quite like MS Access, it's a good program to learn the basics of sql, and there're a lot of books and learning resources for it. Note: I don't like Access, and I don't mean to advertise it, but it might be a good learning tool.
First you need to normalise, there 's a lot of stuff around that, but loads of tutorials that try and take some common sense and make it really obscure
Looking at your column names I see three tables
Customers(CustomerID, CustomerName, CustomerAddress)
CustomerOrders(OrderID, CustomerID, SellDate, ProductNumber) Try not to name your tables and columns and such the same as Sql keywords.
Products(ProductNumber, ProductDescription)
Normalisation says things like, you should be able to uniquely identify any records in the table, you had that with OrderId. When I split the tables up I added CustomerID, because you could have more than one customer with the same name.
Another simple rule is in your structure, if you had more than one order for a customer, you would be storing their name and address more than once, which is wasteful, but the real problem, is what if that customer changes address? How do you find which rows to change, you could do Where name = "Fred" and Address = "Here", but you don't know if you have more than one customer called Fred with an address of Here.
So you first query would be a join
Select Customers.CustomerName,Customers.CustomerAddress From Customers
Inner join CustomerOrders On Customers.CustomerID = CustomerOrders.CustomerID
Where CustomerOrders.OrderID = 7
Or if you want to skip past learning joins for now, you could do it with two queries
Select CustomerID From CustomerOrders Where OrderID = 7
then
Select CustomerName,CustomerAddress From Customers Where CustomerID = ?
You should be using joins, but you might find sub-query a little easier to get your head round. You can do both queries at once with
Select CustomerID From CustomerOrders
Where CustomerID In (Select CustomerID From CustomerOrders Where OrderID = 7)
Don't know far you've got with sql table creation, but Primary and Foreign keys is two things to look at. That will let you put some rules in the database. A primary Key on CustomerOrders will stop you having two orders with the same ID, which would be bad.
A foreign Key would stop you creating a CustomerOrder for a customer that did not exist, and one to products for a product that doesn't.
Assuming you went down that route and you were looking to write an application to order things.
You'd probably have a function to maintain Customers which would add them with something like
Insert Into Customers(CustomerID,CustomerName,CustomerAddress) Values(1,'Fred Bloggs','England')
And one For Products
Insert Into Products(ProductNumber,ProductDescription) Values(1,'A thingamabob')
Then you'd choose a customer, so you have it's id, select a product so you have it's number, so you don't need to mess with CustomerName, CustomerAddress or ProductDescription
and
Insert Into CustomerOrders(OrderID,CustomerID,ProductNumber,SellDate) Values(1,1,1,'20121017')
Note the date format, if you are going to pass dates as strings to sql, (another topic this) do them in YYYYMMDD, when you get them back out with a select, you'll get them in the format your machine/database is set up for which in your case seems to be mm/dd/yyyy. The problem is I deduced that because I know there are only twelve months in the year. (One of the things that makes me a senior dev :) If your example selldate had been 1/5/2012, I'd have read that as the 1st May, because I'm configured for English. Avoid that ambiguity at all costs it will reach out and hurt you on a regular basis.
PS the way you did it 1/15/2012 would be treated as a mathematical expression as in 1 divided by 15 ...
So the reason you couldn't write a join is basically you only had one table. Join is join one table to another. Well actually it's a bit more complex than that, but that's a good way past where you are in the learning curve.
As for moving the data, be quicker to start again I should think. Unlikely you have created two different customers with the same name, but the queries to move the data, would have to take into account that you could have.
To move the data, assuming CustomerID is an Identity (Autonumber) column
Something like
Insert into Customers(CustomerName,CustomerAddress)
Select Distinct CustomerName,CustomerAddress From [Order]
Would do the job for Customers.
Then for products
Insert into Products(ProductDescription)
Select Distinct ProductDescription From [Order]
Then
Insert into CustomerOrders(OrderID,CustomerID,ProductNumber,SellDate)
Select old.OrderID,c.CustomerID,p.ProductNumber,old.SellDate
From [Order] old
Inner Join Products p On p.ProductDesription = old.ProductDescription
Inner Join Customers c On c.CustomerName = old.CustomerName And c.CustomerAddress = old.CustomerAddress
might do CustomerOrders I think
A simple tip. When modelling a data solution, try to write down simple sentences that describe the scenario. For example (ok, it is just a basic one):
An order is made up of many order lines
An order line refers a product
A customer create many orders
Here, the nouns describe the entities of your scenario. Then, for each entity, try to describe its property:
An order is characterized by a unique serial number, a date, a total. It refers a customer.
An order line refers to a product, and is characterized by a quantity, a unit price, a sub total
A customer....
An so on.
Well, in your model you roughly have to create a table for each entity. The table fields are taken from the property of each entity. For each field remeber to define the proper data type.
Ok, this is NOT a modelling tutorial, but it is a starting point, just to approach the solution.

Where are Cartesian Joins used in real life?

Where are Cartesian Joins used in real life?
Can some one please give examples of such a Join in any SQL database.
just random example. you have a table of cities: Id, Lat, Lon, Name. You want to show user table of distances from one city to another. You will write something like
SELECT c1.Name, c2.Name, SQRT( (c1.Lat - c2.Lat) * (c1.Lat - c2.Lat) + (c1.Lon - c2.Lon)*(c1.Lon - c2.Lon))
FROM City c1, c2
Here are two examples:
To create multiple copies of an invoice or other document you can populate a temporary table with names of the copies, then cartesian join that table to the actual invoice records. The result set will contain one record for each copy of the invoice, including the "name" of the copy to print in a bar at the top or bottom of the page or as a watermark. Using this technique the program can provide the user with checkboxes letting them choose what copies to print, or even allow them to print "special copies" in which the user inputs the copy name.
CREATE TEMP TABLE tDocCopies (CopyName TEXT(20))
INSERT INTO tDocCopies (CopyName) VALUES ('Customer Copy')
INSERT INTO tDocCopies (CopyName) VALUES ('Office Copy')
...
INSERT INTO tDocCopies (CopyName) VALUES ('File Copy')
SELECT * FROM InvoiceInfo, tDocCopies WHERE InvoiceDate = TODAY()
To create a calendar matrix, with one record per person per day, cartesian join the people table to another table containing all days in a week, month, or year.
SELECT People.PeopleID, People.Name, CalDates.CalDate
FROM People, CalDates
I've noticed this being done to try to deliberately slow down the system either to perform a stress test or an excuse for missing development deliverables.
Usually, to generate a superset for the reports.
In PosgreSQL:
SELECT COALESCE(SUM(sales), 0)
FROM generate_series(1, 12) month
CROSS JOIN
department d
LEFT JOIN
sales s
ON s.department = d.id
AND s.month = month
GROUP BY
d.id, month
This is the only time in my life that I've found a legitimate use for a Cartesian product.
At the last company I worked at, there was a report that was requested on a quarterly basis to determine what FAQs were used at each geographic region for a national website we worked on.
Our database described geographic regions (markets) by a tuple (4, x), where 4 represented a level number in a hierarchy, and x represented a unique marketId.
Each FAQ is identified by an FaqId, and each association to an FAQ is defined by the composite key marketId tuple and FaqId. The associations are set through an admin application, but given that there are 1000 FAQs in the system and 120 markets, it was a hassle to set initial associations whenever a new FAQ was created. So, we created a default market selection, and overrode a marketId tuple of (-1,-1) to represent this.
Back to the report - the report needed to show every FAQ question/answer and the markets that displayed this FAQ in a 2D matrix (we used an Excel spreadsheet). I found that the easiest way to associate each FAQ to each market in the default market selection case was with this query, unioning the exploded result with all other direct FAQ-market associations.
The Faq2LevelDefault table holds all of the markets that are defined as being in the default selection (I believe it was just a list of marketIds).
SELECT FaqId, fld.LevelId, 1 [Exists]
FROM Faq2Levels fl
CROSS JOIN Faq2LevelDefault fld
WHERE fl.LevelId=-1 and fl.LevelNumber=-1 and fld.LevelNumber=4
UNION
SELECT Faqid, LevelId, 1 [Exists] from Faq2Levels WHERE LevelNumber=4
You might want to create a report using all of the possible combinations from two lookup tables, in order to create a report with a value for every possible result.
Consider bug tracking: you've got one table for severity and another for priority and you want to show the counts for each combination. You might end up with something like this:
select severity_name, priority_name, count(*)
from (select severity_id, severity_name,
priority_id, priority_name
from severity, priority) sp
left outer join
errors e
on e.severity_id = sp.severity_id
and e.priority_id = sp.priority_id
group by severity_name, priority_name
In this case, the cartesian join between severity and priority provides a master list that you can create the later outer join against.
When running a query for each date in a given range. For example, for a website, you might want to know for each day, how many users were active in the last N days. You could run a query for each day in a loop, but it's simplest to keep all the logic in the same query, and in some cases the DB can optimize the Cartesian join away.
To create a list of related words in text mining, using similarity functions, e.g. Edit Distance

Query to get all revisions of an object graph

I'm implementing an audit log on a database, so everything has a CreatedAt and a RemovedAt column. Now I want to be able to list all revisions of an object graph but the best way I can think of for this is to use unions. I need to get every unique CreatedAt and RemovedAt id.
If I'm getting a list of countries with provinces the union looks like this:
SELECT c.CreatedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.CreatedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
UNION
SELECT c.RemovedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.RemovedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
For more complicated queries this could get quite complicated and possibly perform very poorly so I wanted to see if anyone could think of a better approach. This is in MSSQL Server.
I need them all in a single list because this is being used in a from clause and the real data comes from joining on this.
You have most likely already implemented your solution, but to address a few issues; I would suggest considering Aleris's solution, or some derivative thereof.
In your tables, you have a "removed at" field -- well, if that field were active (populated), technically the data shouldn't be there -- or perhaps your implementation has it flagged for deletion, which will break the logging once it is removed.
What happens when you have multiple updates during a reporting period -- the previous log entries would be overwritten.
Having a separate log allows for archival of the log information and allows you to set a different log analysis cycle from your update/edit cycles.
Add whatever "linking" fields required to enable you to get back to your original source data OR make the descriptions sufficiently verbose.
The fields contained in your log are up to you but Aleris's solution is direct. I may create an action table and change the field type from varchar to int, as a link into the action table -- forcing the developers to some standardized actions.
Hope it helps.
An alternative would be to create an audit log that might look like this:
AuditLog table
EntityName varchar(2000),
Action varchar(255),
EntityId int,
OccuranceDate datetime
where EntityName is the name of the table (eg: Contries, Provinces), the Action is the audit action (eg: Created, Removed etc) and the EntityId is the primary key of the modified row in the original table.
The table would need to be kept synchronized on each action performed to the tables. There are a couple of ways to do this:
1) Make triggers on each table that will add rows to AuditTable
2) From your application add rows in AuditTable each time a change is made to the repectivetables
Using this solution is very simple to get a list of logs in audit.
If you need to get columns from original table is also possible using joins like this:
select *
from
Contries C
join AuditLog L on C.Id = L.EntityId and EntityName = 'Contries'
You could probably do it with a cross join and coalesce, but the union is probably still better from a performance standpoint. You can try testing each though.
SELECT
COALESCE(C.CreatedAt, P.CreatedAt)
FROM
dbo.Countries C
FULL OUTER JOIN dbo.Provinces P ON
1 = 0
WHERE
C.LocalID = #Country OR
P.LocalID = #Country