Process data representing relationships between a web of entities - sql

Here are the facts:
There are many companies.
Each company can have many businesses.
There are many addresses.
You don't know which businesses are owned by which companies (or the name of the company).
However, you do know the address of each business and you know a business might trade at more than one address.
Forming relationships between addresses:
If a business has the same address as another then, for the purpose of this question, we will say that they are owned by the same company.
ie A link is formed between two addresses when a business uses both addresses.
So ,an address "A" might be linked to many other addresses.
Note that:
6a. the addresses that address "A" links to might also be linked to one OR MORE addresses.
6b. One of the addresses ""A" links to might link back to "A" via a third address (ie two business that use both these addresses)
A complex example of this is shown in the picture attached. In this picture, there are only two companies. One has the red business, the other has the blue, green and black business.
Here is some example data in tableBA ( I have attached a photo to describe these relationships)
BUSINESS Address
A 1
A 2
B 1
C 3
D 4 < four businesses sharing the same address
E 4
F 4
G 4
W 2
W 5
X 5
X 6
So I want to created code that will create the following output. The output has one name per company and lists the business names that are in the company.
ie there is one row for every complete chain of addresses
A,B,W,X
D,E,F,G
C
This question is an simplification/ improvement on another SO question here.
This answer to the other question uses a combination of SQL and VBA code to solve the problem, because MS Access doesn't support recursive joining.
How can this be done with pure SQL, either with recursive joining or some other technique (not with a stored procedure)?

This is an SQL Server answer.
Quote from this answer:
To clarify:
A business can have multiple addresses
Any business in a business group (AKA company) shares an address with any other business in the group
A business group can own multiple businesses
Each business is only associated with one business group (corollary to second point)
SQL Fiddle with sample data
We can refer to each business group by the first (smallest name in
alphabetical order) business in the group. Let's call this the key
business. After we've identified the key business for each business,
we can group by the key business and get the results.
In order to get the key business:
Generate a list of pairs of businesses where both businesses are in the same group, based on any shared address. This list should exclude
the following (see next point for why):
A -> B, when we have B -> A
A -> A
The left side of the pairs should be unique: each business should appear on the left side of the pair no more than once, if at all.
For each business, follow the pairs from one to the next until the right business is never the left business in any other pair. That is
the key business.
That is the reason for the exclusions in the first point. If we have both A -> B and B -> A, we'll get to a never-ending loop.
Same goes for A -> A.
The following query will return the key business for each business:
WITH Pairs AS (
SELECT Businesses.Business AS Business2, MIN(Businesses_1.Business) AS Business1
FROM Businesses
INNER JOIN Businesses AS Businesses_1 ON Businesses.Address = Businesses_1.Address
WHERE Businesses.Business > Businesses_1.Business
GROUP BY Businesses.Business
),
KeyBusinesses AS (
SELECT Business2 AS Business, Business1 AS KeyBusiness
FROM Pairs
UNION ALL
SELECT Pairs.Business2, KeyBusinesses.KeyBusiness
FROM Pairs
INNER JOIN KeyBusinesses ON Pairs.Business1 = KeyBusinesses.Business
)
SELECT Businesses.*, ISNULL(KeyBusinesses.KeyBusiness, Businesses.Business) AS KeyBusiness
FROM Businesses
LEFT JOIN KeyBusinesses ON Businesses.Business = KeyBusinesses.Business
SQL Fiddle

This is a crude and crappy solution (probably not efficient), but it gets the correct results. It works by maintaining 2 maps to a Company (where a Company contains a list of business-names). First maps business-address to company, 2nd maps business-name to company. if the company is NOT found in either map, a new company is created: -
package test;
import org.junit.Assert;
import org.junit.Test;
import java.util.*;
import java.util.stream.Collectors;
public class Companies {
public static List<Company> listCompanies(List<Business> businesses) {
List<Company> companies = new ArrayList<>();
Map<String, Company> companyByAddress = new HashMap<>(); // map allows many addresses to map to same company
Map<String, Company> companyByBusiness = new HashMap<>(); // map allows many businesses to map to same company
for (Business business : businesses) {
Company company = companyByAddress.get(business.address);
if (company == null)
company = companyByBusiness.get(business.name);
if (company != null) {
company.addAddress(business.name);
} else {
company = new Company();
company.addAddress(business.name);
companies.add(company);
}
companyByBusiness.put(business.name, company);
companyByAddress.put(business.address, company);
}
return companies;
}
#Test
public void testOne() {
List<Business> businesses = new ArrayList<>();
businesses.add(new Business("A", "1"));
businesses.add(new Business("A", "2"));
businesses.add(new Business("B", "1"));
businesses.add(new Business("C", "3"));
businesses.add(new Business("D", "4"));
businesses.add(new Business("E", "4"));
businesses.add(new Business("F", "4"));
businesses.add(new Business("G", "4"));
businesses.add(new Business("W", "2"));
businesses.add(new Business("W", "5"));
businesses.add(new Business("X", "5"));
businesses.add(new Business("X", "6"));
List<Company> companies = listCompanies(businesses);
Assert.assertEquals("A, B, W, X", companies.get(0));
Assert.assertEquals("C", companies.get(1));
Assert.assertEquals("D, E, F, G", companies.get(2));
}
static class Business {
private final String name;
private final String address;
Business(String business, String address) {
this.name = business;
this.address = address;
}
}
static class Company {
private final Set<String> addresses; // Being a "Set", each address will occur only once
Company() {
this.addresses = new LinkedHashSet<>(); // A "LinkedHashSet" preserves insertion order
}
void addAddress(String address) {
addresses.add(address);
}
#Override
public String toString() {
return addresses.stream().collect(Collectors.joining(", "));
}
}
}

Related

How to choose how to fetch relationships from a relational database?

I know of at least 4 ways of fetching relationships from a relational database.
I tried to make the examples generic to any language.
I'd like to know some algorithm of choosing one over the other besides manually testing the query.
Method A: has the problem of having to loop twice to build the result. also can't process one row at a time one by one without building arrays.
for contact in query("SELECT id, name FROM contact") {
contacts[contact.id]["name"] = contact.name
push(ids, contact.id)
}
for email in query("SELECT contact_id, address FROM email WHERE contact_id IN ?", ids) {
push(contacts[email.contact_id]["emails"], email.address)
}
Method B: Has the problem of a cartesian join in case of more joins
for contact in query("SELECT c.id, c.name, e.address FROM contact c JOIN email e ON e.contact_id = c.id") {
contacts[contact.id]["name"] = contact.name
push(contacts[email.contact_id]["emails"], email.address)
}
Method C: Has a problem in that GROUP_CONCAT is limited to a certain number of bytes and might be cut off
for contact in query("SELECT c.id, c.name, GROUP_CONCAT(e.address) AS addresses FROM contact c JOIN email e ON e.contact_id = c.id GROUP BY c.id") {
contacts[contact.id]["name"] = contact.name
contacts[contact.id]["emails"] = split(",", contact.addresses)
}
Method D: Has the N+1 query problem, runs a query for each contact
contactStatement = prepare("SELECT id, name FROM contact")
emailStatement = prepare("SELECT address FROM email WHERE contact_id = ?")
for contact in contactStatement.query() {
contacts[contact.id]["name"] = contact.name
// this uses a prepared query
for email in emailStatement.query(contact.id) {
push(contacts[contact.id]["emails"], email.address)
}
}
Or maybe there is a way that is ideal in every case?
I have many years of experience using various ORM's and I'd like to avoid using one.
Method A is what most ORM's default to.
Method D is what seems like the solution that SQL designers expect us to use.
I'm asking for some help in how to find a logical way of picking one method over another for a specific SQL query.

How do I flatten a hierarchy in LINQ to Entities?

I have a class Org, which has ParentId (which points to a Consumer) and Orgs properties, to enable a hierarchy of Org instances. I also have a class Customer, which has a OrgId property. Given any Org instance, named Owner, how can I retrieve all Customer instances for that org? That is, before LINQ I would do a 'manual' traversal of the Org tree with Owner as its root. I'm sure something simpler exists though.
Example: If I have a root level Org called 'Film', with Id '1', and sub-Org called 'Horror' with ParentId of '1', and Id of 23, I want to query for all Customers under Film, so I must get all customers with OrgId's of both 1 and 23.
Linq won't help you with this but SQL Server will.
Create a CTE to generate a flattened list of Org Ids, something like:
CREATE PROCEDURE [dbo].[OrganizationIds]
#rootId int
AS
WITH OrgCte AS
(
SELECT OrganizationId FROM Organizations where OrganizationId = #rootId
UNION ALL
SELECT parent.OrganizationId FROM Organizations parent
INNER JOIN OrgCte child ON parent.Parent_OrganizationId = Child.OrganizationId
)
SELECT * FROM OrgCte
RETURN 0
Now add a function import to your context mapped to this stored procedure. This results in a method on your context (the returned values are nullable int since the original Parent_OrganizationId is declared as INT NULL):
public partial class TestEntities : ObjectContext
{
public ObjectResult<int?> OrganizationIds(int? rootId)
{
...
Now you can use a query like this:
// get all org ids for specific root. This needs to be a separate
// query or LtoE throws an exception regarding nullable int.
var ids = OrganizationIds(2);
// now find all customers
Customers.Where (c => ids.Contains(c.Organization.OrganizationId)).Dump();
Unfortunately, not natively in Entity Framework. You need to build your own solution. Probably you need to iterate up to the root. You can optimize this algorithm by asking EF to get a certain number of parents in one go like this:
...
select new { x.Customer, x.Parent.Customer, x.Parent.Parent.Customer }
You are limited to a statically fixed number of parent with this approach (here: 3), but it will save you 2/3 of the database roundtrips.
Edit: I think I did not get your data model right but I hope the idea is clear.
Edit 2: In response to your comment and edit I have adapted the approach like this:
var rootOrg = ...;
var orgLevels = new [] {
select o from db.Orgs where o == rootOrg, //level 0
select o from db.Orgs where o.ParentOrg == rootOrg, //level 1
select o from db.Orgs where o.ParentOrg.ParentOrg == rootOrg, //level 2
select o from db.Orgs where o.ParentOrg.ParentOrg.ParentOrg == rootOrg, //level 3
};
var setOfAllOrgsInSubtree = orgLevels.Aggregate((a, b) => a.Union(b)); //query for all org levels
var customers = from c in db.Customers where setOfAllOrgsInSubtree.Contains(c.Org) select c;
Notice that this only works for a bounded maximum tree depth. In practice, this is usually the case (like 10 or 20).
Performance will not be great but it is a LINQ-to-Entities-only solution.

Convert multi-rows values into Collection(List) in LINQ

I am struggling with converting multi-rows values which are belong to the same user into collection.
Here is a simple scenario.
users Table:userid, password
address Table:address, userid
Users Table and Address Table are one-to-many related--one user might have multi-addresses.
Assume the User's ID is 1001 while he/she have two addresses one is in Auckland and another one is Wellington.
I would like select both of them together with user's id.
1001 Auckland
1001 Wellington
So the question is are there any approach is able to put these two value into collection like list.
public class UserDetails{
private List<String> _Address
public string userid{get;set;}
public List<String> Address{
get{retrun _Address;}
set{_Address=value;}
}
}
var user_address= from _user in users
join _address in address on _user.userid=_address.userid
select new userDetails{
userid=_user.userid
**Address.add()**
};
Does anyone know how to construct the List in the LINQ and call the add method.
I want to put the list object into one row so that avoid the redundancy of userid.
Thanks for your help.
Maybe something like this:
var user_address= from _user in users
select new userDetails{
userid=_user.userid,
Address=(from _address in address
where _user.userid=_address.userid
select _address.:address
).ToList()
};
You do not have to join the address table.

Simple Linq-to-entities query involving .Include I believe

I have a Linq-to-Entities query that is not complicated but requires an .include and/or projection and/or join because it must be executed in one pass.
Here is my database (Microsoft SQL Server 2008):
Table A (Customers) (contains CustomerID (customer IDs), and ZipCode (zip codes) as strings.
Table C (Categories) (contains CategoryID (categories) like "food", "shelter","clothing", "housing" (primary keys).
Table A_C is a linking table, since Tables A and C are linked as many-to-many: contains just two fields: CustomerID "customer IDs" and CategoryID (Categories), in combination as primary keys. This table is a linking table betweeen tables A and C.
Here is my query, that must be executed in just one trip to the database: I need to select all records in Table A that satisfy a condition, then filter these records depending on a 'list of parameters' that are found in the linking Table A_C--and do this all in one trip to the database. But I don't know what the length or composition of the list of parameters for Table A_C is, ahead of time--it varies from call to call. Thus this list of parameters varies method call by method call.
To give a more concrete example:
Table A has a list of customer IDs. I find the customers that live in a certain Zip code. Then, in the same SQL query, I need to find which of these customers have selected certain categories: Food, Clothing, Housing, etc, but my web method does not know ahead of time what these categories are, rather, they are passed as a list to the method: List myCategoryList (which could be 1 category or 100 categories, and varies method call by method call).
How do I write the projection using Linq-to-Entities? When the list of parameters varies? And do it all in one pass?
List<string> CategoryList = new List<string>() { "Food", "Shelter", "Housing" }; // in one call to the web service method
List<string> CategoryList = new List<string>() { "Food", "Clothing" }; //could be a second call--it varies and I don't know ahead of time what the List will be
So how can I do the SQL query using Linq-to-Entities? In one pass? (Of course I could loop through the list, and make repeated trips to the database, but that's not an optimal solution I am told). Projection,.Include are keywords but surfing the net yielded nothing.
Here is a crude guess, just to get ball rolling:
public void WebMethod1 (CategoryList)
{
using (EntityFramework1 context = new EntityFramework1())
{
/* assume CategoryList is a list of strings passed into the method and is,for this particular call,something like: List<string> CategoryList = new List<string>() { "Food", "Clothing" }; for this call, but in the next call it could be: List<string> CategoryList = new List<string>() { "Food", "Shelter", "Housing" } */
string ZipCodeString = "12345";
string customerIDString = "E12RJ55";
var CustomersFromZipCodeHavingSelectedCertainCategories = from x in context.A_C
where x.A.CustomerID == customerIDString
where x.A.StartsWith(ZipCodeString)
where x.A_C.Contains(CategoryList) //???? This is clearly not grammatical, but what is?
select x;
}
/*
my problem is: I want to filter all records from A that contain a zipcode 12345, and that also have a certain CustomerID "E12RJ55" from table A, but further filter this set with all such CustomerIDs in linking table A_C that contain the categories "Food" and "Clothing".
How to do this in one pass? I can do this quite easily in multiple passes and trips to the database using code, but somebody in this thread here http://bit.ly/rEG2AM suggested I do a Join/projection and do it all in one fell swoop.
*/
I will also accept SQL answers since it might help yield a solution. This question btw is not difficult I believe--but I could not find an answer on the net.
EDIT: with answer and credit to david s.
I thank you for the answer david.s. Here is what worked, slightly different than the answer by david.s, in that I am using the linking table (bridge table) called “Customer_Categories” that is between the table Customer and Categories and contains the primary key of each (as is required for many-to-many relationships). This bridge table is what I called "A_C" in my original answer, and here has ints rather than strings but is the same thing. Intellisense picked up this table and I used it, and it works. Also keep in mind that CategoryList is a list of ints, List CategoryList = new List();, yet amazingly it automagically works inside this SQL-to-Entities query:
Var CustomersFromZipCOde = context.Customers.Where (custo => custo.CustomerID==customerIDString && custo.ZipCode.StartsWith(ZipCodeString) && custo.Customer_Categories.Any(categ => CategoryList.Contains(categ.CategoryID)));
//gives the right output, incredible.
First of all i would like to say that even if you explanation is very long it is not very clear. You would like a simple Linq-to-Entities query but you don't give the Entities, you only speak of tables in your database.
Assuming you have the following entities:
public class Customer
{
public string CustomerID { get; set; }
public string ZipCode { get; set; }
public virtual ICollection<Category> Categories { get; set; }
}
public class Category
{
public string CategoryID { get; set; }
public virtual ICollection<Customer> Customers { get; set; }
}
Your query might look like this:
var CustomersFromZipCodeHavingSelectedCertainCategories =
context.Customers.Where(
customer => customer.CustomerID == customerIDString &&
customer.ZipCode.StartsWith(ZipCodeString) &&
customer.Categories.Any(
category => CategoryList.Contains(category.CategoryID));
More info on other ways to do this here:
http://smehrozalam.wordpress.com/2010/06/29/entity-framework-queries-involving-many-to-many-relationship-tables/

Sub-optimal queries over many-to-many relations with HQL

I have two entities, Location and Industry, and a link-table between them. I've configured a many-to-many relationship, in both directions, between the two entities.
In a search query, I'm trying to select Locations that are associated with a list of industries.
After days and days of trying to wrangle the criteria API, I've decided to drop down to HQL and abandon the criteria API. But even that isn't going well for me - it seems, regardless of whether I hand-write this HQL query, or let the criteria API do it, I end up with the same result.
I managed to produce the right result in two ways - like this:
var q = Data.Query("select distinct loc from Location loc join loc.Industries ind where ind in (:ind)");
q.SetParameterList("ind", new Industry[] { Data.GetIndustry(4), Data.GetIndustry(5) });
And (better) like that:
var q = Data.Query("select distinct loc from Location loc join loc.Industries ind where ind.id in (:ind)");
q.SetParameterList("ind", new int[] { 4, 5 });
Unfortunately, both result in a sub-optimal query:
select distinct
location0_.Id as Id16_,
location0_.Name as Name16_,
(etc.)
from Location location0_
inner join LocationIndustry industries1_
on location0_.Id=industries1_.LocationId
inner join Industry industry2_
on industries1_.IndustryId=industry2_.Id
where
industry2_.Id in (? , ?)
Why the extra join?
Is NH not smart enough to know that the Industry.Id property, being the only Industry-property involved in the query, is stored in the LocationIndustry link-table, and there is no need for the extra join to the Industry table itself?
Or am I doing something wrong?
Ideally, the most intuitive thing for me would be to write:
from Location loc where loc.Industries in (:ind)
This does not work - it throws an error and says it does not know about the Industries property. I guess because Industries, being a "property" in programming terms, is actually a "relationship" in terms of DBMS.
What is the simplest and most efficient way to write this query in HQL?
Thanks!
I'm not sure you can avoid this extra join given the mapping strategy you have used.
You could avoid it by using an intermediary class but this would mean you would need a class structure like this:
public class Industry {
//Other stuff
public virtual List<LocationIndustry> LocationIndustries {get; set:;}
}
public class LocationIndustry {
public virtual Location Location {get; set;}
public virtual Industry Industry {get; set;}
}
public class Location {
//normal stuff
public virtual IList<LocationIndustry> LocationIndustries {get; set;}
}
Then you can query on the LocationIndustry class and avoid the join to Location.