I am learning more raw SQL after coming from years of Rails and other ORMs, so have quite a ways to learn how to do complex queries efficiently. What I'm wondering here is _how to find all users which are missing some fields and associations, and returning which fields/associations they are missing.
I have a rough idea of how to write this in SQL but not exact (for PostgreSQL).
I have something like this data model, a table of users, a table of "social media links", and an association mapping the link to the user, so a user can have many social media links, but there could be more than one user associated with one link (i.e. organizations):
CREATE TABLE users (
id INT GENERATED BY DEFAULT AS IDENTITY,
slug VARCHAR(255) NOT NULL,
name VARCHAR(255),
description TEXT,
PRIMARY KEY (id)
)
CREATE TABLE sociallinks (
id INT GENERATED BY DEFAULT AS IDENTITY,
type VARCHAR(255),
value TEXT,
PRIMARY KEY (id)
)
CREATE TABLE usersociallinks (
id INT GENERATED BY DEFAULT AS IDENTITY,
user_id INTEGER REFERENCES users;
sociallink_id INTEGER REFERENCES sociallinks,
PRIMARY KEY (id)
)
The question is, how do you perform the query (with this pseudocode):
select from users
join on usersociallinks.user_id = users.id
join on sociallinks.id = usersociallinks.id
where name = null
or description = null
or missing linkedin? (sociallinks.type == linkedin)
or missing facebook? (sociallinks.type == facebook)
return slug from users table
return has_name = false if name is null from users table
return has_description = false if description is null from users table
return has_linkedin = false if linkedin is null from sociallinks table
return has_facebook = false if facebook is null from sociallinks table
In natural language, "select the users which either don't have a name or a description, or are missing a linkedin or facebook link (sociallinks.value), and return what fields they are missing".
I can do this the naive, long and convoluted way or querying one thing at a time, but I'm wondering how to do this efficiently, in possibly just one query (or as few queries as possible).
SELECT * FROM users
WHERE name IS NULL
OR description IS NULL
LIMIT 1
Fetch those, then do:
const record = await knex.raw(SQL)
const output = {}
if (!record.name) output.hasName = false
if (!record.description) output.hasDescription = false
return output
Then the next query is more complex, but I would do one at a time.
How do you do it somewhat efficiently and in as few queries as possible? Fetching with a limit of say 100 users per query.
Input data would be:
users:
id,slug,name,description
1,foo,Foo,Im a description
2,bar,,Im a description too
3,baz,,
4,hello,Hello,
5,world,,
6,food,Foo,Im a descriptiond
7,bard,,Im a description tood
8,bazd,asdf,fdsa
9,hellod,,
10,worldd,,A worldd description
sociallinks:
id,type,value
1,facebook,foo
2,facebook,bar
3,facebook,baz
4,facebook,hello
5,facebook,world
6,linkedin,foo
7,linkedin,bar
8,linkedin,baz
9,linkedin,hello
10,linkedin,world
usersociallinks:
id,user_id,sociallink_id
1,1,1
2,2,2
3,2,6
4,3,7
5,5,3
6,8,4
7,8,8
8,9,9
Output data would be:
user_id,slug,has_name,has_description,has_linkedin,has_facebook
1,foo,true,true,false,true
2,bar,false,true,true,true
3,baz,false,false,true,false
4,hello,true,false,false,false
5,world,false,false,false,true
6,food,true,true,false,false
7,bard,false,true,false,false
// 8 has everything so it is missing
9,hellod,false,false,true,false
10,worldd,false,true,false,false
EXISTS() to the rescue:
SELECT u.id, u.slug
, (u.name > '' ) AS has_name
, (u.description> '' ) AS has_description
, EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i
WHERE sl.user_id = u.id AND s.type = 'facebook') AS has_facebook
, EXISTS(SELECT 1 FROM usersociallinks sl JOIN sociallinks s ON s.id = usl.sociallink_i
WHERE sl.user_id = u.id AND s.type = 'linkedin') AS has_linkedin
FROM users u
ORDER BY u.id
;
You can join the tables and use conditional aggregation:
SELECT u.id, u.slug,
MAX(name) IS NOT NULL has_name,
MAX(description) IS NOT NULL has_description,
MAX(CASE WHEN s.type = 'linkedin' THEN 1 ELSE 0 END)::boolean has_linkedin,
MAX(CASE WHEN s.type = 'facebook' THEN 1 ELSE 0 END)::boolean has_facebook
FROM users u
LEFT JOIN usersociallinks usl ON usl.user_id = u.id
LEFT JOIN sociallinks s ON s.id = usl.sociallink_id AND s.type IN ('facebook', 'linkedin')
GROUP BY u.id, u.slug
ORDER BY u.id
See the demo.
select the users which either don't have a name or a description, or are missing a linkedin or facebook link (sociallinks.value), and return what fields they are missing
I would think two left joins:
with usl as (
select usl.*, sl.type
from usersociallinks usl join
sociallinks sl
on sl.id = usl.sociallink_id
)
select concat_ws(',',
(case when u.name is null then 'name' end),
(case when u.description is null then 'description' end),
(case when usl_f.user_id is null then 'facebook' end),
(case when usl_l.user_id is null then 'linkedin' end)
) as missing_values
from users u left join
usl usl_f
on usl_f.user_id = u.id and usl_f.value = 'facebook' left join
usl usl_l
on usl_l.user_id = u.id and usl_l.value = 'linkedin'
where u.name is null or u.description is null or
usl_f.user_id is null or usl_l.user_id is null
I am trying to get the Country from my factories AddressID directly in one Query with an Inner join. My problem is that the AddressID could be an empty string.
How do i check that and do some kind of exception or something similar? I if address ID is empty i need an empty string as result for f.Country.
This is my query for now, but if AddressID = '' then the row is not showing up in my result.
SELECT o.FactoryID,o.Name,o.Rating,o.ProductCategory,o.Emissions,o.LatestChanges,o.AddressID,o.ProductTags,f.Country
FROM FactoryHistory o
INNER JOIN AddressHistory f on f.AddressID = o.AddressID
WHERE NOT o.LatestChanges = 'Deleted' AND o.IsCurrent = 1 AND f.IsCurrent = 1
I something in my question is missing or unclear, just let me know!
Just left join:
SELECT o.FactoryID, o.Name, o.Rating, o.ProductCategory, o.Emissions, o.LatestChanges, o.AddressID, o.ProductTags, f.Country
FROM FactoryHistory o
LEFT JOIN AddressHistory f on f.AddressID = o.AddressID AND f.IsCurrent = 1
WHERE NOT o.LatestChanges = 'Deleted' AND o.IsCurrent = 1
If you still want to filter out factories whose AddressID is not null (assuming that by empty you mean null) and that do not exist in the address table, than you can add a condition in the WHERE clause:
WHERE
NOT o.LatestChanges = 'Deleted'
AND o.IsCurrent = 1
AND (o.AddressID IS NULL OR f.AddressID IS NOT NULL)
It might be clearer with a negation:
WHERE
NOT o.LatestChanges = 'Deleted'
AND o.IsCurrent = 1
AND NOT (o.AddressID IS NOT NULL AND f.AddressID IS NULL)
In SQL, when you do a bunch of joins, it treats all of the joined objects as one "super-object" to be selected from. This remains the case when you group by a particular column, as long as you include anything you select in the grouping (unless it is produced by the grouping, such as summing a bunch of int columns).
In LINQ, you can similarly do a bunch of joins in a row, and select from them. However, when you perform a grouping, it behaves differently. The syntax in query-style LINQ only allows for grouping a single table (i.e., one of your joins), discarding the others.
For an example case suppose we have a few tables:
Request
-------
int ID (PK)
datetime Created
int StatusID (FK)
Item
----
int ID (PK)
string Name
RequestItem
-----------
int ID (PK)
int ItemID (FK)
int RequestID (FK)
int Quantity
Inventory
---------
int ID (PK)
int ItemID (FK)
int Quantity
LU_Status
---------
int ID (PK)
string Description
In our example, LU_Status has three values in the database:
1 - New
2 - Approved
3 - Completed
This is a simplified version of the actual situation that lead me to this question. Given this schema, the need is to produce a report that shows the number of requested items (status not "Completed"), approved items (status "Approved"), distributed items (status "Completed"), and the number of items in stock (from Inventory), all grouped by the item. If this is a bit vague take a look at the SQL or let me know and I'll try to make it clearer.
In SQL I might do this:
select i.Name,
Requested = sum(ri.Quantity),
Approved = sum(case when r.StatusID = 2 then ri.Quantity else 0 end)
Distributed = sum(case when r.StatusID = 3 then ri.Quantity else 0 end)
Storage = sum(Storage)
from RequestItem as ri
inner join Request as r on r.ID = ri.RequestID
inner join Item as i on i.ID = ri.ItemID
inner join (select ItemID, Storage = sum(Quantity)
from Inventory
group by ItemID)
as inv on inv.ItemID = ri.ItemID
group by i.Name
This produces the desired result.
I began to rewrite this in LINQ, and got so far as:
var result = from ri in RequestItem
join r in Request on ri.RequestID equals r.ID
join i in Item on ri.ItemID equals i.ID
join x in (from inv in Inventory
group inv by inv.ItemID into g
select new { ItemID = g.Key, Storage = g.Sum(x => x.Quantity) })
on ri.ItemID equals x.ItemID
group...????
At this point everything had been going smoothly, but I realized that I couldn't simply group by i.Name like I did in SQL. In fact, there seemed to be no way to group all of the joined things together so that I could select the necessary things from them, so I was forced to stop there.. I understand how to use the group syntax in simpler situations (see the subquery), but if there's a way to do this sort of grouping in LINQ I'm not seeing it, and searching around here and elsewhere has not illuminated me.
Is this a shortcoming of LINQ, or am I missing something?
You can create an anonymous type in a grouping that contains all data you need:
var result = from ri in RequestItem
join r in Request on ri.RequestID equals r.ID
join i in Item on ri.ItemID equals i.ID
join x in (from inv in Inventory
group inv by inv.ItemID into g
select new { ItemID = g.Key, Storage = g.Sum(x => x.Quantity) })
on ri.ItemID equals x.ItemID
group new
{
i.Name,
r.StatusId,
ri.Quantity,
x.Storage,
}
by i.Name into grp
select new
{
grp.Key,
Requested = grp.Where(x => x.StatusID == 2).Sum(x => x.Quantity),
Distributed = grp.Where(x => x.StatusID == 3).Sum(x => x.Quantity),
Storage = grp.Sum(x => x.Storage)
}
(not tested, obviously, but it should be close).
The easiest way is to use group new { ... } by ... construct and include all the items from the joins that you need later inside the { ... }, like this
var query =
from ri in db.RequestItem
join r in db.Request on ri.RequestID equals r.ID
join i in db.Item on ri.ItemID equals i.ID
join x in (from inv in db.Inventory
group inv by inv.ItemID into g
select new { ItemID = g.Key, Storage = g.Sum(x => x.Quantity) }
) on ri.ItemID equals x.ItemID
group new { ri, r, i, x } by i.Name into g
select new
{
Name = g.Key,
Requested = g.Sum(e => e.ri.Quantity),
Approved = g.Sum(e => e.r.StatusID == 2 ? e.ri.Quantity : 0),
Distributed = g.Sum(e => e.r.StatusID == 3 ? e.ri.Quantity : 0),
Storage = g.Sum(e => e.x.Storage)
};
I've got a sql select query which returns two rows:
SELECT contacts_patientcontact.contact_id, patient_firstname, recent_mailouts
FROM contacts_patientcontact
INNER JOIN patients_patientcore
ON contacts_patientcontact.patient_id = patients_patientcore.patient_id
LEFT JOIN (SELECT contact_id, COUNT(*) as recent_mailouts
FROM contacts_communicationinstance
WHERE communication_type = 'questionnaire mailout'
GROUP BY contact_id) mail_outs
ON contacts_patientcontact.contact_id = mail_outs.contact_id
WHERE contact_date BETWEEN '2012/03/05' AND '2012/03/12'
AND contact_type = 'Postal Questionnaire'
AND patient_dead != 1
AND consent_withdrawn IS NULL
AND lost_follow_up != 1
AND (key = 'A' OR key = 'C')
AND (recent_mailouts < 1
OR recent_mailouts IS NULL);
However when I add it into django using the raw method the queryset doesn't seem to be iterable.
def weekly_questionnaire_mailout_query(monday):
"""
Returns a query set of PatientContact objects for patients
due a mailout in the week following the parameter 'monday'.
"""
nxt_monday = monday + datetime.timedelta(weeks=1)
nxt_monday_str = nxt_monday.strftime('%Y/%m/%d')
monday_str = monday.strftime('%Y/%m/%d')
contacts = PatientContact.objects.raw("""
SELECT contacts_patientcontact.contact_id
FROM contacts_patientcontact
INNER JOIN patients_patientcore
ON contacts_patientcontact.patient_id = patients_patientcore.patient_id
LEFT JOIN (SELECT contact_id, COUNT(*) as recent_mailouts
FROM contacts_communicationinstance
WHERE communication_type = 'questionnaire mailout'
GROUP BY contact_id) mail_outs
ON contacts_patientcontact.contact_id = mail_outs.contact_id
WHERE contact_date BETWEEN '%s' AND '%s'
AND contact_type = 'Postal Questionnaire'
AND patient_dead != 1
AND consent_withdrawn IS NULL
AND lost_follow_up != 1
AND (cora = 'A' OR cora = 'C')
AND (recent_mailouts < 1
OR recent_mailouts IS NULL);
""" % (monday_str, nxt_monday_str)
)
return contacts
contacts = weekly_questionnaire_mailout_query(monday)
for contact in contacts:
patients.add(contact.patient_id)
That last line is never reached. (I've checked the dates are correct, and I've included the PatientContact model below).
class PatientContact(models.Model):
contact_id = models.AutoField(primary_key=True)
patient_id = models.ForeignKey(PatientCore, db_column="patient_id",
verbose_name="patient")
# additional fields..
I'm at a loss with this - instead of showing the items in a queryset my (pydevd) debugger shows a RawQuerySet object. The same function (with the same parameter) is returning an object that djangotables2 handles fine (producing the table I'd expect from the sql output).
EDIT
That's embarrassing - it was the dates after all - I wasn't actually running the same SQL query (I thought I'd checked and rechecked them last week). Apologies to anyone who's spent any time on this.
This was a mistake. I was running the wrong piece of code.
The solution is to be more carefull!
We had an issue where our workflows have not been creating activities.
I now need to report which accounts have not had their workflows invoked.
I've tried advanced find and then moved to sql.
My question is can someone provide a simple starter query to pull which 'entity' has NOT had a specific activity associated with it?
Please let me know if the question is not clear enough or more info, is needed.
Below is a solution using SQL where I step through my thought process, and below that is a solution that gets started with the C# API (edit: just realized this is for a report, so that part can be disregarded). I've commented in most places, so I hope my methods are fairly straightforward.
SQL
1.
--get all the entities that aren't activities and aren't intersect entities (N:N tables)
--Put in your own where conditions to further filter this list,
--which is still probably far too expansive
SELECT
A.name EntityName
FROM MetadataSchema.Entity A
WHERE
A.IsActivity = 0
AND A.IsIntersect = 0
2.
--CROSS JOIN the non-activity entities with the activity entities
--to get a list of all possible entity/activity pairings
SELECT DISTINCT
A.name EntityName
, B.Name ActivityName
FROM MetadataSchema.Entity A
CROSS JOIN MetadataSchema.Entity B
WHERE
A.IsActivity = 0
AND A.IsIntersect = 0
AND B.IsActivity = 1
3.
--LEFT JOIN the partial cartesian join above against the Activity table,
--making a note of which entities actually have activity records.
--This will provide a complete list of which entity/activity pairings
--exist and don't exist
SELECT
A.name EntityName
, B.Name ActivityName
--if there is a matching activity, the unique key,
--ActivityTypeCode (int), will be positive.
--So, if there is a positive sum for an entity/activity
--pairing, you know there is a valid pair; otherwise
--no pair
, CAST(CASE WHEN sum(coalesce(C.ActivityTypeCode, 0)) > 0
THEN 1
ELSE 0
END AS BIT) EntityOwnsActivity
FROM MetadataSchema.Entity A
CROSS JOIN MetadataSchema.Entity B
LEFT JOIN dbo.ActivityPointer C ON
--ObjectTypeCode is a unique identifier for Entities;
--RegardingObjectTypeCode is the code for the entity type
--associated with a particular activity
A.ObjectTypeCode = C.RegardingObjectTypeCode
--ActivityTypeCode is the code for the particular activity
AND B.ObjectTypeCode = C.ActivityTypeCode
WHERE
A.IsActivity = 0
AND A.IsIntersect = 0
AND B.IsActivity = 1
GROUP BY
A.name
, B.Name
4.
--Putting it all together, using the above master table,
--filter out the entities/activities you're interested in
--(in this case, all entities that aren't associated with
--any emails)
SELECT
EntityName
FROM
(
SELECT
A.name EntityName
, B.Name ActivityName
, CAST(CASE WHEN sum(coalesce(C.ActivityTypeCode, 0)) > 0
THEN 1
ELSE 0
END AS BIT) EntityOwnsActivity
FROM MetadataSchema.Entity A
CROSS JOIN MetadataSchema.Entity B
LEFT JOIN dbo.ActivityPointer C ON
A.ObjectTypeCode = C.RegardingObjectTypeCode
AND B.ObjectTypeCode = C.ActivityTypeCode
WHERE
A.IsActivity = 0
AND A.IsIntersect = 0
AND B.IsActivity = 1
GROUP BY
A.name
, B.Name
) EntityActivities
WHERE ActivityName = 'Email'
AND EntityOwnsActivity = 0
ORDER BY
EntityName
C#.NET API
using (OrganizationServiceProxy _serviceProxy =
new OrganizationServiceProxy(
new Uri(".../XRMServices/2011/Organization.svc"), null, null, null))
{
_serviceProxy.EnableProxyTypes();
RetrieveAllEntitiesRequest request = new RetrieveAllEntitiesRequest()
{
EntityFilters = EntityFilters.Entity,
RetrieveAsIfPublished = true
};
// Retrieve the MetaData.
EntityMetadata[] entities =
((RetrieveAllEntitiesResponse)_serviceProxy.Execute(request)).EntityMetadata;
var ents = from e1 in entities.Where(x => x.IsActivity != true)
.Where(x => x.IsIntersect != true)
from e2 in entities.Where(x => x.IsActivity == true)
select new
{
entityName = e1.SchemaName
,
activityName = e2.SchemaName
};
//at this point, because of the limited nature of the Linq provider for left joins
//and sums, probably the best approach is to do a fetch query on each entity/activity
//combo, do some sort of sum and find out which combos have matches
// in the activity pointer table
//API = very inefficient; maybe improved in next CRM release? Let's hope so!
}