VB.NET LINQ group join followed by another group - vb.net

I have two collections of objects in VB.NET that I want to link together using a join, and possibly group together. Basically my objects look like this:
Institution
ID
Name
Visit
ID
InstitutionID
Date
IsFollowUp
IsSelfScheduled
A visit has to be to an institution, but an institution can also have no visits. I can link them together using LINQ, but I can't quite get the institutions that do not have any visits to appear in the list also (which is what I am looking for). I know I have to use a group join, but I can't work out how to incorporate the existing join as well.
From p In visitList Where p.IsFollowUp = False AndAlso p.IsSelfScheduled = False
Group p By Key = p.InstitutionID Into grp = Group
Select InstitutionID = Key, Visits = grp, LastVisitDate = grp.FirstOrDefault().ProvisionalDate
If the worst comes to the worst, I can implement a private class to do what I want, but it seems like it should be something simple in LINQ.
Edit:
Okay, using the link below that Tim posted, I managed to come up with something like this:
From i In institutionList
Select InstitutionID = i.ID, Name = i.Name, Inspections =
(
From p In visitList Where p.InstitutionID = i.ID Select p
)
It uses a sub-query to pull out the relevant information, but whether it is efficient or not, I have no idea. It seems to pull out the relevant information.

Related

SQL - join three tables based on (different) latest dates in two of them

Using Oracle SQL Developer, I have three tables with some common data that I need to join.
Appreciate any help on this!
Please refer to https://i.stack.imgur.com/f37Jh.png for the input and desired output (table formatting doesn't work on all tables).
These tables are made up in order to anonymize them, and in reality contain other data with millions of entries, but you could think of them as representing:
Product = Main product categories in a grocery store.
Subproduct = Subcategory products to the above. Each time the table is updated, the main product category may loses or get some new suproducts assigned to it. E.g. you can see that from May to June the Pulled pork entered while the Fishsoup was thrown out.
Issues = Status of the products, for example an apple is bad if it has brown spots on it..
What I need to find is: for each P_NAME, find the latest updated set of subproducts (SP_ID and SP_NAME), and append that information with the latest updated issue status (STATUS_FLAG).
Please note that each main product category gets its set of subproducts updated at individual occasions i.e. 1234 and 5678 might be "latest updated" on different dates.
I have tried multiple queries but failed each time. I am using combos of SELECT, LEFT OUTER JOIN, JOIN, MAX and GROUP BY.
Latest attempt, which gives me the combo of the first two tables, but missing the third:
SELECT
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE
FROM SUBPRODUCT
LEFT OUTER JOIN PRODUCT ON PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
JOIN(SELECT SP_PRODUCT_ID, MAX(SP_VALUE_DATE) AS latestdate FROM SUBPRODUCT GROUP BY SP_PRODUCT_ID) sub ON
sub.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID AND sub.latestDate = SUBPRODUCT.SP_VALUE_DATE;
Trying to find a row with a max value is a common SQL pattern - you can do it with a join, like your example, but it's usually more clear to use a subquery or a window function.
Correlated subquery example
select
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE,
ISSUES.STATUS_FLAG, ISSUES.STATUS_LAST_UPDATED
from PRODUCT
join SUBPRODUCT
on PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
and SUBPRODUCT.SP_VALUE_DATE = (select max(S2.SP_VALUE_DATE) as latestDate
from SUBPRODUCT S2
where S2.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID)
join ISSUES
on ISSUES.ISSUE_ID = SUBPRODUCT.SP_ID
and ISSUES.STATUS_LAST_UPDATED = (select max(I2.STATUS_LAST_UPDATED) as latestDate
from ISSUES I2
where I2.ISSUE_ID = ISSUES.ISSUE_ID)
Window function / inline view example
select
PRODUCT.P_NAME,
S.SP_PRODUCT_ID, S.SP_NAME, S.SP_ID, S.SP_VALUE_DATE,
I.STATUS_FLAG, I.STATUS_LAST_UPDATED
from PRODUCT
join (select SUBPRODUCT.*,
max(SP_VALUE_DATE) over (partition by SP_PRODUCT_ID) as latestDate
from SUBPRODUCT) S
on PRODUCT.P_ID = S.SP_PRODUCT_ID
and S.SP_VALUE_DATE = S.latestDate
join (select ISSUES.*,
max(STATUS_LAST_UPDATED) over (partition by ISSUE_ID) as latestDate
from ISSUES) I
on I.ISSUE_ID = S.SP_ID
and I.STATUS_LAST_UPDATED = I.latestDate
This often performs a bit better, but window functions can be tricky to understand.

Select Count of one table into another

I have one SQL statement as:
SELECT ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP
FROM
RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES,
RSKLMOBILEB2E.MEDIA_TYPE MEDIA,
RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
WHERE ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
AND ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
AND ARTICLES.ARTCL_SRC_ID = 1
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT
Now I need to combine another SQL statement into one which is:
SELECT COUNT ( * )
FROM RSKLMOBILEB2E.NEWS_LIKES LIKES, RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
WHERE LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
Basically I have one table which contains articles and I need to include the user likes which is in another table.
Use a subquery to add the likescount in your first query like this:
SELECT ARTICLES.NEWS_ARTCL_ID
,ARTICLES.NEWS_ARTCL_TTL_DES
,ARTICLES.NEWS_ARTCL_CNTNT_T
,ARTICLES.NEWS_ARTCL_PUB_DT
,ARTICLES.NEWS_ARTCL_AUTH_NM
,ARTICLES.NEWS_ARTCL_URL
,ARTICLES.MEDIA_URL
,ARTICLES.ARTCL_SRC_ID
,SOURCES.ARTCL_SRC_NM
,MEDIA.MEDIA_TYPE_DESCRIP
,(
SELECT COUNT(*)
FROM RSKLMOBILEB2E.NEWS_LIKES LIKES
WHERE LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
) AS LikesCount
FROM RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
,RSKLMOBILEB2E.MEDIA_TYPE MEDIA
,RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
WHERE ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
AND ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
AND ARTICLES.ARTCL_SRC_ID = 1
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT;
I'm not sure what you are trying to achieve but it seems you want to count all the data from 2 tables. You can edit your query to something like this.
SELECT COUNT (ARTICLES.*) FROM RSKLMOBILEB2E.NEWS_LIKES LIKES
JOIN RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
ON LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
I think that solution is in using Analytic Functions. Please have a look on https://oracle-base.com/articles/misc/analytic-functions
Please check following query (keep in mind I have no idea about your table structures). Due to left join records might be duplicated, this is why grouping is added.
SELECT ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP,
count(LIKES.ID) over ( partition by ARTICLES.NEWS_ARTCL_ID ) as num_likes
FROM RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
join RSKLMOBILEB2E.MEDIA_TYPE MEDIA
on ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
join RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
on ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
LEFT JOIN RSKLMOBILEB2E.NEWS_LIKES LIKES
ON LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
WHERE
ARTICLES.ARTCL_SRC_ID = 1
group by ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT
I also changed coma-separated list of tables from where condition to joins. I think this is more readable since table join conditions are separated from result filtering in where clause.

How to get Django QuerySet 'exclude' to work right?

I have a database that contains schemas for skus, kits, kit_contents, and checklists. Here is a query for "Give me all the SKUs defined for kitcontent records defined for kit records defined in checklist 1":
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1;
I'm using Django, and I mostly really like the ORM because I can express that query by:
skus = SKU.objects.filter(kitcontent__kit__checklist_id=1).distinct()
which is such a slick way to navigate all those foreign keys. Django's ORM produces basically the same as the SQL written above. The trouble is that it's not clear to me how to get all the SKUs not defined for checklist 1. In the SQL query above, I'd do this by replacing the "=" with "!=". But Django's models don't have a not equals operator. You're supposed to use the exclude() method, which one might guess would look like this:
skus = SKU.objects.filter().exclude(kitcontent__kit__checklist_id=1).distinct()
but Django produces this query, which isn't the same thing:
SELECT distinct s.* FROM skus s
WHERE NOT ((skus.id IN
(SELECT kc.sku_id FROM kit_contents kc
INNER JOIN kits k ON (kc.kit_id = k.id)
WHERE (k.checklist_id = 1 AND kc.sku_id IS NOT NULL))
AND skus.id IS NOT NULL))
(I've cleaned up the query for easier reading and comparison.)
I'm a beginner to the Django ORM, and I'd like to use it when possible. Is there a way to get what I want here?
EDIT:
karthikr gave an answer that doesn't work for the same reason the original ORM .exclude() solution doesn't work: a SKU can be in kit_contents in kits that exist on both checklist_id=1 and checklist_id=2. Using the by-hand query I opened my post with, using "checklist_id = 1" produces 34 results, using "checklist_id = 2" produces 53 results, and the following query produces 26 results:
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1
JOIN kit_contents kc2 ON kc2.sku_id = s.id
JOIN kits k2 ON k2.id = kc2.kit_id
JOIN checklists c2 ON k2.checklist_id = 2;
I think this is one reason why people don't seem to find the .exclude() solution a reasonable replacement for some kind of not_equals filter -- the latter allows you to say, succinctly, exactly what you mean. Presumably the former could also allow the query to be expressed, but I increasingly despair of such a solution being simple.
You could do this - get all the objects for checklist 1, and exclude it from the complete list.
sku_ids = skus.values_list('pk', flat=True)
non_checklist_1 = SKU.objects.exclude(pk__in=sku_ids).distinct()

JOIN; only one record please!

OK, I have a complicated query from a poorly designed DB... In one query I need to get a count from one database, information from another with a link from another, here goes:
Each blog has a type (news, report etc) and a section Id for a certain part of the site but it also can be linked to multiple computer games and sections)
type ( blog_id, title, body, etc...) // yes I know the type is the name of the blog and not just an id number in the table not my design
blog_link ( blog_id, blog_type, section_id, game_id )
blog_comments (blog_id, blog_type, comment, etc...)
So the query goes a little like this:
SELECT bl.`blog_id`, count(bc.`blog_id`) AS 'comment_count', t.`added`
FROM blog_link bl
JOIN type t ON t.`id` = bl.`blog_id`
JOIN blog_comments bc ON (`item_id` = bl.`blog_id` AND `blog_type` = '[$type]')
WHERE bl.`section_id` = [$section_id] AND bl.`blog_type` = '[$type]'
GROUP BY bl.`blog_id`
ORDER BY `added` DESC
LIMIT 0,20
Now this is fine so long as I do not have multiple games associated with one blog.
Edit: So currently if more than one game is associated the comment_count is multiplied by the amount of games associated... not good.
I have no idea how I could do this... It just isn't working! If I could somehow group by the blog_id before I join it would be gold... anyone got an Idea?
Many thanks in advance
Dorjan
edit2: I've offered a bounty as this problem surely can be solved!! Come on guys!
It seems like you just want to get a DISTINCT count, so just add DISTINCT inside the count. Although you will need to add some sort of unique identifier for each comment. Ideally you would have a unique id (ie. auto increment) for each comment, but if you don't you could probably use blog_id+author+timestamp.
SELECT bl.`blog_id`, count(DISTINCT CONCANT(bc.`blog_id`,bc.`author`,bc.`timestamp`) AS 'comment_count',...
That should give you a unique comment count.
I think you need to get the blogs of type "X" first, then do a count of comments for those blogs.
SELECT
EXPR1.blog_id,
count(bc.`blog_id`) AS 'comment_count'
FROM
(
SELECT
bl.blog_id, t.added
FROM
blog_link bl
JOIN
type t ON t.id = bl.blog_id
WHERE
bl.`section_id` = [$section_id]
AND
bl.`blog_type` = '[$type]'
GROUP BY
bl.`blog_id`
ORDER BY
`added` DESC
LIMIT 0,20
) AS EXPR1
JOIN
blog_comments bc ON
(
bc.item_id = EXPR1.blog_id
)
Not tested :
SELECT bl.`blog_id`, count(bc.`blog_id`) AS 'comment_count', t.`added`
FROM
(
SELECT DISTINCT blog_id, blog_type
FROM blog_link
WHERE
`section_id` = [$section_id]
AND `blog_type` = '[$type]'
) bl
INNER JOIN blog_comments bc ON (
bc.`item_id` = bl.`blog_id` AND bc.`blog_type` = bl.`blog_type`
)
INNER JOIN type t ON t.`id` = bl.`blog_id`
GROUP BY bl.`blog_id`
ORDER BY t.`added` DESC
LIMIT 0,20

How to write a query returning non-chosen records

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.
The table structure is as follows:
table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)
Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like
select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null
causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.
TIA,
No'am
Maybe this is a bit faster:
SELECT w.name
FROM words w
WHERE NOT EXISTS
(SELECT 1
FROM choices c
WHERE c.class = 1 and c.wid = w.id)
Something like that should do the trick:
SELECT name
FROM words
WHERE id NOT IN
(SELECT DISTINCT wid -- DISTINCT is actually redundant
FROM choices
WHERE class == 1)
SELECT words.name
FROM
words
LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL
Make sure you have an index on choices (class, wid).