How to group within categories - sql

I'm trying to eliminate duplicated data within ID across multiple categories. Is it possible to eliminate duplicates for each category in one query? If I had one category that would be simple as adding a group by ID.
INSERT INTO TABLE_PROFILES(CATEGORY,ID,REGION_ID)
SELECT D.category_id, C.ID
FROM MATCH_DATA C JOIN
CATEGORY_TABLE D
ON c.EXTERNAL_ID = d.device_id;

Try using distinct
INSERT INTO TABLE_PROFILES(CATEGORY,ID,REGION_ID)
SELECT distinct D.category_id, C.ID
FROM MATCH_DATA C JOIN
CATEGORY_TABLE D
ON c.EXTERNAL_ID = d.device_id;

Is that what you were looking for? Distinct?
Insert INTO TABLE_PROFILES(CATEGORY,ID) SELECT distinct D.category_id, C.ID FROM MATCH_DATA C JOIN CATEGORY_TABLE D ON c.EXTERNAL_ID = d.device_id;

Related

Three table join in sql

I'm trying to join three tables.
Tables:
HumanResources.Employees: Employee_ID(Primary Key), First_Name, Title -- also known as Employee_Title,
ProjectDetails.TimeCards: Employee_ID(Foreignkey), Project_ID (Foreign Key)
ProjectDetails.Projects: Project_Name, Project_ID(Primary Key)
I tried joining them using a temporary table.
select b.First_Name, b.Title, c.Project_ID from HumanResources.Employees b -- select statement
inner join ProjectDetails.TimeCards c on b.Employee_ID = c.Employee_ID -- first join seems to be the only one working.
inner join (select d.Project_Name as Project_Name, Project_ID from ProjectDetails.Projects d) as d on d.Project_ID=c.Project_ID -- second join doesn't seem to work.
The use of the subquery is redundunt at best, and also the common alias of "d" may be a source of error.
Just do:
select b.First_Name, b.Title, c.Project_ID
from HumanResources.Employees b
inner join ProjectDetails.TimeCards c on b.Employee_ID = c.Employee_ID
inner join ProjectDetails.Projects d on d.Project_ID=c.Project_ID
In the inner query for Project_ID use column alias name and then join on the alias column name.also try to have a different alias name for the sub query.
I don't see the benefit of using a subquery in your query. Could you try the below query?
Make sure in your table Projects has matching Project_ID otherwise of course nothing would come up.
SELECT b.First_Name
,b.Title
,c.Project_ID
FROM HumanResources.Employees b
INNER JOIN ProjectDetails.TimeCards c ON b.Employee_ID = c.Employee_ID
INNER JOIN ProjectDetails.Projects d ON d.Project_ID = c.Project_ID

Counting associations from multiple tables

I want to see how many association each of my records in a given table have. Some of these association have some conditions attached to them
So far I have
-- Count app associations
SELECT
distinct a.name,
COALESCE(v.count, 0) as visitors,
COALESCE(am.count, 0) AS auto_messages,
COALESCE(c.count, 0) AS conversations
FROM apps a
LEFT JOIN (SELECT app_id, count(*) AS count FROM visitors GROUP BY 1) v ON a.id = v.app_id
LEFT JOIN (SELECT app_id, count(*) AS count FROM auto_messages GROUP BY 1) am ON a.id = am.app_id
LEFT JOIN (
SELECT DISTINCT c.id, app_id, count(c) AS count
FROM conversations c LEFT JOIN messages m ON m.conversation_id = c.id
WHERE m.visitor_id IS NOT NULL
GROUP BY c.id) c ON a.id = c.app_id
WHERE a.test = false
ORDER BY visitors DESC;
I run into problem with the last join statement for conversations. I want to count the number of conversations that have at least 1 message where the visitor_id is not null. For some reason, I get multiple records for each app, ie. the conversations are not being grouped properly.
Any ideas?
My gut feeling, based on limited understanding of the big picture: in the nested query selecting from conversations,
remove DISTINCT
remove c.id from SELECT list
GROUP BY c.app_id instead of c.id
EDIT: try this
...
LEFT JOIN (
SELECT app_id, count(*) AS count
FROM conversations c1
WHERE
EXISTS (
SELECT *
FROM messages m
WHERE m.conversation_id = c1.id and
M.visitor_id IS NOT NULL
)
GROUP BY c1.app_id) c
ON a.id = c.app_id

Query returning too many results

SQL query that returns expected 29 results for a.id = 366
select a.name, c.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
where a.id = 366
GROUP BY a.name, c.name
order by MAX(b.renew_date), MAX(b.date) desc;
SQL code below that returns 34 results, multiple results where two different Provides supplied the same course. I know these extra results are because I added e.name to the list to be returned. But all that is needed is the 29 entries with the latest date and Providers names.
select a.name, c.name, e.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
GROUP BY a.name, c.name, e.name
order by MAX(b.renew_date), MAX(b.date) desc;
Can anyone rework this code to return a single DISTINCT Provider name with the MAX(renew_date) for each course.
This returns exactly one row per distinct combination of (a.name, c.name):
The one with the latest renew_date.
Among these, the one with the latest date (may differ from global max(date)!).
Among these, the one with the alphabetically first e.name:
SELECT DISTINCT ON (a.name, c.name)
a.name AS a_name, c.name AS c_name, e.name AS e_name
, b.renew_date, b.date
FROM boson_course c
JOIN boson_coursedetail b on c.id = b.course_id
JOIN boson_coursedetail_attendance d on d.coursedetail_id = b.id
JOIN boson_employee a on a.id = d.employee_id
JOIN boson_provider e on b.provider_id = e.id
WHERE a.id = 366
ORDER BY a.name, c.name
, b.renew_date DESC NULLS LAST
, b.date DESC NULLS LAST
, e.name;
The result is sorted by a_name, c_name first. If you need your original sort order, wrap this in a subquery:
SELECT *
FROM (<query from above>) sub
ORDER BY renew_date DESC NULLS LAST
, date DESC NULLS LAST
, a_name, c_name, e_name;
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Why DESC NULL LAST?
PostgreSQL sort by datetime asc, null first?
Aside: Don't use basic type names like date ad column names. Also, name is hardly ever a good name. As you can see, we have to use aliases to make this query useful. Some general advice on naming conventions:
How to implement a many-to-many relationship in PostgreSQL?
Try using distinct on:
select distinct on (a.name, c.name, e.name), a.name, c.name, e.name,
B.date, b.renew_date as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
ORDER BY a.name, c.name, e.name, B.date desc
order by MAX(b.renew_date), MAX(b.date) desc;

Complicated SQL query

Does anyone have an idea about how to structure the following query:
Tables :
TBL_GAME id, name
TBL_CATEGORY id, name
LU_GAME_TO_CATEGORY gameid, catid
LU_GAME_TO_EVENT eventid, gameid
So, basically Categories have many games.
Events have many games.
I want to generate a report that shows the Categories listed by how many of its Games were used in Events. Ordered by the amount descending.
Is this possible ?
SELECT
c.id as catId,
c.name as catName,
g.id as gameId,
g.name as gameName,
sum(ge.INT_QUANTITY) as totalQuantities
FROM
TBL_CATEGORY as c,
TBL_GAME as g,
LU_GAME_TO_CATEGORY as gc,
LU_GAME_TO_EVENT as ge
WHERE
c.id = gc.catId AND
g.id = gc.gameId AND
g.id = ge.gameId
GROUP BY
c.id,
g.id
ORDER BY
totalQuantities desc,
c.name,
g.name
SELECT A.ID, SUM(D.INT_QUANTITY) AS YourSum
FROM TBL_CATEGORY A --Adapt your selected columns
INNER JOIN LU_GAME_TO_CATEGORY B ON A.id = B.catid
INNER JOIN TBL_CATEGORY C ON B.gameid = C.id
INNER JOIN LU_GAME_TO_EVENT D ON C.gameid = D.gameid
GROUP BY A.ID
ORDER BY YourSum DESC;
Here i don't adjust amount, because columns does not exist.
You must add Amount column in target table

Optimal SQLite Query Remove Duplicates

I have two tables with the following setup:
category: (id, name)
item: (id, name, category_id) - category_id is foreign key to category table
Now I am writing a query to retrieve a subset from the category table of only used categories:
SELECT c.id, c.name
FROM category c
WHERE c.id IN (SELECT DISTINCT category_id FROM item)
The above query works fine. I'm just wondering if this is the most optimal way of doing the query or if there's something else that I could do via a join or something
Transforming the IN (SELECT) to EXISTS (SELECT ... WHERE ) might help:
SELECT c.id, c.name
FROM category c
WHERE EXISTS (SELECT 1 FROM item WHERE item.category_id = c.id)
Another possibility (I expect it to be slower, but it always depends on your db):
SELECT c.id, c.name
FROM category c
INNER JOIN item ON item.category_id = c.id
GROUP BY c.id
Or you could use DISTINCT instead of GROUP BY:
SELECT DISTINCT c.id, c.name
FROM category c
INNER JOIN item ON item.category_id = c.id
And if speed is that important, don't forget to call ANALYZE from time to time:
http://www.sqlite.org/lang_analyze.html
Some other variants for fun:
SELECT c.id, c.name
FROM category c
INNER JOIN (SELECT DISTINCT item.category_id ) AS i_c ON i_c.category_id = c.id
Another:
SELECT c.id, c.name
FROM category c
EXCEPT
SELECT c.id, c.name
FROM category c
LEFT JOIN item ON item.category_id = c.id
WHERE item.category_id IS NULL
Use Join:
SELECT c.id, c.name
FROM category c
JOIN item i on c.id=i.category_id
GROUP BY c.id, c.name