SQL Count on multiple joins with dynamic WHERE - sql

My issue is that I have a Select statement that has a where clause that is generated on the fly. It is joined across 5 tables.
I basically need a Count of each DISTINCT instance of a USER ID in table 1 that falls into the scope of the WHERE. This has to be able to be executed in one statement as well. So, Esentially, I can't do a global GROUP BY because of the other 4 tables data I need returned.
If I could get a column that had the count that was duplicated where the primary key column is that would be perfect. Right now this is what I'm looking at as my query:
SELECT *
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column
, 3.Column
, 3.Column2
, 1.Date_Time_In DESC
So instead of selecting all columns, I will be filtering it down to about 5 or 6 but with that I need something like a Total column that is the Distinct count of TBL1's Primary Key that applies the WHERE clause that has a possibility of growing and shrinking in size.
I almost wish there was a way to apply the same WHERE clause to a subselect because I realize that would work but don't know of a way other than creating a variable and just placing it in both places which I can't do either.

If you are using SQL Server 2005 or higher, you could use one of the AGGREGATE OVER functions.
SELECT *
, COUNT(UserID) OVER(PARTITION BY UserID) AS 'Total'
FROM TBL1 1
INNER JOIN TBL2 2 On 2.FK = 1.FK
INNER JOIN TBL3 3 On 3.PK = 2.PK INNER JOIN TBL4 4 On 4.PK = 3.PK
LEFT OUTER JOIN TBL5 5 ON 4.PK = 5.PK
WHERE 1.Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00'
ORDER BY
4.Column, 3.Column, 3.Column2, 1.Date_Time_In DESC

something like adding:
inner join (select pk, count(distinct user_id) from tbl1 WHERE Date_Time_In BETWEEN '2010-11-15 12:00:00' AND '2010-11-30 12:00:00') as tbl1too on 1.PK = tbl1too.PK

Related

Union 2 table with diff schema

Following is the requirement and given table schema
Write a query in Canvas using Table 1 ,Table 2 & Table 3 only for books available in both the tables with below conditions
It has to be non-fiction and have rating above 4.2 .
Introduce a column using price to show grouping.
Union Table 3.
Table Schema
I was able to solve the first point. However I am not able to wrap my head around the point 2 & 3. Do you guys think its an incorrect question?
My Query:
Select table2.book, table1.genre, table2.ratings, table2.reviews, table2.type, table2.price
from table1 inner join table2 on table1.book_name = table2.book
where table1.genre = 'non-fiction'
and table2.ratings > 4.2
It is not possible to UNION two tables with a different number of columns.
UNION is a set operator which combines data vertically across the tables. So, the columns must be same number, same datatype and same order.
The trick here is to use NULL for the extra columns present in both the tables if you want to use UNION. You can match the number of columns using this way.
(select table2.book,table2.author,NULL as noOfCopies,table1.genre,table2.ratings,table2.reviews,table2.type,sum(table2.price) as totalprice
from table1 inner join table2 on table1.book_name = table2.book
where table1.genre = 'non-fiction' and table2.ratings > 4.2
group by table2.book, table1.genre, table2.ratings, table2.reviews,table2.type)
UNION
select NULL as book, author_name, noOfCopies, NULL as genre, NULL as ratings, NULL as reviews, NULL as type, NULL as totalprice from table3;
To do union with table3,used inner join to link table2 and table3 with author name
Select table2.book, table1.genre, table2.ratings, table2.reviews, table2.type, sum(table2.price), sum(noofcopies)
from table1
inner join table2 on table1.book_name = table2.book
inner join table3 on table2.author=table3.author_name
where table1.genre = 'non-fiction' and table2.ratings > 4.2
group by table2.book, table1.genre, table2.ratings, table2.reviews,table2.type

Remove duplicates from result in sql

i have following sql in java project:
select distinct * from drivers inner join licenses on drivers.user_id=licenses.issuer_id
inner join users on drivers.user_id=users.id
where (licenses.state='ISSUED' or drivers.status='WAITING')
and users.is_deleted=false
And result i database looks like this:
And i would like to get only one result instead of two duplicated results.
How can i do that?
Solution 1 - That's Because one of data has duplicate value write distinct keyword with only column you want like this
Select distinct id, distinct creation_date, distinct modification_date from
YourTable
Solution 2 - apply distinct only on ID and once you get id you can get all data using in query
select * from yourtable where id in (select distinct id from drivers inner join
licenses
on drivers.user_id=licenses.issuer_id
inner join users on drivers.user_id=users.id
where (licenses.state='ISSUED' or drivers.status='WAITING')
and users.is_deleted=false )
Enum fields name on select, using COALESCE for fields which value is null.
usually you dont query distinct with * (all columns), because it means if one column has the same value but the rest isn't, it will be treated as a different rows. so you have to distinct only the column you want to, then get the data
I suspect that you want left joins like this:
select *
from users u left join
drivers d
on d.user_id = u.id and d.status = 'WAITING' left join
licenses l
on d.user_id = l.issuer_id and l.state = 'ISSUED'
where u.is_deleted = false and
(d.user_id is not null or l.issuer_id is not null);

SQL Table Joining

I'm joining these three tables, but the same information gets displayed 3 times ... Any idea how to have only the unique rows to be displayed, as determined by unique shipment id's?
SELECT S.SHIPMENT_ID, S.CREATION_DATE, S.BUSINESS_ID, B.BUS_ID, S.SHIPMENT_STATUS, S.BUSINESS_NAME, S.SHIPMENT_MODES, S.CUSTOMER_NAME
FROM "SHIPMENT" S
INNER JOIN "BUSINESS" B ON S.BUSINESS_ID=B.BUS_ID
INNER JOIN "SHIPMENT_GROUP" SG ON S.SHIPMENT_ID=SG.SHIPMENT_ID
INNER JOIN "DATA_GROUP" DG ON DG.ID=SG.GROUP_ID
try select distinct
SELECT DISTINCT column1, column2, ...
FROM table_name;
w3schools
You are selecting rows from the first table only, so this suggests that you are using the joins for filtering.
If so, you can rewrite this with exists, which will avoid duplicates if there are multiple matches. Starting from your existing query, the logic would be:
select s.*
from shipment s
where
exists (
select 1
from business b
where b.bus_id = s.business_id
) and exists (
select 1
from shipment_group sg
inner join data_group dg on dg.id = sg.group_id
where sg.shipment_id = s.shipment_id
)

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Join and compare 2 queries of 2 tables

This is probably a quite trivial question for many here but I am not used to write sub queries and joins, so I hope someone want to help.
I have two tables: new_road and old_roads.
These two queries sum up the length of the roads belonging to a specific road number.
SELECT new_road.nummer, SUM(new_road.length) FROM road_table.road GROUP BY new_road.nummer
SELECT old_road.nummer, SUM(ST_length(old_road.geom)) FROM old_road_table.old_road GROUP BY old_road.nummer
I wish to have a result table where these two queries are joined so I can compare the new and old summed length for each road number.
Like
old.nummer old.length new.nummer new.lenght
2345 10.3 2345 10.5
2346 578.2 2346 600
2347 54.2 NULL NULL
NULL NULL 2546 32.2
I think some version of an outer join is needed because there will be a road numbers in the old_road table that does not exist in the new.road table and i would like to see them too.
Appreciate any advice
Edit:
After advice from below did I came up with this:
SELECT * FROM
(SELECT new_road.nummer, SUM(new_road.length) FROM road_table.road GROUP BY new_road.nummer) new_table
FULL OUTER JOIN
(SELECT old_road.nummer, SUM(ST_length(old_road.geom)) FROM old_road_table.old_road GROUP BY old_road.nummer) old_table
ON new_road.nummer = old_road.nummer
But each time I run it I get missing FROM-clause entry. When I run each sub query individually they work. I have crosschecked with the documentation and it look OK to me, but clearly I am missing something here.
Consider using a FULL OUTER JOIN
This is not the exact output you requested but you don't need to display the nummer twice.
SELECT
COALESCE(new_road.nummer,old_road.nummer)nummer,
new_road.length,
old_road.length
FROM (
SELECT new_road.nummer
,SUM(new_road.length) length
FROM road_table.road
GROUP BY new_road.nummer
) new_road
FULL OUTER JOIN (
SELECT old_road.nummer
,SUM(ST_length(old_road.geom))length
FROM old_road_table.old_road
GROUP BY old_road.nummer
) old_road ON
old_road.nummer = new_road.nummer
Following query should solve the purpose. I didn't run it but the basic idea is result of a query on a table is another table on which you can query again.
Select * FROM (SELECT new_road.nummer, SUM(new_road.length) FROM road_table.road GROUP BY new_road.nummer) table1 JOIN (SELECT old_road.nummer, SUM(ST_length(old_road.geom)) FROM old_road_table.old_road GROUP BY old_road.nummer) table2 ON table1.new_road.nummer = table2.old_road.nummer
The tricky bit here is that you want to make sure you include all of the keys from both lists. My favorite way to do this kind of thing is:
select * from (
SELECT distinct new_road.nummer as nummer from road_table.road
union
SELECT distinct old_road.nummer as nummer FROM old_road_table.old_road
) allkeys
left join
(
SELECT new_road.nummer as nummer, SUM(new_road.length) as nlen
FROM road_table.road GROUP BY new_road.nummer
) n
on allkeys.nummer = n.nummer
left join
(
SELECT old_road.nummer as nummer, SUM(ST_length(old_road.geom)) as olen
FROM old_road_table.old_road GROUP BY old_road.nummer
) o
on allkeys.nummer = o.nummer
The first subquery builds a list of all keys, then you join to both of your queries from there. There's nothing wrong with an outer join, but I find this easier to manage if you have to include 3 or more tables. If you had to include another table it would just be one more union in allkeys and one more left join to that table.