Rewrite SQL and use of group by - sql

I have written below sql for one of the requirement and is fetching my results. But, I am wondering if there is any better way of writing this query rather than using alias table as A.
SELECT A.*,B.OPRDEFNDESC FROM
( select OPRID_ENTERED_BY ,COUNT(*)
from ps_req_hdr
where entered_dt > '01-JUL-2012'
GROUP BY OPRID_ENTERED_BY
ORDER BY COUNT(*) DESC) A, PSOPRDEFN B
WHERE A.OPRID_ENTERED_BY=B.OPRID

You may be able to use a simple INNER JOIN to do the same thing...
SELECT A.OPRID_ENTERED_BY, COUNT(*), B.OPRDEFNDESC
FROM ps_req_hdr A
JOIN PSOPRDEFN B ON A.OPRID_ENTERED_BY = B.OPRID
WHERE A.entered_dt > '01-JUL-2012'
GROUP BY A.OPRID_ENTERED_BY, B.OPRDEFNDESC
ORDER BY COUNT(*) DESC
NOTE
As per the comments below, the COUNT(*) result for this query will NOT include records that don't have corresponding matches in table B, and it will inflate for non-unique matches in table B. What this means is: if B.OPRID is not a unique field or if A.OPRID_ENTERED_BY is not a foreign key for B.OPRID then this answer will not yield the same results as the original query.

Related

SQL for a query with several input IDs, how to get the first 5 results for each ID

I have a query that accepts several IDs as filters in a WHERE clause.
it's formatted something like this:
SELECT a.ID, a.VOLUMETRY, b.ANNOY_DISTANCE
FROM PRODUCT a
JOIN RECOMMENDATIONS b on a.ID = b.ID
WHERE a.ID in ('0001','0002', ...., '0099')
ORDER BY b.ANNOY_DISTANCE
Now this query can return several thousand results for each ID, but I only need the first 5 for each ID after ordering them by the ANNOY_DISTANCE column. The rest aren't needed and would only slow post-processing of the data.
How can I change this so that the query result only gives the first 5 rows for each ID?
Use window functions, which you can filter using a QUALIFY clause:
SELECT p.ID, p.VOLUMETRY, r.ANNOY_DISTANCE
FROM PRODUCT p JOIN
RECOMMENDATIONS r
ON p.ID = r.ID
WHERE a.ID in ('0001','0002', ...., '0099')
QUALIFY ROW_NUMBER() OVER (PARTITION BY p.ID ORDER BY r.ANNOY_DISTANCE) <= 5
ORDER BY r.ANNOY_DISTANCE;
Notice that I changed your table aliases to be meaningful abbreviations for the table names. That is a best practice.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).

SQL Query - Return rows from table A matching any row in table B

I am trying to write a query which uses a list of unique LOCATIONs obtained from a first query, as criteria to query rows from a second table.
For example:
SELECT
TABLE_A."LOCATION",
MIN(TABLE_A.WORKDATE) AS MIN_WORK_DATE
FROM
DB.TABLE_A
WHERE
MIN_WORK_DATE > '201201'
Then somehow:
SELECT
TABLE_B."LOCATION",
(other fields of interest)
FROM
DB.TABLE_B
WHERE
TABLE_B."LOCATION" (is contained in the result above)
Thanks in advance for any help!
You can do it with join:
SELECT
b.location,
(other fields of interest)
FROM
tableB b
JOIN
(SELECT a.location, min(a.workdate) as min_workdate
FROM tableA a
GROUP BY a.location) c
ON b.location = c.location
WHERE c.min_workdate > '201201'
Fiddle
You can match any table's column with any another table's column:
EX:
SELECT
TABLE_A."LOCATION",
TABLE_A.WORKDATE AS WORK_DATE
FROM
DB.TABLE_A, DB.TABLE_B
WHERE
TABLE_A.SOME_COLUMN > 'some_value_given' and TABLE_A.SOME_COLUMN=TABLE_B.SOME_COULMN

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)