How to select the first row from group by date [duplicate] - sql

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 8 years ago.
I am writing a program for amateur radio. Some callsigns will appear more than once in the data but the qsodate will be different. I only want the first occurrence of a call sign after a given date.
The query
select distinct
a.callsign,
a.SKCC_Number,
a.qsodate,
b.name,
a.SPC,
a.Band
from qso a, skccdata b
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
and b.callsign = a.callsign
order by a.QSODate
The problem:
Because contacts occur on different dates, I get all of the contacts - I have tried adding min(a.qsodate) to get only the first but then I run into all sorts of issues regarding grouping.
This query will be in a stored procedure, so creating temp tables or cursors will not be a problem.

You can use the ROW_NUMBER() to get the first row with the first date, like this:
WITH CTE
AS
(
select
a.callsign,
a.SKCC_Number,
a.qsodate,
b.name,
a.SPC,
a.Band,
ROW_NUMBER() OVER(PARTITION BY a.callsign ORDER BY a.QSODate) AS RN
from qso a,skccdata b
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
and b.callsign = a.callsign
)
SELECT *
FROM CTE
WHERE RN = 1;
ROW_NUMBER() OVER(PARTITION BY a.callsign ORDER BY a.QSODate) will give you a ranking number for each group of callsign ordered by QSODate, then the WHERE RN = 1 will eliminate all the rows except the first one which has the minimum QSODate.

Have you tried starting your query with SELECT TOP 1 ...(fields) Then you will only get one row. You can use TOP x .... for x number of rows, or TOP 50 PERCENT for the top half of the rows, etc. Then you can eliminate DISTINCT in this case
EDIT: misunderstood question. How about this?
select
a.callsign,
a.SKCC_Number,
a.qsodate,
(SELECT TOP 1 b.name FROM skccdata b WHERE b.callsign = a.callsign) as NAME,
a.SPC,
a.Band
from qso a
where SKCC_Number like '%[CTS]%'
AND QSODate > = '2014-08-01'
GROUP BY a.QSODate, a.callsign, a.SKCC_Number, a.SPC, a.Band
order by a.QSODate
and add callsign to your where clause to isolate callsigns

Related

Can some expert please explain what this query is doing? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 months ago.
This is the query in one of the Reports that I am trying to fix. What is being done here?
select
*
from
(
SELECT
[dbo].[RegistrationHistory].[AccountNumber],
[dbo].[RegistrationHistory].[LinkedBP],
[dbo].[RegistrationHistory].[SerialNumber],
[dbo].[RegistrationHistory].[StockCode],
[dbo].[RegistrationHistory].[RegistrationDate],
[dbo].[RegistrationHistory].[CoverExpiry],
[dbo].[RegistrationHistory].[LoggedDate] as 'CoverExpiryNew',
ROW_NUMBER() OVER(PARTITION BY [dbo].[RegistrationHistory].[SerialNumber]
ORDER BY
LoggedDate asc) AS seq,
[dbo].[Registration].[StockCode] as 'CurrentStockCode'
FROM
[SID_Repl].[dbo].[RegistrationHistory]
LEFT JOIN
[SID_Repl].[dbo].[Registration]
on [dbo].[RegistrationHistory].[SerialNumber] = [dbo].[Registration].[SerialNumber]
where
[dbo].[RegistrationHistory].[StockCode] in
(
'E4272HL1',
'E4272HL2',
'E4272HL3',
'E4272H3',
'OP45200HA',
'OP45200HM',
'EOP45200HA',
'EOP45200HM',
'4272HL1',
'4272HL2',
'4272HL3',
'4272H3'
)
)
as t
where
t.seq = 1
and CurrentStockCode in
(
'E4272HL1',
'E4272HL2',
'E4272HL3',
'E4272H3',
'OP45200HA',
'OP45200HM',
'EOP45200HA',
'EOP45200HM',
'4272HL1',
'4272HL2',
'4272HL3',
'4272H3'
)
I am looking for a simplified way of splitting this query into step by step, so that I can see where it is going wrong.
ROW_NUMBER in a subquery combined with a filter on it in the outer query is an idiom to filter out all but the first row in a group. So here
ROW_NUMBER() OVER(PARTITION BY [dbo].[RegistrationHistory].[SerialNumber])
Assigns the row with the lowest SerialNumber 1, the next lowest, 2, etc. Then later
where
t.seq = 1
removes all but the row with the lowest serial number from the result.

First two records of each group based on rank

I've created a part of the query that returns me the data like in the picture below:
Now, I am trying to select First 2 records (1 and 2) of each group (sap_id, wr_nbr) where "rn" has more than 1.
So, my final table should look like:
I've tried with TOP 2 WITH TIES but it returns me only two records of the whole table.
Any idea how to achieve this?
Thank you in advance.
SELECT b.*
FROM
(SELECT a.[sap_id]
,a.[wr_nbr]
,a.[start_date]
,a.[end_date]
,a.[vs_ind]
,a.[rn]
,COUNT(*) OVER (PARTITION BY a.sap_id, a.wr_nbr) as count_rows
FROM
(SELECT [sap_id]
,[ts_nbr]
,[wr_nbr]
,[check_line]
,[check_nbr]
,[start_date]
,[end_date]
,[vs_ind]
,[rn]
,[rank_ind]
FROM [dbo].[first_two]) a) b
WHERE b.count_rows > 1
AND b.rn <= 2
Final table looks like:

Oracle SQL: Get the max record for each duplicate ID in self join table [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
GROUP BY with MAX(DATE) [duplicate]
(6 answers)
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Closed 5 years ago.
It's been marked as a duplicate and seems to be explained a bit in the linked questions, but I'm still trying to get the separate DEBIT and CREDIT columns on the same row.
I've created a View and I am currently self joining it. I'm trying to get the max Header_ID for each date.
My SQL is currently:
SELECT DISTINCT
TAB1.id,
TAB1.glperiods_id,
MAX(TAB2.HEADER_ID),
TAB1.batch_date,
TAB1.debit,
TAB2.credit,
TAB1.descrip
FROM
IQMS.V_TEST_GLBATCH_GJ TAB1
LEFT OUTER JOIN
IQMS.V_TEST_GLBATCH_GJ TAB2
ON
TAB1.ID = TAB2.ID AND TAB1.BATCH_DATE = TAB2.BATCH_DATE AND TAB1.GLPERIODS_ID = TAB2.GLPERIODS_ID AND TAB1.DESCRIP = TAB2.DESCRIP AND TAB1.DEBIT <> TAB2.CREDIT
WHERE
TAB1.ACCT = '3648-00-0'
AND
TAB1.DESCRIP NOT LIKE '%INV%'
AND TAB1.DEBIT IS NOT NULL
GROUP BY
TAB1.id,
TAB1.glperiods_id,
TAB1.batch_date,
TAB1.debit,
TAB2.credit,
TAB1.descrip
ORDER BY TAB1.batch_date
And the output for this is (37 rows in total):
I'm joining the table onto itself to get DEBIT and CREDIT on the same line. How do I select only the rows with the max HEADER_ID per BATCH_DATE ?
Update
For #sagi
Those highlighted with the red box are the rows I want and the ones in blue would be the ones I'm filtering out.
Fixed mistake
I recently noticed I had joined my table onto itself without making sure TAB2 ACCT='3648-00-0'.
The corrected SQL is here:
SELECT DISTINCT
TAB1.id,
TAB1.glperiods_id,
Tab1.HEADER_ID,
TAB1.batch_date,
TAB1.debit,
TAB2.credit,
TAB1.descrip
FROM
IQMS.V_TEST_GLBATCH_GJ TAB1
LEFT OUTER JOIN
IQMS.V_TEST_GLBATCH_GJ TAB2
ON
TAB1.ID = TAB2.ID AND TAB1.BATCH_DATE = TAB2.BATCH_DATE AND TAB2.ACCT ='3648-00-0'AND TAB1.GLPERIODS_ID = TAB2.GLPERIODS_ID AND TAB1.DESCRIP = TAB2.DESCRIP AND TAB1.DEBIT <> TAB2.CREDIT
WHERE
TAB1.ACCT = '3648-00-0'
AND
TAB1.DESCRIP NOT LIKE '%INV%'
AND TAB1.DEBIT IS NOT NULL
ORDER BY TAB1.BATCH_DATE
Use window function like ROW_NUMBER() :
SELECT s.* FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.batch_id ORDER BY t.header_id DESC) as rnk
FROM YourTable t
WHERE t.ACCT = '3648-00-0'
AND t.DESCRIP NOT LIKE '%INV%'
AND t.DEBIT IS NOT NULL) s
WHERE s.rnk = 1
This is an analytic function that rank your record by the values provided in the OVER clause.
PARTITION - is the group
ORDER BY - Who's the first of this group (first gets 1, second 2, ETC)
It is a lot more efficient then joins(Your problem could have been solved in many ways) , and uses the table only once.

How do I get the top 10 results of a query?

I have a postgresql query like this:
with r as (
select
1 as reason_type_id,
rarreason as reason_id,
count(*) over() count_all
from
workorderlines
where
rarreason != 0
and finalinsdate >= '2012-12-01'
)
select
r.reason_id,
rt.desc,
count(r.reason_id) as num,
round((count(r.reason_id)::float / (select count(*) as total from r) * 100.0)::numeric, 2) as pct
from r
left outer join
rtreasons as rt
on
r.reason_id = rt.rtreason
and r.reason_type_id = rt.rtreasontype
group by
r.reason_id,
rt.desc
order by r.reason_id asc
This returns a table of results with 4 columns: the reason id, the description associated with that reason id, the number of entries having that reason id, and the percent of the total that number represents.
This table looks like this:
What I would like to do is only display the top 10 results based off the total number of entries having a reason id. However, whatever is leftover, I would like to compile into another row with a description called "Other". How would I do this?
with r2 as (
...everything before the select list...
dense_rank() over(order by pct) cause_rank
...the rest of your query...
)
select * from r2 where cause_rank < 11
union
select
NULL as reason_id,
'Other' as desc,
sum(r2.num) over() as num,
sum(r2.pct) over() as pct,
11 as cause_rank
from r2
where cause_rank >= 11
As said above Limit and for the skipping and getting the rest use offset... Try This Site
Not sure about Postgre but SELECT TOP 10... should do the trick if you sort correctly
However about the second part: You might use a Right Join for this. Join the TOP 10 Result with the whole table data and use only the records not appearing on the left side. If you calculate the sum of those you should get your "Sum of the rest" result.
I assume that vw_my_top_10 is the view showing you the top 10 records. vw_all_records shows all records (including the top 10).
Like this:
SELECT SUM(a_field)
FROM vw_my_top_10
RIGHT JOIN vw_all_records
ON (vw_my_top_10.Key = vw_all_records.Key)
WHERE vw_my_top_10.Key IS NULL

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.