Nested SQL Server Query Max Date - sql

Ladies and Gents,
I need to write a query that grabs data from a view, but I'm not sure how to go about this. The issue is there is really no key and there are two fields I'm concerned with that will control what rows I need to retrieve.
The view looks something like this:
Category columna columnb uploaddate
-----------------------------------------------------
a value value 1/30/2013 04:04:04:000
a value value 1/29/2013 04:04:04:000
b value value 1/28/2013 01:23:04:000
b value value 1/30/2013 04:04:04:000
b value value 1/30/2013 04:04:04:000
c value value 1/30/2013 01:01:01:000
c value value 1/30/2013 01:01:01:000
What I need to retrieve is all rows for each unique category and the newest uploaddate. So in the example above I would get 1 row for category a which would have the newest uploaddate. Category b would have 2 rows which have the 1/30/2013 date. Category c would have two rows also.
I also need to just compare the date of upload, not the time. As the loading can take a couple seconds. I was trying to use max date but it would only grab the time to the second.
Any guidance/thoughts would be great.
Thanks!
EDIT:
Here is what I threw together so far and I think it's close but it's not working yet and I doubt this is the most efficient way to do this.
select
*
from
VIEW c
INNER JOIN
(
SELECT
Category,
MAX(CONVERT(DateTime, Convert(VarChar, UploadDate, 101))) as maxuploaddate
FROM
View
GROUP BY
Category,
UploadDate
) temp ON temp.Category = c.Category AND CONVERT(VarChar, UploadDate, 101) = temp.maxuploaddate
The problem lies in the nested selected statement as it's still grabbing all combinations of Category and Upload date. Is there a way to do a distinct on the Category and UploadDate, just getting the newest combination?
Thanks Again

Your query is close, you have a mistake in the group by. I'd also get rid of the date conversions; date comparisons work fine.
select
*
from
VIEW c
INNER JOIN
(
SELECT
Category,
MAX(UploadDate) as maxuploaddate
FROM
View
GROUP BY
Category
) temp ON temp.Category = c.Category AND UploadDate = temp.maxuploaddate

If you want to do this to the nearest date, you need to convert to a date first. In SQL Server syntax:
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by cast(uploaddate as date) desc) as seqnum
from view
) v
where seqnum = 1
In Oracle syntax:
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by to_char(uploaddate, 'YYYY-MM-DD') desc) as seqnum
from view
) v
where seqnum = 1
Because you want ties, these use rank() instead of row_number().

In Oracle you can use Rank() to achieve this. Rank() creates a duplicate number if the same criteria are met.
Edit: And you can use Trunc() to "trim" the time from the uploaddate.
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by trunc(uploaddate) desc) rank
from view)
where rank = 1
Also Dense_Rank() exists, which won't create duplicate numbers. So this is not applicable here. See this question for more info on the differences.

Related

SQL MAX Query Multiple Columns

Trying to populate multiple columns based on one MAX value but the grouping returns multiple results. Is there a way I can tell SQL to only pull the values based on the MAX that I want?
Query:
Select a.ID, (MAX)a.PayDate, a.AgencyName
From a
Group By a.ID, a.AgencyName
What I need is the latest paydate per ID, then I want additional information in reference to that entry such as AgencyName (& more columns I want to add) but because of the grouping - SQL returns the latest paydate for each of the AgencyNames that the person has had - but I only want the AgencyName associated with the record that is Max Paydate for that ID. I know it's the grouping that does this but I am unsure how to proceed - any help greatly appreciated.
Thanks
Select a.ID,a.PayDate, a.AgencyName
From a
where exists (select 1 from a a1 where a1.id = a.id
having a.payDate = max(a1.paydate)
Group By a.ID,
I would just use a correlated subquery like this:
select a.*
from a
where a.paydate = (select max(a2.paydate) from a a2 where a2.id = a.id);
Note that this could return multiple rows if an id has duplicates on the most recent paydate. An alternative that guarantees one row is row_number():
select a.*
from (select a.*,
row_number() over (partition by id order by paydate desc) as seqnum
from a
) a
where seqnum = 1;

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

How to select multiple rows in SQL Server while filling one column with the first value

Each of my rows have a date. I want the database to keep the good date. But I am in a situation where I want only the first date. But I still want all the other rows. So I would like to fill the date column with all the same date in my result.
For an example (Because I don't think I expressed myself well)
I have this:
name value date
a 10 5/13
b 14 2/13
c 20 1/13
a 11 7/13
a 5 8/13
b 8 9/13
I want it to become like this in the result:
name value date
a 26 5/13
b 22 5/13
c 20 5/13
I searched for this information but I only find the way to select the first row.
for now I'm doing
SELECT name, SUM(value), date FROM table
ORDER BY name
And I'm kind of clueless for what to do next.
Thanks :)
Databases don't have a concept of "first". Here is an attempt, but no guarantees unless you have a way of ordering to determine first:
select name, sum(value), const.date
from table cross join
(select top 1 date from table) const
group by name, const.date
If you only want to do this for a query, to provide this aggregated data for some specific client requirement, then #freshPrince's answer is appropriate. But if want to actually modify the data in the table itself, and prevent the issue from arising again, then you need to change the schema.
Create Table newTable(
name varChar(30) not null,
date datetime not null,
value decimal(10,2) not null default(0),
primary key (name, date) )
Insert newTable (name, date, value)
Select name, SUM(value), Min(date)
FROM currentTable
Group By Name
and delete the old table... then rename the new table to whatever...
You will also have to modify the process used to insert new rows so that instread of always inserting a new row, it updates the existing row for a specified name and date if it already exists...
Your question is slightly confusing since your desired result is showing a date that does not exists with either b or c but if that is the result that you want want you could use something similar to the following:
select name, sum(value) value, d.date
from yt
cross join
(
select min(date) date
from yt
where name = (select min(name)
from yt)
) d
group by name, d.date;
See SQL Fiddle with Demo
But it seems like you actually would want the min(date) for each name:
select name, sum(value) value, min(date)
from yt
group by name;
See SQL Fiddle with Demo.
If the order of the date should be the determined by the name then you could use:
select t.name, sum(value) value, d.date
from yt t
cross join
(
select top 1 name, date
from yt
order by name, date
) d
group by t.name, d.date;
See Demo