Second Max Date by ID - sql

I have a table that looks like this
ID Date
123 2/1/2017
123 4/1/2017
123 6/5/2017
123 7/8/2017
456 3/8/2017
456 3/9/2017
456 3/10/2017
Dates are in American format.
I want to pull a list of IDs, with the SECOND max date. So i would like the results to be:
ID Date
123 6/5/2017
456 3/9/2017
I do not know how to do this. I have googled, but with no avail. Any help is greatly appreciated.
I have tried this, but its not working
select *
from (
select ROW_NUMBER() over (partition by ID order by DATE desc ) as 'rowNum', ID, DATE
from table1 ) withRowNum
where rowNum = 2

For SQL Server:
If your dates are varchar, and your current date format is not mdy then you could use set dateformat e.g.
set dateformat mdy;
select *
from (
select ROW_NUMBER() over (
partition by ID
order by convert(date,DATE) desc
) as 'rowNum', ID, DATE
from table1 ) withRowNum
where rowNum = 2

Related

How to get the last day of a month on different items? - SQL

Supposed I have some sample data in table_name_a as below:
code val_a date
-------------------------
1 00001 500 20191101
2 00001 1000 20191130
3 00002 200 20191101
4 00002 400 20191130
5 00003 200 20191101
6 00003 600 20191130
There are some val_a of code between 20191101 and 20191130, I would like to get the last day value of the month on every code, and my SQL query is as below(need to match Hive and Impla):
SELECT code, max(date) AS date, val_a
FROM table_a
WHERE date BETWEEN '20090601'
AND '20090630'
GROUP BY code, val_a
But above query was wrong(the val_a of code is not the last day of the month),My expected output as below:
code val_a date
--------------------------
1 00001 1000 20191130
2 00002 400 20191130
3 00003 600 20191130
Thanks so much for any advice.
We could try using a ROW_NUMBER solution here:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date DESC) rn
FROM table_a
-- WHERE date BETWEEN '20090601' AND '20090630'
-- your current WHERE clause is dubious
)
SELECT code, date, val_a
FROM cte
WHERE rn = 1;
Note that it is not best practice to be storing dates as text. That being said, given that you are storing your dates in an ISO format with fixed width, we can still work with these dates in this case. Also, your current WHERE clause does not make sense, so I commented it out.
You can try the following code. In the subquery, you get the max date along with the code. The WHERE IN clause is used as filter to your data.
SELECT code, val_a, date
FROM table_a
WHERE (code, date) IN
(SELECT code, MAX(date)
FROM table_a
GROUP BY code)
In more general way you can use correlected subquery :
select a.*
from table_a a
where a.date = (select max(a1.date) from table_a a1 where a1.code = a.code);
Use row_number:
with your_data as (
select stack(6,
'00001',500 ,'20191101',
'00001',1000,'20191130',
'00002',200 ,'20191101',
'00002',400 ,'20191130',
'00003',200 ,'20191101',
'00003',600 ,'20191130' ) as (code,val_a,date)
)
select code,val_a,date
from
(
select code,val_a,date,
--partition by code and months, max date first
row_number() over(partition by code, substr(date, 1,6) order by date desc) rn
from your_data d
)s where rn=1
;
Result:
OK
code val_a date
00001 1000 20191130
00002 400 20191130
00003 600 20191130
Time taken: 54.641 seconds, Fetched: 3 row(s)
If you need the data of only last day of the month then you can use LAST_DAY and TRUNC function on the date in WHERE clause as follows:
SELECT
CODE,
DATE AS "DATE", -- removed MAX
VAL_A
FROM
TABLE_A
WHERE
DATE BETWEEN '20090601' AND '20090630'
AND TRUNC(LAST_DAY(MAX(DATE))) = TRUNC(DATE); -- added this condition
-- removed the GROUP BY clause
Cheers!!

Combining multiple scalar bigquery queries into a single query to generate one table

I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |
Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.

group a set of records by date in teradata

Currently I have data in a table as shown below:
date id value
1-Jan-13 1 100
2-Jan-13 1 100
3-Jan-13 1 100
4-Jan-13 1 200
5-Jan-13 1 200
6-Jan-13 1 100
7-Jan-13 1 100
I am trying to group the records based on the id and val and version records with startdate and end date .
Desired output:
start date end date id value
1-Jan-13 3-Jan-13 1 100
4-Jan-13 5-Jan-13 1 200
6-Jan-13 7-Jan-13 1 100
I'm not an expert in Teradata but you most likely, since windowing functions are supported (specifically ROW_NUMBER), be able to do something like this
SELECT MIN(date) start_date, MAX(date) end_date, id, value
FROM
(
SELECT date, id, value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY date) island
FROM table1
) q
GROUP BY id, value, island
ORDER BY start_date, end_date
Sample output:
| START_DATE | END_DATE | ID | VALUE |
|------------|------------|----|-------|
| 2013-01-01 | 2013-01-03 | 1 | 100 |
| 2013-01-04 | 2013-01-05 | 1 | 200 |
| 2013-01-06 | 2013-01-07 | 1 | 100 |
Here is SQLFiddle demo (It's a SQL Server demo, but should work as expected in Teradata)
The ROW_NUMBER version can be further simplified: modified SQL Fiddle
For Teradata:
SELECT
id,val,MIN(dt),MAX(dt)
FROM
(
SELECT
id,val,dt,
dt - ROW_NUMBER() OVER (PARTITION BY id ORDER BY val, dt) AS dummy
FROM table1
) AS dt
GROUP BY 1,2,dummy
And there are some hardly known functions in TD13.10 for processing time series data:
WITH cte(id,val,pd) AS
(
SELECT id, val, PERIOD(dt, dt+1) AS pd
FROM table1
)
SELECT
id, val,
BEGIN(pd) AS start_dt,
LAST(pd) AS end_dt
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.id,cte.val)
,cte.pd)
RETURNS (id INTEGER
,val INTEGER
,pd PERIOD(DATE)
,Nrm_Count INTEGER)
HASH BY id
LOCAL ORDER BY id, val, pd
) A
ORDER BY start_dt, end_dt

SQL Statement to Get The Minimum DateTime from 2 String FIelds

I've got a bit of a messy table on my hands that has two fields, a date field and a time field that are both strings. What I need to do is get the minimum date from those fields, or just the record itself if there is no date/time attached to it. Here's some sample data:
ID First Last Date Time
1 Joe Smith 2013-09-06 04:00
1 Joe Smith 2013-09-06 02:00
2 Jack Jones
3 John Jack 2013-09-05 06:00
3 John Jack 2013-09-15 15:00
What I would want from a query is to get the following:
ID First Last Date Time
1 Joe Smith 2013-09-06 02:00
2 Jack Jones
3 John Jack 2013-09-05 06:00
The min date/time for ID 1 and 3 and then just ID 2 back because he doesn't have a date/time. I cam up with the following query that gives me ID's 1 and 3 exactly as I would want them:
SELECT *
FROM test as t
where
cast(t.date + ' ' + t.time as Datetime ) = (select top 1 cast(p.date + ' ' + p.time as Datetime ) as dtime from test as p where t.ID = p.ID order by dtime)
But it doesn't return row number 2 at all. I imagine there's a better way to go about doing this. Any ideas?
You can do this with row_number():
select ID, First, Last, Date, Time
from (select t.*,
row_number() over (partition by id order by date, time) as seqnum
from test t
) t
where seqnum = 1;
Although storing dates and times as strings is not recommended, you at least do it right. The values use the ISO standard format (or close enough) so alphabetic sorting is the same as date/time sorting.
Assuming [Date] and [Time] are the types I think they are, and not strings:
SELECT ID,[First],[Last],[Date],[Time] FROM
(
SELECT ID,[First],[Last],[Date],[Time],rn = ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY [Date], [Time])
FROM dbo.test
) AS t WHERE rn = 1;
Example:
DECLARE #x TABLE
(
ID INT,
[First] VARCHAR(32),
[Last] VARCHAR(32),
[Date] DATE,
[Time] TIME(0)
);
INSERT #x VALUES
(1,'Joe ','Smith','2013-09-06','04:00'),
(1,'Joe ','Smith','2013-09-06','02:00'),
(2,'Jack','Jones',NULL, NULL ),
(3,'John','Jack ','2013-09-05','06:00'),
(3,'John','Jack ','2013-09-15','15:00');
SELECT ID,[First],[Last],[Date],[Time] FROM
(
SELECT ID, [First],[Last],[Date],[Time],rn = ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY [Date], [Time])
FROM #x
) AS x WHERE rn = 1;
Results:
ID First Last Date Time
-- ----- ----- ---------- --------
1 Joe Smith 2013-09-06 02:00:00
2 Jack Jones NULL NULL
3 John Jack 2013-09-05 06:00:00
Try:
SELECT
*
FROM
test as t
WHERE
CAST(t.date + ' ' + t.time as Datetime) =
(
select top 1 cast(p.date + ' ' + p.time as Datetime ) as dtime
from test as p
where t.ID = p.ID
order by dtime
)
OR (t.date='' AND t.time='')

Select rows with nearest date

I have a SQL statement.
SELECT ID, LOCATION, CODE,MAX(DATE) FROM TABLE1 WHERE
DATE <= CONVERT(DATETIME,'11-11-2012') AND
EXISTS(SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE =#TEMP_CODE.CODE)
AND ID IN (14,279)
GROUP BY ID, LOCATION, CODE,(DATE)
I need rows with the nearest date to the 11-11-2012, but the table returns all the values. What am I doing wrong. Thanks
ID LOCATION CODE DATE
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-08 00:00:00.000
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000
14 CAR STREET,UDUPI 234 2012-08-14 00:00:00.000
279 MADHUGIRI 234 2012-08-08 00:00:00.000
279 MADHUGIRI 234 2012-08-11 00:00:00.000
I need to select the row with the max date. The required result is
ID LOCATION CODE DATE
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000
279 MADHUGIRI 234 2012-08-11 00:00:00.000
Remove (DATE) from the GROUP BY Clause.
Change
SELECT ID, LOCATION, CODE,MAX(DATE) FROM TABLE1 WHERE
DATE <= CONVERT(DATETIME,'11-11-2012') AND
EXISTS(SELECT * FROM #TEMP_CODE WHERETABLE1.CODE =#TEMP_CODE.CODE)
AND ID IN ('KBL01005','KBL05020')
GROUP BY ID, LOCATION, CODE,(DATE)
to
SELECT ID, LOCATION, CODE,MAX(DATE) FROM TABLE1 WHERE
DATE <= CONVERT(DATETIME,'11-11-2012') AND
EXISTS(SELECT * FROM #TEMP_CODE WHERETABLE1.CODE =#TEMP_CODE.CODE)
AND ID IN ('KBL01005','KBL05020')
GROUP BY ID, LOCATION, CODE
Try using unambigious date format
SELECT ID, LOCATION, CODE,MAX(DATE) FROM TABLE1
WHERE DATE <= '20121111' AND
EXISTS(SELECT * FROM #TEMP_CODE WHERETABLE1.CODE =#TEMP_CODE.CODE)
AND ID IN ('KBL01005','KBL05020')
GROUP BY ID, LOCATION, CODE
Also see why you need to use unambigious date formats http://beyondrelational.com/modules/2/blogs/70/posts/10898/understanding-datetime-column-part-ii.aspx
No need to use Group by (Date). Try this:
SELECT ID, LOCATION, CODE,MAX(DATE) FROM TABLE1
WHERE DATE <= CONVERT(DATETIME,'11-11-2012') AND
EXISTS(SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE =#TEMP_CODE.CODE)
AND ID IN (14,279)
GROUP BY ID, LOCATION, CODE