Select records using max values for two columns - sql

I have a table laid out similar to this. I need to select distinct vendor number that has the highest year value and the highest month value
VENDORMONTHLY:
id Vendor Year month More stuff(More columns)
---|---------|-------|-------|---------|
1 | 93000 | 2017 | 3 | sadf |
2 | 93000 | 2017 | 2 | asdf |
5 | 93000 | 2017 | 1 | asdf |
3 | 93000 | 2016 | 12 | fff |
4 | 93000 | 2016 | 11 | ffff |
6 | 40000 | 2017 | 2 | fff |
7 | 40000 | 2017 | 1 | fff |
8 | 40000 | 2016 | 12 | fff |
The result would look like this. I can not for the life of me come up with a query that will give me what I need.
id Vendor Year month More stuff(More columns)
---|---------|-------|-------|---------|
1 | 93000 | 2017 | 3 | sadf |
6 | 40000 | 2017 | 2 | fff |
Any help would be greatly appreciated!

Quick answer, use NOT EXISTS to verify the same id has no other row with a later year or same year but later month:
select v1.*
from VENDORMONTHLY v1
where not exists (select 1 from VENDORMONTHLY v2
where v2.Vendor = v1.Vendor
and (v2.Year > v1.year
or (v2.Year = v1.Year and v2.Month > v1.Month)))
Will return both rows in case of a latest row tie.
Core ANSI SQL-99. Will run on any dbms!

If you are using some database (SQL Server, Oracle, Postgres etc) that support window functions, you can rank ( or row_number if you need only one row per year-month combination per vendor)
select *
from (
select v.*,
rank() over (
partition by vendor order by year desc,
month desc
) rn
from vendormonthly v
) v
where rn = 1;
In SQL server, same can be done in a better way using top with ties:
Select top 1 with ties *
From vendormonthly
Order by rank() over (
partition by vendor
order by year desc, month desc
)

Related

How to get last value for each user_id (postgreSQL)

Current ratio of user is his last inserted ratio in table "Ratio History"
user_id | year | month | ratio
For example if user with ID 1 has two rows
1 | 2019 | 2 | 10
1 | 2019 | 3 | 15
his ratio is 15.
there is some slice from develop table
user_id | year | month | ratio
1 | 2018 | 7 | 10
2 | 2018 | 8 | 20
3 | 2018 | 8 | 30
1 | 2019 | 1 | 40
2 | 2019 | 2 | 50
3 | 2018 | 10 | 60
2 | 2019 | 3 | 70
I need a query which will select grouped rows by user_id and their last ratio.
As a result of the request, the following entries should be selected
user_id | year | month | ratio
1 | 2019 | 1 | 40
2 | 2019 | 3 | 70
3 | 2018 | 10 | 60
I tried use this query
select rh1.user_id, ratio, rh1.year, rh1.month from ratio_history rh1
join (
select user_id, max(year) as maxYear, max(month) as maxMonth
from ratio_history group by user_id
) rh2 on rh1.user_id = rh2.user_id and rh1.year = rh2.maxYear and rh1.month = rh2.maxMonth
but i got only one row
Use distinct on:
select distinct on (user_id) rh.*
from ratio_history rh
order by user_id, year desc, month desc;
distinct on is a very convenient Postgres extension. It returns one row for the key values in parentheses? Which row, it is the first row based on the sort criteria. Note that the sort criteria need to start with the expressions in parentheses.

Is it possible to find the MAX value of a an already aggregated calculation inside the same view?

I have created a calculation in Microsoft SQL Server Management Studio that creates a running total per company and quarter, but at a monthly level and this part works fine.
So if company X sold 40 apples, hypothetically, in Jan and then 60 in Feb, then the running total in Feb would be 100 and if they sold 30 in March, then March's running total would be 130 and then in April it would reset for the new quarter.
What I need now is to find the MAX of these values, per month across all companies. So if Company 'X' sold 100 in Feb, but Company 'Y' sold 150, I want to return 150.
The calculation I use to get the rolling values per quarter calls on two functions to calculate the quarter each month falls into, as well as the relevant Fiscal Period / year ('GetQuarter' and 'GetFiscalPeriod' being the functions).
So my question is, is there any way to find the max at a different level of detail (in this case across ALL Companies) when the value you are looking at is already aggregated at Company level?
I'm told Stored Procedures would make this a lot simpler but the software I use can't call on Stored Procedures, only views and tables.
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
SUM(cm.Actual) OVER (
PARTITION BY (
SELECT dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2))),
cm.Measure,
cm.Company_Code,
(LEFT((SELECT dbo.GetFiscalPeriod(cm.[Date])), 4))
ORDER BY cm.[Date]
) AS Current_QTD_Actual
FROM mytable cm
Desired Output would look like the "MAX" field below:
+--------------+--------+-----+-----+----------+---------+-----+------------+
| Company_Code | Actual | QTD | MAX | Date | Measure | QTR | FiscalYear |
| AAA | 40 | 40 | 40 | 20180701 | Bananas | Q1 | 2019 |
| BBB | 35 | 35 | 40 | 20180701 | Bananas | Q1 | 2019 |
| AAA | 60 | 100 | 105 | 20180801 | Bananas | Q1 | 2019 |
| BBB | 70 | 105 | 105 | 20180801 | Bananas | Q1 | 2019 |
| AAA | 30 | 130 | 150 | 20180901 | Bananas | Q1 | 2019 |
| BBB | 45 | 150 | 150 | 20180901 | Bananas | Q1 | 2019 |
| AAA | 25 | 25 | 45 | 20181001 | Bananas | Q2 | 2019 |
| BBB | 45 | 45 | 45 | 20181001 | Bananas | Q2 | 2019 |
| AAA | 30 | 55 | 85 | 20181101 | Bananas | Q2 | 2019 |
| BBB | 40 | 85 | 85 | 20181101 | Bananas | Q2 | 2019 |
+--------------+--------+-----+-----+----------+---------+-----+------------+
As the QTD calculation I currently have is already a rolled up SUM, simply wrapping this in a MAX function does not work for obvious reasons.
I tried creating a temporary table within the calculation using examples I've seen online, which I then call back into the original table and max that value but I think my syntax is wrong because it never comes out right (I'm still a novice so temporary table syntaxes still elude me quite a bit).
You seem to want the cumulative sum of the maximum values for each month. If this is correct, you can use two levels of window functions:
select measure, fiscalyear, qtr, date, actual,
sum(actual) over (partition by measure fiscalyear, qtr order by date) as running_actual
from (select t.*,
row_number() over (partition by measure, date order by actual desc) as seqnum
from t
) t
where seqnum = 1;
You can't stack aggregates together on the same SELECT with the only exception of appying a windowed aggregate (with an OVER clause) over a regular aggregate. For example:
SELECT
T.GroupedColumn,
RowsByGroup = COUNT(*), -- Regular aggregate
SumOfAllRows = SUM(COUNT(*)) OVER () -- Windowed aggregate of a regular one
FROM
MyTable AS T
GROUP BY
T.GroupedColumn
You can however apply them if you warp the former on a subquery or CTE, which also make the query more readable IMO. I believe you are looking for something like the following:
;WITH RunningSumPerQuarterPerCompany AS
(
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
Current_QTD_Actual = SUM(cm.Actual) OVER (
PARTITION BY
dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
cm.Measure,
cm.Company_Code,
LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
ORDER BY
cm.[Date]),
-- Add additional PARTITION BY columns for the GROUP BY later on
Quarter = dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
FiscalPeriod = LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
FROM
mytable cm
),
MaxRunningSumPerQuarter AS
(
SELECT
R.Quarter,
R.FiscalPeriod,
Max_Current_QTD_Actual = MAX(R.Current_QTD_Actual)
FROM
RunningSumPerQuarterPerCompany AS R
GROUP BY
R.Quarter,
R.FiscalPeriod -- GROUP BY whichever dimension you need
)
SELECT
R.*,
M.Max_Current_QTD_Actual
FROM
RunningSumPerQuarterPerCompany AS R
LEFT JOIN MaxRunningSumPerQuarter AS M ON
R.Quarter = M.Quarter AND
R.FiscalPeriod = M.FiscalPeriod -- Join by the GROUP BY columns to display the MAX

SQL query to select today and previous day's price

I have historic stock price data that looks like the below. I want to generate a new table that has one row for each ticker with the most recent day's price and its previous day's price. What would be the best way to do this? My database is Postgres.
+---------+------------+------------+
| ticker | price | date |
+---------+------------+------------|
| AAPL | 6 | 10-23-2015 |
| AAPL | 5 | 10-22-2015 |
| AAPL | 4 | 10-21-2015 |
| AXP | 5 | 10-23-2015 |
| AXP | 3 | 10-22-2015 |
| AXP | 5 | 10-21-2015 |
+------- +-------------+------------+
You can do something like this:
with ranking as (
select ticker, price, dt,
rank() over (partition by ticker order by dt desc) as rank
from stocks
)
select * from ranking where rank in (1,2);
Example: http://sqlfiddle.com/#!15/e45ea/3
Results for your example will look like this:
| ticker | price | dt | rank |
|--------|-------|---------------------------|------|
| AAPL | 6 | October, 23 2015 00:00:00 | 1 |
| AAPL | 5 | October, 22 2015 00:00:00 | 2 |
| AXP | 5 | October, 23 2015 00:00:00 | 1 |
| AXP | 3 | October, 22 2015 00:00:00 | 2 |
If your table is large and have performance issues, use a where to restrict the data to last 30 days or so.
Best bet is to use a window function with an aggregated case statement which is used to create a pivot on the data.
You can see more on window functions here: http://www.postgresql.org/docs/current/static/tutorial-window.html
Below is a pseudo code version of where you may need to head to answer your question (sorry I couldn't validate it due to not have a postgres database setup).
Select
ticker,
SUM(CASE WHEN rank = 1 THEN price ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price ELSE 0 END) yesterday
FROM (
SELECT
ticker,
price,
date,
rank() OVER (PARTITION BY ticker ORDER BY date DESC) as rank
FROM your_table) p
WHERE rank in (1,2)
GROUP BY ticker.
Edit - Updated the case statement with an 'else'

Using ORDER BY and getting most recent version of records

I have a database structured in the following way
ID | DATE | col_0 |
--------------------------
1 | 2014 | A_Ver2_data0 |
2 | 2014 | A_Ver2_data1 |
3 | 2014 | A_Ver2_data2 |
4 | 2013 | A_Ver1_data0 |
5 | 2013 | A_Ver1_data1 |
6 | 2012 | A_Ver0_data0 |
7 | 2012 | A_Ver0_data1 |
8 | 2013 | B_Ver3_data0 |
9 | 2013 | B_Ver3_data1 |
10 | 2013 | B_Ver3_data2 |
11 | 2010 | B_Ver2_data0 |
12 | 2010 | B_Ver2_data1 |
13 | 2009 | B_Ver1_data0 |
14 | 2007 | B_Ver0_data0 |
I need to write a query that will return the most recent version of the A_ and B_ prefixed data sets. So I was thinking something like SELECT * FROM db.table ORDER BY DATE DESC But I want to filter out expired versions. desired output should be:
ID | DATE | col_0 |
--------------------------
1 | 2014 | A_Ver2_data0 |
2 | 2014 | A_Ver2_data1 |
3 | 2014 | A_Ver2_data2 |
8 | 2013 | B_Ver3_data0 |
9 | 2013 | B_Ver3_data1 |
10 | 2013 | B_Ver3_data2 |
Any Ideas?
I think this does what you want. It parses the column to get the first and last parts and then finds the maximum "DATE" for each. It returns the row that matches the date:
select id, "DATE", COL_A
from (select v.*,
max("DATE") over (partition by substr(col_A, 1, 1),
substr(col_A, 8)
) as maxdate
from versiones v
) v
where "DATE" = maxdate;
The SQL Fiddle is here.
I am not sure but i think this would work : "HAVING date >= MAX(date)-1"
max(date)-1 will return 2014-1 = 2013 , which will eventually filter out the results based on date >= 2013 .
But this would list all the 2013,2014 entries ..
You could use an analytic function to get the maximum version, and then select the corresponding records, as below:
SELECT
*
FROM
db.table
WHERE
Col_0 IN
(
Select Distinct
Max(Col_0) Over (Partition By Replace(Col_0, Replace(Regexp_Substr(Col_0, '_[^,]+_'), '_', ''), '')
Order By REPLACE(Regexp_Substr(Col_0, '_[^,]+_'), '_', '') DESC) AS Col_0
FROM
db.table
);
Also, please note that you would not be able to name a column as DATE, because DATE is a reserved word.
Here is my answer:
select * from
versiones
where SUBSTR(COL_A,0,6)
in(
select version from
(
select SUBSTR(COL_A,0,1) letra,max(SUBSTR(COL_A,6,1)) maximo,
SUBSTR(COL_A,0,1)||'_Ver'||max(SUBSTR(COL_A,6,1)) version
from versiones
group by SUBSTR(COL_A,0,1)
)
cs
)
Sqlfiddle: http://sqlfiddle.com/#!4/84a8f/13

sql for finding most recent record in a group

i have a table like such
table_id | series_id | revision_id | year
------------------------------------------
1 | 1 | 1 | 2010
2 | 2 | 1 | 2009
3 | 2 | 2 | 2008
4 | 2 | 2 | 2009
5 | 2 | 3 | 2010
6 | 2 | 3 | 2008
7 | 3 | 2 | 2007
8 | 3 | 3 | 2010
9 | 3 | 3 | 2010
I need to find the table_id for the max(year) when grouped by revision_id when series = X
in postgresql.
eg : when x =2 i expect this result
table_id | series_id | revision_id | year
------------------------------------------
2 | 2 | 1 | 2009
4 | 2 | 2 | 2009
5 | 2 | 3 | 2010
this doesn't work
SELECT * from table
WHERE series_id = 2
AND table_id IN (
SELECT table_id
FROM table
WHERE series_id = 2
GROUP by revision
ORDER BY year DESC
)
I cannot figure out a way to do this in postgresql since I need to return a field i am not grouping by
here are some similar problems in other SQL flavors.
MySQL
SQL Query, Selecting 5 most recent in each group
SQL SERVER
SQL Server - How to select the most recent record per user?
Query:
SELECT table_id, series_id, revision_id, year
FROM tableName t INNER JOIN
(SELECT revision_id, max(year) AS year
FROM tableName
WHERE series_id = 2
GROUP BY revision_id) s
USING (revision_id, year)
WHERE series_id = 2;
Result:
table_id | series_id | revision_id | year
----------+-----------+-------------+------
2 | 2 | 1 | 2009
4 | 2 | 2 | 2009
5 | 2 | 3 | 2010
(3 rows)
Hmm...
Try this:
SELECT *
FROM table as a
WHERE series_id = ?
AND year = (SELECT MAX(year)
FROM table as b
WHERE a.series_id = b.series_id
AND a.revision_id = b.revision_id)