I have a SQL Server table like this:
http://sqlfiddle.com/#!2/a15dd/1
What I want to do is display the latest year and month where trades were made.
In this case, i want to display
ID: 1
Year: 2013
Month: 11
Trades: 2
I've tried to use:
select
id, MAX(year), MAX(month)
from
ExampleTable
where
trades > 0
group by
id
Do I have to concatenate the columns?
You can use ROW_NUMBER to assign each row a number based on it's relative position (as defined by your order by):
SELECT ID,
Year,
Month,
Trades,
RowNum = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Year DESC, Month DESC)
FROM ExampleTable
WHERE Trades > 0;
With your example data this gives:
ID YEAR MONTH TRADES RowNum
1 2013 11 2 1
1 2013 4 42 2
Then you can just limit this to where RowNum is 1:
SELECT ID, Year, Month, Trades
FROM ( SELECT ID,
Year,
Month,
Trades,
RowNum = ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Year DESC, Month DESC)
FROM ExampleTable
WHERE Trades > 0
) AS t
WHERE t.RowNum = 1;
If, as in your example, Year and Month are stored as VARCHAR you will need to convert to an INT before ordering:
RowNum = ROW_NUMBER() OVER(PARTITION BY ID
ORDER BY
CAST(Year AS INT) DESC,
CAST(Month AS INT) DESC)
Example on SQL Fiddle
If you are only bothered about records where ID is 1, you can do it simply using TOP:
SELECT TOP 1 ID, Year, Month, Trades
FROM ExampleTable
WHERE ID = 1
AND Trades > 0
ORDER BY CAST(Year AS INT) DESC, CAST(MONTH AS INT) DESC;
Why store "year" and "month" as separate columns? In any case, the basic logic is to combine the two values to get the latest one. This is awkward because you are storing numbers as strings and the months are not zero-padded. But it is not so hard:
select id,
max(year + right('00' + month, 2))
from ExampleTable
group by id;
To separate them out:
select id,
left(max(year + right('00' + month, 2)), 4) as year,
right(max(year + right('00' + month, 2)), 2) as month
from ExampleTable
group by id;
Here is a SQL Fiddle. Note when you use SQL Fiddle that you should set the database to the correct database.
I'm not sure whether I get your question right, but shouldn't the following work?
SELECT TOP 1 year + '-' + month AS Last, trades
FROM ExampleTable
WHERE CAST(trades AS INTEGER) > 0
ORDER BY CAST(year AS integer) DESC, CAST(month AS integer) DESC
SQLFiddle
Try this
SELECT TOP 1 ID, [year], trades,
MAX(Convert(INT,[month])) OVER(PARTITION BY [year]) AS [Month]
FROM ExampleTable
WHERE trades > 0
Related
Code to get the test data:
create table SalesCalls
(
EmpId INT NOT NULL,
EmpName nvarchar(20),
month INT,
Year INT,
CallsMade INT
)
GO
Insert into SalesCalls values
(1,'ABC',12,2018,10),
(1,'ABC',1,2019,15),
(1,'ABC',2,2019,20),
(2,'DEF',12,2018,12),
(2,'DEF',1,2019,14),
(2,'DEF',2,2019,26)
GO
The objective is to compare the current month sales of an Employee with the previous month sales of that Employee and find out the percentage change in it. Achieved that using the below query:
With SalesCTE as
(
select EmpId,EmpName,
Month As CurrentMonth,
Year as CurrentMonthYear,
Case When month = 1 then 12 Else (Month-1) End AS PrevMonth,
Case when month = 1 then (Year - 1) Else Year End As PrevMonthYear,
CallsMade
from SalesCalls
)
select
S1.EmpId, S1.EmpName, S1.CurrentMonth, S1.CurrentMonthYear, S1.CallsMade as CurrentMonthCalls,
S2.CurrentMonth as PrevMont,
S2.CurrentMonthYear as PrevMonthYear,
S2.CallsMade as PrevMonthCalls,
( CONVERT(numeric(5,2),S1.CallsMade) / S2.CallsMade) * 100 As PercentageChange
from SalesCTE S1
JOIN SalesCTE S2 ON S1.EmpId = S2.EmpId
AND S1.PrevMonth = S2.CurrentMonth
AND S1.PrevMonthYear = S2.CurrentMonthYear
ORDER BY S1.EmpId, S1.CurrentMonth, S1.CurrentMonthYear
The above query worked until the time there are no redundant records for an Employee for the same month.
But later data from multiple sources is coming in and an Employee table can have multiple records for the same month and it is still valid. Because the employee could be making calls in different ways. An as example the below record is inserted into the table:
Insert into SalesCalls values
(1,'ABC',1,2019,1)
Now the above query which worked fine above for the comparison of current month SalesCalls with the previous month is no longer working.
Phase 2 of the use case:
So to fix this I have build an intermediate temp table that contains aggregate data. The query used is:
Select EmpId, EmpName, month, Year, SUM(CallsMade) as CallsMade
into #SalesCalls
from SalesCalls
group by EmpId, EmpName, month, Year
Now the SalesCalls table inside the CTE is replaced with #SalesCalls and then the above query works fine.
But this #SalesCalls table needs to be dropped and recreated every time to see the latest comparison data.
The question is, is it possible to get the comparison data using a single query only and no intermediate temp tables or views.
Just use window functions:
select EmpId, EmpName, month, Year,
sum(CallsMade) as CallsMade,
(case when lag(year * 12 + month) over (partition by empId order by year, month) = year * 12 + month - 1
then lag(sum(callsMade)) over (partition by empId order by year, month)
end) as prevMonthCalls,
(case when lag(year * 12 + month) over (partition by empId order by year, month) = year * 12 + month - 1
then callsMade * 100.0 / lag(sum(callsMade) over (partition by empId order by year, month)
end) as as perentageChange
from SalesCalls
group by EmpId, EmpName, month, Year;
No joins, CTEs, subqueries, or temporary tables are needed at all.
One of the simplest solutions :
select EmpId, EmpName, month, Year,
sum(CallsMade) as CallsMade,
lag(sum(callsMade)) over (partition by empId order by year, month) AS prevMonthCalls,
sum(CallsMade) * 100.0 / lag(sum(CallsMade)) over (partition by empId order by year, month) as PercentageChange
from SalesCalls
group by EmpId, EmpName, month, Year
order by EmpId, Year,month;
tbl_totalMonth has id,time, date and kwh column.
I want to get the last recorded data of the months and group it per month so the result would be the name of the month and kwh.
the result should be something like this:
month | kwh
------------
January | 150
February | 400
the query I tried: (but it returns the max kwh not the last kwh recorded)
SELECT DATENAME(MONTH, a.date) as monthly, max(a.kwh) as kwh
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
group by DATENAME(MONTH, a.date)
I suspect you need something quite different:
select *
from (
select *
, row_number() over(partition by month(a.date), year(a.date) order by a.date DESC) as rn
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
) d
where rn = 1
To get "the last kwh recorded (per month)" you need to use row_number() which - per month - will order the rows (descending) and give each one a row number. When that number is 1 you have "the most recent" row for that month, and you won't need group by at all.
You could use group by and month
select datename(month, date), sum(kwh)
from tbl_totalMonth
where date = (select max(date) from tbl_totalMonth )
group by datename(month, date)
if you need only the last row for each month then youn should use
select datename(month, date), khw
from tbl_totalMonth a
inner join (
select max(date) as max_date
from tbl_totalMonth
group by month(date)) t on t.max_date = a.date
I am trying to count unique users on a monthly basis that were not present in the previous month. So if a user has a record for January and then another one for February, then I would only count January for that user.
user_id time
a1 1/2/17
a1 2/10/17
a2 2/18/17
a4 2/5/17
a5 3/25/17
My results should look like this
Month User Count
January 1
February 2
March 1
I'm not really familiar with BigQuery, but here's how I would solve the problem using TSQL. I imagine that you'd be able to use similar logic in BigQuery.
1). Order the data by user_id first, and then time. In TSQL, you can accomplish this with the following and store it in a common table expression, which you will query in the step after this.
;WITH cte AS
(
select ROW_NUMBER() OVER (PARTITION BY [user_id] ORDER BY [time]) AS rn,*
from dbo.employees
)
2). Next query for only the rows with rn = 1 (the first occurrence for a particular user) and group by the month.
select DATENAME(month, [time]) AS [Month], count(*) AS user_count
from cte
where rn = 1
group by DATENAME(month, [time])
This is assuming that 2017 is the only year you're dealing with. If you're dealing with more than one year, you probably want step #2 to look something like this:
select year([time]) as [year], DATENAME(month, [time]) AS [month],
count(*) AS user_count
from cte
where rn = 1
group by year([time]), DATENAME(month, [time])
First aggregate by the user id and the month. Then use lag() to see if the user was present in the previous month:
with du as (
select date_trunc(time, month) as yyyymm, user_id
from t
group by date_trunc(time, month)
)
select yyyymm, count(*)
from (select du.*,
lag(yyyymm) over (partition by user_id order by yyyymm) as prev_yyyymm
from du
) du
where prev_yyyymm is not null or
prev_yyyymm < date_add(yyyymm, interval 1 month)
group by yyyymm;
Note: This uses the date functions, but similar functions exist for timestamp.
The way I understood question is - to exclude user to be counted in given month only if same user presented in previous month. But if same user present in few months before given, but not in previous - user should be counted.
If this is correct - Try below for BigQuery Standard SQL
#standardSQL
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
you can test / play with above using below example with dummy data from your question
#standardSQL
WITH yourTable AS (
SELECT 'a1' AS user_id, '1/2/17' AS time UNION ALL
SELECT 'a1', '2/10/17' UNION ALL
SELECT 'a2', '2/18/17' UNION ALL
SELECT 'a4', '2/5/17' UNION ALL
SELECT 'a5', '3/25/17'
)
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
The output is
Year Month User_Count
2017 January 1
2017 February 2
2017 March 1
Try this query:
SELECT
t1.d,
count(DISTINCT t1.user_id)
FROM
(
SELECT
EXTRACT(MONTH FROM time) AS d,
--EXTRACT(MONTH FROM time)-1 AS d2,
user_id
FROM nbitra.tmp
) t1
LEFT JOIN
(
SELECT
EXTRACT(MONTH FROM time) AS d,
user_id
FROM nbitra.tmp
) t2
ON t1.d = t2.d+1
WHERE
(
t1.user_id <> t2.user_id --User is in previous month
OR t2.user_id IS NULL --To handle january, since there is no previous month to compare to
)
GROUP BY t1.d;
So I'm working with the following postgresql table:
10 rows from PostGreSQL table
For each business_id, I want to filter out those businesses where the review_count isn't above a specific review_count threshold for 2 consecutive months (or rows). Depending on the city the business_id is in, the threshold will be different (so for example, in the screenshot above, we can assume rows with city = Charlotte has a review_count threshold of >= 2, and those with city = Las Vegas has a review_count threshold of >= 3. If a business_id does not have at least one instance of consecutive months with review_counts above the specified threshold, I want to filter it out.
I want this query to return only the business_ids that meet this condition (as well as all the other columns in the table that go along with that business_id). The composite primary key on this table is (business_id, year, month).
Some months, as you may notice, are missing from the data (month 9 of the second business_id). If that is the case, I do NOT want to count 2 rows as 'consecutive months'. For example, for the business in Las Vegas, I do NOT want to consider month 8 to 10 as 'consecutive months', even though they appear in consecutive rows.
I've tried something like this, but have kind of run into a wall and don't think its getting me far:
SELECT *
FROM us_business_monthly_review_growth
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM us_business_monthly_review_growth
GROUP BY business_id, year, month
HAVING (city = 'Las Vegas'
AND (CASE WHEN COUNT(review_count >= 2 * 2.21) >= 2))
OR (city = 'Charlotte' AND (CASE WHEN COUNT(review_count >= 2 * 1.95) >= 2))
I'm new to Postgre and StackOverflow, so if you have any feedback on the way I asked this question please don't hesitate to let me know! =)
UPDATE:
Thanks to some help from #Gordon Linoff, I found the following solution:
SELECT *
FROM us_businesses_monthly_growth_and_avg
WHERE business_id IN (SELECT distinct(business_id)
FROM (SELECT *,
lag(year) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_year,
lag(month) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_month,
lag(review_count) OVER (PARTITION BY business_id ORDER BY year, month) AS prev_review_count
FROM us_businesses_monthly_growth_and_avg
) AS usga
WHERE (city = 'Charlotte' AND review_count >= 4 * 1.95 AND prev_review_count >= 4 * 1.95 AND (YEAR * 12 + month) = (prev_year * 12 + prev_month) + 1)
OR (city = 'Las Vegas' AND review_count >= 4 * 3.31 AND prev_review_count >= 4 * 3.31 AND (YEAR * 12 + month) = (prev_year * 12 + prev_month) + 1);
You can do this with lag():
select distinct business_id
from (select t.*,
lag(year) over (partition by business_id order by year, month) as prev_year,
lag(month) over (partition by business_id order by year, month) as prev_month,
lag(rating) over (partition by business_id order by year, month) as prev_rating
from us_business_monthly_review_growth t
) t
where rating >= $threshhold and prev_rating >= $threshhold and
(year * 12 + month) = (prev_year * 12 + prev_month) + 1;
The only trick is setting the threshold value. I have no idea how you plan on doing that.
Please try...
SELECT business_id
FROM
(
SELECT business_id AS business_id,
LAG( business_id, -1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
city,
LAG( year, -1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, -1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates,
review_count AS review_count
FROM us_business_monthly_review_growth
order BY business_id,
year,
month
) tempTable
JOIN tblCityThresholds ON tblCityThresholds.city = tempTable.city
WHERE business_id = lag_in_business_id
AND diffInDates = 1
AND tblCityThresholds.threshold <= review_count
GROUP BY business_id;
In formulating this answer I first used the following code to test that LAG() performed as hoped...
SELECT business_id,
LAG( business_id, 1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
year,
month,
LAG( year, 1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, 1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates
FROM mytable
ORDER BY business_id,
year,
month;
Here I was trying to get LAG() to refer to values on the next row, but the output showed that it was referring to the previous row in that comparison. Unfortunately I wanted to compare current values with the next one to see if the next record had the same business_id, etc. So I changed the 1 in LAG() to `-1', giving me...
SELECT business_id,
LAG( business_id, -1 ) OVER ( ORDER BY business_id, year, month ) AS lag_in_business_id,
year,
month,
LAG( year, -1 ) OVER ( ORDER BY business_id, year, month ) * 12 + LAG( month, -1 ) OVER ( ORDER BY business_id, year, month ) AS diffInDates
FROM mytable
ORDER BY business_id,
year,
month;
As this gave me the desired results I added city, to allow a JOIN between the results and an assumed table holding the details of each city and its corresponding threshold. I chose the name tblCityThresholds as a suggestion since I am not sure what you have / would call it. This completed the inner SELECT statement.
I then joined the results of the inner SELECT statement to tblCityThresholds and refined the output as per your criteria. Note : It is assumed that the city field will always have a corresponding entry in tblCityThresholds;
I then used GROUP BY to ensure no repetition of a business_id.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://www.postgresql.org/docs/8.4/static/functions-window.html (in regards LAG())
I have a table:
create table remote (account int ,datecreated datetime,status int)
insert into remote (account , datecreated,status)
values
(123,'2015-08-25',1),
(123,'2015-08-25',1),
(123,'2015-09-26',1),
(1238,'2015-08-25',1),
(123,'2014-08-25',1),
(123,'2014-08-26',1),
(1238,'2014-08-25',1),
(1238,'2014-08-25',1),
(1235,'2014-08-25',1),
(1234,'2014-09-22',1),
(1234,'2014-09-22',1),
(1234,'2014-10-29',1),
(1236,'2014-10-25',1);
From here I would like to get the unique account count for each month/year where status=1
For example using the data above:
the output would be
count | month
-------------
1 |9/2015
2 |8/2015
2 |10/2014
1 |9/2014
3 |8/2014
How can I make this work?
I use sql 2012.
Use Group by month and year of datecreated to skip day part in count. use the same month and year in order by desc . Then Concatenate the month and year to get the result
SELECT [Count],
[Mon/Year]= CONVERT(VARCHAR(2), [Month]) + '/' + CONVERT(VARCHAR(4), [year])
FROM (SELECT [year]=Year(datecreated),
[month]= Month(datecreated),
[Count]= Count(distinct account)
FROM remote
GROUP BY Year(datecreated),
Month(datecreated)) a
ORDER BY [year] DESC,[Month] DESC
Result
Count Mon/Year
----- --------
1 9/2015
3 8/2015
2 10/2014
1 9/2014
5 8/2014
This is a group by query with a filter and some datetime logic:
select year(datecreated) as yr, month(datecreated) as mon, count(*)
from remote
where status = 1
group by year(datecreated), month(datecreated)
order by yr desc, mon desc;
This puts the year and month into separate columns. You can concatenate them together into a single value if you really want to.