using MIN on a datepart with Group BY not working, returns different dates - sql

Can anyone help with an aggregate function.. MIN.
I have a car table that i want to return minimum sale price and minimum year on a tbale that has identical cars but different years and price ...
Basically if i removed Registration (contains a YEAR) from the group by and select the query works but if i leave it in then i get 3 cars returned which are exactly the same model,make etc but with different years..
But i am using MIN so it should return 1 car with the year 2006 (the minimum year between the 3 cars)
The MIN(SalePrice) is working perfectly .. its the registraton thats not owrking..
Any ideas?
SELECT
MIN(datepart(year,[Registration])) AS YearRegistered,
MIN(SalePrice), Model, Make
FROM
[VehicleSales]
GROUP BY
datepart(year,[Registration]), Model, Make

IF I have correctly understood what you are looking for, you should query:
SELECT Model, Make, MIN(datepart(year,[Registration])) AS YearRegistered, MIN(SalePrice)
FROM [VehicleSales]
GROUP BY Model, Make
Hope it helps.

Turro answer will return the lowest registration year and the lowest price for (Model, Make), but this doesn't mean that lowest price will be for the car with lowest Year.
Is it what you need?
Or, you need one of those:
lowest price between the cars having lowest year
lowest year between the cars having lowest price
-- EDITED ---
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
That's why I made a comment. Imagine next situation
Porshe 911 2004 2000
Porshe 911 2004 3000
Porshe 911 2005 1000
Porshe 911 2005 5000
You'll get result that will not really tell you if this car goes cheaper based on year or not.
Porshe 911 2004 1000
I don't know how you'll tell if car gets cheaper next year based on one row without comparison with previous year, at least.
P.S. I'd like to buy one of cars above for listed price :D

You're getting what you're asking for: the cars are put into different groups whenever their model, make, or year is different, and the (minimum, i.e. only) year and minimum price for each of those groups is returned.
Why are you using GROUP BY?

You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
You should find cheapest (or average) make/model per year and compare with the cheapest (or average) from previous year (for the same make/model).
Then you can see which of them gets cheaper the next year (I suppose most of them)

Related

How do I group nodes by a different node property and order by a different one in Neo4j?

I am working with crimes in Boston dataset, with each crime as nodes in Neo4j. I want to query and display the most committed crimes each year to get a result like this:
Year
offense_code_group
count(offense_code_group)
2015
Aggravated Assault
5827
2016
Larceny From Motor Vehicle
11534
2017
Verbal Disputes
12049
2018
Investigate Person
8724
Note: The result is just an example of how I want it to group and look, not the actual result I'll get after querying the dataset.
But this is the best I have been able to do so far:
query and output in neo4j desktop
I know there is no GROUP BY clause in Neo4j and I have tried using collect(), but I can't get it to work.
How about something like this:
MATCH (c:Crime)
WITH c.year AS year, c.offense_code_group AS code, count(c.offense_code_group) AS count
ORDER BY year, count DESC
WITH year, COLLECT(code) AS codes, COLLECT(count) AS counts
RETURN year, codes[0], counts[0]

find mean batting average ("Ave") and total number of centuries by country but only including records for which starting year is 2010 or later

I have a dataset of cricket players and want to find mean batting average "Ave" and total number of centuries “Hundreds” by country (column) but only including records for which starting year “From” is 2010 or later
Ave, Hundreds, Country, From are the columns name
new_data.groupby(['Country'])['Ave'].mean()
new_data.groupby(['Country'])['Hundreds']
I want to apply these two in a single line and also want to use the condition that starting Year should be 2010 or later
I am assuming you have two columns only Ave and Hundreds. You can do it by using Pandas .agg method.
grouped_data = new_data[new_data['From'].year >= 2010].groupby(['Country'])
grouped_data.agg(['mean', 'sum'])
Let me know if it doesn't work.
new_data[new_data['From'].year >= 2010].groupby(['Country'])['Ave'].mean()
You can do the same for 'Hundreds'.

identifying trends and classifying using sql

i have a table xyz, with three columns rcvr_id,mth_id and tpv. rcvr_id is an id given to a customer, mth_id is a column which stores the month number( mth_id is calculated as (2012-1900) * 12 + 1,2,3.. ( depending on the month). So for example Dec 2011 will have month_id of 1344, Jan 2012 1345 etc. Tpv is a variable which shows the customers transaction amount.
Example table
rcvr_id mth_id tpv
1 1344 23
2 1344 27
3 1344 54
1 1345 98
3 1345 102
.
.
.
so on
P.S if a customer does not have a transaction in a given month, his row for that month wont exist.
Now, the question. Based on transactions for the months 1327 to 1350, i need to classify a customer as steady or sporadic.
Here is a description.
The above image is for 1 customer. i have millions of customers.
How do i go about it? I have no clue how to identify trends in sql .. or rather how to do it the best way possible.
ALSO i am working on teradata.
Ok i have found out how to get standard deviation. Now the important question is : How do i set a standard deviation limit on my own? i just cant randomly say "if standard dev is above 40% he is sporadic else steady". I thought of calculating average of standard deviation for all customers and if it is above that then he is sporadic else steady. But i feel there could be a better logic
I would suggest the STDDEV_POP function - a higher value indicates a greater variation in values.
select
rcvr_id, STDDEV_POP(tpv)
from yourtable
group by rcvr_id
STDDEV_POP is the function for Standard Deviation
If this doesn't differentiate enough, you may need to look at regression functions and variance.

DB2 - Ranking data by timeframe

I am trying to write a report (DB2 9.5 on Solaris) to do the following:
I have a set of data, let's say it's an order table. I want to run a report which will give me, for each month, the number of orders per customer, and their "rank" that month. The rank would be based on the number of orders. I was playing around with the RANK() OVER clauses, but I can't seem to get it to give me a rank per month (or other "group by"). If there are 100 customers and 12 months of data, i would expect 1200 rows in the report, 100 per month, each with a rank between 1 and 100. Let me know if more detail would be helpful. Thanks in advance.
the solution is to use the PARTITION BY clause.
for example, see page 5 here: http://cmsaville.ca/documents/MiscDocs/TopNQueries.pdf

First and last measure dates for sets in MDX

I'm looking for some guidance on how to approach an MDX query. My situation is that I have sales occuring, which make up the grain of the fact table, and are measures. I have a products dimension and a customer dimension. I also have a date dimension and a time dimension, I made them seperate to keep member counts low on the dimensions.
The query I'm trying to write, is one that asks for the first and last purchase, per customer per product. So, an example result set may look like:
Car - Bob - 2008-12-10 - 15:39 - 2008-12-11 - 16:44
Car - Bill - 2008-12-12 - 09:16 - 2008-12-12 - 09:16
Van - Jim - 2008-12-11 - 14:02 - 2008-12-12 - 22:01
So, Bob bought two cars, and we have the first and last purchases, Bill bought one car so the first and last purchases are the same, Jim may have bought three vans but we only show the first and last.
I've tried using TAIL, but can't seem to get the sets correct to show the last purchase per customer. Even then, expirements with HEAD for the first purchase showed I couldn't use the same dimension twice on the same axis. It's also made harder by the fact that there may be several purchases per day, so the query I need is the last time for the last date for each customer for each product, and the first time for the first date for each customer for each product.
I'm not neccesarily asking for an exact query answer, although that would help, but I am interested in the approach and best methods to use. The platform is SQL Server Analysis Services 2005.
Can't you just use the min and max aggregations on purchase date? Or have I completely missed the problem?