DB2 - Ranking data by timeframe - sql

I am trying to write a report (DB2 9.5 on Solaris) to do the following:
I have a set of data, let's say it's an order table. I want to run a report which will give me, for each month, the number of orders per customer, and their "rank" that month. The rank would be based on the number of orders. I was playing around with the RANK() OVER clauses, but I can't seem to get it to give me a rank per month (or other "group by"). If there are 100 customers and 12 months of data, i would expect 1200 rows in the report, 100 per month, each with a rank between 1 and 100. Let me know if more detail would be helpful. Thanks in advance.

the solution is to use the PARTITION BY clause.
for example, see page 5 here: http://cmsaville.ca/documents/MiscDocs/TopNQueries.pdf

Related

Google Sheets Query Function. How can I get only Unique or Distinct Rows?

I am trying to answer a question on a case using the Query function on Google Sheets and am stuck on a particular problem.
I need to get the total number of unique orders per year. I used the formula below and managed to get the total orders per year.
=QUERY(raw_data!$A$1:$U$9995, "select YEAR(C), COUNT(B) group by YEAR(C)", 1)
Where column C is the date and B is the order_id.
The problem is that this returns a total of 9994 orders and includes duplicates of the same order. For example, if a customer purchased 3 different products, they would each be given a line in the database and would count as 3 of the 9994 orders. However, they all have the same order_id.
I need to get the number of unique orders per year. I know this number is 5009 since I did some manual research through Excel, but wanted to find that same total, separated by year, using the Query Function since this is a case to test my SQL Knowledge.
Is this possible? Does the Query Function have a way to get the count for unique order_ids? Thank you very much for your help!
See if this helps
=QUERY(UNIQUE(raw_data!$B$1:$C$9995), "select YEAR(Col2), COUNT(Col1) where Col2 is not null group by YEAR(Col2)", 1)

How to NTILE over distinct values in BigQuery?

I have a query that I'm trying to put together in Google BigQuery that would decile sales for each customer. The problem I'm running into is that if a decile breaks at the point where many customers have the same sales value, they could end up in different deciles despite having the same sales.
For example, if there were twenty customers in total, and one spent $100, 18 spent $50, and one spent $25, the 18 customers who spent $50 will still be broken out across all the deciles due to equal groups being created, whereas in reality I would want them to be placed in the same decile.
The data that I'm using is obviously a bit more complex -- there are about 10 million customers, and the sales are deciled within a particular group to which each customer belongs.
Example code:
NTILE(10) OVER (PARTITION BY customer_group ORDER BY yearly_sales asc) as current_sales_decile
The NTILE function works, but I just run into the problem described above and haven't figured out how to fix it. Any suggestions welcome.
Calculate the ntile yourself:
select ceiling(rank() over (partition by customer_group order by yearly_sales) * 10.0 /
count(*) over (partition by customer_group)
)
This gives you more control over how the tiles are formed. In particular, all rows with the same value go in the same tile.

SQL GROUPING SETS averages with multiple many-to-many dimensions

I have a table of data with the following:
User,Platform,Dt,Activity_Flag,Total_Purchases
1,iOS,05/05/2016,1,1
1,Android,05/05/2016,1,2
2,iOS,05/05/2016,1,0
2,Android,05/05/2016,1,2
3,iOS,05/05/2016,1,1
3,Android,06/05/2016,1,3
1,iOS,06/05/2016,1,2
4,Android,06/05/2016,1,2
1,Android,06/05/2016,1,0
3,iOS,07/05/2016,1,2
2,iOS,08/05/2016,1,0
I want to do a GROUPING SETS (Platform,Dt,(Platform,Dt),()) aggregation to be able to find for each combination of Platform and Dt the following:
Total Purchases
Total Unique Users
Average Purchases per User per Day
The first two are simple as these can be achieved via a sum(Total_Purchases) and count(distinct user) respectively.
The problem I have is with the last metric. The result set should look like this but I don't know how to get the last column to be calculated correctly:
Platform,Dt,Total_Purchases,Total_Unique_Users,Average_Purchases_Per_User_Per_Day
Android,05/05/2016,4,2,2.0
iOS,05/05/2016,2,3,0.7
Android,06/05/2016,5,3,1.7
iOS,06/05/2016,2,1,2.0
iOS,07/05/2016,2,1,2.0
iOS,08/05/2016,0,1,0.0
,05/05/2016,6,3,2.0
,06/05/2016,7,3,2.3
,07/05/2016,1,1,1.0
,08/05/2016,1,1,1.0
Android,,9,4,1.8
iOS,,6,3,1.2
,,15,4,1.6
For the first ten rows we see that getting the Average purchase per user per day is a simple division of the first two columns as the dimension in these rows represent a single date only. But when we look at the final 3 rows we see that the division is not the way to achieve the desired result. This is because it needs to take an average for each day in turn to get the overall per day amount.
If this isn't clear please let me know and I'll be happy to explain better. This is my first post on this site!

Query to find a weekly average

I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.

MySQL: ORDER BY stat / age?

I have an int field in my product table called product_stat which is incremented everytime a product is looked at. I also have a date field called product_date_added.
In order to figure out the average visits per day a product gets, you need to calculate how many days the product has existed by using the current date and the date the product was added. Then divide the product stat by the amount of days it has existed to get the average visits per day.
OK but what I want to do is select a number of products and order them by visits per day DESC
How can I do that?
Thanks!!!
Something like this should do the trick, using DATEDIFF to get the difference between two dates, and then dividing the product_stat column by that difference.
SELECT
p.*,
p.product_stat/DATEDIFF(CURDATE(),p.product_date_added) as visits_per_day
FROM products p
ORDER BY visits_per_day DESC
Although note that DATEDIFF only came around as of MySQL 4.1.1. If you're using an earlier version you should do "TO_DAYS(CURDATE()) - TO_DAYS(p.product_date_added)" instead.
You will want something like this:
SELECT product_name,
product_stat / datediff(now(), product_date_added) as 'VisitPerDay'
FROM product
ORDER by VisitPerDay DESC