Finding some difficulty trying to ORDER BY the SUM(Revenue) with this query - sql

I would like to order the result by using the SUM(Revenue), Below is my code kindly help me fix it, Thank you
SELECT
EXTRACT(YEAR FROM Release_date) AS year_released, COUNT(Genre) AS number_of_comedy,SUM(Revenue)AS total_revenue
FROM
Movie_data.movie
WHERE
Genre='Comedy'
GROUP BY
EXTRACT(YEAR FROM Release_date)
ORDER BY
SUM(Revenue)
LIMIT
1000
THE ERROR MESSAGE I get is "SELECT list expression references column Release_date which is neither grouped nor aggregated at [2:19]"

You should be able to use what you have written. You can also write:
ORDER BY total_revenue
Often when ordering by revenue, you want the largest values first:
ORDER BY total_revenue DESC

Related

How to aggregate rows on BigQuery

I need to group different years in my dataset so that I can see the total number of login_log_id each year has(BigQuery)
SELECT login_log_id,
DATE(login_time) as login_date,
EXTRACT(YEAR FROM login_time) as login_year,
TIME(login_time) as login_time,
FROM `steel-time-347714.flex.logs`
GROUP BY login_log_id
I want to make a group by so that I can see total number of login_log_id generated in different years.
My columns are login_log_id, login_time
I am getting following error :-
SELECT list expression references column login_time which is neither grouped nor aggregated at [2:6]
The error is because every column you refer to in the select need to be aggregated or be in the GROUP BY.
If you want the total logins by year, you can do:
SELECT
EXTRACT(YEAR FROM login_time) as login_year,
COUNT(1) as total_logins,
COUNT(DISTINCT login_log_id) as total_unique_logins
FROM `steel-time-347714.flex.logs`
GROUP BY login_year
But if you want the total by login_log_id and year:
SELECT
login_log_id,
EXTRACT(YEAR FROM login_time) as login_year,
COUNT(1) as total_logins
FROM `steel-time-347714.flex.logs`
GROUP BY login_log_id, login_year

Find the average lowest item in a collection grouped by date in SQL

My SQL isn't the best - I can get this working in C# but it seems more efficient to get it in my data layer - I've got a table Prices:
ID
Price
DateTime
Each row is exactly 1 hour from the next, so I have a snapshot of a price every hour.
I'm trying to work out which hour in a day over the entire dataset has the lowest price (on average).
So ideally I'm after a list of each hour in the day ranked by how cheap on average that hour is over the entire dataset - so a maximum of 24 rows (one for each hour in the day).
Any help would be greatly appreciated!
Thanks :D
Which database are you on?
Different DBs have different ways to extract date from a timestamp column.
Postgres has date(timestamp), In Oracle, you can use trunc(timestamp). Or most DBs have to_char/to_date. So you can try that.
Once you have extracted the date, you can try something like this -
select ID,
Price,
DateTime,
trunc(DateTime) as day,
rank() over (partition by trunc(DateTime) order by Price asc) as least_for_day
from Prices
Now you can use the "least_for_day" ranked column and select by day.
Again, depending on the DB, you can either directly qualify on the ranked column in the same SQL or use the above as a sub-query and filter for the rank.
You can use a query like below
select
hour,
avg(daily_rank) avg_rank
from
(
select *, hour= format((datetime as datetime),'HH'), daily_rank= dense_rank() over (partition by cast(datetime as date) order by price asc)
) t
group by hour
Thank you very much to #Many Manjunath and #DhruvJoshi. Final solution below;
WITH prices AS
(
SELECT
[Price],
[DateTime],
CAST([DateTime] AS TIME) 'Time',
CAST([DateTime] as date) 'Date',
rank() over (partition by cast([DateTime] as date) order by [Price] asc) as least_for_day
FROM [dbo].[Prices]
)
SELECT [Time], count(*) 'Qty Cheapest' FROM prices
WHERE least_for_day = 1
GROUP BY [Time]
ORDER BY 2 DESC
That returns 24 rows:

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

Sum over partition not working

I've got some code with the partition function, but it's not working.
I get an error message that says
Incorrect syntax near 'Sales'
Does anyone know why? I looked at the other partition questions, didn't find an answer,
The code (below) is supposed to select PriceZoneID and Sales from the Aggregated Sales History table then sum up the total sales using the OVER function and put that data in a new column called Total Sales.
It should then sum up the sales for each zone using the OVER (PARTITION) expression in a new column called TotalSalesByZone then order the data by Price Zone ID and Sales
Select PriceZoneID,
Sales,
SUM(Sales) OVER () AS Total Sales,
SUM(Sales) OVER (PARTITION BY PriceZoneID) AS TotalSalesByZone
From AggregatedSalesHistory
ORDER BY PriceZoneID AND Sales;
(Partition By divides the result into Partitions eg Zones)
If you could post the code with the correct answer, it would be greatly appreciated!
Coming out of the comments now, as it's getting a little silly to correct the errors in there. There is 1 typograhical error in your code, and 1 syntax error:
Select PriceZoneID,
Sales,
SUM(Sales) OVER () AS Total Sales, --There's a space in the alias
SUM(Sales) OVER (PARTITION BY PriceZoneID) AS TotalSalesByZone
FROM AggregatedSalesHistory
ORDER BY PriceZoneID AND Sales; --AND is not valid in an ORDER BY clause
The correct query would be:
Select PriceZoneID,
Sales,
SUM(Sales) OVER () AS TotalSales, --Removed Space
SUM(Sales) OVER (PARTITION BY PriceZoneID) AS TotalSalesByZone
FROM AggregatedSalesHistory
ORDER BY PriceZoneID, Sales; --Comma delimited

How query works with missing Group by values

I am tryng to write a query to pull the total discount as well as revenue of my customer's orders by day. Note that for each ship_id there will be several items so I want to have the sum clauses as below and in the end, I have that group by and I want to group them by ship_id (this is unique for each order).
I got a result of 2M rows (this could be correct because we are a big) but I am not sure if my group by is correct. What if I only put ship_id there? How my query understands it?
d.ship_id
,to_char(d.order_datetime,'yyyy/mm/dd hh24:mi:ss') as datetime_order
,to_char(d.order_datetime, 'D') as day_order
,to_char(d.order_datetime, 'MM') as month_order
,sum(di.discount) as discount
,sum(di.price * di.units) AS price
FROM
table1 d
JOIN
table2 di
ON
d.ship_id = di.ship_id
GROUP BY
d.ship_id
,to_char(d.order_datetime,'yyyy/mm/dd hh24:mi:ss')
,to_char(d.order_datetime, 'D')
,to_char(d.order_datetime, 'MM')
In this query you grouping result by
ship_id (order)
d.order_datetime yyyy/mm/dd hh24:mi:ss (i think this is wrong if you need group only by day) this group is grouping by day,month,year,hour,minute and seconds
d.order_datetime 'D' (grouping by day correct)
d.order_datetime 'mm' (grouping by month correct, doesn't change results of grouping by day)
P.S.: I dont see grouping by costumer in anywhere.
Your query will sumarize all values groupped by itens on group by:
If you put, something like:
Select ship_id, ,sum(di.discount) as discount
,sum(di.price * di.units) AS price from table1 group by ship_id
The query will return Price and discount groupped by ship_id, when you make a group by you must put all non-*aggregate fields on group by.
*Aggregate funtions (Sum,AVG,count...)
Try explain better your problem to make me understand your situation(problem).