Using a timestamp function in a GROUP BY - google-bigquery

I'm working with a large transaction data set and would like to group a count of individual customer transactions by month. I am unable to use the timestamp function in the GROUP BY and return the following error:
BAD_QUERY (expression STRFTIME_UTC_USEC([DATESTART], '%b') in GROUP BY is invalid)
Is there a simple workaround to achieve this or should I build a calendar table (which may be the simplest option)?

You have to use an alias:
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION)
FROM datasetId.tableId
GROUP BY month

#Charles is correct but as an aside you can also group by column number.
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION) as count
FROM [datasetId.tableId]
GROUP BY 1
ORDER BY 2 DESC

Related

Identify “Start” and “End” date of a trip in Netezza?

I want to get how many times a pack has been used in any given month, Each time the pack is activated it can be used for 7 days.
Expected Result
I have tried Lag and lead along with nesting the query.
Here's what you need.
select
max([Date]) as Month_END,
Line_Number,
Pack_Code,
Product_Code,
count(1) as [Number Of Packs]
from
Table
group by
datepart(mm, [Date]),
Line_Number,
Pack_Code,
Product_Code
Some issue with my laptop. Before i complete my answer, it has posted my response :).
My previous answer will give you the total number of times in a month a pack has been used under count variable. You can then divide the aggregated value by 7 to identify the number of active packs. Hope it helps
You can try converting the date to year month using the below syntax and then apply a group by on the required columns to get the count of the pack_code
to_char(to_Date(LOADED_DT,'YYYY-MM-DD HH24:MI:SS'),'YYYYMM') year_month
Sample query:
select a.year_month,a.line_number,a.pack_code,a.product_code,count(a.pack_code)
select to_char(to_Date(LOADED_DT,'YYYY-MM-DD HH24:MI:SS'),'YYYYMM') year_month,line_number,pack_code,product_code,count(pack_code)
from <table_name> ) a
group by a.year_month,a.line_number,a.pack_code,a.product_code;

Distinct count and group by in HIVE

I am very new to HIVE and have an issue with distinct count and GROUP BY.
I want to calculate maximum temperature from temperature_data table corresponding to those years which have at least 2 entries in the table-
I tried with below query but it is not working
select
SUBSTRING(full_date,7,4) as year,
MAX(temperature) as temperature
from temperature_data
where count(distinct(SUBSTRING(full_date,7,4))) >= 2
GROUP BY SUBSTRING(full_date,7,4);
I am getting an error-
FAILED: SemanticException [Error 10128]: Line 2:0 Not yet supported place for UDAF 'count'
Below is input-
year,zip,temperature
10-01-1990,123112,10
14-02-1991,283901,11
10-03-1990,381920,15
10-01-1991,302918,22
12-02-1990,384902,9
10-01-1991,123112,11
14-02-1990,283901,12
10-03-1991,381920,16
10-01-1990,302918,23
12-02-1991,384902,10
10-01-1993,123112,11
You should use HAVING keyword instead to set a condition on variable you're using for grouping.
Also, you can benefit of using subqueries. See below.
SELECT
year,
MAX(t1.temperature) as temperature
FROM
(select SUBSTRING(full_date,7,4) year, temperature from temperature_data) t1
GROUP BY
year
HAVING
count(t1.year) > 2;
#R.Gold, We can try to simplify the above query without using sub-query as below:
SELECT substring(full_date,7) as year, max(temperature)
FROM your-hive-table
GROUP BY substring(full_date,7)
HAVING COUNT(substring(full_date,7)) >= 2
And, fyi - we can't use aggregate functions with WHERE clause.

group by date part of datetime and get number of records for each

I have this so far:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) over() as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY TO_DATE(created_at)
But this is not working.
syntax error:
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'created_at': (possible column names are: (tok_function to_date (tok_table_or_col created_at)))
count(*) does not give sum for each grouped by date but instead all of the rows.
Note : I am actually using hive but it is exactly like sql when it comes to queries
Try this:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY Date(created_at)
I think you want to use date part(including year, month and day) of created_at for grouping.
select
date(created_at) as created_at_day,
datediff(curdate(), sales_flat_order.created_at) as delay,
count(*) as numberOfOrders
from magentodb.sales_flat_order
WHERE status IN ('packed', 'cod_confirmed' ) GROUP BY created_at_day
This query will show only the first order created on the day. Because you are grouping by the day. You can use average to find average delay of orders created for the day.
My phone won't allow me to post comments. But try this link it might guide you the right way.
stackoverflow.com/questions/29704904/invalid-table-alias-or-column-reference-b

Cannot Group by Year

Beginner SQL Question:
I'm trying to do a group by, by year and I'm getting funny results. I am using SQL Server 2008.
First, I tried
select count(applicationkey) , approveddate from ida.applications group by approveddate
To get a count of applications by date. However, I am interested in applications by year in stead of day so I tried
select count(applicationkey) , approveddate from ida.applications group by year(approveddate)
When I do this, I get an error message -Column 'ida.applications.ApprovedDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.-
However, if I do this I get results
select count(applicationkey) from ida.applications group by year(approveddate)
I get results. Its just I want to be able to see what year matches to which count, which I cannot do for some reason. Does anyone know why I am having this problem?
select count(applicationkey),
year(approveddate)
from ida.applications
group by year(approveddate)
group by must match fields in select if not using an aggregate.
You have all the correct parts there, just include the year(approveddate) in your select like so
select count(applicationkey), year(approveddate)
from ida.applications
group by year(approveddate)
In a group query, columns selected have to be aggregate functions or appear in the group-by list, because otherwise SQL wouldn't know which of the multiple values for the column in the group to use.
You can fix your query easily by using
select count(applicationkey) , year(approveddate)
from ida.applications group by year(approveddate)
-- the year displayed is from the group by list

Not using group by in second part of sum

This is my sum clause
Select *,(sum(current_bal-curr_bal_now)/
current_bal from base
Group by month
This gives me an error because I'm not using current_bal in the group by.
Is there a way of not using group by current_bal aswell as month as it completely messes up the output layout.
Thanks
Another guess...
SELECT *,
( sum(current_bal) OVER (PARTITION BY month) ) / current_bal
FROM base
The problem is that the sum will return you only a value, so, in your all select, Current_bal is different. Which one should choose ?
If what you want is add every division, something like this:
Select sum(current_bal/current_bal)
from base
Group by month
will work