group by date part of datetime and get number of records for each - hive

I have this so far:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) over() as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY TO_DATE(created_at)
But this is not working.
syntax error:
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'created_at': (possible column names are: (tok_function to_date (tok_table_or_col created_at)))
count(*) does not give sum for each grouped by date but instead all of the rows.
Note : I am actually using hive but it is exactly like sql when it comes to queries

Try this:
select created_at,
DATEDIFF(TO_DATE(current_date()), TO_DATE(sales_flat_order.created_at)) as delay,
count(*) as NumberOfOrders
FROM
magentodb.sales_flat_order
WHERE
status IN ( 'packed' , 'cod_confirmed' )
GROUP BY Date(created_at)

I think you want to use date part(including year, month and day) of created_at for grouping.
select
date(created_at) as created_at_day,
datediff(curdate(), sales_flat_order.created_at) as delay,
count(*) as numberOfOrders
from magentodb.sales_flat_order
WHERE status IN ('packed', 'cod_confirmed' ) GROUP BY created_at_day
This query will show only the first order created on the day. Because you are grouping by the day. You can use average to find average delay of orders created for the day.

My phone won't allow me to post comments. But try this link it might guide you the right way.
stackoverflow.com/questions/29704904/invalid-table-alias-or-column-reference-b

Related

Distinct count and group by in HIVE

I am very new to HIVE and have an issue with distinct count and GROUP BY.
I want to calculate maximum temperature from temperature_data table corresponding to those years which have at least 2 entries in the table-
I tried with below query but it is not working
select
SUBSTRING(full_date,7,4) as year,
MAX(temperature) as temperature
from temperature_data
where count(distinct(SUBSTRING(full_date,7,4))) >= 2
GROUP BY SUBSTRING(full_date,7,4);
I am getting an error-
FAILED: SemanticException [Error 10128]: Line 2:0 Not yet supported place for UDAF 'count'
Below is input-
year,zip,temperature
10-01-1990,123112,10
14-02-1991,283901,11
10-03-1990,381920,15
10-01-1991,302918,22
12-02-1990,384902,9
10-01-1991,123112,11
14-02-1990,283901,12
10-03-1991,381920,16
10-01-1990,302918,23
12-02-1991,384902,10
10-01-1993,123112,11
You should use HAVING keyword instead to set a condition on variable you're using for grouping.
Also, you can benefit of using subqueries. See below.
SELECT
year,
MAX(t1.temperature) as temperature
FROM
(select SUBSTRING(full_date,7,4) year, temperature from temperature_data) t1
GROUP BY
year
HAVING
count(t1.year) > 2;
#R.Gold, We can try to simplify the above query without using sub-query as below:
SELECT substring(full_date,7) as year, max(temperature)
FROM your-hive-table
GROUP BY substring(full_date,7)
HAVING COUNT(substring(full_date,7)) >= 2
And, fyi - we can't use aggregate functions with WHERE clause.

PERCENTAGE USING TWO DIFFERENT COLUMNS, USING GROUP BY IN SUBQUERY

I alreADY written a code for it but, there is no response, i am using postgreSQL.
The final thing i want is to find THE ERROR PERCENTAGE STATUS(see the status colounm which have 200 Ok or not found) FOR EACH DAY.
FOR EG-- 2016/07/22 - 1.5% ERROR
P.S the database is really big with different status and dates, i want
result date wise
In above code i am trying to find the (NOT-FOUND status/Total Status) for each day
THE code is :-
SELECT status, date(time), round(coun/total) AS percent
FROM log,
(SELECT count(*) AS coun
FROM log
WHERE status NOT LIKE '200 OK'
GROUP BY date(time)
ORDER BY date(time)) c,
(SELECT count(*) AS total
FROM log GROUP BY
date(time)
ORDER BY date(time)) t
GROUP BY date(time), status, percent
ORDER BY date(time);
The database i have is like that...
enter image description here
SELECT distinct status, date(time), round(count(1) over (partition by date(time))/count(1) over ()) AS percent, count(1) over (partition by date(time)), count(1) over ()
FROM log
it would give you percentage for statuses per day. not only not 200, but all, which would make more sense because you dont filter it in general query.
or you can add FILTER (where status NOT LIKE '200 OK') before over for count if you still want it.
but general idea is not to scan same table three times and not join the result...

how make query about some of a field are equal?

I'm trying the code below
SELECT Sum(Price) FROM Faktor WHERE date=date
but it shows total of all price. I want to show the sum of per day like:
date ----- sum
2015/5/1 12345
2015/5/2 54124
I have tried below code too but get error:
SELECT date,Sum(Price) FROM Faktor WHERE date=date
[Err] 42000 - [SQL Server]Column 'Faktor.date' is invalid in the
select list because it is not contained in either an aggregate
function or the GROUP BY clause.
Try
SELECT [date],SUM([Price])
FROM Faktor
GROUP BY [date]
Not sure why you'd use date=date, so I left it out.
Just as the error message tells you, you need to use a group by clause and the date column needs to be in it.
SELECT date, SUM(Price)
FROM Faktor
WHERE date=date -- this looks a bit odd... maybe you want a range of dates or something?
GROUP BY date
With SQL Server all non-aggregated columns from the select statement needs to be grouped (unlike some versions of MySQL for instance).
SELECT
date
,SUM(Price)
FROM
Faktor
WHERE
--add date rules here if you have date criteria i.e. date >= 'someDate'
GROUP BY
date

Using a timestamp function in a GROUP BY

I'm working with a large transaction data set and would like to group a count of individual customer transactions by month. I am unable to use the timestamp function in the GROUP BY and return the following error:
BAD_QUERY (expression STRFTIME_UTC_USEC([DATESTART], '%b') in GROUP BY is invalid)
Is there a simple workaround to achieve this or should I build a calendar table (which may be the simplest option)?
You have to use an alias:
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION)
FROM datasetId.tableId
GROUP BY month
#Charles is correct but as an aside you can also group by column number.
SELECT STRFTIME_UTC_USEC(DATESTART, '%b') as month, COUNT(TRANSACTION) as count
FROM [datasetId.tableId]
GROUP BY 1
ORDER BY 2 DESC

get previous from max value

I have folowing sql query an di want to get previous of max value from table.
select max(card_no),vehicle_number
FROM WBG.WBG_01_01
group by vehicle_number
Through this query i got each maximum card number of each vehicle.But i want to get previouse of that max.For example
if vehicle number has card number 21,19,17,10,5,6,1 and i want to get 19 from max function
Please anyone tell me how can i do this in sql.
Another idea would be to use analytics, something like this:
select
vehicle_number,
prev_card_no
from (
select
card_no,
vehicle_number,
lag(card_no) over
(partition by vehicle_number order by card_no) as prev_card_no,
max(card_no) over
(partition by vehicle_number) as max_card_no
FROM WBG.WBG_01_01
)
where max_card_no = card_no;
Of course, this doesn't take into account your seemingly arbitrary ordering from your question, nor would it work with duplicate maximum numbers.
try this one:
select max(card_no),vehicle_number
FROM WBG.WBG_01_01
where card_no < (Select max(card_no) from WBG.WBG_01_01 group by vehicle_number)
group by vehicle_number