my query won't run, and I think the query is correct, anyone knows what I did wrong? - google-bigquery

SELECT
usertype CONCAT(start_station_name ,"to", end_station_name) AS route,
COUNT (*) AS num_trips,
ROUND(AVG(cast(tripduration as int64/60),2) AS duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station, end_station, usertype
ORDER BY num_trips DESC LIMIT 10
This part of the query was underlined as a SYNTAX error on the big query (start_station_name,) I copied it the exact way my instructor did on a course. But it didn't return a result.

fixed the query for you:
SELECT usertype, CONCAT(start_station_name ,"to", end_station_name) AS route, COUNT (*) AS num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name, usertype ORDER BY num_trips DESC LIMIT 10
there was a missing comma after usertype. there was missing parenthesis after int64. the group by had the wrong column names.
query runs and produces results.

Related

Oracle SQL group by to_char - not a group by expression

I want to group by dd-mm-yyyy format to show working_hours per employee (person) per day, but I get error message ORA-00979: not a GROUP BY expression, when I remove TO_CHAR from GROUP BY it works fine, but that's not I want as I want to group by days regardless hours, what am I doing wrong here?
SELECT papf.person_number emp_id,
to_char(sh21.start_time,'dd/mm/yyyy') start_time,
to_char(sh21.stop_time,'dd/mm/yyyy') stop_time,
SUM(sh21.measure) working_hours
FROM per_all_people_f papf,
hwm_tm_rec sh21
WHERE ...
GROUP BY
papf.person_number,
to_char(sh21.start_time,'dd/mm/yyyy'),
to_char(sh21.stop_time,'dd/mm/yyyy')
ORDER BY sh21.start_time
ORDER BY sh21.start_time
needs to either be just the column alias defined in the SELECT clause:
ORDER BY start_time
or use the expression in the GROUP BY clause:
ORDER BY to_char(sh21.start_time,'dd/mm/yyyy')
If you use sh21.start_time then the table_alias.column_name syntax refers to the underlying column from the table and you are not selecting/grouping by that.

Getting Error Running Query with Concatenation

I'm trying to run a query to join two columns together, count number of trips, and roundup the average of trip duration. I'm getting error
SELECT
usertype,
CONCAT(start_station_name, "to", end_station_name) AS Route,
COUNT(*) AS Num_Trips,
ROUND(AVG(CAST(tripduration AS INT64)/60), 2) AS Duration
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY
start_station_name, end_station_name, usertype
ORDER BY
Num_Trips DESC
LIMIT 10

How to filter multiple conditions in BigQuery?

I want to filter my table with 2 conditions:
ride_length must be greater than 31
For the trips that have both same start_station_name and end_station_name, the ride_length must be greater than 60
I try the query code with a subquery as below:
SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
(SELLECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND)
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND)>60 AND
start_station_name = end_station_name
)
I get this error: Syntax error: Parenthesized expression cannot be parsed as an expression, struct constructor, or subquery at [10:9]
It seems my subquery has issue? Anyone of you can help will be much appreciated!
Thanks in advance!
The subquery you have created cannot be used in the WHERE clause.
One approach would be to use a UNION to combine two sets of results.
This should work:
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
start_station_name != end_station_name)
UNION ALL
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 60 AND
start_station_name = end_station_name)

Finding some difficulty trying to ORDER BY the SUM(Revenue) with this query

I would like to order the result by using the SUM(Revenue), Below is my code kindly help me fix it, Thank you
SELECT
EXTRACT(YEAR FROM Release_date) AS year_released, COUNT(Genre) AS number_of_comedy,SUM(Revenue)AS total_revenue
FROM
Movie_data.movie
WHERE
Genre='Comedy'
GROUP BY
EXTRACT(YEAR FROM Release_date)
ORDER BY
SUM(Revenue)
LIMIT
1000
THE ERROR MESSAGE I get is "SELECT list expression references column Release_date which is neither grouped nor aggregated at [2:19]"
You should be able to use what you have written. You can also write:
ORDER BY total_revenue
Often when ordering by revenue, you want the largest values first:
ORDER BY total_revenue DESC

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.