Getting Error Running Query with Concatenation - sql

I'm trying to run a query to join two columns together, count number of trips, and roundup the average of trip duration. I'm getting error
SELECT
usertype,
CONCAT(start_station_name, "to", end_station_name) AS Route,
COUNT(*) AS Num_Trips,
ROUND(AVG(CAST(tripduration AS INT64)/60), 2) AS Duration
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY
start_station_name, end_station_name, usertype
ORDER BY
Num_Trips DESC
LIMIT 10

Related

my query won't run, and I think the query is correct, anyone knows what I did wrong?

SELECT
usertype CONCAT(start_station_name ,"to", end_station_name) AS route,
COUNT (*) AS num_trips,
ROUND(AVG(cast(tripduration as int64/60),2) AS duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station, end_station, usertype
ORDER BY num_trips DESC LIMIT 10
This part of the query was underlined as a SYNTAX error on the big query (start_station_name,) I copied it the exact way my instructor did on a course. But it didn't return a result.
fixed the query for you:
SELECT usertype, CONCAT(start_station_name ,"to", end_station_name) AS route, COUNT (*) AS num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name, usertype ORDER BY num_trips DESC LIMIT 10
there was a missing comma after usertype. there was missing parenthesis after int64. the group by had the wrong column names.
query runs and produces results.

How to filter multiple conditions in BigQuery?

I want to filter my table with 2 conditions:
ride_length must be greater than 31
For the trips that have both same start_station_name and end_station_name, the ride_length must be greater than 60
I try the query code with a subquery as below:
SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
(SELLECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND)
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND)>60 AND
start_station_name = end_station_name
)
I get this error: Syntax error: Parenthesized expression cannot be parsed as an expression, struct constructor, or subquery at [10:9]
It seems my subquery has issue? Anyone of you can help will be much appreciated!
Thanks in advance!
The subquery you have created cannot be used in the WHERE clause.
One approach would be to use a UNION to combine two sets of results.
This should work:
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
start_station_name != end_station_name)
UNION ALL
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 60 AND
start_station_name = end_station_name)

How to avoid displaying duplicate values in rows

I have this query in which I try to display the most frequent trip between two stations by the day of the week:
SELECT startday AS Day, start_station_name, start_station_id, end_station_name, end_station_id, count(*) AS trip_counts
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
ORDER BY trip_counts DESC
AND this is my current output:
I don't want rows with the same 'day' to repeat, I just want to display the one with the larger trip_count by day.
Row 8 should not be displayed, since row 1 has value 'Monday' already and its value for trip_count is larger.
You can use window functions:
SELECT s.*
FROM (SELECT startday AS Day, start_station_name, start_station_id,
end_station_name, end_station_id,
COUNT(*) AS trip_counts,
ROW_NUMBER() OVER (PARTITION BY startday ORDER BY COUNT(*) DESC) as seqnum
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
) s
WHERE seqnum = 1
ORDER BY trip_counts DESC
I would simply turn your query to a view (so it would be easier to access next time):
CREATE VIEW view_name
AS
SELECT startday AS Day, start_station_name, start_station_id, end_station_name, end_station_id, COUNT(*) AS trip_counts
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
ORDER BY trip_counts DESC
Then you can easily use the query below to access only the entries with max trip_counts:
SELECT *, MAX(trip_counts) trip_counts
FROM vw_name
GROUP BY day

Bigquery - Select a column with not grouping them in group by clause

I'm having day-wise tables with google analytics data that is split based on device_category(desktop/mobile/tablet) and user_type(new user/returning user).
My requirement is, to query for the top-performing product in the month and just know the type of device and user. I do not want to group them based on device_category, user_type.
When excluding them from my query is gives an error saying - "Query error: SELECT list expression references column device_category which is neither grouped nor aggregated at [3:21]"
QUERY THAT DOES NOT WORK(this is my requirement)
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
QUERY THAT WORKS
SELECT
month,
year,
device_category,
user_type,
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name,
device_category,
user_type
order by
item_revenue desc;
Sample Data
I know in regular SQL workbenches we can select a Column in SQL not in Group By clause, but the same does not work for my issue on Bigquery.
Could you help me with a workaround for this.
Technically, you can envelope device_category and user_type with ANY_VALUE or MAX or MIN:
SELECT
month,
year,
ANY_VALUE(device_category),
ANY_VALUE(user_type),
product_name,
round(sum(item_revenue),2) as item_revenue
FROM
`ProjectName.DatasetName.GA_REPORT_3_*`
where
_table_suffix between '20201101' and '20210131'
and channel_grouping = 'Organic Search'
group by
month,
year,
channel_grouping,
product_name
order by
item_revenue desc;
You can use a subquery to achieve this:
SELECT
x.month,
x.year,
x.device_category,
x.user_type,
x.product_name,
ROUND(SUM(x.item_revenue),2) as item_revenue
FROM
(SELECT
month,
year,
device_category,
user_type,
product_name,
item_revenue
FROM `ProjectName.DatasetName.GA_REPORT_3_*`
WHERE _table_suffix BETWEEN '20201101' and '20210131'
AND channel_grouping = 'Organic Search'
) x
GROUP BY
x.month,
x.year,
x.product_name,
x.device_category,
x.user_type
ORDER BY ROUND(SUM(x.item_revenue),2) DESC;

Sessions per Hour and Minute: SQL - BigQuery / GA360

I want to measure the impact of TV-Advertising on website sessions and transactions.
Therefore, I need all (new) sessions per hour and minute to compare these with the TV airing time. I thought about selecting the hits.hitnumber and filter it by all values that equal 1 (=first hit). However, I get too little sessions, around one third of what GA displays.
Any help is appreciated!
SELECT
date,
hits.hour,
hits.minute,
hits.hitNumber,
totals.transactions,
hits.item.productName,
FROM [XXXXXXXX.ga_sessions_20180221]
WHERE hits.hitNumber =1
GROUP BY date, hits.hour, hits.minute, hits.hitNumber ,totals.transactions, hits.item.productName
There may be several sessions with the same values of the fields you are grouping by, so you should add session identifiers (in BQ, this is the combination of fullVisitorId and visitId) at least in the GROUP BY clause:
SELECT
fullVisitorId,
visitId,
date,
hits.hour,
hits.minute,
hits.hitNumber,
totals.transactions,
hits.item.productName,
FROM [XXXXXXXX.ga_sessions_20180221]
WHERE hits.hitNumber =1
GROUP BY
fullVisitorId, visitId, date, hits.hour, hits.minute, hits.hitNumber,
totals.transactions, hits.item.productName