How to filter multiple conditions in BigQuery? - sql

I want to filter my table with 2 conditions:
ride_length must be greater than 31
For the trips that have both same start_station_name and end_station_name, the ride_length must be greater than 60
I try the query code with a subquery as below:
SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
(SELLECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND)
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND)>60 AND
start_station_name = end_station_name
)
I get this error: Syntax error: Parenthesized expression cannot be parsed as an expression, struct constructor, or subquery at [10:9]
It seems my subquery has issue? Anyone of you can help will be much appreciated!
Thanks in advance!

The subquery you have created cannot be used in the WHERE clause.
One approach would be to use a UNION to combine two sets of results.
This should work:
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 31 AND
start_station_name != end_station_name)
UNION ALL
(SELECT
TIMESTAMP_DIFF(ended_at,started_at,SECOND) AS ride_length,
start_station_name,
end_station_name
FROM
divvy_stations_trips.all_trips
WHERE
TIMESTAMP_DIFF(ended_at,started_at,SECOND) > 60 AND
start_station_name = end_station_name)

Related

my query won't run, and I think the query is correct, anyone knows what I did wrong?

SELECT
usertype CONCAT(start_station_name ,"to", end_station_name) AS route,
COUNT (*) AS num_trips,
ROUND(AVG(cast(tripduration as int64/60),2) AS duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station, end_station, usertype
ORDER BY num_trips DESC LIMIT 10
This part of the query was underlined as a SYNTAX error on the big query (start_station_name,) I copied it the exact way my instructor did on a course. But it didn't return a result.
fixed the query for you:
SELECT usertype, CONCAT(start_station_name ,"to", end_station_name) AS route, COUNT (*) AS num_trips, ROUND(AVG(cast(tripduration as int64)/60),2) AS duration FROM bigquery-public-data.new_york_citibike.citibike_trips GROUP BY start_station_name, end_station_name, usertype ORDER BY num_trips DESC LIMIT 10
there was a missing comma after usertype. there was missing parenthesis after int64. the group by had the wrong column names.
query runs and produces results.

Running query is returning previous filled cells as null

When I run a query that just brings back the columns, it works as intended i.e.
SELECT
start_station_name,
start_station_id,
end_station_name,
end_station_id,
FROM `casestudycyclistic-359515.DIvvyData.November2021` LIMIT 1000
This gives the proper values of the street names and IDs of the stations.
However when I run a query intended to add two columns to the table, the result nulls those values.
CREATE TABLE `casestudycyclistic-359515.DIvvyData.November2021rev2` AS (
SELECT
ride_id,
ended_at,
(ended_at - started_at) AS ride_length,
EXTRACT(DAYOFWEEK FROM ended_at) AS WEEKDAY,
started_at,
start_station_name,
start_station_id
end_station_name,
end_station_id,
start_lat,
start_lng,
end_lat,
end_lng,
member_casual,
FROM
`casestudycyclistic-359515.DIvvyData.November2021`
)
Most of but not all the values in those four columns return as null. What have I done wrong here?

Getting Error Running Query with Concatenation

I'm trying to run a query to join two columns together, count number of trips, and roundup the average of trip duration. I'm getting error
SELECT
usertype,
CONCAT(start_station_name, "to", end_station_name) AS Route,
COUNT(*) AS Num_Trips,
ROUND(AVG(CAST(tripduration AS INT64)/60), 2) AS Duration
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY
start_station_name, end_station_name, usertype
ORDER BY
Num_Trips DESC
LIMIT 10

How to avoid displaying duplicate values in rows

I have this query in which I try to display the most frequent trip between two stations by the day of the week:
SELECT startday AS Day, start_station_name, start_station_id, end_station_name, end_station_id, count(*) AS trip_counts
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
ORDER BY trip_counts DESC
AND this is my current output:
I don't want rows with the same 'day' to repeat, I just want to display the one with the larger trip_count by day.
Row 8 should not be displayed, since row 1 has value 'Monday' already and its value for trip_count is larger.
You can use window functions:
SELECT s.*
FROM (SELECT startday AS Day, start_station_name, start_station_id,
end_station_name, end_station_id,
COUNT(*) AS trip_counts,
ROW_NUMBER() OVER (PARTITION BY startday ORDER BY COUNT(*) DESC) as seqnum
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
) s
WHERE seqnum = 1
ORDER BY trip_counts DESC
I would simply turn your query to a view (so it would be easier to access next time):
CREATE VIEW view_name
AS
SELECT startday AS Day, start_station_name, start_station_id, end_station_name, end_station_id, COUNT(*) AS trip_counts
FROM table
GROUP BY startday, start_station_name, end_station_name, start_station_id, end_station_id
ORDER BY trip_counts DESC
Then you can easily use the query below to access only the entries with max trip_counts:
SELECT *, MAX(trip_counts) trip_counts
FROM vw_name
GROUP BY day

PostgreSQL How to add WHERE where is count() and GROUP

How can I add WHERE function into my query ?
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = 1 // not working I don't know why
GROUP BY txn_year;
Thanks for any opinion
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE column_active = 1
GROUP BY txn_year;
active is of type character varying, i.e. a string type. This should work:
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = '1'
GROUP BY txn_year;