WITH
longest_used_bike AS (
SELECT
bikeid,
SUM(duration_minutes) AS trip_duration
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips`
GROUP BY
bikeid
ORDER BY
trip_duration DESC
LIMIT 1
)
-- find station at which longest_used bike leaves most often
SELECT
trips.start_station_id,
COUNT(*) AS trip_ct
FROM
longest_used_bike AS longest
INNER JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
ON longest.bikeid = trips.bikeid
GROUP BY
trips.start_station_id
ORDER BY
trip_ct DESC
LIMIT 1
this query will give you a result thats 2575 but why does the result change to 3798 when you use full join instead of inner join? im trying to figure that one what but i am not sure what to think
A full join will include all entries from the trips table - regardless of whether or not they are joinable to the longest_used_bike ID (they will have a NULL value for the columns in longest)
Also see here for an explanation on join-types.
A tip: If you encounter things like these try to look at the queries unaggregated (omit the GROUP BY clause and the COUNT function) - you would then notice here that you'll suddenly have more (unwanted) rows in the FULL JOIN query.
An INNER JOIN will return only rows where the JOIN condition is satisfied. So only rows where there us a natch in both tables.
A FULL JOIN will return ALL rows from the left and all rows from the right with null values in the fields where there is not a natch.
Related
I am trying to figure out the station from which the bike begins a trip most frequently. The tutorial walks through the following code:
WITH
longest_used_bike AS (
SELECT
bikeid,
SUM(duration_minutes) AS trip_duration
FROM
bigquery-public-data.austin_bikeshare.bikeshare_trips
GROUP BY
bikeid
ORDER BY
trip_duration DESC
LIMIT 1
)
## find station at which the longest-used bike leaves
most often
SELECT
trips.start_station_id,
COUNT(*) AS trip_ct
FROM
longest_used_bike AS longest
INNER JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
ON longest.bikeid = trips.bikeid
GROUP BY
trips.end_station_id
ORDER BY
trip_ct DESC
LIMIT 1
I am getting error: "SELECT list expression references trips.start_station_id which is neither grouped nor aggregated at [17:5]"
How should I go about fixing it and still get the expected answer for the question,"In this activity, you wrote a query with an INNER JOIN to join your temporary table with the original bikeshare_trips table. Which station ID would your query return if you used a FULL JOIN instead of an INNER JOIN?" TIA
I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;
I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.
LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".
MDW_CUSTOMER_ACCOUNTS has the following fields: ACCOUNT_ID, MEAL_ID.
MDW_MEALS_MENU has the following fields: MEAL_ID, MEAL_NAME.
I am trying to generate a report on the number of times a particular meal has been subscribed to by a customer using the query,
SELECT count(a.account_id), b.meal_id, b.meal_name
FROM mdw_meals_menu b LEFT JOIN mdw_customer_accounts a
on b.meal_id=a.meal_id
WHERE
a.start_date BETWEEN to_date('01-APR-2013','DD-MON-YYYY')
AND to_date('30-JUN-2013','DD-MON-YYYY')
GROUP BY b.meal_id, b.meal_name
ORDER BY count(a.account_id) desc, b.meal_id;
This only lists the MEAL_IDs that has been subscribed to at least once. But it is not displaying the Ids that have not been subscribed to.
How do I get these MEAL_IDs to print with the count being 0?
i have modified the code, but still i get the same result.
Your where clause is effectively turning your outer join back into an inner join - conditions on an outer-joined table should generally be in the join clause, like so:
SELECT count(a.account_id), b.meal_id, b.meal_name
FROM mdw_meals_menu b
LEFT JOIN mdw_customer_accounts a
on b.meal_id=a.meal_id and
a.start_date BETWEEN to_date('01-APR-2013','DD-MON-YYYY')
AND to_date('30-JUN-2013','DD-MON-YYYY')
GROUP BY b.meal_id, b.meal_name
ORDER BY count(a.account_id) desc, b.meal_id;
You should use a left outer join .
I am trying to get information from 3 tables in my database. I am trying to get 4 fields. 'kioskid', 'kioskhours', 'videotime', 'sessiontime'. In order to do this, i am trying a join in a subquery. This is what I have so far:
SELECT k.kioskid, k.hours, v.time, s.time
FROM `nsixty_kiosks` as k
LEFT JOIN (SELECT time
FROM `nsixty_videos`
ORDER BY videoid) as v
ON kioskid = k.kioskid LEFT JOIN
(SELECT kioskid, time
FROM `sessions`
ORDER BY pingid desc LIMIT 1) as s ON s.kioskid = k.kioskid
WHERE hours is NOT NULL
When I run this query, it works but it shows every row instead of just showing the last row of each kiosk id. Which is meant to show based on the line 'ORDER BY pingid desc LIMIT 1'.
Any body have some ideas?
Instead of joining to s, you can use a correlated subquery:
SELECT k.kioskid,
k.hours,
v.time,
( SELECT time
FROM sessions
WHERE sessions.kioskid = k.kioskid
ORDER
BY pingid DESC
LIMIT 1
)
FROM nsixty_kiosks AS k
LEFT
JOIN ( SELECT time
FROM `nsixty_videos`
ORDER BY videoid
) AS v
ON kioskid = k.kioskid
WHERE hours IS NOT NULL
;
N.B. I didn't fix your LEFT JOIN (...) AS v, because I don't understand what it's trying to do, but it too is broken; the ON clause doesn't refer to any of its columns, and there's no point in having an ORDER BY in a subquery unless you also have a LIMIT or whatnot in there.
Well, your join on the 'v' subquery doesn't actually reference the 'v' subquery, nor does the 'v' subquery even contain a kioskid field to JOIN on, so that's undoubtedly part of the problem.
To go much further we'd need to see schema and sample data.