Joining a table on itself - sql

Is there a better way to write this SQL query?
SELECT *, (SELECT TOP 1 columnB FROM mytable WHERE mytable.columnC = T1.columnC ORDER BY columnD) as firstRecordOfColumnB
FROM
(SELECT * FROM mytable WHERE columnA = 'apple') as T1
Notice that columnC is not the primary key.

If the keyColumns is really a key column (i.e. unique), than the query can definitly be written more elegantly and efficiently...
SELECT
*, columnB
FROM
mytable
WHERE
columnA = 'apple'

This might be better in case of performance:
SELECT
*,
(TOP 1 myLookupTable.columnB FROM mytable AS myLookupTable WHERE myLookupTable.keyColumn = mytable.keyColumn) as firstRecordOfColumnB
FROM
mytable
WHERE
columnA = 'apple'
But for the TOP 1 part I don't know any better solution.
Edit:
If the keyColumn is unique, the data in firstRecordOfColumnB would be the same as in mytable.columnB.
If it's not unique at least you need to sort that data to get a relevant TOP 1, example:
SELECT
*,
(TOP 1 myLookupTable.columnB FROM mytable AS myLookupTable WHERE myLookupTable.keyColumn = mytable.keyColumn
ORDER BY myLookupTable.sortColumn) as firstRecordOfColumnB
FROM
mytable
WHERE
columnA = 'apple'

Related

SQL select rows based on CASE statement

I would want to get each ID's 'good' value if exists, 'bad' if it doesn't.
If an ID has index='good', want to return that row for the ID.
If an ID ONLY has index='bad', want to get that one.
How would you go about it?
You can try the below -
with cte as
(
select id, index, value, row_number() over(partition by id order by case when index='good' then 1 else 2 end) as rn
from tablename
)
select id,index, value
from cte where rn=1
SELECT
COALESCE(good.id, bad.id) AS id,
COALESCE(good.index, bad.index) AS index,
COALESCE(good.value, bad.value) AS value
FROM data AS good
FULL OUTER JOIN data AS bad on good.id=bad.id and good.index='good' and bad.index='bad'
You can use EXISTS to check if there is index = 'good' for each id:
SELECT t1.*
FROM tablename t1
WHERE t1.index = 'good'
OR NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.id = t1.id AND t2.index = 'good'
);

GoogleSQL - SELECT IF

I'm working with a dataset - structured like this
I want to exclude all records with ReviewRound being "a" if they have gone through review round "b" - If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
Some records have not gone to round "b". The issues I'm running into are as a result of there being multiple records for each unique ID.
Ideally this could be done in GoogleBigQuery, if not, filtering through GoogleScripts may also be an option!
Any suggestions would be appreciated!
If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
If I followed you correctly, you could express this as a not condition with a correlated subquery that ensures that, if the current record has ReviewRound = 'a', there is no other record that has the same id and ReviewRound = 'b'.
select t.*
from mytable t
where not (
t.ReviewRound = 'a'
and exists (
select 1
from mytable t1
and t1.id = t.id and t1.ReviewRound = 'b'
)
)
You can do this with window functions as well:
select t.* except (num_bs)
from (select t.*,
countif(reviewround = 'b') over (partition by id) as num_bs
from t
) t
where num_bs = 0 or reviewround = 'b';
By using window functions, you can solve it with this query
SELECT ID, Score
FROM (
SELECT *,
MAX(CASE WHEN ReviewRound = 'b' THEN 1 ELSE 0 END) OVER (partition by ID) as has_b
FROM mytable
) t
WHERE has_b = 0
Re-conceptualizing as keeping only the latest review round, I would try:
select * from mytable join
(select ID, max(ReviewRound) as ReviewRound from mytable group by ID)
on (ID, ReviewRound)

How to optimize a SQL Query containing a max?

Please note that both T and T1 refer to same table.
We are trying to retrieve a maximum value and while retrieving max value, we
are interested in those rows, which have equal columnC values.
select *
from table T
where T.columnA in (0,1,2,3)
and T.columnB = (select max(T1.columnB)
from table T1
where T1.columnC = T.columnC)
This type of query is typically more efficient using window functions:
select *
from (
select *,
max(columnb) over (partition by columnc) as max_b
from the_table
where columna in (0,1,2,3)
) t
where columnb = max_b;
If the condition on columna is very selective an index on that column would help. Some optimizers might generate more efficient plans if you change columna in (0,1,2,3) into columna between 0 and 3
a_horse_sith_no_name is correct that window functions are generally a better approach. Regardless of window functions or your query, indexes will help.
In particular, you want indexes on T(columnc, columnb) and T(columnA). That is two separate indexes. The SQL optimizer should be able to take advantage of the indexes both for your query and for the window functions approach.
Not sure about where do you want (which layer) the columnA filter, but maybe like this:
Select tt1.* from table tt1
inner join
(
select * from
table t1
inner join
( select max(T0.columnB) max_columnB
from table t0 ) t2
on t1.columnB = t2.max_columnB
) tt2
on tt1.columnC = tt2.columnC
Where tt1.columnA in (0,1,2,3)
An index is needed for columnA, and columnB and for columnC to run fast.

Big Query view (table without duplicate rows)

I need to create a view that is pretty much just like some table with some simple transformations and I want to make sure the values in a particular column are not duplicate.
So let's say the table looks like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
1 fingers computers
And the resulting view would look like this:
ID, ColumnA, ColumnB
-------------------
1 cars shirts
2 tvs dogs
So, is there an equivalent to SELECT distint(ID), ColumnA, ColumnB?
What's the most efficient way to do it?
If you just want an arbitrary row for each ID, use ANY_VALUE:
#standardSQL
WITH Input AS (
SELECT 1 AS ID, 'cars' AS ColumnA, 'shirts' AS ColumnB UNION ALL
SELECT 2 AS ID, 'tvs' AS ColumnA, 'dogs' AS ColumnB UNION ALL
SELECT 1 AS ID, 'fingers' AS ColumnA, 'computers' AS ColumnB
)
SELECT
ANY_VALUE(t).*
FROM Input AS t
GROUP BY t.ID;
Or you can use the ARRAY_AGG trick to select the latest row based on a condition.
Below is for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'cars' AS columnA, 'shirts' AS columnB UNION ALL
SELECT 2, 'tvs', 'dogs' UNION ALL
SELECT 1, 'fingers', 'computers'
)
SELECT r.*
FROM (
SELECT ARRAY_AGG(t ORDER BY columnA LIMIT 1)[OFFSET (0)] AS r
FROM yourTable t
GROUP BY id
)
-- ORDER BY id
Note: you should have some logic about selecting row with cars over the fingers!
Above version (as an example) is based on asc order

SQL Server 2005 - Select top N plus "Other"

I have a table for which I want to select top the 5 rows by some column A. I also want to have a 6th row titled 'Other' which sums the values in column A for all but the top 5 rows.
Is there an easy way to do this? I'm starting with:
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
Not tested, but try something like this:
select * from (
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
union all
select
null, sum(columnA)
from
someTable t
where primaryKey not in (
select top 5
primaryKey
from
someTable t
order by
columnA desc
)
) a
select top 5 columnB, columnA
from someTable
order by columnA desc
select SUM(columnA) as Total
from someTable
Do the subtraction on the client side.
100% untested, and off the top of my head, but you can give something like this a go. If I have a chance to test tonight I'll update the post, but there's a bottle of wine open for dinner and it's Friday night... :)
WITH CTE AS
(
SELECT
ColumnB,
ColumnA,
ROW_NUMBER() OVER (ORDER BY ColumnB) AS RowNumber
FROM
dbo.SomeTable
)
SELECT
CASE WHEN RowNumber <= 5 THEN ColumnB ELSE 'Other' END AS ColumnB,
SUM(ColumnA) AS ColumnA
FROM
CTE
GROUP BY
CASE WHEN RowNumber <= 5 THEN ColumnB ELSE 'Other' END
ORDER BY
MIN(RowNumber)
EDIT: Looks like this worked after a couple of silly syntax errors. I've corrected those, so it should work as listed above now. I can't speak to performance on a large data set though, but it's worth giving it a shot.
This is off the top of my head, and i will garuntee horribly efficient:
SELECT TOP 5 columnB, columnA
FROM comTable t
ORDER BY columnA desc
UNION
SELECT 'Other', (A.Sum - B.Sum) AS Summary
FROM (SELECT SUM(columnA) FROM someTable) A
JOIN (SELECT SUM(One) FROM
(SELECT TOP 5 columnA FROM someTable ORDER BY columnA DESC)) B
I have recently used the EXCEPT statemnet a lot: (Not tested but I give it a go)
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
UNION ALL
SELECT 'OTHER' ColumnB, SUM(ColumnA)
FROM
(SELECT ColumnB, ColumnA
FROM someTable t
EXCEPT
select top 5
columnB, columnA
from
someTable t
order by
columnA desc
) others