Use case after order by - sql

I was reading an sql book, one of questions is:
Write a query against the Sales.Customers table that returns for each customer the customer ID and region. Sort the rows in the output by region, having NULL marks sort last (after non-NULL values).Note that the default sort behavior for NULL marks in T-SQL is to sort first (before non-NULL values).
And the answer is :
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 1 ELSE 0 END, region;
I can kind of get the idea but still confused, let's take the record with custid = 9 for instance:
since custid 9 has a null region, in the case cstatement return 1, so the query is sth like:
ORDER BY 1, region
which is equivalent to:
ORDER BY custid, region --because custid is the first column
so how come the custid 9 is not before custid 10(the second record in the output)? isn't that output needs to order by custid first, so 9 is before 10?

Your interpretation is incorrect. The 1 is simple a number, not a column reference.
The query is equivalent to:
SELECT custid, region
FROM (SELECT c.*,
(CASE WHEN region IS NULL THEN 1 ELSE 0 END) as region_is_null
FROM Sales.Customers c
) c
ORDER BY region_is_null, region;
This is an important distinction about numbers in the ORDER BY. The expression:
ORDER BY 1
refers to the first column. However,
ORDER BY 1 + 0
is simply a numeric expression that returns the constant 1 -- and will result in an error in SQL Server (which does not allow constants in ORDER BY).

so the query is sth like
ORDER BY 1, region
No this is incorrect. The expression CASE WHEN region IS NULL THEN 1 ELSE 0 END is evaluated per-row; and the 1 is a value instead of column position. Column position inside ORDER BY can only specified only as a literal and not as an expression. So this:
custid region
8 NULL
9 NULL
10 BC
42 BC
45 CA
Becomes:
custid region case...
8 NULL 1
9 NULL 1
10 BC 0
42 BC 0
45 CA 0
And the sorted results could be:
custid region case...
10 BC 0
42 BC 0
45 CA 0
8 NULL 1
9 NULL 1
Or:
custid region case...
42 BC 0
10 BC 0
45 CA 0
9 NULL 1
8 NULL 1

You can try below - in your case 0 will be comign first then 1 so you need to change the order of the value, or you can do desc order if you don't want to change the value
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 0 ELSE 1 END, region

The idea is to use CASE statement to create a calculate virtual column to mark the nulls as 0 and none nulls as 1 and then sort accordingly.
if you use 0 in the order by clause you will get an error because you don't have a column at position of 0, also if you reorder the selected columns the result will be the same.
so the output of case statement is not a position of column it's a calculated column.
customer_id region marker
not important if null 0

ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
is not equivalent to
ORDER BY 1,
region
because in the second one the first column to sort by is always constant, whereas in the first it can change depending on the CASE.
And
ORDER BY 1,
region
is also not equivalent to
ORDER BY custid,
region
again in the first the 1 is constant but custid is variable.
What
ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
does is to "generate" a new column to sort by depending on the content of region. That new column gets 1 when region is null 0 otherwise. If you imagine this new column in the table it would look like
custid | region | new column
...
10 | BC | 0
...
9 | NULL | 1
...
Now if this gets sorted by the new column and the region the customer with ID 10 comes before the customer with ID 9 because the one with ID 10 has the lower value for the new column -- 0 against the 1 from the customer with the ID 9.

Related

Ignoring records for certain criteria

I have data as below,
ACCOUNT
FLAG
asdf
1
asdf
2
asdf
3
kjhj
1
qwer
1
qwer
1
need to get output:
ACCOUNT
FLAG
kjhj
1
qwer
1
situation is that need to get records that have only "1" in 2nd column. If they have any other value other than "1", need to ignore all records for particular 1st column.
can you plz suggest a query
tried group by but didn't find option
Group to a single account per output row, then assert that all rows in a group must have flag=1 by using HAVING with both min and max.
SELECT
account,
MIN(flag) flag
FROM
your_table
GROUP BY
account
HAVING
MIN(flag) = 1
AND MAX(flag) = 1
Some people prefer the following and being more understandable, and it also causes a NULL row to exclude the group...
HAVING
MIN(CASE WHEN flag=1 THEN 1 ELSE 0 END) = 1

SQL: create another column that calculates ratio

So I have a table that looks like the following:
car owner
non car owner
have dog
num ppl
1
0
1
60
0
1
1
80
1
0
0
90
1
0
0
98
I am trying to add another column to find the ratios. For example, the total number of car owners is 110. If I want to find the ratio of people who own car and have dog, then I have to divide 60/110 for the first row. Also, the total number of non car owners is 98. Therefore, if I want to find that ration, I need to divide 80 by 98 for the second row and so on.
So far, I have tried the following code:
with a as (
select
id,
case when car_owner = 1 then 1 else 0 end car_owner,
case when non_car_owner = 1 then 1 else 0 end as non_car_owner = 1
from `xyz_table`
),
b as (select
car_owner,
non_car_owner,
case when have_dog = 1 then 1 else 0 end have_dog,
count(distinct id) num_ppl
from `xyz_table`
join a using (id)
group by 1,2,3
order by 4 desc
)
select *, num_ppl/(select (case when dog_owner = 1 then 110 else 0 end) as ratio
from a)
from b
Unfortunately , it throws the following error:
Scalar subquery produced more than one element
Any help would be appreciated.
PS. I am running this code on google bigquery.
If I want to find the ratio of people who own car and have dog,
You can use avg():
select avg(car_owner * have_dog)
from t;

Get previous value from column A when column B is not null in Hive

I have a table tableA below
ID number Estimate Client
---- ------
1 3 8 A
1 NULL 10 Null
1 5 11 A
1 NULL 19 Null
2 NULL 20 Null
2 2 70 A
.......
I would like to select previous row of Estimate column when number column is not null. For instance, when number = 3, then pre_estimate = NULL, when number = 5, then pre_estimate = 10, and when number = 2, then pre_estimate = 20.
The query below does not seem to return the correct answer in Hive. What should be correct way to do it?
select lag(Estimate, 1) OVER (partition by ID) as prev_estimate
from tableA
where number is not null
Consider the table with following structure:
number - int
estimate - int
order_column - int
order_column is taken as a column on which you want to sort your table rows.
Data in table:
number estimate order_column
3 8 1
NULL 10 2
5 11 3
NULL 19 4
NULL 20 5
2 70 6
I used the following query and got the result you have mentioned.
SELECT * FROM (SELECT number, estimate, lag(estimate,1) over(order by order_column) as prev_estimate from tableA) tbl where tbl.number is not null;
As per my understanding, I didn't find the reason to partition by id, that's why I haven't considered ID in the table.
The reason you were getting wrong results is due to the reason that where clause in main query will select only the records with number as not null and then it computes lag function, but you need to consider all the rows when computing the lag function and then you should select rows with number as not null.

how can i alternate between 0 and 1 values in sql server?

I want to create a select which will alternate between 1 and 0
my table looks like that
id1 id2 al
11 1 1
40 1 0
12 1 0
237 1 1
but I want to make it like that
id1 id2 al
40 1 0
11 1 1
12 1 0
237 1 1
I want to keep the same values in my table but I just want to switch the rows to alternate between 0 and 1
Consider:
select *
from mytable
order by row_number() over(partition by al order by id1), al
This alternates 0 and 1 values - if the groups have a different number of rows, then, once the smallest group exhausts, all remaining rows in the other group appear at the end of the resultset.
I am unsure which column you want to use to order the rows within each group - I assumed id1, but you might want to change that to your actual requirement.

One column condition in sql

I have a table:
[letter] [Name] [status] [price]
A row1 1 11
A row1 1 15
B row2 2 9
B row2 3 23
B row2 3 30
And want to select data something like this:
SELECT letter, Name,
COUNT(*),
CASE WHEN price>10 THEN COUNT(*) ELSE NULL END
GROUP BY letter, Name
the result is:
A row1 2 2
B row2 1 null
B row2 2 2
But I want this format:
A row1 2 2
B row2 3 2
Please, help me to modify my query
Close. Probably want this instead:
SELECT letter, Name,
COUNT(*),
SUM(CASE WHEN price>10 THEN 1 ELSE 0 END)
FROM TableThatShouldHaveAppearedInTheQuestionInTheFromClause
GROUP BY letter, Name
should work. Assuming that the intention of the fourth column is to return the count of the number of rows, within each group, with a price greater than 10. It's also possible to do this as a COUNT() over a CASE then returns non-NULL and NULL results for the rows that should and should not be counted, but I find the above form easier to quickly reason about.
Since nulls are not used in aggregate functions:
SELECT letter
, name
, count(*)
, count(
case when price > 10 then 1
end
)
FROM t
GROUP BY letter, name
You were very close.
Looking to the other answers, probably this is not the best way, but it will work.
The count of the prices over 10 is made with a subquery which has a condition on price > 10 and which is joined to the current TAB record with the alias A for the same letter and name.
SELECT letter,
Name,
COUNT(*),
(SELECT COUNT(*) FROM TAB WHERE letter = A.letter and Name = A.Name WHERE price>10)
FROM TAB A
GROUP BY letter, Name