Sqlite - get numbered rows

Sqlite - get numbered rows - sql

I am retrieving list of persons from a database and each person has some points. What I want to achieve is to get all person information along with person's points and rank. Points are calculated on the go, because they are not stored within the entity and the query looks something like that:
SELECT p.<some person attributes>, s.points, [here I need rank] as rank
FROM Persons p LEFT JOIN <subquery calculating points> s
ON p.id = s.personId
ORDER BY s.points DESC
In my select part I need to get a position in ranking of a person (what is basically order of returned rows, since I order it by points, right?)
Is there any sql/sqlite column or function to return that?

This is exactly what window functions are for. Specifically, dense_rank will also take care of pesky edge-cases where several users have the same number of points:
SELECT p.<some person attributes> s.points,
DENSE_RANK() OVER (ORDER BY points DESC) as "rank"
FROM Persons p
LEFT JOIN <subquery calculating points> s ON p.id = s.personId
ORDER BY s.points DESC

Unfortunately, SQLite is not very good at this. You pretty much need to resort to a correlated subquery:
with s as (
<subquery calculating points>
)
select p.<some person attributes>, s.points,
(select 1 + count(*)
from s s2
where s2.points > s.points
) as rank
from Persons p left join
s
on p.id = s.personId
order by s.points desc;
This specifically implements rank() over (order by points desc). Similar logic can be used for dense_rank() or row_number() if that is what you really need.

Related

How to avoid duplicates between two tables on using join?

I have two tables work_table and progress_table.
work_table has following columns:
id[primary key],
department,
dept_name,
dept_code,
created_time,
updated_time
progress_table has following columns:
id[primary key],
project_id,
progress,
progress_date
I need only the last updated progress value to be updated in the table now am getting duplicates.
Here is the tried code:
select
row_number() over (order by a.dept_code asc) AS sno,
a.dept_name,
b.project_id,
p.physical_progress,
DATE(b.updated_time) as updated_date,
b.created_time
from
masters.dept_users as a,
work_table as b
LEFT JOIN
progress as p on b.id = p.project_id
order by
a.dept_name asc
It shows the duplicate values for progress with the same id how to resolve it?[the progress values are integer whose values are feed to the form]

Having reformatted your query, some things become clear...
You've mixed , and JOIN syntax (why!?)
You start with the masters.dept_users table, but don't mention it in your description
You have no join predicate between dept_users and work_table
You calculate an sno, but have no partition by and never use it
Your query includes columns not mentioned in the table descriptions above
And to top it off, you use meaningless aliases like a and b? Please for the love of other, and your future self (who will try to read this one day) make the aliases meaningful in Some way.
You possibly want something like...
WITH
sorted_progress AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY project_id
ORDER BY progress_date DESC -- This may need to be updated_time, your question is very unclear
)
AS seq_num
FROM
progress
)
SELECT
<whatever>
FROM
masters.dept_users AS u
INNER JOIN
work_table AS w
ON w.user_id = u.id -- This is a GUESS, but you need to do SOMETHING here
LEFT JOIN
sorted_progress AS p
ON p.project_id = w.id -- Even this looks suspect, are you SURE that w.id is the project_id?
AND p.seq_num = 1
That at least shows how to get that latest progress record (p.seq_num = 1), but whether the other joins are correct is something you'll have to double (and triple) check for yourself.

SQL Query with row_number() not returning expected output

my goal is to write a query that should return the cities which produced the highest avg. sales for each item-category.
This is the expected output:
item_category|city
books |los_angeles
toys |austin
electronics |san_fransisco
My 3 table schemas look like this:
users
user_id|city
sales
user_id|item_id|sales_amt
items
item_id|item_category
These are further notes to consider:
1. sales_amt is the only column that may have Null values. if no users have placed a sale for a particular item-category (no rows in sales with a non-Null sales_amt), then the city name should be Null.
2. only 1 row per each distinct item. It more than 1 city qualify, then pick the first one alphabetically.
The attempt I took looks like this but it does not produce the right output:
select a.item_category,a.city from (
select
i.item_category,
u.city,
row_number() over (partition by i.item_category,u.city order by avg(s.sales_amt) desc)rk
from sales s
join users u on s.user_id=u.user_id
join items i on i.item_id=s.item_id
group by i.item_category,u.city)a
where a.rk=1
My output does not return the Null cased for sales_amt. Also, I get non-unique rows. Therefore, I am very nervous I am not properly incorporating the 2 notes.
I hope someone can help.

my goal is to write a query that should return the cities which produced the highest avg. sales for each item-category.
This can be calculated using aggregation and window functions:
select ic.*
from (select i.item_category, u.city,
row_number() over(partition by u.item_category order by avg(s.sales_amt) desc, u.city) as seqnum
from users u join
sales s
on s.user_id = u.user_id join
items i
on i.item_id = s.item_id
group by i.item_category, u.city
) ic
where seqnum = 1;
Your question explicitly says "average" which is why this uses avg(). However, I suspect that you really want the sum in each city, which would be sum().
Notes:
You want one row so row_number() instead of rank().
You need sales to calculate the average, so join, instead of left join.
You want one row per item_category, so that is used for partitioning.

Aaaand my take on it is a mix of GMB and Gordon's advices; GMB points out that left joins are needed but I think his starting table, partition and choice of rank() is wrong (his query cannot generate null city names as requested, and could generate duplicates tied on same avg), and Gordon picked up on things like ordering by city on a tied avg which GMB did not but missed the "if no sales of any items in category X put null for the city" requirement. Both guys left cancelled orders floating round the system which introduces errors:
select *
from (
select
i.item_category,
u.city,
row_number() over(partition by i.item_category order by avg(s.sales_amt) desc, u.city asc) rn
from items i
left join (select * from sales where sale_amt is not null) s on i.item_id = s.item_id
left join users u on s.user_id = u.user_id
group by i.item_category, u.city
) t
where rn = 1
We start from itemcategory so that categories having no sales get nulls for their sale amount and city.
We also need to consider that any sales that didn't fulfil will have null in their amount and we exclude these with a subquery otherwise they will link through to users giving a false positive - even though the avg will calculate as null for a category that only has cancelled orders, the city will still show when it should not). I could also have done this with a and sales_amt is not null predicate in the join but I think this way is clearer. This should not be done with a predicate in the where clause because that will eliminate the sale-less categories we are trying to preserve
Row number is used on avg but with city name to break any ties. It's a simpler function than rank and cannot generate duplicate values
Finally we pull the rn 1s to get the top averaging cities

I think you want left joins starting from users in the inner query to preserve cities without sales.
As for the ranking: if you want one record per city, then do not put other columns that city in the partition (your current partition gives you one record per city and per category, which is not what you want).
Consider:
select *
from (
select
i.item_category,
u.city,
rank() over(partition by u.city order by avg(s.sales_amt) desc) rk
from users u
left join sales s on s.user_id = u.user_id
left join items i on i.item_id = s.item_id
group by i.item_category, u.city
) t
where rk = 1

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.

If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz

Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)

If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

PostgreSQL ORDER BY with VIEWs

Let's say I want to write a simple SELECT query that uses a VIEW:
CREATE TEMP VIEW people AS
SELECT
p.person_id
,p.full_name
,p.phone
FROM person p
ORDER BY p.last_name;
SELECT
p.*
,h.address
,h.appraisal
FROM people p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.last_name, h.appraisal;
The obvious problem here is that p.last_name is no longer available when I go to perform the final ORDER BY.
How can I sort the final query so that the original sequence of the people view follows through to the final query?
The simple solution here, is to just include p.last_name with the view. I don't want to do that - my real world example (much more complicated) makes that a problem.
I've done similar things with temp tables in the past. For example, I create the table with CREATE TEMP TABLE testing WITH OIDS and then do an ORDER BY testing.oid to pass through the original sequence.
Is it possible to do the same with views?

This is possible if you use row_number() over().
Here is an example:
SELECT
p.*
,h.address
,h.appraisal
FROM (SELECT *, row_number() over() rn FROM people) p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.rn, h.appraisal;
And here is the SQL Fiddle you can test with.
As #Erwin Brandstetter correctly points out, using rank() will produce the correct results and allow for sorting on additional fields (in this case, appraisal).
SELECT
p.*
,h.address
,h.appraisal
FROM (SELECT *, rank() over() rn FROM people) p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.rn, h.appraisal;
Think about it this way, using row_number(), it will always sort by that field only, regardless of any other sorting parameters. By using rank() where ties are the same, other fields can easily be search upon.
Good luck.

Building on the idea of #sgeddes, but use rank() instead:
SELECT p.*
, h.address
, h.appraisal
FROM (SELECT *, rank() OVER () AS rnk FROM people) p
LEFT JOIN homes h ON h.person_id = p.person_id
ORDER BY p.rnk, h.appraisal;
db<>fiddle here - demonstrating the difference
Old sqlfiddle

Create a row_number column and use it in the select.
CREATE TEMP VIEW people AS
SELECT
row_number() over(order by p.last_name) as i
,p.person_id
,p.full_name
,p.phone
FROM person p
SELECT
p.*
,h.address
,h.appraisal
FROM people p
LEFT JOIN homes h
ON h.person_id = p.person_id
ORDER BY p.i, h.appraisal

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sqlite - get numbered rows - sql

Related

How to avoid duplicates between two tables on using join?

SQL Query with row_number() not returning expected output

Subtracting values of columns from two different tables

Distinct on multi-columns in sql

PostgreSQL ORDER BY with VIEWs

Categories

Resources