Amount based on a prority - sql

Please feel free to change or suggest me to change my title to better sound on what I am trying to ask.
I have a query that gives the following result:
select
Customer.customer_id,
Transaction.amount
From Customer inner join Transaction on Customer.customer_id = Transcation.coustomer_id
Result:
customer_id| amount
01456 |50
01456 |100
01456 |400
01456 |0
01963 |50
01963 |100
01963 |221
01963 |0
Now, I want to add a priority field to give me a priority of 1, 2, or 3. The lower the amount, the higher the priority. Note: I want to replace 0 with text 'Negative'. Ranking amount expect 0.
This is what I want.
customer_id| amount| priority
01456| 50| 3
01456| 100| 2
01456| 400| 1
01456| 0| Negative
01963| 50| 3
01963| 100| 2
01963| 221| 1
01963| 0| Negative
Is this achievable? Your help will be greatly appreciated.

Window functions like ROW_NUMBER() are perfect for this:
SELECT c.customer_id,
t.amount,
ROW_NUMBER() over (PARTITION BY customer_id ORDER BY amount desc) priority
FROM Customer c
JOIN [Transaction] t on c.customer_id = t.customer_id
The partition by resets the numbering on each unique customer_id, and the order by decides which direction and order to number the rows.

Use row_number() or rank():
select customer_id, amount,
row_number() over (partition by customer_id order by amount desc) as priority
from t;

Related

Oracle SQL - How to return the name with the highest ID ending in a certain number

I have a table structured like this where I need to get the ID's last number, how many people's ID ends with that number, and the person with the highest ID:
Members: |ID |Name |
-----------------
|123 |foo |
|456 |bar |
|789 |boo |
|1226|far |
The result I need to get looks something like this
|LAST_NUMBER |OCCURENCES |HIGHEST_ID_GUY |
---------------------------------------------
|3 |1 |foo |
|6 |2 |far |
|9 |1 |boo |
However, while I can get the first two results to display correctly, I have no idea how to display HIGHEST_ID_GUY. My code looks like this:
SELECT DISTINCT SUBSTR(id, LENGTH(id - 1), LENGTH(id)) AS LAST_NUMBER,
COUNT(*) AS OCCURENCES
/* This is where I need to add HIGHEST_ID_GUY */
FROM Members
GROUP BY SUBSTR(id, LENGTH(id - 1), LENGTH(id))
ORDER BY LAST_NUMBER
Any help appreciated :)
If id is a number, then use arithmetic operations:
select mod(id, 10) as last_digit,
count(*),
max(name) keep (dense_rank first order by id desc) as name_at_biggest
from t
group by mod(id, 10);
If id is a string, then you need to convert to a number or something similar to define the "highest id". For instance:
select substr(id, -1) as last_digit,
count(*),
max(name) keep (dense_rank first order by to_number(id) desc) as name_at_biggest
from t
group by substr(id, -1);

Value from previous row in GROUP BY as column

I have this table:
+----------+-------------+-------------------+------------------+
| userId| testId| date| note|
+----------+-------------+-------------------+------------------+
| 123123123| 1|2019-01-22 02:03:00| aaa|
| 123123123| 1|2019-02-22 02:03:00| bbb|
| 123456789| 2|2019-03-23 02:03:00| ccc|
| 123456789| 2|2019-04-23 02:03:00| ddd|
| 321321321| 3|2019-05-23 02:03:00| eee|
+----------+-------------+-------------------+------------------+
Would like to get newest note (whole row) for each group userId and testId:
SELECT
n.userId,
n.testId,
n.date,
n.note
FROM
notes n
INNER JOIN (
SELECT
userId,
testId,
MAX(date) as maxDate
FROM
notes
GROUP BY
userId,
testId
) temp ON n.userId = temp.userId AND n.testId = temp.testId AND n.date = temp.maxDate
It works.
But now I'd like to also have previous note in each row:
+----------+-------------+-------------------+-------------+------------+
| userId| testId| date| note|previousNote|
+----------+-------------+-------------------+-------------+------------+
| 123123123| 1|2019-02-22 02:03:00| bbb| aaa|
| 123456789| 2|2019-04-23 02:03:00| ddd| ccc|
| 321321321| 3|2019-05-23 02:03:00| eee| null|
+----------+-------------+-------------------+-------------+------------+
Have no idea how to do it. I heard about LAG() function which might be useful but found no good examples for my case.
I'd like to use it on dataframe in pyspark (if it's important)
use lag() and row_number analytic function
select userid,testid,date,note,previous_note
from
(select userid,testid,date,note,
lag(note)over(partition by useid,testid order by date) as previous_note,
row_number() over(partition by userid,testid order by date desc) rn
from table_name
) a where a.rn=1
select userid,testid,date,note,previous_note from
(select userid,testid,date,note,lead(note)
over(partition by userid,testid order by date desc) as previous_note,
row_number() over(partition by userid,testid order by date desc) srno
from Table_Name
) a where a.srno=1
I hope it will give you right answer which you want. it will give you latest date as new record and previous date note as previous_Note.

Iterating over groups in table

I have the following data:
cte1
===========================
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
I have to perform the following operation:
Extract p_id by the following algorithm
For every row with same m_ids
In each group:
2.I. Group records by p_id
2.II. Order desc records by level
2.III. Select p_id with exact count as the m_ids length and with the biggest level
So far I fail to write this algorithm completely, but I wrote (probably wrong where I'm getting array_length) this for the last part of it:
SELECT id
FROM grouped_cte1
GROUP BY id,
level
HAVING Count(*) = array_length(grouped_cte1.m_ids, 1)
ORDER BY level DESC
LIMIT 1
where grouped_cte1 for m_ids={123} is
m_ids |p_id |level
---------|-----------|-----
{123} |98 |1
{123} |111 |2
and for m_ids={432,222} is
m_ids |p_id |level
---------|-----------|-----
{432,222}|215 |1
{432,222}|215 |1
{432,222}|240 |2
{432,222}|240 |2
{432,222}|437 |3
{432,222}|275 |3
etc.
2) Combine query from p.1 with the following. The following extracts p_id with level=1 for each m_ids:
select m_ids, p_id from cte1 where level=1 --also selecting m_ids for joining later`
which results in the following:
m_ids |p_id
---------|----
{123} |98
{432,222}|215
Desirable result:
m_ids |result_1 |result_2
---------|-----------|--------
{123} |111 |98
{432,222}|240 |215
So could anyone please help me solve the first part of algorithm and (optionally) combine it in a single query with the second part?
EDIT: So far I fail at:
1. Breaking the presented table into subtables by m_ids while iterating over it.
2. Performing computation of array_length(grouped_cte1.m_ids, 1) for corresponding rows in query.
For the first part of the query you're on the right track, but you need to change the grouping logic and then join again to the table to filter it out by highest level per m_ids for which you could use DISTINCT ON clause combined with proper sorting:
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC;
This would give you:
m_ids | p_id | level
-----------+------+-------
{123} | 111 | 2
{432,222} | 240 | 2
And then when combined with second query (using FULL JOIN for displaying purposes, when the first query is missing such conditions) which I modified by adding distinct since there can be (and in fact is) more than one record for m_ids, p_id pair with first level it would look like:
select
coalesce(r1.m_ids, r2.m_ids) as m_ids,
r1.p_id AS result_1,
r2.p_id AS result_2
from (
select
distinct on (t.m_ids)
t.m_ids, t.p_id, t.level
from cte1 t
join (
select
m_ids,
p_id
from cte1
group by m_ids, p_id
having count(*) = array_length(m_ids, 1)
) as g using (m_ids, p_id)
order by t.m_ids, t.level DESC
) r1
full join (
select distinct m_ids, p_id
from cte1
where level = 1
) r2 on r1.m_ids = r2.m_ids
giving you result:
m_ids | result_1 | result_2
-----------+----------+----------
{123} | 111 | 98
{432,222} | 240 | 215
that looks different from what you've expected but from my understanding of the logic it is the correct one. If I misunderstood anything, please let me know.
Just for the sake of logic explanation, one point:
Why m_ids with {123} returns 111 for result_1?
for group of m_ids = {123} we have two distinct p_id values
both 98 and 111 account for the condition of equality count with the m_ids length
p_id = 111 has a higher level, thus is chosen for the result_1

SQL query to return a grouped result as a single row

If I have a jobs table like:
|id|created_at |status |
----------------------------
|1 |01-01-2015 |error |
|2 |01-01-2015 |complete |
|3 |01-01-2015 |error |
|4 |01-02-2015 |complete |
|5 |01-02-2015 |complete |
|6 |01-03-2015 |error |
|7 |01-03-2015 |on hold |
|8 |01-03-2015 |complete |
I want a query that will group them by date and count the occurrence of each status and the total status for that date.
SELECT created_at status, count(status), created_at
FROM jobs
GROUP BY created_at, status;
Which gives me
|created_at |status |count|
-------------------------------
|01-01-2015 |error |2
|01-01-2015 |complete |1
|01-02-2015 |complete |2
|01-03-2015 |error |1
|01-03-2015 |on hold |1
|01-03-2015 |complete |1
I would like to now condense this down to a single row per created_at unique date with some sort of multi column layout for each status. One constraint is that status is any one of 5 possible words but each date might not have one of every status. Also I would like a total of all statuses for each day. So desired results would look like:
|date |total |errors|completed|on_hold|
----------------------------------------------
|01-01-2015 |3 |2 |1 |null
|01-02-2015 |2 |null |2 |null
|01-03-2015 |3 |1 |1 |1
the columns could be built dynamically from something like
SELECT DISTINCT status FROM jobs;
with a null result for any day that doesn't contain any of that type of status. I am no SQL expert but am trying to do this in a DB view so that I don't have to bog down doing multiple queries in Rails.
I am using Postresql but would like to try to keep it straight SQL. I have tried to understand aggregate function enough to use some other tools but not succeeding.
The following should work in any RDBMS:
SELECT created_at, count(status) AS total,
sum(case when status = 'error' then 1 end) as errors,
sum(case when status = 'complete' then 1 end) as completed,
sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs
GROUP BY created_at;
The query uses conditional aggregation so as to pivot grouped data. It assumes that status values are known before-hand. If you have additional cases of status values, just add the corresponding sum(case ... expression.
Demo here
An actual crosstab query would look like this:
SELECT * FROM crosstab(
$$SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY 1, 2
ORDER BY 1, 2$$
,$$SELECT unnest('{error,complete,"on hold"}'::text[])$$)
AS ct (date date, errors int, completed int, on_hold int);
Should perform very well.
Basics:
PostgreSQL Crosstab Query
The above does not yet include the total per date.
Postgres 9.5 introduces the ROLLUP clause, which is perfect for the case:
SELECT * FROM crosstab(
$$SELECT created_at, COALESCE(status, 'total'), ct
FROM (
SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY created_at, ROLLUP(status)
) sub
ORDER BY 1, 2$$
,$$SELECT unnest('{total,error,complete,"on hold"}'::text[])$$)
AS ct (date date, total int, errors int, completed int, on_hold int);
Up to Postgres 9.4, use this query instead:
WITH cte AS (
SELECT created_at, status, count(*) AS ct
FROM jobs
GROUP BY 1, 2
)
TABLE cte
UNION ALL
SELECT created_at, 'total', sum(ct)
FROM cte
GROUP BY 1
ORDER BY 1
Related:
Grouping() equivalent in PostgreSQL?
If you want to stick to a simple query, this is a bit shorter:
SELECT created_at
, count(*) AS total
, count(status = 'error' OR NULL) AS errors
, count(status = 'complete' OR NULL) AS completed
, count(status = 'on hold' OR NULL) AS on_hold
FROM jobs
GROUP BY 1;
count(status) for the total per date is error-prone, because it would not count rows with NULL values in status. Use count(*) instead, which is also shorter and a bit faster.
Here is a list of techniques:
For absolute performance, is SUM faster or COUNT?
In Postgres 9.4+ use the new aggregate FILTER clause, like #a_horse mentioned:
SELECT created_at
, count(*) AS total
, count(*) FILTER (WHERE status = 'error') AS errors
, count(*) FILTER (WHERE status = 'complete') AS completed
, count(*) FILTER (WHERE status = 'on hold') AS on_hold
FROM jobs
GROUP BY 1;
Details:
How can I simplify this game statistics query?

Confusing SQL Query, Group By? Having?

I have a relational database in SQL Server which I use to store Products, Competitor Companies and Competitor Prices. I regularly add new records to the Competitor Prices table rather than updating existing records so I can track prices changes over time.
I want to build a query which given a particular product, find the most recent price from each of the competitors. It is possible that each competitor doesn't have a price recorded.
Data Example
tblCompetitorPrices
+-----+----------+-------------+-----+----------+
|cp_id|product_id|competitor_id|price|date_added|
+-----+----------+-------------+-----+----------+
|1 |1 |3 |70.00|15-01-2014|
+-----+----------+-------------+-----+----------+
|2 |1 |4 |65.10|15-01-2014|
+-----+----------+-------------+-----+----------+
|3 |2 |3 |15.20|15-01-2014|
+-----+----------+-------------+-----+----------+
|4 |1 |3 |62.30|19-01-2014|
+-----+----------+-------------+-----+----------+
And I want the query to return...
+-----+----------+-------------+-----+----------+
|cp_id|product_id|competitor_id|price|date_added|
+-----+----------+-------------+-----+----------+
|4 |1 |3 |62.30|19-01-2014|
+-----+----------+-------------+-----+----------+
|2 |1 |4 |65.10|15-01-2014|
+-----+----------+-------------+-----+----------+
I can currently access all the prices for the product, but I'm not able to filter the results so only the most recent price for each competitor is shown - I'm really unsure...here is what I have so far....
SELECT cp_id, product_id, competitor_id, price, date_added
FROM tblCompetitorPrices
WHERE product_id = '1'
ORDER BY date_added DESC
Thanks for any help!
Try this,
SELECT cp_id, product_id, competitor_id, price, date_added
FROM tblCompetitorPrices
WHERE product_id = '1' AND date_added=( SELECT MAX(date_added)
FROM tblCompetitorPrices
WHERE product_id = '1')
ORDER BY date_added DESC
As an alternative, you can also use ROW_NUMBER() which is a Window function that generates sequential number.
SELECT cp_id,
product_id,
competitor_id,
price,
date_added
FROM (
SELECT cp_id,
product_id,
competitor_id,
price,
date_added,
ROW_NUMBER() OVER (PARTITION BY competitor_id
ORDER BY date_added DESC) rn
FROM tblCompetitorPrices
WHERE product_ID = 1
) a
WHERE a.rn = 1
This query can easily be modified to return latest record for each competitor in every product.
It took a while since I had to test the query myself, so yeah, here it is. Try it, it may help you even a bit with clause combinations. :) It's shorter.
SELECT cp_id, product_id, competitor_id, price, MAX(date_added) as last_date
FROM tblCompetitorPrices
WHERE product_id = '1'
GROUP BY competitor_id