Get occurrence count of specific categories in a table - sql

Looking to get the transition count of categories from a table. For Name type B, category transitions from Good to Bad so count is 2. For Name type A, it transitions from Good - Moderate - Good - Moderate - Bad, hence gets a count of 5.
Any help would be appreciated.
This is my input data:
Name
order no
category
A
1
Good
A
2
Good
A
3
MODERATE
A
4
Good
A
5
MODERATE
A
6
Bad
A
7
Bad
B
1
Good
B
2
Good
B
3
Good
B
4
BAD
And this is my desired output:
Name
category_transition_count
A
5
B
2

select name
,count(cnt) as category_transition_count
from
(select name
,case when category <> lag(category) over(partition by Name order by order_no) or lag(category) over(partition by Name order by order_no) is null then 1 end as cnt
from t) t
group by name
name
category_transition_count
A
5
B
2
Fiddle

You could use the lag window function to get the category of the previous row, and then compare it with the current row to see if it changed, and count those occurrences. Note that by definition the lag of the first value is null, which can't be different from the current value. so you'll need to handle that explicitly:
SELECT name, COUNT(changed) + 1
FROM (SELECT name,
CASE WHEN category <> LAG(category) OVER (PARTITION BY name ORDER BY order_no ASC)
THEN 1
END AS changed
FROM mytable) t
GROUP BY name
SQLFiddle (PostgreSQL) demo

Related

SQL query which will extract conditionally the values from top categories the first and the 2nd where CATEGORY is OTHER

I have this table. The table just a small example and has more obs.
id
CATEGORY
AMOUNT
1
TECH
120
1
FUN
220
2
OTHER
340
2
PARENTS
220
made by id category amount spent in each category.I want to select ID and Category in which the ID spents the most but in case if category is OTHER I want to get 2nd most spending category.
I have a constraint. I CANNOT use the the subquery and select with filter WHERE CATEGORY <> 'OTHER'. It just makes my machine to go out of the memory (For reasons Idk)
This is what I have tried.
I have tried to create a row_number () over (partition by id order by amount desc) rn.
and then
select id, category from table where row num = 1 group by 1,2
**buttt. I don't know how to say to query. If CATEGORY is OTHER then take row num=2 . **
id
CATEGORY
AMOUNT
ROW NUM
1
TECH
120
2
1
FUN
220
1
2
OTHER
340
1
2
PARENTS
220
2
Another thing I was thinking to do is to write qualify function
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC) <1.
Also here I am getting only 1st records in which there is also OTHER. If I could filter it out within QUALIFY and say if CATEGORY is 'OTHER' don't consider it.
I am using Databricks.

Is there a way to display the first two results of each unique id?

I work in healthcare. In a Postgres database, we have a table member IDs and dates. I'm trying to pull the latest two dates for each member ID.
Simplified sample data:
A 1
B 1
B 2
C 1
C 5
C 7
D 1
D 2
D 3
D 4
Desired result:
A 1
B 1
B 2
C 1
C 5
D 1
D 2
I get a strong feeling this is for a homework assignment and would recommend that you look into partitioning and specifically rank() function by yourself first before looking at my solution.
Moreover, you have not specified how you received the initial result you provided, so I'll have to assume you just did select letter_column, number_column from my_table; to achieve the result.
So, what you actually want here is partition the initial query result into groups by the letter_column and select the first two rows in each. rank() function lets you assign each row a number, counting within groups:
select letter_column,
number_column,
rank() over (partition by letter_column order by number_column) as rank
from my_table;
Since it's a function, you can't use it in a predicate in the same query, so you'll have to build another query around this one, this time filtering the results where rank is over 2:
with ranked_results as (select letter_column,
number_column,
rank() over (partition by letter_column order by number_column asc) as rank
from my_table mt)
select letter_column,
number_column
from ranked_results
where rank < 3;
Here's an SQLFiddle to play around: http://sqlfiddle.com/#!15/e90744/1/0
Hope this helps!

SQL Query getting the latest record of the Group and calculate the value of those particular records

I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)
You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.
You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;

Complex SQL query or queries

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?
This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

SQL view with a column that shows top result of relationship with multiple weightings

I have three tables, an Objects table, a Status table and a StatusTypes Table.
An Object has Multiple Status' which each has a status type. I would like to create a view that gives me the objects ID, and Most Important Status Description which is found in the StatusTypes table, and the most important status Date which is in the Status Table.
The part I am getting hung up on is to find the most Important Status It must first be sorted by the latest date, then by a integer weighting (Priority) in the Status Table then again by another weighting in the StatusTypes Table (Weighting)
What would be the best SQL statement to quickly deliver these results.
Objects
ID Aquisiton Date Serial Number
127237 1997-04-21 2151513515
127239 1997-10-31 2151513523
127242 1998-01-20 2165588481
127272 1998-10-20 2195689842
127286 1999-06-15 2231549489
127291 1999-06-01 2229564978
Status
ID ObjectID Priority StatusMessage Date Status
1 127237 1 Online 22.02.12 07.01.00 1
2 127237 3 Job Received 22.02.12 07.01.00 3
3 127237 5 Job Started 22.02.12.07.01.00 3
4 127237 5 Jam 22.02.12.07.01.00 2
5 127286 1 Online 22.02.12.07.09.00 1
Status Types
ID Description Weighting
1 Idle 0
2 Error 9
3 Working 5
Expected Output##
ID Status Date
127237 Error 22.02.12 07.01.00
127286 Idle 22.02.12.07.09.00
Sounds like you could use ROW_NUMBER():
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC, Priority, Weighting) 'RowRank'
FROM YourTable a
)sub
WHERE RowRank = 1
Obviously replacing YourTable with the relevant JOIN's
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group, ie: if you PARTITION BY ID then for each unique ID value the numbering will start over at 1. ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
Updated with your data:
SELECT ObjectID,Description,Date
FROM (SELECT a.*,b.Description,ROW_NUMBER() OVER(PARTITION BY a.ObjectID ORDER BY CONVERT(DATE,LEFT([Date],8),4) DESC, Priority DESC, Weighting DESC) 'RowRank'
FROM Status a
JOIN Status_Types b
ON a.Status = b.ID
)sub
WHERE RowRank = 1
Demo: SQL Fiddle