SQL, label user based on the similarity - sql

Is below case possible in SQL?
Let say I have a table like this:
user_id
product_id
1
123
1
122
1
121
2
124
2
125
2
121
3
123
3
122
3
122
4
123
4
212
4
222
5
124
5
125
5
121
I want to label the user if they have same product_id, regardless the order, so the output looks like this:
user_id
product_id
label
1
123
a
1
122
a
1
121
a
2
124
b
2
125
b
2
121
b
3
123
a
3
121
a
3
122
a
4
123
c
4
212
c
4
222
c
5
124
b
5
125
b
5
121
b
Please advise

You can use the string_agg function to get the list of product_ids for each user (as a single string), then use the dense_rank function on that string to get unique labels for each product_ids list.
select T.user_id, T.product_id, D.label
from table_name T join
(
select user_id,
chr(dense_rank() over (order by user_products) + 96) label
from
(
select user_id,
string_agg(cast(product_id as string), ',' order by product_id) user_products
from table_name
group by user_id
) lbl
) D
on T.user_id = D.user_id
order by T.user_id

Related

How to select distinct values for two and return all columns?

I want to select distinct values from two columns.
Example data:
ID TITLE SOURCE TARGET
1 asd 12 2
2 asd1 123 125
3 asd1 123 56
4 asd2 123 125
5 asd3 164 146
I want to get distinct data for source and target columns ID - 2 and ID - 4 are duplicates.
ID TITLE SOURCE TARGET
1 asd 12 2
2 asd1 123 125
3 asd1 123 56
5 asd3 164 146
If you just want the distinct values, use select distinct:
select distinct source, target
from example t;
If you want the rows where the source/target only appears on one row, then one method uses window functions:
select t.*
from (select t.*,
count(*) over (partition by source, target) as cnt
from example t
) t
where cnt = 1;

Find 3 or more consecutive transaction record where the transaction amount greater than 100 and the records belong to the same category

I have a customer transaction table which has 3 columns, id, Category, TranAmount. Now I want to find 3 or more consecutive transaction records which belongs to the same category and the TranAmount greater than 100.
Below is the sample table:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
4 B 190
5 A 90
6 B 219
7 B 492
8 B 129
9 B 390
10 B 40
11 A 110
12 A 130
And the output should be:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
6 B 219
7 B 492
8 B 129
9 B 390
Look into "gaps and islands" reference for a deeper understanding of the approach. Here's one of many you could read: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/
In this specific problem you have two conditions that cause a break in a consecutive series, those being a change in category or an amount that doesn't meet the threshold.
with data as (
select *,
row_number() over (order by Id) as rn,
row_number() over (partition by
Category, case when TranAmount >= 100 then 1 else 0 end order by Id) as cn
from Transactions
), grp as (
select *, count(*) over (partition by rn - cn) as num
from data
where TranAmount >= 100
)
select * from grp where num >= 3;
https://rextester.com/DUM44618
This will work if there are no gaps between the ids:
select distinct t.*
from tablename t inner join (
select t.id from tablename t
where t.tranamount > 100
and
exists (
select 1 from tablename
where id = t.id - 1 and category = t.category and tranamount > 100
)
and
exists (
select 1 from tablename
where id = t.id + 1 and category = t.category and tranamount > 100
)
) tt on t.id in (tt.id - 1, tt.id, tt.id + 1)
See the demo.
Results:
Id | Category | TranAmount
-: | :------- | ---------:
1 | A | 190
2 | A | 160
3 | A | 169
6 | B | 219
7 | B | 492
8 | B | 129
9 | B | 390
I can't really test this out yet but give this a try.
SELECT Id, Category, Amount FROM Table
WHERE Amount > 100
and Category IN
(SELECT Category FROM Table
WHERE Amount > 100
GROUP BY Category HAVING COUNT (Category ) >= 3)

how to find the total

I have a table A and the output expected is below.
Table A
Id patientId PID
1 123 p1
1 123 p2
1 124 p3
1 124 p4
1 125 p5
2 126 p6
2 126 p7
2 126 p8
2 127 p9
2 127 p10
Count of pid is the count for every patientId how many pids are present and Total count of IDs is the total number of Ids(lets say 5 for Id 1 for example)
Expecting an output like this:
id patientId Count of pid Total count of IDs
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5
I am not sure how to go beyond this
select Id,patientId,count(PID)
from A
group by 1,2
Because you want to count over two different fields, you need two separate GROUP BY subqueries, which can be JOINed on Id:
SELECT A1."Id",
A1."patientId",
A1.num_pids,
A2.total_ids
FROM (SELECT "Id", "patientId", COUNT(*) AS num_pids
FROM A
GROUP BY "Id", "patientId") A1
JOIN (SELECT "Id", COUNT(*) AS total_ids
FROM A
GROUP BY "Id") A2 ON A2."Id" = A1."Id"
ORDER BY "Id", "patientId"
Output:
Id patientId num_pids total_ids
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5
select a.Id,a.patientId,count(a.patientId), a2.IdCount
from A a
left join (select Id, count(Id) as "IdCount"
from A
group by Id) a2
on a.Id = a2.Id
group by a.Id,a.patientId, a2.IdCount
I think you just want a window function:
select Id, patientId, count(*),
count(*) over ()
from A
group by 1, 2;
The second count(*) counts the number of rows in the result set, which appears to be what you want.

sql for Access Database

I am dealing with a huge volume of traffic data. I want to identify the vehicles which have changed their lanes in MS Access database. I want to identify those records only which has changed the lane (immediate two records: before lane change and after lane change)
Traffic Data:
Vehicle_ID Lane_ID Frame_ID Distance
1 2 12 100
1 2 13 103
1 2 14 105
2 1 15 107
***2 1 16 130
2 2 17 135***
2 2 18 136
***3 1 19 140
3 2 20 141***
3 2 21 147
4 2 22 149
***4 2 23 151
4 1 24 154***
4 1 25 159
With assistance from here i have sorted out those Vehicle_ID which have changed their lanes:
SELECT t.Vehicle_ID, COUNT(t.Lane_ID) AS [Lane Count]
FROM (
SELECT DISTINCT Vehicle_ID, Lane_ID FROM Table1
) AS t
GROUP BY t.Vehicle_ID
HAVING COUNT(t.Lane_ID) > 1
Shown Result:
Vehicle_ID Lane Count
2 2
3 2
4 2
Now i want to do further analysis withe records of lane changing by segregating immediate two records: before and after lane change. My desired output would be:
Desired Result:
Vehicle_ID Lane_ID Frame_ID Distance
***2 1 16 130
2 2 17 135***
***3 1 19 140
3 2 20 141***
***4 2 23 151
4 1 24 154***
Assuming the frame ids have no gaps, you can do this using joins:
select t1.*
from (table1 as t1 inner join
table1 as t1prev
on t1prev.Vehicle_ID = t1.Vehicle_ID and
t1prev.frame_id = t1.frame_id - 1
) inner join
table1 as t1next
on t1next.Vehicle_ID = t1.Vehicle_ID and
t1next.frame_id = t1.frame_id + 1
where t1prev.lane_id <> t1.lane_id or
t1next.lane_id <> t1.lane_id;
Otherwise, this will be a very expensive query.
You can do it with EXISTS:
select t.* from Table1 t
where
exists (
select 1 from Table1
where
vehicle_id = t.vehicle_id
and
frame_id in (t.frame_id - 1, t.frame_id + 1)
and
lane_id <> t.lane_id
)

Simple data, Complex query on SQL Server

I need to make a query over an SQL Server table but I don't know exactly how.
Consider this table (the real table is much more complex, Ord1 and Ord2 are dates that could be null, but i simplified it to this case):
Data of MyTable
ID MaqID Ord1 Ord2
------------------------
1 144 4 3
2 144 2 1
3 12 2 3
4 144 3 5
5 12 3 1
6 144 4 2
7 12 2 4
8 144 2 3
9 12 1 5
10 12 3 2
I need records for specific MaqID in Specific Order. I get that with this Query:
SELECT * FROM myTable WHERE MaqID=144 ORDER BY MaqID, Order1 DESC, Order2
Wich give me:
ID MaqID Ord1 Ord2
------------------------
6 144 4 2
1 144 4 3
4 144 3 5
2 144 2 1
8 144 2 3
Now, I need a single query that, for each MaqID, return the first ID for each subquery following above order. The result should be:
Expected result
MaqID ID
-----------
144 6
12 5
I have already try distinct conbination of TOP a MAX, but TOP result only one result and i need one for each MaqID, and for Max I have not field to maximize.
To sumarize: I need the first ID for each MaqID from a subquery in a specific order
Any ideas? Thanks!
You can do this using row_number():
select t.*
from (select t.*,
row_number() over (partition by macid Order1 DESC, Order2) as seqnum
from mytable t
) t
where seqnum = 1;