SQL, label user based on the similarity

SQL, label user based on the similarity - sql

Is below case possible in SQL?
Let say I have a table like this:
user_id
product_id
1
123
1
122
1
121
2
124
2
125
2
121
3
123
3
122
3
122
4
123
4
212
4
222
5
124
5
125
5
121
I want to label the user if they have same product_id, regardless the order, so the output looks like this:
user_id
product_id
label
1
123
a
1
122
a
1
121
a
2
124
b
2
125
b
2
121
b
3
123
a
3
121
a
3
122
a
4
123
c
4
212
c
4
222
c
5
124
b
5
125
b
5
121
b
Please advise

You can use the string_agg function to get the list of product_ids for each user (as a single string), then use the dense_rank function on that string to get unique labels for each product_ids list.
select T.user_id, T.product_id, D.label
from table_name T join
(
select user_id,
chr(dense_rank() over (order by user_products) + 96) label
from
(
select user_id,
string_agg(cast(product_id as string), ',' order by product_id) user_products
from table_name
group by user_id
) lbl
) D
on T.user_id = D.user_id
order by T.user_id

Related

How to select distinct values for two and return all columns?

I want to select distinct values from two columns.
Example data:
ID TITLE SOURCE TARGET
1 asd 12 2
2 asd1 123 125
3 asd1 123 56
4 asd2 123 125
5 asd3 164 146
I want to get distinct data for source and target columns ID - 2 and ID - 4 are duplicates.
ID TITLE SOURCE TARGET
1 asd 12 2
2 asd1 123 125
3 asd1 123 56
5 asd3 164 146

If you just want the distinct values, use select distinct:
select distinct source, target
from example t;
If you want the rows where the source/target only appears on one row, then one method uses window functions:
select t.*
from (select t.*,
count(*) over (partition by source, target) as cnt
from example t
) t
where cnt = 1;

Find 3 or more consecutive transaction record where the transaction amount greater than 100 and the records belong to the same category

I have a customer transaction table which has 3 columns, id, Category, TranAmount. Now I want to find 3 or more consecutive transaction records which belongs to the same category and the TranAmount greater than 100.
Below is the sample table:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
4 B 190
5 A 90
6 B 219
7 B 492
8 B 129
9 B 390
10 B 40
11 A 110
12 A 130
And the output should be:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
6 B 219
7 B 492
8 B 129
9 B 390

Look into "gaps and islands" reference for a deeper understanding of the approach. Here's one of many you could read: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/
In this specific problem you have two conditions that cause a break in a consecutive series, those being a change in category or an amount that doesn't meet the threshold.
with data as (
select *,
row_number() over (order by Id) as rn,
row_number() over (partition by
Category, case when TranAmount >= 100 then 1 else 0 end order by Id) as cn
from Transactions
), grp as (
select *, count(*) over (partition by rn - cn) as num
from data
where TranAmount >= 100
)
select * from grp where num >= 3;
https://rextester.com/DUM44618

This will work if there are no gaps between the ids:
select distinct t.*
from tablename t inner join (
select t.id from tablename t
where t.tranamount > 100
and
exists (
select 1 from tablename
where id = t.id - 1 and category = t.category and tranamount > 100
)
and
exists (
select 1 from tablename
where id = t.id + 1 and category = t.category and tranamount > 100
)
) tt on t.id in (tt.id - 1, tt.id, tt.id + 1)
See the demo.
Results:
Id | Category | TranAmount
-: | :------- | ---------:
1 | A | 190
2 | A | 160
3 | A | 169
6 | B | 219
7 | B | 492
8 | B | 129
9 | B | 390

I can't really test this out yet but give this a try.
SELECT Id, Category, Amount FROM Table
WHERE Amount > 100
and Category IN
(SELECT Category FROM Table
WHERE Amount > 100
GROUP BY Category HAVING COUNT (Category ) >= 3)

how to find the total

I have a table A and the output expected is below.
Table A
Id patientId PID
1 123 p1
1 123 p2
1 124 p3
1 124 p4
1 125 p5
2 126 p6
2 126 p7
2 126 p8
2 127 p9
2 127 p10
Count of pid is the count for every patientId how many pids are present and Total count of IDs is the total number of Ids(lets say 5 for Id 1 for example)
Expecting an output like this:
id patientId Count of pid Total count of IDs
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5
I am not sure how to go beyond this
select Id,patientId,count(PID)
from A
group by 1,2

Because you want to count over two different fields, you need two separate GROUP BY subqueries, which can be JOINed on Id:
SELECT A1."Id",
A1."patientId",
A1.num_pids,
A2.total_ids
FROM (SELECT "Id", "patientId", COUNT(*) AS num_pids
FROM A
GROUP BY "Id", "patientId") A1
JOIN (SELECT "Id", COUNT(*) AS total_ids
FROM A
GROUP BY "Id") A2 ON A2."Id" = A1."Id"
ORDER BY "Id", "patientId"
Output:
Id patientId num_pids total_ids
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5

select a.Id,a.patientId,count(a.patientId), a2.IdCount
from A a
left join (select Id, count(Id) as "IdCount"
from A
group by Id) a2
on a.Id = a2.Id
group by a.Id,a.patientId, a2.IdCount

I think you just want a window function:
select Id, patientId, count(*),
count(*) over ()
from A
group by 1, 2;
The second count(*) counts the number of rows in the result set, which appears to be what you want.

sql for Access Database

I am dealing with a huge volume of traffic data. I want to identify the vehicles which have changed their lanes in MS Access database. I want to identify those records only which has changed the lane (immediate two records: before lane change and after lane change)
Traffic Data:
Vehicle_ID Lane_ID Frame_ID Distance
1 2 12 100
1 2 13 103
1 2 14 105
2 1 15 107
***2 1 16 130
2 2 17 135***
2 2 18 136
***3 1 19 140
3 2 20 141***
3 2 21 147
4 2 22 149
***4 2 23 151
4 1 24 154***
4 1 25 159
With assistance from here i have sorted out those Vehicle_ID which have changed their lanes:
SELECT t.Vehicle_ID, COUNT(t.Lane_ID) AS [Lane Count]
FROM (
SELECT DISTINCT Vehicle_ID, Lane_ID FROM Table1
) AS t
GROUP BY t.Vehicle_ID
HAVING COUNT(t.Lane_ID) > 1
Shown Result:
Vehicle_ID Lane Count
2 2
3 2
4 2
Now i want to do further analysis withe records of lane changing by segregating immediate two records: before and after lane change. My desired output would be:
Desired Result:
Vehicle_ID Lane_ID Frame_ID Distance
***2 1 16 130
2 2 17 135***
***3 1 19 140
3 2 20 141***
***4 2 23 151
4 1 24 154***

Assuming the frame ids have no gaps, you can do this using joins:
select t1.*
from (table1 as t1 inner join
table1 as t1prev
on t1prev.Vehicle_ID = t1.Vehicle_ID and
t1prev.frame_id = t1.frame_id - 1
) inner join
table1 as t1next
on t1next.Vehicle_ID = t1.Vehicle_ID and
t1next.frame_id = t1.frame_id + 1
where t1prev.lane_id <> t1.lane_id or
t1next.lane_id <> t1.lane_id;
Otherwise, this will be a very expensive query.

You can do it with EXISTS:
select t.* from Table1 t
where
exists (
select 1 from Table1
where
vehicle_id = t.vehicle_id
and
frame_id in (t.frame_id - 1, t.frame_id + 1)
and
lane_id <> t.lane_id
)

Simple data, Complex query on SQL Server

I need to make a query over an SQL Server table but I don't know exactly how.
Consider this table (the real table is much more complex, Ord1 and Ord2 are dates that could be null, but i simplified it to this case):
Data of MyTable
ID MaqID Ord1 Ord2
------------------------
1 144 4 3
2 144 2 1
3 12 2 3
4 144 3 5
5 12 3 1
6 144 4 2
7 12 2 4
8 144 2 3
9 12 1 5
10 12 3 2
I need records for specific MaqID in Specific Order. I get that with this Query:
SELECT * FROM myTable WHERE MaqID=144 ORDER BY MaqID, Order1 DESC, Order2
Wich give me:
ID MaqID Ord1 Ord2
------------------------
6 144 4 2
1 144 4 3
4 144 3 5
2 144 2 1
8 144 2 3
Now, I need a single query that, for each MaqID, return the first ID for each subquery following above order. The result should be:
Expected result
MaqID ID
-----------
144 6
12 5
I have already try distinct conbination of TOP a MAX, but TOP result only one result and i need one for each MaqID, and for Max I have not field to maximize.
To sumarize: I need the first ID for each MaqID from a subquery in a specific order
Any ideas? Thanks!

You can do this using row_number():
select t.*
from (select t.*,
row_number() over (partition by macid Order1 DESC, Order2) as seqnum
from mytable t
) t
where seqnum = 1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL, label user based on the similarity - sql

Related

How to select distinct values for two and return all columns?

Find 3 or more consecutive transaction record where the transaction amount greater than 100 and the records belong to the same category

how to find the total

sql for Access Database

Simple data, Complex query on SQL Server

Categories

Resources