I'm trying to write a query that will return the closest match from a table. The table represents a hierarchy, and looks like this:
Hier_Code Group1 Group2 Group3
_________ ______ ______ ______
A 1 1 5
AA 1 2 5
AAA 1 2 5
AAB 1 3 5
AB 2 3 5
ABA 2 3 5
ABB 2 3 5
The values I'm searching for may not match perfectly, in which case I'd want them to 'roll-up' to the next level to find the match. For example:
Value Returns:
A 1 1 5 (Perfect match)
AAC 1 2 5 (AA is closest)
AABB 1 3 5 (AAB is closest)
The idea is that you would strip off one character at a time from the search term until you find a match. I've tried doing this with a recursive CTE, and also with a WHILE loop to no avail.
Any suggestions?
Thanks in advance
Try this:
SELECT top 1 Group1, Group2, Group3
FROM temp
WHERE 'AABB' like Hier_Code + '%'
GROUP BY Group1, Group2, Group3
ORDER BY MAX(len(Hier_Code)) desc
Here's a starter that you could use to create some dynamic SQL:
select id from t
where g1 = 1 and g2 = 1 and g3 = 5
union
select id from t
where g1 = 1 and g2 = 1
union
select id from t
where g1 = 1
The default behavior of UNION is to eliminate duplicates, so that after these three selects are executed, no id would occur more than once in the result set.
The problem is that there are several permutations:
{g1, g2, g3}, {g1, g3}, {g1, g2}, {g2, g3}
so you cannot simply pluck the final condition off as I did in my example.
Related
Given two tables or dataframes. One will be having datasets and corresponding tables. Other will be having source and target.
I need a solution for the below condition:
Once we find ft.dataset = st.source, we need to replace ft.table in st.source and neglect the remaining records.
For example: Here in first block of second table which is from seq_no 1 to 6, we have a match at Abc, so we replaced with db.table1 and neglect the remaining records in that block. Similarly we need to do the same for each and every block of second table.
Note that Target is same in all the rows of second table.
Please help me with a possible solution in pyspark or Hive.
First table(ft):
Dataset | Table
_________________
Abc db.table1
Xyz db.table2
Def db.table3
Second table(st):
Target| seq_no| source
______________________
A 1 A
A 2 B1
A 3 C1
A 4 D1
A 5 Abc
A 6 Xyz
A 1 A
A 2 B1
A 3 C1
A 4 D1
A 5 Def
A 6 Abc
A 7 Xyz
Expected output:
Target| seq_no | source
_______________________
A 1 A
A 2 B1
A 3 C1
A 4 D1
A 5 db.table1
A 1 A
A 2 B1
A 3 C1
A 4 D1
A 5 db.table3
In Hive, you can use a left join to search for a match in the first table, and a window min() to identify the sequence of the first match
select target, seq_no, source
from (
select
st.target,
st.seq_no,
coalesce(st.source, ft.table) as source,
min(case when ft.dataset is not null then st.seq_no end) over(partition by st.target) first_matched_seq_no
from st
left join ft on ft.dataset = st.source
) t
where first_matched_seq_no is null or seq_no <= first_matched_seq_no
order by target, seq_no
I have a table A and the output expected is below.
Table A
Id patientId PID
1 123 p1
1 123 p2
1 124 p3
1 124 p4
1 125 p5
2 126 p6
2 126 p7
2 126 p8
2 127 p9
2 127 p10
Count of pid is the count for every patientId how many pids are present and Total count of IDs is the total number of Ids(lets say 5 for Id 1 for example)
Expecting an output like this:
id patientId Count of pid Total count of IDs
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5
I am not sure how to go beyond this
select Id,patientId,count(PID)
from A
group by 1,2
Because you want to count over two different fields, you need two separate GROUP BY subqueries, which can be JOINed on Id:
SELECT A1."Id",
A1."patientId",
A1.num_pids,
A2.total_ids
FROM (SELECT "Id", "patientId", COUNT(*) AS num_pids
FROM A
GROUP BY "Id", "patientId") A1
JOIN (SELECT "Id", COUNT(*) AS total_ids
FROM A
GROUP BY "Id") A2 ON A2."Id" = A1."Id"
ORDER BY "Id", "patientId"
Output:
Id patientId num_pids total_ids
1 123 2 5
1 124 2 5
1 125 1 5
2 126 3 5
2 127 2 5
select a.Id,a.patientId,count(a.patientId), a2.IdCount
from A a
left join (select Id, count(Id) as "IdCount"
from A
group by Id) a2
on a.Id = a2.Id
group by a.Id,a.patientId, a2.IdCount
I think you just want a window function:
select Id, patientId, count(*),
count(*) over ()
from A
group by 1, 2;
The second count(*) counts the number of rows in the result set, which appears to be what you want.
I have a specific scenario to group data in a result set based on a specific format. Below is how my data looks like.
--------------------------------
ID Value
--------------------------------
1 2
2 1
3 1
4 3
5 1
6 1
7 6
8 9
9 1
10 1
I need to group the result set value based on 'Value' column. Data to be grouped from the first instance of non '1' till the last instance of '1'. Individual non '1's need have its own group value. My expected result should be something like this.
------------------------------------
ID Value Group
------------------------------------
1 2 Group1
2 1 Group1
3 1 Group1
4 3 Group2
5 1 Group2
6 1 Group2
7 6 Group3
8 9 Group4
9 1 Group4
10 1 Group4
Groups start with a non-1 value. You can define them by using a cumulative sum:
select t.*,
sum(case when value <> 1 then 1 else 0 end) over (order by id) as grp
from t;
I have Following 3 tables :
SHIFT_MASTER,PATTERN_MASTER,PATTERN_DETAILS
S_ID ,P_ID,P_D_ID are the priamry keys of SHIFT_MASTER,PATTERN_MASTER,PATTERN_DETAILS tables respectively.
SHIFT_MASTER
S_ID | S_NUMBER| S_Name
---------------------------------
1 A MORNING
2 B AFTERNOON
3 C NIGHT
PATTERN_MASTER
P_ID | P_NAME
----------------
1 Pattern 1
2 Pattern 2
PATTERN_DETAILS
P_D_ID|P_ID | S_ID| ...
---------------------
1 1 1
2 1 2
3 1 3
4 1 2
5 1 1
6 2 3
7 2 2
8 2 1
9 2 3
I GOT OUTPUT AS
P_ID | S_ID
1 1,2,3,2,1
2 3,2,1,3
USING QUERY
SELECT PATTERN_DETAILS.P_ID "PATTERN",
LISTAGG(PATTERN_DETAILS.S_ID, ', ')
WITHIN GROUP (ORDER BY PATTERN_DETAILS.P_D_ID) "SHIFT"
FROM PATTERN_DETAILS
GROUP BY PATTERN_DETAILS.P_ID;
WHAT I WANT IS
P_NAME | S_NUMBER
Pattern 1 A,B,C,B,A
Pattern 2 C,B,A,C
Any suggestion ??? Instead of P_ID i want to show pattern name and instead of shift id i want to show shift number .How to perform join operation along with listagg function ?
You need to join all three tables to get this,
SELECT pm.p_name "P_NAME",
listagg(sm.s_number, ', ') WITHIN GROUP (ORDER BY pd.p_d_id) "S_NUMBER"
FROM pattern_master pm,
pattern_details pd,
shift_master sm
WHERE sm.s_id= pd.s_id
AND pm.p_id = pd.p_id
GROUP BY pm.p_name;
We have a table like this:
id mid mult tr result
----------------------------
1 1 1,35 0 1
2 1 5,85 0 2
3 1 4 1 X
50 2 1,3 1 1
51 2 7 0 2
52 2 4 0 X
99 3 2,45 0 1
100 3 2,2 0 2
101 3 3,3 1 X
105 4 2,3 0 1
106 4 2,4 0 2
107 4 3,2 1 X
111 5 3 1 1
112 5 1,9 0 2
113 5 3,25 0 X
What we need is a query that will bring us a table which will count how many times before, for a specific mid(match), have the same mults(multipliers) for result = 1 and result = x have occurred and group them so that we can count them. Something like
select mult(where result = 1), mult(where result = x), count(1)
The result will be like this
mult_1 mult_x count
------------------------------
1,35 4 33
1,3 4 112
The above states that we have seen 33 matches where the mult for result = 1 was 1,35 AND mult for result = x was 4. Also, we found 112 matches where multi for result = 1 was 1,3 AND for result = x mult was 4 (results are not dependant on the first table).
I find it rather complex myself, but hopefully I made sense. I use SQL Server 2008, so any method is more than welcome.
Here is my (final) solution in action: http://www.sqlfiddle.com/#!3/3a516/7
There are a number of assumptions I'm making here.
I'm assuming that every mid value is going to have at most 1 result value of '1' and 1 result of '2'.
I'm assuming that if you want to specify a mid and have the query run off of that.
This assumption turned out to be wrong.
I'm assuming that you want to include the data for the mid you specify.
This assumption turned out to be wrong, it was every match.
I'm assuming that you want to include every mid (or match) before the mid you specify.
This assumption turned out to be wrong, it was every match.
I'm assuming that for any mult_1 and mult_x combination you find in this way, you want the count of how many times that combination occurs, even if it occurs after the match(mid) that you specified.
This assumption turned out to be wrong, it was just for every match.
If all of these assumptions sound right, then this might create the result you're looking for:
DECLARE #Mid INT = 2
;WITH MatchResults AS
(
SELECT
Mid
, [1] AS MultWithResult1
, [X] AS MultWithResultX
FROM
(
SELECT
Mid
, mult
, result
FROM Matches
WHERE result IN ('1', 'X')
) Base
PIVOT
(
MAX(mult)
FOR result
IN
(
[1]
, [X]
)
) Pivoted
)
SELECT
mult.MultWithResult1 AS mult_1
, mult.MultWithResultX AS mult_x
, COUNT(*) AS [count]
FROM MatchResults mult
GROUP BY mult.MultWithResult1
, mult.MultWithResultX
EDIT: I have edited it based on the response to my answer to what I think he means.
Use this query:
select mult_1, mult_x, count() as count
from Matches M
inner join (select distinct M1.mult as mult_1, MX.mult as mult_x from
matches as M1, matches as MX
where M1.result=1 and MX.result=x) M1x
on (M.mult=M1x.mult_1 and M.result=1) or (M.mult=M1x.mult_x and result=x)
group by mult_1, mult_x
EDIT:
I'm supposing you wnat to get the count of all possible combinations of multipliers of matches with result=1 and matches with result = x.
If this is the case, M1x gives all this possible combinations. And you join all the possible matchea with have any of those combinations, and count them, grouping by the possible set of combinations defined by M1x.