SELECTing Related Rows Based on a Single Row Match - sql

I have the following table running on Postgres SQL 9.5:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
|10 | 4567890 | abc123-ab |
|11 | 4567890 | gex890-aj |
|12 | 4567890 | ghi567-ef |
+---+------------+-------------+
I am looking for the rows for each trans_id based on a LIKE query, like this:
SELECT * FROM table
WHERE message LIKE '%def-234%'
This, of course, returns just three rows, the three that match my pattern in the message column. What I am looking for, instead, is all the rows matching that trans_id in groups of messages that match. That is, if a single row matches the pattern, get all the rows with the trans_id of that matching row.
That is, the results would be:
+---+------------+-------------+
|ID | trans_id | message |
+---+------------+-------------+
| 1 | 1234567 | abc123-ef |
| 2 | 1234567 | def234-gh |
| 3 | 1234567 | ghi567-ij |
| 4 | 8902345 | ced123-ef |
| 5 | 8902345 | def234-bz |
| 6 | 8902345 | ghi567-ij |
| 7 | 6789012 | abc123-ab |
| 8 | 6789012 | def234-cd |
| 9 | 6789012 | ghi567-ef |
+---+------------+-------------+
Notice rows 10, 11, and 12 were not SELECTed because there was not one of them that matched the %def-234% pattern.
I have tried (and failed) to write a sub-query to get the all the related rows when a single message matches a pattern:
SELECT sub.*
FROM (
SELECT DISTINCT trans_id FROM table WHERE message LIKE '%def-234%'
) sub
WHERE table.trans_id = sub.trans_id
I could easily do this with two queries, but the first query to get a list of matching trans_ids to include in a WHERE trans_id IN (<huge list of trans_ids>) clause would be very large, and would not be a very inefficient way of doing this, and I believe there exists a way to do it with a single query.
Thank you!

This will do the job I think :
WITH sub AS (
SELECT trans_id
FROM table
WHERE message LIKE '%def-234%'
)
SELECT *
FROM table JOIN sub USING (trans_id);
Hope this help.

Try this:
SELECT ID, trans_id, message
FROM (
SELECT ID, trans_id, message,
COUNT(*) FILTER (WHERE message LIKE '%def234%')
OVER (PARTITION BY trans_id) AS pattern_cnt
FROM mytable) AS t
WHERE pattern_cnt >= 1
Using a FILTER clause in the windowed version of COUNT function we can get the number of records matching the predefined pattern within each trans_id slice. The outer query uses this count to filter out irrelevant slices.
Demo here

You can do this.
WITH trans
AS
(SELECT DISTINCT trans_id
FROM t1
WHERE message LIKE '%def234%')
SELECT t1.*
FROM t1,
trans
WHERE t1.trans_id = trans.trans_id;
I think this will perform better. If you have enough data, you can do an explain on both Sub query and CTE and compare the output.

Related

Can't figure out sql join

I'm using nextcloud to track data via the forms app, the table oc_forms_v2_submissions contains the entries:
SELECT * FROM `oc_forms_v2_submissions` WHERE `form_id` = 3;
+----+---------+--------------------------------------------+------------+
| id | form_id | user_id | timestamp |
+----+---------+--------------------------------------------+------------+
| 8 | 3 | anon-user-96684f301d22e7be44f07780a9bffe06 | 1663789158 |
| 9 | 3 | anon-user-a1eaa4f939b59e00b403c046410788aa | 1663835954 |
| 10 | 3 | anon-user-440d0dbe9c107492b6ec1a06d98004a8 | 1663942458 |
+----+---------+--------------------------------------------+------------+
the second table is oc_forms_v2_answers
SELECT * FROM `oc_forms_v2_answers`;
+----+---------------+-------------+-----------------------+
| id | submission_id | question_id | text |
+----+---------------+-------------+-----------------------+
| 10 | 8 | 7 | foo |
| 11 | 9 | 7 | bar |
| 12 | 10 | 7 | foo |
+----+---------------+-------------+-----------------------+
So basically i need to the take all the id entries from table submissions and match them with submission_id from answers and I want to have the data from the text column.
SELECT oc_forms_v2_submissions.id as submission_id
FROM `oc_forms_v2_submissions`
RIGHT JOIN `oc_forms_v2_answers` ON submission_id=oc_forms_v2_answers.submission_id;
This is all i could come up with so far but that returns only the submission_id field and everything triplicated :-D
+---------------+
| submission_id |
+---------------+
| 8 |
| 8 |
| 8 |
| 9 |
| 9 |
| 9 |
| 10 |
| 10 |
| 10 |
+---------------+
Edit:
The updated query still does not get me the field from oc_forms_answers:
SELECT oc_forms_v2_submissions.id as submission_id
FROM `oc_forms_v2_submissions`
RIGHT JOIN `oc_forms_v2_answers` ON oc_forms_v2_submissions.id=oc_forms_v2_answers.submission_id where form_id="3";
that is because you are comparing to identical columns, you need in the ON Clause, the link columns of both tables
Also you can use alias, to reduce the typing time
The RIGHT JOIN would also combine all answers with thes ubmission, but you you will never have more submission as answer, so a LEFT JOIN would gove ou all submissions even if there is no answer
SELECT oc_forms_v2_submissions.id as submission_id
FROM `oc_forms_v2_submissions`
LEFT JOIN `oc_forms_v2_answers` ON oc_forms_v2_submissions.id=oc_forms_v2_answers.submission_id;
This should do the trick (just update the correct naming of columns and tables)
SELECT s.id as submission_id, a.txt FROM submissions s
LEFT JOIN answers a
ON s.id=a.submission_id;
You can check this here in db-fiddle. I've used your info for creating a DB, so WHERE clause is missing but all the rest should give you results you're after.

How to count the unique rows after aggregating to array

Trying to solve the problem in a read-only manner.
My table (answers) looks like the one below:
| user_id | value |
+----------------+-------------+
| 6 | pizza |
| 6 | tosti |
| 9 | fries |
| 9 | tosti |
| 10 | pizza |
| 10 | tosti |
| 12 | pizza |
| 12 | tosti |
| 13 | sushi | -> did not finish the quiz.
NOTE: the actual table has 15+ different possible values. (Answers to questions).
I've been able to make create the table below:
| value arr | count | user_id |
+----------------+--------------+-----------+
| pizza, tosti | 2 | 6 |
| fries, tosti | 2 | 9 |
| pizza, tosti | 2 | 10 |*
| pizza, tosti | 2 | 12 |*
| sushi | 1 | 13 |
I'm not sure if the * rows show up in my current query (DB has 30k rows and 15+ value options). The problem here is that "count" is counting the number of answers and not the number of unique outcomes.
Current query looks a bit like:
select string_agg(DISTINCT value, ',' order by value) AS value, user_id,
COUNT(DISTINCT value)
FROM answers
GROUP BY user_id;
Looking for the unique answer combinations like the table shown below:
| value arr | count unique |
+----------------+--------------+
| pizza, tosti | 3 |
| fries, tosti | 1 |
| sushi | 1 | --> Hidden in perfect situation.
Tried a bunch of queries, both written and generated by tools. From super simplified to quite complex, I keep ending up with the answers being count instead of the unique combination accros users.
If this is a duplicate question, please re-direct me to it. Learned a lot these last few days, but haven't been able to find the answer yet.
Any help would be highly appreciated.
Here's what you need. Your almost there.
select t1.value, count(1) From (
select string_agg(DISTINCT value, ',' order by value) AS value, user_id
FROM answers
GROUP BY user_id) t1
group by t1.value;
You can try (this is for SQL Server):
select count(*), string_agg(value, ",")
within group (order by value) as count_unique
from answers
group by string_agg(value, ",")

SQL GROUPING with conditional

I am sure this is easy to accomplish but after spending the whole day trying I had to give up and ask for your help.
I have a table that looks like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 2222222 | 20180802 | Y | 2 |
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 234453656 | 4443232 | 20180506 | N | NULL |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
Basically I have a list of PatientIDs, each patient can have multiple visits (VisitID and DateOfVisit). FollowUp(Y/N) specifies whether the patients has to be seen again and in how many weeks (FollowUpWks).
Now, what I need is a query that extracts PatientsID, DateOfVisit (the most recent one and only if FollowUp is YES) and the FollowUpWks field.
Final result should look like this
| PatientID | VisitId | DateOfVisit | FollowUp(Y/N) | FollowUpWks |
----------------------------------------------------------------------
| 123456789 | 3333333 | 20180902 | Y | 4 |
| 455344243 | 2446364 | 20180618 | Y | 12 |
----------------------------------------------------------------------
The closest I could get was with this code
SELECT PatientID,
Max(DateOfVisit) AS LastVisit
FROM mytable
WHERE FollowUp = True
GROUP BY PatientID;
The problem is that when I try adding the FollowUpWks field to the SELECT I get the following error: "The query does not include the specified expression as part of an aggregate function." However, if I add FollowUpWks to the GROUP BY statement than I get all visits, not just the most recent ones.
You need to match back to the most recent visit. One method uses a correlated subquery:
SELECT t.*
FROM mytable as t
WHERE t.FollowUp = True AND
t.DateOfVisit = (SELECT MAX(t2.DateOfVisit)
FROM mytable as t2
WHERE t2.PatientID = t.PatientID
);

In Hive, what is the difference between explode() and lateral view explode()

Assume there is a table employee:
+-----------+------------------+
| col_name | data_type |
+-----------+------------------+
| id | string |
| perf | map<string,int> |
+-----------+------------------+
and the data inside this table:
+-----+------------------------------------+--+
| id | perf |
+-----+------------------------------------+--+
| 1 | {"job":80,"person":70,"team":60} |
| 2 | {"job":60,"team":80} |
| 3 | {"job":90,"person":100,"team":70} |
+-----+------------------------------------+--+
I tried the following two queries but they all return the same result:
1. select explode(perf) from employee;
2. select key,value from employee lateral view explode(perf) as key,value;
The result:
+---------+--------+--+
| key | value |
+---------+--------+--+
| job | 80 |
| team | 60 |
| person | 70 |
| job | 60 |
| team | 80 |
| job | 90 |
| team | 70 |
| person | 100 |
+---------+--------+--+
So, what is the difference between them? I did not find suitable examples. Any help is appreciated.
For your particular case both queries are OK. But you can't use multiple explode() functions without lateral view. So, the query below will fail:
select explode(array(1,2)), explode(array(3, 4))
You'll need to write something like:
select
a_exp.a,
b_exp.b
from (select array(1, 2) as a, array(3, 4) as b) t
lateral view explode(t.a) a_exp as a
lateral view explode(t.b) b_exp as b

Need Oracle Query to Retrieve info from tables with 1 to Many Relationship

I have tables with information similar to the following:
Table A is a list of circuits:
Circuit | CktType | CktSize
--------------------------------
CKT1 | ABC123 | 10
CKT2 | ABC123 | 12
CKT3 | XYZ789 | 10
Table B is a list of Raceway:
Raceway | RwyType | RwySize
--------------------------------
RWY1 | C | 4
RWY2 | T | 4x6
RWY3 | T | 8x12
Table C is a list of how the circuits go through the Raceway:
Circuit | Sequence | Raceway
--------------------------------
CKT1 | 1 | RWY1
CKT1 | 2 | RWY2
CKT1 | 3 | RWY3
CKT2 | 1 | RWY2
Table C may or may not have entries for all items in tables A and B. There is not a set number or a maximum number of entries in table C for each item in tables A and B.
I would like to write 2 queries in Oracle to retrieve the following data (clearly the queries would be very similar so only really looking for help writing one of them).
All Circuit information with the raceways the circuit goes through
Results Desired:
Circuit | CktType | CktSize | Raceway
----------------------------------------------
CKT1 | ABC123 | 10 | RWY1, RWY2, RWY3
CKT2 | ABC123 | 12 | RWY2
CKT3 | XYZ789 | 10 | (null)
All Raceway information with the circuits in the raceway:
Results Desired:
Raceway | RwyType | RwySize | Circuit
----------------------------------------------
RWY1 | C | 4 | CKT1
RWY2 | T | 4x6 | CKT1, CKT2
RWY3 | T | 8x12 | CKT1
Thanks in advance.
This would be one of your two queries. That produces each circuit information and against it the Raceway sequence separated by commas.. check it out.
SELECT Circuit,
CktType,
CktSize,
RTRIM (
XMLAGG (XMLELEMENT (e, Raceway || ', ') ORDER BY Sequence).EXTRACT (
'//text()'),
', ')
Raceways
FROM (SELECT t_A.Circuit,
t_A.CktType,
t_A.CktSize,
t_C.Raceway,
t_c.Sequence
FROM tableA t_A
LEFT OUTER JOIN
tableC t_C
ON t_A.Circuit = t_C.Circuit)
GROUP BY Circuit;
EDIT: After re-reading your post, I realized this would not work for you. Try the "For XML PATH".
Here is a great example: sql-query-concatenating-results-into-one-string