Teradata SQL code to join against a string - sql

I have a table A that has the below values
+----+----------+-----------------------+
| ID | Date | Name |
+----+----------+-----------------------+
| 1 | 1/4/2019 | Kara,Sara,John |
| 2 | 3/2/2018 | Sara |
| 3 | 4/3/2019 | Lynn,John,Chris,Agnes |
| 4 | 2/1/2020 | Phillip, Anton |
| 5 | 5/1/2020 | Quinn |
| 6 | 7/6/2020 | Idie,John |
+----+----------+-----------------------+
And a table B that has the below values
+-------+
| Name |
+-------+
| John |
| Sara |
| Chris |
+-------+
I would like the output to be as below:
+----+----------+-----------------------+--------+-----------------+
| ID | Date | Name | B.Name | Exists in List? |
+----+----------+-----------------------+--------+-----------------+
| 1 | 1/4/2019 | Kara,Sara,John | Sara | Yes |
| 1 | 1/4/2019 | Kara, Sara, John | John | Yes |
| 2 | 3/2/2018 | Sara | Sara | Yes |
| 3 | 4/3/2019 | Lynn,John,Chris,Agnes | John | Yes |
| 3 | 4/3/2019 | Lynn,John,Chris,Agens | Chris | Yes |
| 4 | 2/1/2020 | Phillip, Anton | | No |
| 5 | 5/1/2020 | Quinn | | No |
| 6 | 7/6/2020 | Idie,John | John | Yes |
+----+----------+-----------------------+--------+-----------------+
I tried using CONTAINS but looks like teradata sql does not accept it. Tried CSVLD to convert text to column.However since there is no fixed number of commas that the string can accept, I cannot use CSVLD function if I do not know precisely how many columns I need to re-create from the text beforehand.
Wondering if there is any alternative to join a column against a string of values? Appreciate your kind input.

You should really fix your data model -- storing multiple values in a string is a bad, bad data design. SQL has a great way of storing lists -- it is called a table.
Assuming you are stuck with someone else's really, really bad data model, you can use a left join:
select a.*, b.name,
(case when b.name is not null then 'Yes' else 'No' end) as in_list
from a left join
b
on ',' || a.name || ',' like '%,' || b.name || ',%';

Related

SQL, query to check and list distinct entries that occur in another table within a specific time frame

I'm using Oracle.
I have two tables. One contains users and the other is an access log of sorts. I need to list all users whose latest log entry appears in the log within a specified time frame including the timestamp of the latest entry. A single user can have several entries in the log.
Here are simplified versions of the tables:
Users
|----------------------------------|
| userid| username | name |
|----------------------------------|
| 1 | josm | John Smith |
| 2 | lajo | Laura Jones |
| 3 | miwi | Mike Williams |
| 4 | subo | Susan Brown |
| 5 | peda | Peter Davis |
| 6 | jami | Jane Miller |
|----------------------------------|
Log
|----------------------------------|
| userid| action | timestamp |
|----------------------------------|
| 3 | a | 20-01-2020 |
| 2 | v | 19-11-2019 |
| 2 | y | 02-11-2019 |
| 4 | b | 15-09-2019 |
| 1 | a | 23-05-2019 |
| 6 | y | 22-05-2019 |
| 3 | b | 16-04-2019 |
| 2 | a | 07-01-2019 |
| 5 | v | 18-11-2018 |
| 6 | a | 12-09-2018 |
|----------------------------------|
Desired result if the time frame is set to last six months:
|---------------------------------------|
| username | name | timestamp |
|--------------------------|------------|
| miwi | Mike Williams | 20-01-2020 |
| lajo | Laura Jones | 19-11-2019 |
| subo | Susan Brown | 15-09-2019 |
|---------------------------------------|
Any help will be greatly appreciated.
You can use aggregation:
select u.username, u.userid, max(l.timestamp)
from logs l join
users u
on l.userid = u.userid
group by u.username, u.userid
having max(l.timestamp) >= add_months(sysdate, -6)

SQL Server - Pivot Out Delimited Column Data Into Rows

I have two columns of delimited data that I would like to pivot out into individual rows for each data item. In the starting table below, the delimited data is represented in the DataPointA and DataPointB columns. Also, note that each ID is a unique identifier for each person. The starting table looks like this:
----------------------------------------------------------
| ID | FirstName | LastName | DataPointA | DataPointB |
----------------------------------------------------------
| A1234 | Bill | Jones | 1,3,7,8 | 1,4 |
| B5678 | Jane | Smith | 2,4,6,9 | 1,5 |
----------------------------------------------------------
I would like to take the DataPoint column data that is delimited by commas and create one row for each DataPoint value, while also condensing into one field. So the end result will look like this:
-------------------------------------------------------------
| ID | FirstName | LastName | DataPoint | DataPointType |
-------------------------------------------------------------
| A1234 | Bill | Jones | 1 | A |
| A1234 | Bill | Jones | 3 | A |
| A1234 | Bill | Jones | 7 | A |
| A1234 | Bill | Jones | 8 | A |
| A1234 | Bill | Jones | 1 | B |
| A1234 | Bill | Jones | 4 | B |
| B5678 | Jane | Smith | 2 | A |
| B5678 | Jane | Smith | 4 | A |
| B5678 | Jane | Smith | 6 | A |
| B5678 | Jane | Smith | 9 | A |
| B5678 | Jane | Smith | 1 | B |
| B5678 | Jane | Smith | 5 | B |
-------------------------------------------------------------
My first instinct was to use UNPIVOT but I am not able to get it to work on two columns. Is there another method I should be using? Thank you in advance.
You don't need pivoting. You need string splitting:
select t.ID, t.FirstName, t.LastName, v.*
from t cross apply
(select 'A' as DataPointType, a.value as DataPoint
from string_split(t.DataPointA, ',') a
union all
select 'B' as DataPointType, b.value as DataPoint
from string_split(t.DataPointB, ',') b
) ab;
string_split() is only available in the most recent versions of SQL Server. In older versions, you can use your own split function, which can readily be found on the web.

SQL 'Sum' Text Fields, Delim with commas

I have a table like this:
+----+-------+-----------------+
| ID | Name | Email |
+----+-------+-----------------+
| 1 | Jane | Jane#doe.com |
| 2 | Will | Will#gmail.com |
| 3 | Will | wsj#example.com |
| 4 | Jerry | jj2#test.com |
+----+-------+-----------------+
Unfortunately I have records that are duplicates due to multiple emails. I would like to run a sql query to generate this:
+----+-------+---------------------------------+
| ID | Name | Email |
+----+-------+---------------------------------+
| 1 | Jane | Jane#doe.com |
| 2 | Will | Will#gmail.com, wsj#example.com |
| 4 | Jerry | jj2#test.com |
+----+-------+---------------------------------+
I know with numbers you'd do something like this, but I don't know how to 'sum' text fields:
SELECT *,
SUM(Number_Field) AS Number_Field,
FROM table
Thanks!
Edit: I am using MS Access

Use max of column and if null use min

I have a table with 10 milestones in the column milestone. The column milestone_achieved has either the value OK or NULL.
The name column has just names, whenever someone new enters, all the milestones are entered in the database with NULL.
Here is what a typical table looks like:
+------+-----------+--------------------+
| name | milestone | milestone_achieved |
+------+-----------+--------------------+
| John | 1 | OK |
| John | 2 | OK |
| John | 3 | NULL |
| John | 4 | NULL |
| John | 5 | NULL |
| John | 6 | NULL |
| Mary | 1 | OK |
| Mary | 2 | OK |
| Mary | 3 | OK |
| Mary | 4 | OK |
| Mary | 5 | OK |
| Mary | 6 | OK |
| Tim | 1 | NULL |
| Tim | 2 | NULL |
| Tim | 3 | NULL |
| Tim | 4 | NULL |
| Tim | 5 | NULL |
| Tim | 6 | NULL |
+------+-----------+--------------------+
Now I want the SQL query to return:
+------+-----------+--------------------+
| name | milestone | milestone_achieved |
+------+-----------+--------------------+
| John | 2 | OK |
| Mary | 6 | OK |
| Tim | 1 | NULL |
+------+-----------+--------------------+
My query right now looks like this:
SELECT name, MAX(milestone) FROM table HAVING milestone_achieved = 'OK' GROUP BY name
UNION ALL
SELECT name, MIN(milestone) FROM table HAVING milestone_achieved IS NULL AND MIN(milestone) = 1 GROUP BY name
This works in 90% of the cases, the problem occurs when e.g. milestone 1 and 2 was completed, but then milestone 1 was "uncompleted" because it didn't fit the specific criteria or whatever (imagine an assembly line where cars are assembled and a screw =milestone 1 isn't tight enough but the paint =milestone 2 is already on it or whatever else you can imagine, I have terrible imagination).
I am now looking fo a way to properly display those 10% cases.
One method is:
SELECT name, MAX(milestone)
FROM table
WHERE milestone_achieved = 'OK'
GROUP BY name
UNION ALL
SELECT name, MIN(milestone)
FROM table
GROUP BY name
HAVING MIN(milestone_achieved) IS NULL;
This follows the structure of your logic. You can do this with one SELECT:
SELECT name,
COALESCE(MAX(CASE WHEN milestone_achieved = 'OK' THEN milestone END),
MIN(milestone)
)
FROM table
GROUP BY name

SQL only select rows with max date within each user

SQL beginner here. I've got a simple test that users take, and each row is the answer to one of their questions. They're allowed to take the exam once per day, so some people take it a second time on another day, and thus will have many rows with different test dates. What I'm basically trying to do is get each user's most recent score.
Here is what my data looks like (table name is dumdum):
+----------+----------------+----------+------------------+
| USERNAME | CORRECT_ANSWER | RESPONSE | DATE_TAKEN |
+----------+----------------+----------+------------------+
| matt | 1 | 1 | 3/23/15 1:04:26 |
| matt | 2 | 2 | 3/23/15 1:04:28 |
| matt | 3 | 3 | 3/23/15 1:04:23 |
| david | 1 | 3 | 3/20/15 1:04:25 |
| david | 2 | 2 | 3/20/15 1:04:28 |
| david | 3 | 1 | 3/20/15 1:04:30 |
| david | 1 | 1 | 3/21/15 11:03:14 |
| david | 2 | 3 | 3/21/15 11:03:17 |
| david | 3 | 2 | 3/21/15 11:03:19 |
| chris | 1 | 2 | 3/17/15 12:45:52 |
| chris | 2 | 2 | 3/17/15 12:45:56 |
| chris | 3 | 3 | 3/17/15 12:45:59 |
| peter | 1 | 1 | 3/19/15 2:45:33 |
| peter | 2 | 3 | 3/19/15 2:45:35 |
| peter | 3 | 2 | 3/19/15 2:45:38 |
| peter | 1 | 1 | 3/20/15 12:32:04 |
| peter | 2 | 2 | 3/20/15 12:32:05 |
| peter | 3 | 3 | 3/20/15 12:32:05 |
+----------+----------------+----------+------------------+
and what I'm trying to get in the end...
+----------+------------------+-------+
| USERNAME | MOST_RECENT_TEST | SCORE |
+----------+------------------+-------+
| matt | 3/23/2015 | 100 |
| david | 3/21/2015 | 33 |
| chris | 3/17/2015 | 67 |
| peter | 3/20/2015 | 100 |
+----------+------------------+-------+
I ran into some trouble because I need to go by day, and not by day/time, so I had to do a weird maneuver where I went to character and back to date... This is what I have so far, but I can't figure out how to use only the scores from the most recent test (right now it's factoring in all scores from every test ever taken)...
SELECT username, to_date(substr(max(test_date),1,9),'dd-MON-yy') as most_recent_test, round((sum(case when response=correct_answer then 1 end)/3)*100,0) as score
FROM dumdum group by username
Any help would be appreciated! Thanks!
There are several solutions to this problem this one uses the WITH clause and the RANK function.
It also uses the TRUNC function rather than to_date(substr(
with mxDate as
(SELECT USERNAME,
TRUNC(DATE_TAKEN) as MOST_RECENT_TEST,
CASE WHEN CORRECT_ANSWER = RESPONSE THEN 1 else 0 END as SCORE,
RANK () OVER (PARTITION BY USERNAME
ORDER BY TRUNC(DATE_TAKEN) DESC) Rk
FROM dumdum)
SELECT
USERNAME,
MOST_RECENT_TEST,
SUM(SCORE)/3 * 100
FROM
mxDate
WHERE
rk = 1
GROUP BY
USERNAME,
MOST_RECENT_TEST
Demo