Keeping part of a string that overlaps based on a condition BigQuery - sql

I have two tables that look like this
table_1
store_no
store_loc
ID
1234
CAL
ID123
6789
LAL
ID947
5678
PAA
ID456
5678
PAA
ID654
9876
LAS
ID789
table_2
ID
client_no
client_name
product
ID123
1029
John Doe
tent blue
ID947
1029
John Doe
tent red
ID456
4538
Jane Doe
skates 42
ID654
4538
Jane Doe
skates black red
ID789
9234
John Smith
bag green
I am trying to remove the parts of the 'product' that don't overlap if the 'store_no' and 'store_loc' match. So given these two tables I'm looking to get the following as a result:
ID
client_no
client_name
product
ID123
1029
John Doe
tent blue
ID947
1029
John Doe
tent red
ID456
4538
Jane Doe
skates
ID789
9234
John Smith
bag green
As in the example, I don't have a defined strings that I need removed, the string could be a number or a word. That's why I need a way to extract only the part that overlaps.
I think I need to use IF and REGEXP, but I'm not sure how to do it. I don't know how to make sure I'm only keeping the part of the string that overlaps given a condition.

Consider below simple approach
select t.*
from table_1
join table_2 t using (ID)
qualify row_number() over(partition by store_no, store_loc order by ID) = 1
if applied to sample data in your question - output is
Row ID client_no client_name product
1 ID123 1029 John Doe tent blue
2 ID456 4538 Jane Doe skates 42
3 ID947 1029 John Doe tent red
4 ID789 9234 John Smith bag green

Related

SQL Db2 - How to unify two rows in one using datetime

I've got a table where we have registries of employees and where they have worked. In each row, we have the employee's starting date on that place. It's something like this:
Employee ID
Name
Branch
Start Date
1
John Doe
234
2018-01-20
1
John Doe
300
2019-03-20
1
John Doe
250
2022-01-19
2
Jane Doe
200
2019-02-15
2
Jane Doe
234
2020-05-20
I need a query where the data returned looks for the next value, making the starting date on the next branch as the end of the current. Eg:
Employee ID
Name
Branch
Start Date
End Date
1
John Doe
234
2018-01-20
2019-03-20
1
John Doe
300
2019-03-20
2022-01-19
1
John Doe
250
2022-01-19
---
2
Jane Doe
200
2019-02-15
2020-05-20
2
Jane Doe
234
2020-05-20
---
When there is not another register, we assume that the employee is still working on that branch, so we can leave it blank or put a default "9999-01-01" value.
Is there any way we can achieve a result like this using only SQL?
Another approach to my problem would be a query that returns only the row that is in a range. For example, if I look for what branch John Doe worked in 2020-12-01, the query should return the row that shows the branch 300.
You can use LEAD() to peek at the next row, according to a subgroup and ordering within it.
For example:
select
t.*,
lead(start_date) over(partition by employee_id order by start_date) as end_date
from t

SQL - How to return entire row sets where some rows match a given list

Let's say there is a table of medical records. Each visit has a unique ID but is made up of several rows corresponding to various codes/services rendered for the visit.
For example, there could be 3 rows with claimID "John" for each unique procedure code "123", "456", and "789"; 15 rows for "Jane" with codes; 6 rows for "David"...
ID Code
John 123
John 456
John 789
Jane 123
Jane 456
Jane 789
Jane 321
Jane 654
David 123
David 456
David 789
David 987
I have a list of 50 unique procedure codes and want to return the entire set of claim lines (i.e. all rows of "John") where any combination of these 50 codes have been billed with another, but not with themselves ("123" with "321", but not "123" with "123"). If "123" is in my list of 50 but "456" and "789" are not, it should not return the set of "John" claims since only one code of my 50 are present. I hope this makes sense.
Positive Result Codes
123
321
987
The query should return all 5 Jane rows (123 and 321) and all 4 David rows (123 & 987).
ID Code
Jane 123
Jane 456
Jane 789
Jane 321
Jane 654
David 123
David 456
David 789
David 987
Try this code:
;WITH Visits as (
SELECT claimID,COUNT(DISTINCT Code) as CNT FROM tbl_Visits
WHERE Code in (123,123,321,987)
GROUP by claimID
HAVING COUNT(DISTINCT Code) > 1
)
SELECT * FROM tbl_Visits
WHERE claimID in (SELECT claimID FROM Visits);

SQL query: selecting only NULL values from a group?

I'm working with a data table that looks something like this:
Names City Date Color Shape
John Smith Baltimore 8/1/2015 Blue
John Smith Baltimore 8/1/2015 Green
John Smith Baltimore 8/1/2015 Rectangle
John Smith Baltimore 8/1/2015
John Smith Baltimore 8/1/2015 Square
John Smith Baltimore 8/1/2015
Rob Johnson Baltimore 8/1/2015
Rob Johnson Baltimore 8/1/2015
Rob Johnson Baltimore 8/1/2015
Rob Johnson Baltimore 8/1/2015
Rob Johnson Baltimore 8/1/2015
Greg Jackson Philadelphia 8/1/2015
Greg Jackson Philadelphia 8/1/2015
Greg Jackson Philadelphia 8/1/2015
Greg Jackson Philadelphia 8/1/2015 Circle
Greg Jackson Philadelphia 8/1/2015
Tom Green Philadelphia 8/1/2015
Tom Green Philadelphia 8/1/2015
Tom Green Philadelphia 8/1/2015 Red
Tom Green Philadelphia 8/1/2015
Tom Green Philadelphia 8/1/2015
My goal with the query is to SELECT all five of the data types present, but to isolate those values in the Names field that have NULL values in the Color and Shape fields. I'm writing this SQL in MS Access. My query so far looks like this:
SELECT [Names], [City], [Date], [Color], [Shape]
FROM [databasename]
WHERE
(
([Color] IS NULL)
AND
([Shape] IS NULL)
);
From the sample data table, I'd like for the results to only include Rob Johnson, since all rows associated with that Name entry have NULL values for the Color and Shape fields. However, with this query, I'm getting all of the other names as well, with the specific rows with NULL values in the Color and Shape fields being returned.
So, the expected output would look like this:
Names City Date Color Shape
Rob Johnson Baltimore 8/1/2015
I suspect that I need to use a GROUP operator here, but I'm not quite sure how to do that. Any ideas?
I think you want this:
SELECT
DISTINCT [Names], [City], [Date], [Color], [Shape]
FROM [table]
WHERE [Names] NOT IN (
SELECT [Names] FROM [table] WHERE ([Color] IS NOT NULL) OR
([Shape] IS NOT NULL)
);
It can be done in other ways, but this should be close to your original query.
You can use an aggregate and inner join:
SELECT d1.* FROM [database-name] d1
INNER JOIN (
select Names,MAX(Color) as mc,MAX(Shape) as ms
from [database-name]
group by Names
) d2
ON d1.Names = d2.Names
WHERE mc IS NULL
AND ms IS NULL

SQL Server Extract overlapping date ranges (return dates that cross other dates)

How would I go about extracting the overlapping dates from the following table?
ID Name StartDate EndDate Type
==============================================================
1 John Smith 01/01/2014 31/01/2014 A
2 John Smith 20/01/2014 20/02/2014 B
3 John Smith 01/03/2014 28/03/2014 A
4 John Smith 18/03/2014 24/03/2014 B
5 John Smith 01/07/2014 31/07/2014 A
6 John Smith 15/07/2014 31/07/2014 B
7 John Smith 25/07/2014 25/08/2014 C
Based on the first example for John Smith, the dates 01/01/2014 to 31/01/2014 overlap with 20/01/2014 to 20/02/2014, so I am expecting just overlapping period back which is 20/01/2014 to 31/01/2014.
The final result would be:
ID Name StartDate EndDate
==================================================
8 John Smith 20/01/2014 31/01/2014
9 John Smith 18/03/2014 24/03/2014
10 John Smith 15/07/2014 31/07/2014
11 John Smith 25/07/2014 31/07/2014
HELP REQUIRED 10 August 2014
In addition to the above request, I am looking for help or guidance on how to get the following results which should include the dates that overlap and the dates that don't. The ID column is irrelevant.
ID Name StartDate EndDate Type
==================================================
1 John Smith 01/01/2014 19/01/2014 A
8 John Smith 20/01/2014 31/01/2014 AB
2 John Smith 01/02/2014 20/02/2014 B
3 John Smith 01/03/2014 17/03/2014 A
9 John Smith 18/03/2014 24/03/2014 AB
3 John Smith 25/03/2014 28/03/2014 A
5 John Smith 01/07/2014 14/07/2014 A
10 John Smith 15/07/2014 31/07/2014 AB
11 John Smith 25/07/2014 31/07/2014 ABC
7 John Smith 01/08/2014 25/08/2014 C
Although the following image is not an exact reflection of the above, for illustration purposes, I am interested in seeing the dates that overlap (red) and the dates that don't (sky blue) in the same result set.
http://imgur.com/SeR9sY1
If you want just overlapping periods, you can get this with a self join. Do note that the results might be redundant if more than two periods overlap on certain dates.
select ft.name,
(case when max(ft.startdate) > max(ft2.startdate) then max(ft.startdate)
else max(ft2.startdate)
end) as startdate,
(case when min(ft.enddate) > min(ft2.enddate) then min(ft.enddate)
else min(ft2.enddate)
end) as enddate
from followingtable ft join
followingtable ft2
on ft.name = ft2.name and
ft.id < ft2.id and
ft.startdate <= ft2.enddate and
ft.enddate > ft2.startdate
group by ft.name, ft.id, ft2.id;
This doesn't assign the ids. You can do that with row_number() and an offset.

SQL query to return all columns from table, but with a max of 3 duplicate id's

Can someone please lend a hand with this query? I've been fooling with LIMIT or TOP, but I think I'm off track. I want to return all fields from a table, but with a max of 3 duplicate id's in the new table.
Table
id first last
===================
1 John Doe
1 John Doe
1 John Doe
1 John Doe
2 Mary Green
2 Mary Green
3 Stacy Kirk
3 Stacy Kirk
3 Stacy Kirk
3 Stacy Kirk
3 Stacy Kirk
Desired Results (up to 3 ids)
id first last
====================
1 John Doe
1 John Doe
1 John Doe
2 Mary Green
2 Mary Green
3 Stacy Kirk
3 Stacy Kirk
3 Stacy Kirk
Thanks!
since you mentioned TOP, this is for SQL SERVER
SELECT id, first, last
FROM
(
SELECT id, first, last,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY LAST) rn
FROM TABLE1
) s
WHERE s.rn <= 3
SQLFiddle Demo (SQL Server)