Get min and max datetimes from each subgroup order by starttimes - sql

I basically want to group the set by timeline in SQL, I am so out of ideas right now
obviously group by does not work and so does row number.
Any ideas for SQL are really appreciated.
select shift_date,associate_id,name,description , min(START_TRAN_DATE) as startdate, max(end_tran_date) as end_date
from ltu_vt
group by shift_date,associate_id,name,description
**SHIFT_DATE ID NAME DESC START_TRAN_DATE END_TRAN_DATE**
2022-11-13 42 John Doe ADP 2022-11-13 06:31:00.000 2022-11-13 06:31:22.000
2022-11-13 42 John Doe LINE 2022-11-13 06:31:22.000 2022-11-13 06:50:13.000
2022-11-13 42 John Doe HJ 2022-11-13 06:50:13.000 2022-11-13 06:50:13.000
2022-11-13 42 John Doe HJ 2022-11-13 06:52:13.000 2022-11-13 06:52:13.000
2022-11-13 42 John Doe HJ 2022-11-13 06:52:20.000 2022-11-13 06:52:20.000
2022-11-13 42 John Doe HJ 2022-11-13 06:52:25.000 2022-11-13 06:52:25.000
2022-11-13 42 John Doe HJ 2022-11-13 06:52:46.000 2022-11-13 06:52:46.000
2022-11-13 42 John Doe BG 2022-11-13 06:53:58.000 2022-11-13 06:53:58.000
2022-11-13 42 John Doe BG 2022-11-13 06:54:01.000 2022-11-13 06:54:01.000
2022-11-13 42 John Doe HJ 2022-11-13 07:13:49.000 2022-11-13 07:13:49.000
2022-11-13 42 John Doe P2L 2022-11-13 07:14:09.000 2022-11-13 07:14:09.000
2022-11-13 42 John Doe P2L 2022-11-13 07:19:48.000 2022-11-13 07:19:48.000
2022-11-13 42 John Doe ADP 2022-11-13 07:20:00.000 2022-11-13 07:20:00.000
expected output is
**SHIFT_DATE ID NAME DESC START_TRAN_DATE END_TRAN_DATE**
2022-11-13 42 John Doe ADP 2022-11-13 06:31:00.000 2022-11-13 06:31:22.000
2022-11-13 42 John Doe LINE 2022-11-13 06:31:22.000 2022-11-13 06:50:13.000
2022-11-13 42 John Doe HJ 2022-11-13 06:50:13.000 2022-11-13 06:52:46.000
2022-11-13 42 John Doe BG 2022-11-13 06:53:58.000 2022-11-13 06:54:01.000
2022-11-13 42 John Doe HJ 2022-11-13 07:13:49.000 2022-11-13 07:13:49.000
2022-11-13 42 John Doe P2L 2022-11-13 07:14:09.000 2022-11-13 07:19:48.000
2022-11-13 42 John Doe ADP 2022-11-13 07:20:00.000 2022-11-13 07:20:00.000

Please try the following solution.
It is a well known "gaps and islands" problem.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (SHIFT_DATE DATE, ID INT, [NAME] VARCHAR(20), [DESC] varchar(10), START_TRAN_DATE datetime, END_TRAN_DATE DATETIME);
INSERT #tbl (SHIFT_DATE, ID, NAME, [DESC], START_TRAN_DATE, END_TRAN_DATE) VALUES
('2022-11-13', 42, 'John Doe', 'ADP', '2022-11-13 06:31:00.000', '2022-11-13 06:31:22.000'),
('2022-11-13', 42, 'John Doe', 'LINE','2022-11-13 06:31:22.000', '2022-11-13 06:50:13.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:50:13.000', '2022-11-13 06:50:13.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:13.000', '2022-11-13 06:52:13.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:20.000', '2022-11-13 06:52:20.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:25.000', '2022-11-13 06:52:25.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 06:52:46.000', '2022-11-13 06:52:46.000'),
('2022-11-13', 42, 'John Doe', 'BG', '2022-11-13 06:53:58.000', '2022-11-13 06:53:58.000'),
('2022-11-13', 42, 'John Doe', 'BG', '2022-11-13 06:54:01.000', '2022-11-13 06:54:01.000'),
('2022-11-13', 42, 'John Doe', 'HJ', '2022-11-13 07:13:49.000', '2022-11-13 07:13:49.000'),
('2022-11-13', 42, 'John Doe', 'P2L', '2022-11-13 07:14:09.000', '2022-11-13 07:14:09.000'),
('2022-11-13', 42, 'John Doe', 'P2L', '2022-11-13 07:19:48.000', '2022-11-13 07:19:48.000'),
('2022-11-13', 42, 'John Doe', 'ADP', '2022-11-13 07:20:00.000', '2022-11-13 07:20:00.000');
-- DDL and sample data population, end
SELECT series, SHIFT_DATE, ID, NAME, [DESC], MIN(START_TRAN_DATE) AS Date_Start, MAX(END_TRAN_DATE) AS Date_End, COUNT(SHIFT_DATE) AS Shift_Counter
FROM
(
SELECT *,
SUM(IIF([DESC] <> ns, 1, 0)) OVER (ORDER BY START_TRAN_DATE) AS series
FROM
(
SELECT series.*,
LAG([DESC]) OVER (ORDER BY START_TRAN_DATE) AS ns
FROM #tbl AS series
) q
) q
GROUP BY series, SHIFT_DATE, ID, NAME, [DESC]
ORDER BY series;
Output
series
SHIFT_DATE
ID
NAME
DESC
Date_Start
Date_End
Shift_Counter
0
2022-11-13
42
John Doe
ADP
2022-11-13 06:31:00.000
2022-11-13 06:31:22.000
1
1
2022-11-13
42
John Doe
LINE
2022-11-13 06:31:22.000
2022-11-13 06:50:13.000
1
2
2022-11-13
42
John Doe
HJ
2022-11-13 06:50:13.000
2022-11-13 06:52:46.000
5
3
2022-11-13
42
John Doe
BG
2022-11-13 06:53:58.000
2022-11-13 06:54:01.000
2
4
2022-11-13
42
John Doe
HJ
2022-11-13 07:13:49.000
2022-11-13 07:13:49.000
1
5
2022-11-13
42
John Doe
P2L
2022-11-13 07:14:09.000
2022-11-13 07:19:48.000
2
6
2022-11-13
42
John Doe
ADP
2022-11-13 07:20:00.000
2022-11-13 07:20:00.000
1

From your expected output it seems that you want to group by description, so this is how you could do it:
SELECT SHIFT_DATE,
associate_id ID,
Name
description DESC,
MIN(START_TRAN_DATE),
MAX(END_TRAN_DATE)
FROM ltu_vt
GROUP BY SHIFT_DATE, ASSOCIATE_ID, DESCRIPTION
Note that the only information I have is the info you have given, so, if I misunderstood the problem, provide more information, maybe a SQL fiddle as well.

Related

Keeping part of a string that overlaps based on a condition BigQuery

I have two tables that look like this
table_1
store_no
store_loc
ID
1234
CAL
ID123
6789
LAL
ID947
5678
PAA
ID456
5678
PAA
ID654
9876
LAS
ID789
table_2
ID
client_no
client_name
product
ID123
1029
John Doe
tent blue
ID947
1029
John Doe
tent red
ID456
4538
Jane Doe
skates 42
ID654
4538
Jane Doe
skates black red
ID789
9234
John Smith
bag green
I am trying to remove the parts of the 'product' that don't overlap if the 'store_no' and 'store_loc' match. So given these two tables I'm looking to get the following as a result:
ID
client_no
client_name
product
ID123
1029
John Doe
tent blue
ID947
1029
John Doe
tent red
ID456
4538
Jane Doe
skates
ID789
9234
John Smith
bag green
As in the example, I don't have a defined strings that I need removed, the string could be a number or a word. That's why I need a way to extract only the part that overlaps.
I think I need to use IF and REGEXP, but I'm not sure how to do it. I don't know how to make sure I'm only keeping the part of the string that overlaps given a condition.
Consider below simple approach
select t.*
from table_1
join table_2 t using (ID)
qualify row_number() over(partition by store_no, store_loc order by ID) = 1
if applied to sample data in your question - output is
Row ID client_no client_name product
1 ID123 1029 John Doe tent blue
2 ID456 4538 Jane Doe skates 42
3 ID947 1029 John Doe tent red
4 ID789 9234 John Smith bag green

I'd like to have a bigquery query code to join and calculate 2 tables

I have 2 tables in Bigquery and I'd like to merge/join them and doing some calculation.
Here are the tables :
Table A
ID
Name
Score
12-2112
John
23844
12-2310
Matthew
21881
13-6205
Matthew
16721
12-1710
Sonia
13344
12-1710
Sonia
8187
Table B
ID
Name
Games
Score
12-2112
John
Soccer
10291
12-2112
John
Soccer
2271
12-2112
John
Soccer
3211
12-2112
John
Soccer
1625
12-2310
Matthew
Tennis
11551
12-2310
Matthew
Volley
2232
12-2310
Matthew
karate
1861
12-2310
Matthew
Judo
2081
13-6205
Matthew
MMA
5281
13-6205
Matthew
Racing
8681
13-6205
Matthew
Volley
1921
12-1710
Sonia
football
3324
12-1710
Sonia
Volley
2716
12-1710
Sonia
Judo
6718
18-1130
Sonia
football
4281
18-1130
Sonia
Tennis
3199
The score on Table A is the total score of games of Table B. However, not all games on Table B is identify.
So, the final table should look like :
Combined
ID
Name
Games
Score
12-2112
John
null
6446
12-2112
John
Soccer
10291
12-2112
John
Soccer
2271
12-2112
John
Soccer
3211
12-2112
John
Soccer
1625
12-2310
Matthew
null
4156
12-2310
Matthew
Tennis
11551
12-2310
Matthew
Volley
2232
12-2310
Matthew
karaté
1861
12-2310
Matthew
Judo
2081
13-6205
Matthew
null
838
13-6205
Matthew
MMA
5281
13-6205
Matthew
Racing
8681
13-6205
Matthew
Volley
1921
12-1710
Sonia
null
586
12-1710
Sonia
football
3324
12-1710
Sonia
Volley
2716
12-1710
Sonia
Judo
6718
18-1130
Sonia
null
707
18-1130
Sonia
football
4281
18-1130
Sonia
Tennis
3199
I've tried all the joining statements possibles (I know) but the output is not as desired.
The best I did found was a union all (or distinct) with this query :
select
ID,
Name,
null as Games,
Score
from Table A
Union ALL
Select
ID,
Name,
Games,
Score
from Table A
and here is the output :
ID
Name
Games
Score
12-2112
John
null
23844
12-2112
John
Soccer
10291
12-2112
John
Soccer
2271
12-2112
John
Soccer
3211
12-2112
John
Soccer
1625
12-2310
Matthew
null
21881
12-2310
Matthew
Tennis
11551
12-2310
Matthew
Volley
2232
12-2310
Matthew
karaté
1861
12-2310
Matthew
Judo
2081
13-6205
Matthew
null
16721
13-6205
Matthew
MMA
5281
13-6205
Matthew
Racing
8681
13-6205
Matthew
Volley
1921
12-1710
Sonia
null
13344
12-1710
Sonia
football
3324
12-1710
Sonia
Volley
2716
12-1710
Sonia
Judo
6718
18-1130
Sonia
null
8187
18-1130
Sonia
football
4281
18-1130
Sonia
Tennis
3199
The score is not the total of table A - Sum (of Games) score of Table B as expected.
Could you please help me out ?
Thanks
I'm trying to joing query statement between table A and table B and have the score of table A to be the difference from the sum (table A) - sum (table B) for a null "games"
I've tried the following query :
select
ID,
Name,
null as Games,
Score
from Table A
Union ALL
Select
ID,
Name,
Games,
Score
from Table A
Consider below simple approach
select id, name, games,
if(not games is null, score, 2 * score - sum(score) over(partition by id, name)) as score
from (
select * from tableB union all
select id, name, null, score from tableA
)
if applied to sample data in your question - output is

Presenting Data uniformly between two different table presentations with SQL

Hello Everyone I have a problem…
Table 1 (sorted) is laid out like this:
User ID Producer ID Company Number
JWROSE 23401 234
KXPEAR 23903 239
LMWEEM 27902 279
KJMORS 18301 183
Table 2 (unsorted) looks like this:
Client Name City Company Number
Rajat Smith London JWROSE
Robert Singh Cleveland KXPEAR
Alberto Johnson New York City LMWEEM
Betty Lee Dallas KJMORS
Chase Galvez Houston 23401
Hassan Jackson Seattle 23903
Tooti Fruity Boise 27902
Joe Trump Tokyo 18301
Donald Biden Cairo 234
Mike Harris Rome 239
Kamala Pence Moscow 279
Adolf Washington Bangkok 183
Now… Table 1 has all of the User IDs and Producer IDs properly rowed with the Company Number.
I want to pull all the data and correctly sorted….
Client Name City User ID Producer ID Company Number
Rajat Smith London JWROSE 23401 234
Robert Singh Cleveland KXPEAR 23903 239
Alberto Johnson New York City LMWEEM 27902 279
Betty Lee Dallas KJMORS 18301 183
Chase Galvez Houston JWROSE 23401 234
Hassan Jackson Seattle KXPEAR 23903 239
Tooti Fruity Boise LMWEEM 27902 279
Joe Trump Tokyo KJMORS 18301 183
Donald Biden Cairo JWROSE 23401 234
Mike Harris Rome KXPEAR 23903 239
Kamala Pence Moscow LMWEEM 27902 279
Adolf Washington Bangkok KJMORS 18301 183
Query:
Select
b.client_name,
b.city.,
a.user_id,
a.producer_id,
a.company_number
From Table 1 A
Left Join Table 2 B On a.company….
And this is where I don’t know what do to….because both tables have all the same variables, but Company Number in Table 2 is mixed with User IDs and Producer IDs... however we know what company Number those ID's are associated to.
As I mention in the comments, and others do, the real problem is your design. "The fact that UserID is clearly a varchar, while the other 2 columns are an int really does not make this any better", and makes this not simple (and certainly not SARGable).
To get the data in the correct order, as well, you need a column to order it on which the data lacks. I have therefore added a pseudo column, MissingIDColumn, to represent this missing column you need to add to your data; which you can do when you fix the design:
SELECT T2.ClientName,
T2.City,
T1.UserID,
T1.ProducerID,
T1.CompanyNumber
FROM (VALUES('JWROSE',23401,234),
('KXPEAR',23903,239),
('LMWEEM',27902,279),
('KJMORS',18301,183))T1(UserID,ProducerID,CompanyNumber)
JOIN (VALUES(1,'Rajat Smith ','London ','JWROSE'),
(2,'Robert Singh ','Cleveland ','KXPEAR'),
(3,'Alberto Johnson ','New York City','LMWEEM'),
(4,'Betty Lee ','Dallas ','KJMORS'),
(5,'Chase Galvez ','Houston ','23401'),
(6,'Hassan Jackson ','Seattle ','23903'),
(7,'Tooti Fruity ','Boise ','27902'),
(8,'Joe Trump ','Tokyo ','18301'),
(9,'Donald Biden ','Cairo ','234'),
(10,'Mike Harris ','Rome ','239'),
(11,'Kamala Pence ','Moscow ','279'),
(12,'Adolf Washington','Bangkok ','183'))T2(MissingIDColumn,ClientName,City,CompanyNumber) ON T2.CompanyNumber IN (T1.UserID,CONVERT(varchar(6),T1.ProducerID),CONVERT(varchar(6),T1.CompanyNumber))
ORDER BY MissingIDColumn;

Pandas split string on matching substring from list

I have not been able to find an answer to the question how to split strings in rows that have substrings matching values in a list (not a part of the dataframe). In other words, I need to split/extract the substrings that match any of the values in a dynamic list from a Series rows. There are many answers on how to mark such rows as True/False or how to split on a match to a static list, but I am stuck at trying to combine both tasks in one. Any help will be greatly appreciated.
Example:
Series - Mr. John Doe, Ms. Jane Smith, Dr. Who, Dr. No, Doctor Doolittle, Mister X, Batman
List 1 - Dr., Doctor
Output - Mr. John Doe, Ms. Jane Smith, Who, No, Doolittle, Mister X, Batman
List 2 - Mr, Mister
Output - John Doe, Ms. Jane Smith, Dr. Who, Dr. No, Doctor Doolittle, X, Batman
s = pd.Series('Mr. John Doe, Ms. Jane Smith, Dr. Who, Dr. No, Doctor Doolittle, Mister X, Batman'.split(', '))
l = ['Dr. ', 'Doctor ']
list(s.str.replace('({})'.format('|'.join(l)), ''))
Out:
['Mr. John Doe',
'Ms. Jane Smith',
'Who',
'No',
'Doolittle',
'Mister X',
'Batman']
l = ['Mr. ', 'Mister ']
list(s.str.replace('({})'.format('|'.join(l)), ''))
Out:
['John Doe',
'Ms. Jane Smith',
'Dr. Who',
'Dr. No',
'Doctor Doolittle',
'X',
'Batman']

Finding user names with particular characters in their last name

In Oracle SQL, I wish to find all user names,whose Last name contains some specific Characters like ('Z','X','D','F') Or contains Some Range of Characters from 'A-F'.
I tried this:
SELECT user_last_name
FROM userbase
WHERE user_last_name LIKE '%A%' OR user_last_name LIKE '%F%' .
So How can I go about it.
And I written facebook in question although its not directly related to it,because,its the example given by my sir.
UPDATE: I tried the above code,it works with so many OR's, but How can we define some range to search for like IN('Z','X','D','F') or IN ('A-F')
If you are using a recent version of Oracle, you should be able to use regular expressions
SQL> ed
Wrote file afiedt.buf
1 select first_name, last_name
2 from employees
3* where regexp_like( last_name, 'Z|X|D|[A-F]' )
SQL> /
FIRST_NAME LAST_NAME
-------------------- -------------------------
Ellen Abel
Sundar Ande
Mozhe Atkinson
David Austin
Hermann Baer
Shelli Baida
Amit Banda
Elizabeth Bates
Sarah Bell
David Bernstein
Laura Bissot
Harrison Bloom
Alexis Bull
Anthony Cabrio
Gerald Cambrault
Nanette Cambrault
John Chen
Kelly Chung
Karen Colmenares
Curtis Davies
Lex De Haan
Julia Dellinger
Jennifer Dilly
Louise Doran
Bruce Ernst
Alberto Errazuriz
Britney Everett
Daniel Faviet
Pat Fay
Kevin Feeney
Jean Fleaur
Tayler Fox
Adam Fripp
Samuel McCain
Allan McEwen
Donald OConnell
Eleni Zlotkey
37 rows selected.
If you want the search to be case-insensitive
SQL> ed
Wrote file afiedt.buf
1 select first_name, last_name
2 from employees
3* where regexp_like( last_name, 'Z|X|D|[A-F]', 'i' )
SQL> /
FIRST_NAME LAST_NAME
-------------------- -------------------------
Ellen Abel
Sundar Ande
Mozhe Atkinson
David Austin
Hermann Baer
Shelli Baida
Amit Banda
Elizabeth Bates
Sarah Bell
David Bernstein
Laura Bissot
Harrison Bloom
Alexis Bull
Anthony Cabrio
Gerald Cambrault
Nanette Cambrault
John Chen
Kelly Chung
Karen Colmenares
Curtis Davies
Lex De Haan
<<snip>>
Matthew Weiss
Jennifer Whalen
Eleni Zlotkey
93 rows selected.
You can use INSTR to test whether characters exist in your field, like so:
SELECT
user_last_name
FROM
userbase
WHERE 0 < INSTR(user_last_name,'A')
or 0 < INSTR(user_last_name,'B')
...repeat for each character you want to test for...