SQL how to "join" two tables - sql

can't imagine simple solution. I've two tables.
table 1 (about 300 rows)
id
name
time
ID1
peter
12:00:00
ID2
alice
12:33:00
ID3
tom
08:00:00
table 2 (about 3'000'000 rows)
id
time
arg1
ID1
12:00:00
23
ID1
11:00:00
34
ID2
12:45:00
21
ID2
12:33:00
22
ID2
08:00:00
12
ID3
08:00:00
21
ID1
08:00:00
23
need output table like this
id
name
time
arg1
ID1
peter
12:00:00
23
ID2
alice
12:33:00
22
ID3
tom
08:00:00
21

Select t1.ID, t1.time, t1.name, t2.arg1 from table1 t1
Inner join table2 t2 on t2.id = t1.id
Where t1.time=t2.time

Related

SQL Query a Table grouped by some columns but taking the first row ordered by another column [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 1 year ago.
With a summary table below
CardNumber
Name
LastName
LastEntrance
123
Name1
Lastname1
2021-12-01 18:00:00
123
Name2
Lastname2
2021-12-01 17:00:00
234
Name3
Lastname3
2021-12-01 10:00:00
234
Name5
Lastname5
2021-12-01 09:00:00
567
Name4
Lastname4
2021-12-01 16:00:00
I want to have a table with unique CardNumber, Name, LastName grouped rows with the most recent LastEntrance of. My result table should be:
CardNumber
Name
LastName
LastEntrance
123
Name1
Lastname1
2021-12-01 18:00:00
234
Name3
Lastname3
2021-12-01 10:00:00
567
Name4
Lastname4
2021-12-01 16:00:00
Could I query this table with a simple SQL query?
I like the logic of crossing with itself and compare every row with the another one if it's bigger. Then only show the crossed ones. No subqueries, more performance, more easy to read.
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.CardNumber = t2.CardNumber
AND t1.Name = t2.Name
AND t1.LastName = t2.LastName
AND t1.LastEntrance < t2.LastEntrance)
WHERE t2.CardNumber IS NULL;
You don't need to group by at all. since you just want to most recent you can do as below. By querying most recent LastEntrance from your table and using that in your main query.
Select t.CardNumber,t.Name,t.LastName,t.LastEntrance from tableX t
where t.LastEntrance= (select Max(a.LastEntrance) from tableX a where a.CardNumber=t.CardNumber)
order by t.CardNumber
Use a subquery to find the max(LastEntrance) by employee to get the desired results
select *
from cards cards1
where LastEntrance = (select Max(LastEntrance)
from cards cards2
where cards1.CardNumber = cards2.CardNumber)

Date and value conversion

I'm Inserting the ID's from table2 which is not exist in the table1. Table1 have perfect datatype in date(datetime), Gen(varchar(1)) but Table2 has different datatype for the same column date(varchar(255)), Gen(float) 1-M,2-F.
I share the problem in sample set.
Table1
ID date Gen
193 1996-03-26 00:00:00 M
446 1997-09-20 00:00:00 F
689 1997-02-21 00:00:00 F
612 1993-10-19 00:00:00 M
Table2
ID date Gen
123 1993-03-02 00:00:00 1
456 2019-10-19 11:50:13.913 2
689 1997-02-21 00:00:00 2
789 2019-11-04 08:06:36.71 1
012 2000-10-02 07:11:19 1
I need to append the new ID's in table1. while using the insert query how can I convert the date and Gen variable like table1 format.
Result:
Table1
ID date Gen
193 1996-03-26 00:00:00 M
446 1997-09-20 00:00:00 F
689 1997-02-21 00:00:00 F
612 1993-10-19 00:00:00 M
123 1993-03-02 00:00:00 M
456 2019-10-19 00:00:00 F
789 2019-11-04 00:00:00 M
012 2000-10-02 00:00:00 M
If you want to insert the rows in table2 that are not in table1, you can use insert with filtering logic:
insert into table1 (id, date, gen)
select t2.id, t2.date, (case when gen = 1 then 'M' else 'F' end)
from table2 t2
where not exists (select 1 from table1 t1 where t2.id = t1.id);

T-SQL max date and min date between two date

First, thanks for your time and your help!
I have two tables:
Table 1
PersId name lastName city
---------------------------------------
1 John Smith Tirana
2 Leri Nice Tirana
3 Adam fortsan Tirana
Table 2
Id PersId salesDate
--------------------------------------------
1 1 2017-01-22 08:00:40 000
2 2 2017-01-22 09:00:00 000
3 1 2017-01-22 10:00:00 000
4 1 2017-01-22 20:00:00 000
5 3 2017-01-15 09:00:00 000
6 1 2017-01-21 09:00:00 000
7 1 2017-01-21 10:00:00 000
8 1 2017-01-21 18:55:00 000
I would like to see the first recent sales between two dates according to each city for each day I want to bring it empty if I do not have a sale
SalesDate > '2017-01-17 09:00:00 000'
and SalesDate < '2017-01-23 09:00:00 000'
Table 2, id = 5 because the record is not in the specified date range
If I wanted my results to look like
Id PersId MinSalesDate MaxSalesDate City
-----------------------------------------------------------------------------
1 1 2017-01-22 08:00:40 000 2017-01-22 20:00:00 000 Tirana
2 2 2017-01-22 09:00:00 000 null Tirana
3 3 null null Tirana
4 1 2017-01-21 09:00:00 000 2017-01-21 18:55:00 000 Tirana
You dont identify how to get ID in the result. You appear to just want Row_Number(). I will leave that out, but this should get you started. You may have to work out conversion issues in the data range check, and I havent checked the query for syntax errors, I will leave that to you.
Select T1.PersId, City
, Min(T2.salesDate) MinSalesDate
, Max(T2.salesDate) MaxSalesDate
From Table1 T1
Left Join Table2 T2
On T1.PersId = T2.PersId
And T2.salesDate Between '2017-01-17 09:00:00 000' And < '2017-01-23 09:00:00 000'
Group BY T1.PersId, T2.City
Try the following using row_number to get min and max sale dates:
SELECT
T2.Id, T1.PersId, T2.MIN_salesDate, T2.MAX_salesDate, T1.City
FROM Table1 T1
LEFT JOIN
(
SELECT MIN(Id) as Id, PersId, MIN(salesDate) as MIN_salesDate, MAX(salesDate) as MAX_salesDate
FROM
(
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY PersId ORDER BY salesDate ASC) as RNKMIN
,ROW_NUMBER() OVER (PARTITION BY PersId ORDER BY salesDate DESC) as RNKMAX
FROM Table2 T2
WHERE salesDate Between '2017-01-17 09:00:00 000' And '2017-01-23 09:00:00 000'
) temp
WHERE RNKMIN = 1 or RNKMAX = 1
GROUP BY PersId
) T2
on T1.PersId = T2.PersId

Replace Null with Previous Known Row Value in the Same Column in Pig/Hive

I have one data set with null values in one column:
price time id
1 12:00:00 id1
10 12:00:00 id2
NULL 12:05:00 id1
NULL 12:05:00 id2
NULL 12:10:00 id2
2 12:10:00 id1
3 12:15:00 id1
NULL 12:20:00 id1
NULL 12:25:00 id1
4 12:30:00 id1
I want to add value to row which are null with previous known row value for each of the id/time in Pig or Hive.
So, the output should be:
price time id
1 12:00:00 id1
10 12:00:00 id2
**1** 12:05:00 id1
**10** 12:05:00 id2
**10** 12:10:00 id2
2 12:10:00 id1
3 12:15:00 id1
**3** 12:20:00 id1
**3** 12:25:00 id1
4 12:30:00 id1
Many Thanks in advance.
Edit: This is what I am running in hive:
Select price,time, id,last_value(price,true) over (partition by id order by time) as LatestPrice from table;
It's working fine for some of the rows (1000s), but for a larger set (24 M rows) after completion of 100% mapper and 100% reducer the job is still running from last 1 day. Any suggestion?
you can try something like this.
select
notNullTmp.price, tmp.id, tmp.time
(
select LAG(b.time, 1) over (PARTITION BY a.id ORDER BY a.time) as prev_time, b.time as time, b.id as id
FROM
(
select price, time, id
from table
where price is NOT NULL
) a
JOIN
(
select price, time, id
from table
where price is NULL
)b
ON (a.id = b.id)
) tmp
JOIN
(
select price, time, id
from table
where price is NOT NULL
) notNullTmp
ON (tmp.id = notNullTmp.id AND tmp.prev_time == notNullTmp.time)
UNION
select price, time, id
from table
where price is NOT NULL;
The idea is to separate null price records fron non-null price records and then, look for each null price record id in non-null price record for an entry.After picking the price for each null-price entry, we join this data with non-null price data.

Compare with Dates and ID?

Using SQL Server 2000
I want to get Table2.TimeIn Table2.TimeOut according to Table1.personid and also If Table1.Date = Table3.Date then it should take a Table3.TimeIn, Table3.TimeOut.
3 Tables
Table1
ID Date
001 20090503
001 20090504
001 20090506
002 20090505
002 20090506
So on…,
Table2
ID TimeIn TimeOut
001 08:00:00 18:00:00
002 08:00:00 21:00:00
So on…,
Table3
ID Date TimeIn TimeOut
001 20090504 10:00:00 22:00:00
001 20090505 17:00:00 23:00:00
002 20090505 12:00:00 21:00:00
So on…,
Select Table1.ID,
Table1.Date,
Table2.TimeIn,
Table2.TimeOut
from Table1
Inner Join Table2 on Table1.ID = Table2.ID
If Table1.Date = Table3.Date then it should take Table3.TimeIn, Table3.TimeOut else Table2.TimeIn, Table2.Timeout
Expected Output
ID Date TimeIn TimeOut
001 20090503 08:00:00 18:00:00
001 20090504 10:00:00 22:00:00
001 20090506 08:00:00 18:00:00
002 20090505 12:00:00 21:00:00
002 20090506 08:00:00 21:00:00
So on…,
How to write a query for this condition?
Employee time schedule fallback?:
SELECT Table1.ID
,Table1.Date
,COALESCE(Table3.TimeIn, Table2.TimeIn) AS TimeIn
,COALESCE(Table3.TimeOut, Table2.TimeOut) AS TimeOut
FROM Table1
INNER JOIN Table2 -- Always have an expected schedule for an employee
ON Table1.ID = Table2.ID
LEFT JOIN Table3 -- May.may not have an actual schedule for an employee
ON Table3.ID = Table1.ID
AND Table3.Date = Table1.Date
/*
ORDER BY Table1.ID
,Table1.Date
*/