An alternative to self-join - sql

I have a table that has a reference to itself like this:
Id Total Prev Session
1 | 10 | NULL | 1
2 | 15 | 1 | 1
3 | 11 | NULL | 2
4 | 29 | 2 | 1
5 | 19 | 3 | 2
6 | 47 | 4 | 1
And I need to get the differences for the specific sessions.
Like this for session 1:
1. 10 -- None to 10
2. 5 -- 10 to 15
3. 14 -- 15 to 29
4. 18 -- 29 to 47
To do this, I use:
SELECT F.Total - P.Total AS Difference
FROM Foo F LEFT OUTER JOIN
Foo P ON F.Prev = P.Id
WHERE Session = #Session
Which is extremely slow.
How can I retrieve these differences faster without altering the table?

You cannot. This is the fastest query possible, although it may become a lot faster if you add an index on Session, Prev and Id.

Related

Select all the records in the first table that match each of the records in the second

I'm working with an Access database and have two tables:
ID_1
Number
Some other data
1
1
Data
2
2
Data
3
3
Data
4
4
Data
5
3
Data
6
1
Data
7
2
Data
8
3
Data
9
1
Data
10
1
Data
11
2
Data
12
3
Data
13
4
Data
14
1
Data
15
2
Data
16
3
Data
17
4
Data
18
3
Data
19
3
Data
ID_2
Number
Some other data
1
3
Data
2
1
Data
3
2
Data
4
3
Data
5
2
Data
As you see, both tables have duplicate data. I need a query that would select all the records in the first table that match each of the records in the second, they are related by Number field. It's also necessary that these records aren't repeated (that is, that the query doesn't repeat values when selecting). For the given example I should get this result:
ID
ID_1
Number
Some other data
1
3
3
Data
2
5
3
Data
3
8
3
Data
4
12
3
Data
5
16
3
Data
6
18
3
Data
7
19
3
Data
8
1
1
Data
9
6
1
Data
10
9
1
Data
11
10
1
Data
12
14
1
Data
13
2
2
Data
14
7
2
Data
15
11
2
Data
16
15
2
Data
I was thinking that maybe I could use Join, but I still don't know how; tried Where, but also didn't find a use for it. Could you please help me with that?
I don't see where you're generating your output ID field from - or where you're picking your Data field from so here's the best guess.
SELECT Table1.ID_1, Table1.Number, Table1.[Some other data]
FROM Table1
WHERE (Table1.Number In (SELECT Number From Table2))
ORDER BY Table1.Number, Table1.ID_1;
Looks like this:
MySql DB data structure
create table tbl1(ID_1 serial, Number int);
create table tbl2(ID_2 serial, Number int);
insert into tbl1(Number) values (1),(2),(3),(4),(3),(1),(2),(3),(1),(1),(2),(3),(4),(1),(2),(3),(4),(3),(3);
insert into tbl2(Number) values (3),(1),(2),(3),(2);
query (with s), needed to remove duplicates
the window function count(tbl1.Number) OVER(PARTITION BY Number) sorts the result for us by the count of matched numbers
the #rownum variable is needed to count rows
with s as (select distinct Number from tbl2),
f as (select ID_1,tbl1.Number from tbl1 left join s on
(tbl1.Number=s.Number) where s.Number is not null order by
count(tbl1.Number) OVER(PARTITION BY Number) desc)
select #rownum := #rownum + 1 AS ID,ID_1,Number from f, (SELECT #rownum := 0) r;
results
+------+------+--------+
| ID | ID_1 | Number |
+------+------+--------+
| 1 | 3 | 3 |
| 2 | 5 | 3 |
| 3 | 8 | 3 |
| 4 | 12 | 3 |
| 5 | 16 | 3 |
| 6 | 18 | 3 |
| 7 | 19 | 3 |
| 8 | 1 | 1 |
| 9 | 6 | 1 |
| 10 | 9 | 1 |
| 11 | 10 | 1 |
| 12 | 14 | 1 |
| 13 | 2 | 2 |
| 14 | 7 | 2 |
| 15 | 11 | 2 |
| 16 | 15 | 2 |
+------+------+--------+

full outer join in redshift

I have 2 tables A and B with columns, containing some details of students (all columns are integer):
A:
st_id,
st_subject_id,
B:
st_id,
st_subject_id,
st_count1,
st_count2
st_id means student id, st_subject_id is subject id.
For student id 15, there are following entries:
A:
15 | 1
15 | 2
15 | 3
B:
15 | 1 | 31 | 11
15 | 2 | 30 | 14
15 | 4 | 21 | 6
15 | 5 | 26 | 9
3 subjects in table A and 4 subjects(2 matching with table A and 2 extra) in table B.
I want to display the final result as:
15 | 1 | 31 | 11
15 | 2 | 30 | 14
15 | 3 | null | null
15 | 4 | 21 | 6
15 | 5 | 26 | 9
Can this be done using full outer join in SQL, or by another method?
I think something like this would suffice, but I can't test right now.
Coalesce means that the first non-null value will be selected from both tables.
select
coalesce(A.st_id, B.st_id) st_id,
coalesce(A.st_subject_id, B.st_subject_id) st_subject_id,
B.st_count1,
B.st_count2
from A
full outer join B
on A.st_id = B.st_id and A.st_subject_id = B.st_subject_id

How to use outer join, conditionally on a value in joined table [duplicate]

I have a table. It has a pk of id and an index of [service, check, datetime].
id service check datetime score
---|-------|-------|----------|-----
1 | 1 | 4 |4/03/2009 | 399
2 | 2 | 4 |4/03/2009 | 522
3 | 1 | 5 |4/03/2009 | 244
4 | 2 | 5 |4/03/2009 | 555
5 | 1 | 4 |4/04/2009 | 111
6 | 2 | 4 |4/04/2009 | 322
7 | 1 | 5 |4/05/2009 | 455
8 | 2 | 5 |4/05/2009 | 675
Given a service 2 I need to select the rows for each unique check where it has the max date. So my result would look like this table.
id service check datetime score
---|-------|-------|----------|-----
6 | 2 | 4 |4/04/2009 | 322
8 | 2 | 5 |4/05/2009 | 675
Is there a short query for this? The best I have is this, but it returns too many checks. I just need the unique checks at it's latest datetime.
SELECT * FROM table where service=?;
First you need find out the biggest date for each check
SELECT `check`, MAX(`datetime`)
FROM YourTable
WHERE `service` = 2
GROUP BY `check`
Then join back to get the rest of the data.
SELECT Y.*
FROM YourTable Y
JOIN ( SELECT `check`, MAX(`datetime`) as m_date
FROM YourTable
WHERE `service` = 2
GROUP BY check) as `filter`
ON Y.`service` = `filter`.service
AND Y.`datetime` = `fiter`.m_date
WHERE Y.`service` = 2

Sum and distinct in acces SQL

I already made a query that this was it result :
7 | 3
8 | 4
8 | 2
8 | 1
10 | 3
12 | 4
12 | 1
13 | 3
I need new query that take this result and return this :
7 | 3
8 | **7**
10 | 3
12 | **5**
13 | 3
In the left column I need that evry number will appears only once,
and in the right column sum the numbers according to the value in the left column as I showed before.
how to do it?
SELECT leftField, SUM(rigthField) as rigthField
FROM YourResult
GROUP BY leftField

In hive, how to do a calculation among 2 rows?

I have this table.
+------------------------------------------------------------+
| ks | time | val1 | val2 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 1 |
| B | 1 | 3 | 5 |
| A | 2 | 6 | 7 |
| B | 2 | 10 | 12 |
| A | 4 | 6 | 7 |
| B | 4 | 20 | 26 |
+------------------------------------------------------------+
What I want to get is for each row,
ks | time | val1 | val1 of next ts of same ks |
To be clear, result of above example should be,
+------------------------------------------------------------+
| ks | time | val1 | next.val1 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 6 |
| B | 1 | 3 | 10 |
| A | 2 | 6 | 6 |
| B | 2 | 10 | 20 |
| A | 4 | 6 | null |
| B | 4 | 20 | null |
+------------------------------------------------------------+
(I need the same next for value2 as well)
I tried a lot to come up with a hive query for this, but still no luck. I was able to write a query for this in sql as mentioned here (Quassnoi's answer), but couldn't create the equivalent in hive because hive doesn't support subqueries in select.
Can someone please help me achieve this?
Thanks in advance.
EDIT:
Query I tried was,
SELECT ks, time, val1, next[0] as next.val1 from
(SELECT ks, time, val1
COALESCE(
(
SELECT Val1, time
FROM myTable mi
WHERE mi.val1 > m.val1 AND mi.ks = m.ks
ORDER BY time
LIMIT 1
), CAST(0 AS BIGINT)) AS next
FROM myTable m
ORDER BY time) t2;
Your query seems quite similar to the "year ago" reporting that is ubiquitous in financial reporting. I think a LEFT OUTER JOIN is what you are looking for.
We join table myTable to itself, naming the two instances of the same table m and n. For every entry in the first table m we will attempt to find a matching record in n with the same ks value but an incremented value of time. If this record does not exist, all column values for n will be NULL.
SELECT
m.ks,
m.time,
m.val1,
n.val1 as next_val1,
m.val2,
n.val2 as next_val2
FROM
myTable m
LEFT OUTER JOIN
myTable n
ON (
m.ks = n.ks
AND
m.time + 1 = n.time
);
Returns the following.
ks time val1 next_val1 val2 next_val2
A 1 1 6 1 7
A 2 6 6 7 7
A 3 6 NULL 7 NULL
B 1 3 10 5 12
B 2 10 20 12 26
B 3 20 NULL 26 NULL
Hope that helps.
I find that using Hive custom map/reduce functionality works great to solve queries similar to this. It gives you the opportunity to consider a set of input and "reduce" to one (or more) results.
This answer discusses the solution.
The key is that you use CLUSTER BY to send all results with similar key value to the same reducer, hence same reduce script, collect accordingly, and then output the reduced results when the key changes, and start collecting for the new key.