LIMIT the return of non-unique values - sql

I have two tables. Posts and Replies. Think of posts as a blog entry while replies are the comments.
I want to display X number of posts and then the latest three comments for each of the posts.
My replies has a foreign key "post_id" which matches the "id" of every post.
I am trying to create a main page that has something along the lines of
Post
--Reply
--Reply
--Reply
Post
--Reply
so on and so fourth. I can accomplish this by using a for loop in my template and discarding the unneeded replies but I hate grabbing data from a db I won't use. Any ideas?

This is actually a pretty interesting question.
HA HA DISREGARD THIS, I SUCK
On edit: this answer works, but on MySQL it becomes tediously slow when the number of parent rows is as few as 100. However, see below for a performant fix.
Obviously, you can run this query once per post: select * from comments where id = $id limit 3 That creates a lot of overhead, as you end up doing one database query per post, the dreaded N+1 queries.
If you want to get all posts at once (or some subset with a where) the following will surprisingly work. It assumes that comments have a monotonically increasing id (as a datetime is not guaranteed to be unique), but allows for comment ids to be interleaved among posts.
Since an auto_increment id column is monotonically increasing, if comment has an id, you're all set.
First, create this view. In the view, I call post parent and comment child:
create view parent_top_3_children as
select a.*,
(select max(id) from child where parent_id = a.id) as maxid,
(select max(id) from child where id < maxid
and parent_id = a.id) as maxidm1,
(select max(id) from child where id < maxidm1
and parent_id = a.id) as maxidm2
from parent a;
maxidm1 is just "max id minus 1"; maxidm2, "max id minus 2" -- that is, the second and third greatest child ids within a particular parent id.
Then join the view to whatever you need from the comment (I'll call that text):
select a.*,
b.text as latest_comment,
c.text as second_latest_comment,
d.text as third_latest_comment
from parent_top_3_children a
left outer join child b on (b.id = a.maxid)
left outer join child c on (c.id = a.maxidm1)
left outer join child d on (c.id = a.maxidm2);
Naturally, you can add whatever where clause you want to that, to limit the posts: where a.category = 'foo' or whatever.
Here's what my tables look like:
mysql> select * from parent;
+----+------+------+------+
| id | a | b | c |
+----+------+------+------+
| 1 | 1 | 1 | NULL |
| 2 | 2 | 2 | NULL |
| 3 | 3 | 3 | NULL |
+----+------+------+------+
3 rows in set (0.00 sec)
And a portion of child. Parent 1 has noo children:
mysql> select * from child;
+----+-----------+------+------+------+------+
| id | parent_id | a | b | c | d |
+----+-----------+------+------+------+------+
. . . .
| 18 | 3 | NULL | NULL | NULL | NULL |
| 19 | 2 | NULL | NULL | NULL | NULL |
| 20 | 2 | NULL | NULL | NULL | NULL |
| 21 | 3 | NULL | NULL | NULL | NULL |
| 22 | 2 | NULL | NULL | NULL | NULL |
| 23 | 2 | NULL | NULL | NULL | NULL |
| 24 | 3 | NULL | NULL | NULL | NULL |
| 25 | 2 | NULL | NULL | NULL | NULL |
+----+-----------+------+------+------+------+
24 rows in set (0.00 sec)
And the view gives us this:
mysql> select * from parent_top_3;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
+----+------+------+------+-------+---------+---------+
3 rows in set (0.21 sec)
The explain plan for the view is only slightly hairy:
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ALL | PRIMARY | NULL | NULL | NULL | 24 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ALL | NULL | NULL | NULL | NULL | 24 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+------+-------------+
However, if we add an index for parent_fks,it gets a better:
mysql> create index pid on child(parent_id);
mysql> explain select * from parent_top_3;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 3 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 2 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 2 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (0.04 sec)
As noted above, this begins to fall apart when the number of parent rows is few as 100, even if we index into parent using its primary key:
mysql> select * from parent_top_3 where id < 10;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 1 | 1 | 1 | NULL | NULL | NULL | NULL |
| 2 | 2 | 2 | NULL | 25 | 23 | 22 |
| 3 | 3 | 3 | NULL | 24 | 21 | 18 |
| 4 | NULL | 1 | NULL | 65 | 64 | 63 |
| 5 | NULL | 2 | NULL | 73 | 72 | 71 |
| 6 | NULL | 3 | NULL | 113 | 112 | 111 |
| 7 | NULL | 1 | NULL | 209 | 208 | 207 |
| 8 | NULL | 2 | NULL | 401 | 400 | 399 |
| 9 | NULL | 3 | NULL | 785 | 784 | 783 |
+----+------+------+------+-------+---------+---------+
9 rows in set (1 min 3.11 sec)
(Note that I intentionally test on a slow machine, with data saved on a slow flash disk.)
Here's the explain, looking for exactly one id (and the first one, at that):
mysql> explain select * from parent_top_3 where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
| 2 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 1000 | |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 4 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | util.a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
5 rows in set (56.01 sec)
Over 56 seconds for one row, even on my slow machine, is two orders of magnitude unacceptable.
So can we save this query? It works, it's just too slow.
Here's the explain plan for the modified query. It looks as bad or worse:
mysql> explain select * from parent_top_3a where id = 1;
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | Using where |
| 2 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 4 | DERIVED | <derived6> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 6 | DERIVED | a | ALL | NULL | NULL | NULL | NULL | 100 | |
| 7 | DEPENDENT SUBQUERY | child | ref | pid | pid | 5 | util.a.id | 179 | Using where |
| 5 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
| 3 | DEPENDENT SUBQUERY | child | ref | PRIMARY,pid | pid | 5 | a.id | 179 | Using where |
+----+--------------------+------------+------+---------------+------+---------+-----------+------+-------------+
7 rows in set (0.05 sec)
But it completes three orders of magnitude faster, in 1/20th of a second!
How do we get to the much speedier parent_top_3a? We create three views, each one dependent on the previous one:
create view parent_top_1 as
select a.*,
(select max(id) from child where parent_id = a.id)
as maxid
from parent a;
create view parent_top_2 as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxid)
as maxidm1
from parent_top_1 a;
create view parent_top_3a as
select a.*,
(select max(id) from child where parent_id = a.id and id < a.maxidm1)
as maxidm2
from parent_top_2 a;
Not only does this work much more quickly, it's legal on RDBMSes other than MySQL.
Let's increase the number of parent rows to 12800, the number of child rows to 1536 (most blog posts don't get comments, right? ;) )
mysql> select * from parent_top_3a where id >= 20 and id < 40;
+----+------+------+------+-------+---------+---------+
| id | a | b | c | maxid | maxidm1 | maxidm2 |
+----+------+------+------+-------+---------+---------+
| 39 | NULL | 2 | NULL | NULL | NULL | NULL |
| 38 | NULL | 1 | NULL | NULL | NULL | NULL |
| 37 | NULL | 3 | NULL | NULL | NULL | NULL |
| 36 | NULL | 2 | NULL | NULL | NULL | NULL |
| 35 | NULL | 1 | NULL | NULL | NULL | NULL |
| 34 | NULL | 3 | NULL | NULL | NULL | NULL |
| 33 | NULL | 2 | NULL | NULL | NULL | NULL |
| 32 | NULL | 1 | NULL | NULL | NULL | NULL |
| 31 | NULL | 3 | NULL | NULL | NULL | NULL |
| 30 | NULL | 2 | NULL | 1537 | 1536 | 1535 |
| 29 | NULL | 1 | NULL | 1529 | 1528 | 1527 |
| 28 | NULL | 3 | NULL | 1513 | 1512 | 1511 |
| 27 | NULL | 2 | NULL | 1505 | 1504 | 1503 |
| 26 | NULL | 1 | NULL | 1481 | 1480 | 1479 |
| 25 | NULL | 3 | NULL | 1457 | 1456 | 1455 |
| 24 | NULL | 2 | NULL | 1425 | 1424 | 1423 |
| 23 | NULL | 1 | NULL | 1377 | 1376 | 1375 |
| 22 | NULL | 3 | NULL | 1329 | 1328 | 1327 |
| 21 | NULL | 2 | NULL | 1281 | 1280 | 1279 |
| 20 | NULL | 1 | NULL | 1225 | 1224 | 1223 |
+----+------+------+------+-------+---------+---------+
20 rows in set (1.01 sec)
Note that these timings are for MyIsam tables; I'll leave it to someone else to do timings on Innodb.
But using Postgresql, on a similar but not identical data set, we get similar timings on where predicates involving parent's columns:
postgres=# select (select count(*) from parent) as parent_count, (select count(*)
from child) as child_count;
parent_count | child_count
--------------+-------------
12289 | 1536
postgres=# select * from parent_top_3a where id >= 20 and id < 40;
id | a | b | c | maxid | maxidm1 | maxidm2
----+---+----+---+-------+---------+---------
20 | | 18 | | 1464 | 1462 | 1461
21 | | 88 | | 1463 | 1460 | 1457
22 | | 72 | | 1488 | 1486 | 1485
23 | | 13 | | 1512 | 1510 | 1509
24 | | 49 | | 1560 | 1558 | 1557
25 | | 92 | | 1559 | 1556 | 1553
26 | | 45 | | 1584 | 1582 | 1581
27 | | 37 | | 1608 | 1606 | 1605
28 | | 96 | | 1607 | 1604 | 1601
29 | | 90 | | 1632 | 1630 | 1629
30 | | 53 | | 1631 | 1628 | 1625
31 | | 57 | | | |
32 | | 64 | | | |
33 | | 79 | | | |
34 | | 37 | | | |
35 | | 60 | | | |
36 | | 75 | | | |
37 | | 34 | | | |
38 | | 87 | | | |
39 | | 43 | | | |
(20 rows)
Time: 91.139 ms

Sounds like you just want the LIMIT clause for a SELECT statement:
SELECT comment_text, other_stuff FROM comments WHERE post_id = POSTID ORDER BY comment_time DESC LIMIT 3;
You'll have to run this query once per post you want to show comments for. There are a few ways to get around that, if you're willing to sacrifice maintainability and your sanity in the Quest for Ultimate Performance:
As above, one query per post to retrieve comments. Simple, but probably not all that fast.
Retrieve a list of post_ids that you want to show comments for, then retrieve all comments for those posts, and filter them client-side (or you could do it server-side if you had windowing functions, I think, though those aren't in MySQL). Simple on the server side, but the client-side filtering will be ugly, and you're still moving a lot of data from server to client, so this probably won't be all that fast either.
As #1, but use an unholy UNION ALL of as many queries as you have posts to display, so you're running one abominable query instead of N small ones. Ugly, but it'll be faster than options 1 or 2. You'll still have to do a bit of filtering client-side, but careful writing of the UNION will make that much easier than the filtering required for #2, and no wasted data will be sent over the wire. It'll make for an ugly query, though.
Join the posts and comments table, partially pivoting the comments. This is pretty clean if you only need one comment, but if you want three it'll get messy quickly. Great on the client side, but even worse SQL than #3, and probably harder for the server, to boot.
At the end of the day, I'd go with option 1, the simple query above, and not worry about the overhead of doing it once per post. If you only needed one comment, then the join option might be acceptable, but you want three and that rules it out. If windowing functions ever get added to MySQL (they're in release 8.4 of PostgreSQL), option 2 might become palatable or even preferable. Until that day, though, just pick the simple, easy-to-understand query.

Although there may be a clever way to get this in one query with no schema changes I'm guessing it wouldn't be performant anyway. Edit: Looks like tpdi has the clever solution. It looks potentially pretty fast, but I'd be curious to see a benchmark on specific databases.
Given the constraints of high performance and minimal data transfer I have two suggestions.
Solution with no schema changes or maintenance
First:
SELECT * FROM Posts
Collect the ids, then:
SELECT id FROM Replies WHERE post_id IN (?) ORDER BY id DESC
Finally, loop through those ids, grabbing only the first 3 for each post_id, then do:
SELECT * FROM Replies WHERE post_id IN (?)
More efficient solution if you are willing to maintain a few cache columns
The second solution is assuming that there are far more reads than writes, you can minimize lookups by storing the last three comment ids on the Posts table every time you add a Reply. In that case you would simply add three columns last_reply_id, second_reply_id, third_reply_id or some such. Then you can look up with either two queries like:
SELECT * FROM Posts
Collect the ids from those fields, then:
SELECT * FROM Replies WHERE post_id IN (?)
If you have those fields you could also manually construct a triple join, which would get the data in one query, although the field list would be quite verbose. Something like
SELECT posts.*, r1.title, r2.title ... FROM Posts
LEFT JOIN Replies as r1 ON Posts.last_reply_id = Replies.id
LEFT JOIN Replies as r2 ON Posts.second_reply_id = Replies.id
...
Which you prefer probably depends on your ORM or language.

Related

How to fill forward time series data in Postgres

I am looking to join three tables together and fill forward null values on the resulting table.
Three tables:
Table 1 (raw.fb_historical_data) - this is the main table on which I would like to join the other two on to. Each row of this table is related to one or more rows in the other two tables through a combination of columns id, clk and timestamp (mkt_id and row_id in the other tables).
+---------------------+-----+-----+--------------+
| timestamp | clk | id | some_columns |
+---------------------+-----+-----+--------------+
| 2016-06-19 06:11:13 | 123 | 126 | a |
| 2016-06-19 06:16:13 | 124 | 127 | b |
| 2016-06-19 06:21:13 | 234 | 126 | c |
| 2016-06-19 06:41:13 | 456 | 127 | d |
| ... | ... | ... | ... |
+---------------------+-----+-----+--------------+
Table 2 (raw.fb_runner_changes) - this table essentially gives price changes for a wide range of different markets
+---------------------+--------+--------+-------+
| timestamp | row_id | mkt_id | price |
+---------------------+--------+--------+-------+
| 2016-06-19 06:11:13 | 123 | 126 | 1 |
| 2016-06-19 06:21:13 | 123 | 126 | 2 |
| 2016-06-19 06:41:13 | 123 | 126 | 3 |
| 2016-06-06 18:54:06 | 124 | 127 | 1 |
| 2016-06-06 18:56:06 | 124 | 127 | 2 |
| 2016-06-06 18:57:06 | 124 | 127 | 3 |
| ... | ... | ... | ... |
+---------------------+--------+--------+-------+
Table 3 (raw.fb_runners) - a table with extra information about market changes that I would like to join
+---------------------+--------+--------+---------------+
| timestamp | row_id | mkt_id | other_columns |
+---------------------+--------+--------+---------------+
| 2016-06-19 06:15:13 | 234 | 126 | ab |
| 2016-06-19 06:31:13 | 234 | 126 | cd |
| 2016-06-19 06:56:13 | 234 | 126 | ef |
| 2016-06-06 18:54:06 | 456 | 127 | gh |
| 2016-06-06 18:56:06 | 456 | 127 | jk |
| 2016-06-06 18:57:06 | 456 | 127 | lm |
| ... | ... | ... | ... |
+---------------------+--------+--------+---------------+
Essentially what I want to do is fill NULL information forward (ordered by timestamp) while grouping by market id.
So far, I have tried to join the tables together using
SELECT *
FROM raw.fb_historical_data AS h
LEFT JOIN raw.fb_runner_changes AS rc
ON rc.row_id = h.clk
AND rc.timestamp = h.timestamp
AND rc.mkt_id = h.id
LEFT JOIN raw.fb_runners AS r
ON r.row_id = h.clk
AND r.timestamp = h.timestamp
AND r.mkt_id = h.id
Which has worked as intended, though now there are nulls in the resulting dataset which i'd like to fill in with the last available value for that market.
With some of the other SQL dialects, fill forward could be done using the window function last_value in combination with the instruction ignore nulls.
Since this is not supported in PostgreSQL (check the note at the bottom of this page), we are using a 2 steps work-around.
select ts, val, val_seq, min(val) over (partition by val_seq) val_fill_fw
from (select ts, val, count(val) over(order by ts) as val_seq
from t
) t
-
+----+----------+---------+-------------+
| ts | val | val_seq | val_fill_fw |
+----+----------+---------+-------------+
| 1 | (null) | 0 | (null) |
| 2 | (null) | 0 | (null) |
| 3 | hello | 1 | hello |
| 4 | (null) | 1 | hello |
| 5 | (null) | 1 | hello |
| 6 | darkness | 2 | darkness |
| 7 | my | 3 | my |
| 8 | (null) | 3 | my |
| 9 | old | 4 | old |
| 10 | (null) | 4 | old |
| 11 | (null) | 4 | old |
| 12 | (null) | 4 | old |
| 13 | friend | 5 | friend |
| 14 | (null) | 5 | friend |
+----+----------+---------+-------------+
SQL Fiddle
This seems to correctly do 'forward fill' in postgres. However I am a postgres newbie so I would appreciate feedback if it's wrong.
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select id, (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example

Oracle SQL Left join same table unknown amount of times

I have this table
| old | new |
|------|-------|
| a | b |
| b | c |
| d | e |
| ... | ... |
| aa | bb |
| bb | ff |
| ... | ... |
| 11 | 33 |
| 33 | 523 |
| 523 | 4444 |
| 4444 | 21444 |
The result I want to achieve is
| old | newest |
|------|--------|
| a | e |
| b | e |
| d | e |
| ... | |
| aa | ff |
| bb | ff |
| ... | |
| 11 | 21444 |
| 33 | 21444 |
| 523 | 21444 |
| 4444 | 21444 |
I can hard code the query to get the result that I want.
SELECT
older.old,
older.new,
newer.new firstcol,
newer1.new secondcol,
…
newerX-1.new secondlastcol,
newerX.new lastcol
from Table older
Left join Table newer
on older.old = newer.new
Left join Table newer1
on newer.new = newer1.old
…
Left join Table newerX-1
on newerX-2.new = newerX-1.old
Left join Table newerX
on newerX-1.new = newerX.old;
and then just take the first value from the right that is not null.
Illustrated here:
| old | new | firstcol | secondcol | thirdcol | fourthcol | | lastcol |
|------|-------|----------|-----------|----------|-----------|-----|---------|
| a | b | c | e | null | null | ... | null |
| b | c | e | null | null | null | ... | null |
| d | e | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| aa | bb | ff | null | null | null | ... | null |
| bb | ff | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| 11 | 33 | 523 | 4444 | 21444 | null | ... | null |
| 33 | 523 | 4444 | 21444 | null | null | ... | null |
| 523 | 4444 | 21444 | null | null | null | ... | null |
| 4444 | 21444 | null | null | null | null | ... | null |
The problem is that the length of "the replacement chain" is always changing (Can vary from 10 to 100).
There must be a better way to do this?
What you are looking for is a recursive query. Something like this:
with cte (old, new, lev) as
(
select old, new, 1 as lev from mytable
union all
select m.old, cte.new, cte.lev + 1
from mytable m
join cte on cte.old = m.new
)
select old, max(new) keep (dense_rank last order by lev) as new
from cte
group by old
order by old;
The recursive CTE creates all iterations (you can see this by replacing the query by select * from cte). And in the final query we get the last new per old with Oracle's KEEP LAST.
Rextester demo: http://rextester.com/CHTG34988
I'm trying to understand how you group your rows to determine different "newest" values. Are these the groupings you want based on the old field?
Group 1 - one letter (a, b, d)
Group 2 - two letters (aa, bb)
Group 3 - any number (11, 33, 523, 4444)
Is this correct? If so, you just need to group them by an expression and then use a window function MAX(). Something like this:
SELECT
"old",
MAX() OVER(PARTITION BY MyGrouping) AS newest
FROM (
SELECT
"old",
CASE
WHEN NOT IS_NUMERIC("old") THEN 'string' || CHAR_LENGTH("old") -- If string, group by string length
ELSE 'number' -- Otherwise, group as a number
END AS MyGrouping
FROM MyTable
) src
I don't know if Oracle has equivalents of the IS_NUMERIC and CHAR_LENGTH functions, so you need to check on that. If not, replace that expression with something similar, like this:
https://www.techonthenet.com/oracle/questions/isnumeric.php

db2, roll up unknown number of rows from case statement result

I am trying to write a query where I can concatenate some rows into a single column based on the result of the case statement in DB2 v9.5
The contractId can be a variable number of rows as well.
Given I have the following table structure
Table1
+------------+------------+------+
| ContractId | Reference | Code |
+------------+------------+------+
| 12 | P123456789 | A |
| 12 | A987654321 | B |
| 12 | 9995559971 | C |
| 12 | 3215654778 | D |
| 13 | abcdef | A |
| 15 | asdfa | B |
| 37 | 282jd | B |
| 89 | asdf82 | C |
+------------+------------+------+
I would like to get the output of the result like so
+-------------+-----------------------+------------------------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------------------------+
| 12 | P123456789 | A987654321, 9995559971, 3215654778 |
| 13 | abcdef | asdfa, 282jd, asdf82 |
+-------------+-----------------------+------------------------------------+
I've tried queries like
select t1.contract_id,
max(case when t1.code = A then t1.reference end) as "reference with code a",
max(case when t1.code in ('B','C','D') then t1.reference end) as 'other references
from table t1
group by t1.contractId
however, this is still giving me an output like
+-------------+-----------------------+------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------+
| 12 | P123456789 | null |
| 12 | null | A987654321 |
| 12 | null | 9995559971 |
| 12 | null | 3215654778 |
+-------------+-----------------------+------------------+
I've also attempted using some of the XML Agg functions but can't seem to get it to format the way I want it too.

Joining two tables and calculating divide-SUM from the resulting table in SQL Server

I have one table that looks like this:
+---------------+---------------+-----------+-------+------+
| id_instrument | id_data_label | Date | Value | Note |
+---------------+---------------+-----------+-------+------+
| 1 | 57 | 1.10.2010 | 200 | NULL |
| 1 | 57 | 2.10.2010 | 190 | NULL |
| 1 | 57 | 3.10.2010 | 202 | NULL |
| | | | | |
+---------------+---------------+-----------+-------+------+
And the other that looks like this:
+----------------+---------------+---------------+--------------+-------+-----------+------+
| id_fundamental | id_instrument | id_data_label | quarter_code | value | AnnDate | Note |
+----------------+---------------+---------------+--------------+-------+-----------+------+
| 1 | 1 | 20 | 20101 | 3 | 28.2.2010 | NULL |
| 2 | 1 | 20 | 20102 | 4 | 1.8.2010 | NULL |
| 3 | 1 | 20 | 20103 | 5 | 2.11.2010 | NULL |
| | | | | | | |
+----------------+---------------+---------------+--------------+-------+-----------+------+
What I would like to do is to merge/join these two tables in one in a way that I get something like this:
+------------+--------------+--------------+----------+--------------+
| Date | Table1.Value | Table2.Value | AnnDate | quarter_code |
+------------+--------------+--------------+----------+--------------+
| 1.10.2010. | 200 | 3 | 1.8.2010 | 20102 |
| 2.10.2010. | 190 | 3 | 1.8.2010 | 20102 |
| 3.10.2010. | 202 | 3 | 1.8.2010 | 20102 |
| | | | | |
+------------+--------------+--------------+----------+--------------+
So the idea is to order them by Date from Table1 and since Table2 Values only change on the change of AnnDate we populate the Resulting table with same values from Table2.
After that I would like to go through the resulting table and create another (Final table) with the following.
On Date 1.10.2010. take last 4 AnnDates (so it would be 1.8.2010. and f.e. 20.3.2010. 30.1.2010. 15.11.2009) and Table2 values on those AnnDate. Make SUM of those 4 values and then divide the Table1 Value with that SUM.
So we would get something like:
+-----------+---------------------------------------------------------------+
| Date | FinalValue |
+-----------+---------------------------------------------------------------+
| 1.10.2010 | 200/(Table2.Value on 1.8.2010+Table2.Value on 20.3.2010 +...) |
| | |
+-----------+---------------------------------------------------------------+
Is there any way this can be done?
EDIT:
Hmm yes now I see that I really didn't do a good job explaining it.
What I wanted to say is
I try INNER JOIN like this:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code
FROM TableOne
INNER JOIN TableTwo ON TableOne.id_intrument=TableTwo.id_instrument WHERE TableOne.id_data_label = somevalue AND TableTwo.id_data_label = somevalue AND date > xxx AND date < yyy
And this inner join returns 2620*40 rows which means for every AnnDate from table2 it returns all Date from table1.
What I want is to return 2620 values with Dates from Table1
Values from table1 on that date and Values from table2 that respond to that period of dates
f.e.
Table1:
+-------+-------+
| Date | Value |
+-------+-------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+-------+-------+
Table2
+-------+---------+
| Value | AnnDate |
+-------+---------+
| x | 1 |
| y | 4 |
+-------+---------+
Resulting table:
+-------+---------+---------+
| Date | ValueT1 | ValueT2 |
+-------+---------+---------+
| 1 | a | x |
| 2 | b | x |
| 3 | c | x |
| 4 | d | y |
+-------+---------+---------+
You need a JOIN statement for your first query. Try:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code FROM TableOne
INNER JOIN TableTwo
ON TableOne.id_intrument=TableTwo.id_instrument;

INSERT INTO..SELECT..ON DUPLICATE KEYS ambiguous ids

I have the following table:
mysql> SELECT * FROM `bright_promotion_earnings`;
+----+----------+------------+----------+-------+
| id | promoter | generation | turnover | payed |
+----+----------+------------+----------+-------+
| 1 | 4 | 1 | 10 | 0 |
| 3 | 4 | 5 | 100 | 0 |
| 4 | 4 | 3 | 10000 | 1 |
| 5 | 4 | 3 | 200 | 0 |
+----+----------+------------+----------+-------+
4 rows in set (0.00 sec)
There is one unique key(promoter, generation, payed):
+---------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| bright_promotion_earnings | 0 | promoter_2 | 1 | promoter | A | 2 | NULL | NULL | YES | BTREE | |
| bright_promotion_earnings | 0 | promoter_2 | 2 | generation | A | 4 | NULL | NULL | | BTREE | |
| bright_promotion_earnings | 0 | promoter_2 | 3 | payed | A | 4 | NULL | NULL | | BTREE | |
+---------------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
Now I want to mark every earning for a promoter as paid by updating the same entry with paid=1 (if it exists).
So if I wanted to mark the earnings of promoter 4 as paid this is what the table should look like:
+----+----------+------------+----------+-------+
| id | promoter | generation | turnover | payed |
+----+----------+------------+----------+-------+
| 4 | 4 | 3 | 10200 | 1 |
| 6 | 4 | 5 | 100 | 1 |
| 7 | 4 | 1 | 10 | 1 |
+----+----------+------------+----------+-------+
3 rows in set (0.00 sec)
This is my current approach(without the DELETE which is trivial):
INSERT INTO
bright_promotion_earnings
(
promoter,
generation,
turnover,
payed
)
SELECT
commission.promoter,
commission.generation,
commission.turnover as turnover2,
'1' as payed
FROM
bright_promotion_earnings as commission
WHERE
promoter=4
AND payed=0
ON DUPLICATE KEY UPDATE turnover=turnover+turnover2;
But mysql keeps telling me that turnover is ambiguous:
#1052 - Column 'turnover' in field list is ambiguous
Does anybody have a hint as I can't alias the table that I'm inserting to.
How can I give the table I'm inserting to a name so that mysql can identify the column?
Thanks in advance.
you have a turnover field in both tables, so mysql can't decide which one you mean at the last row.