SQL index relative to another field - sql

I have, as part of a query, a bunch of distinct pairs of values:
a d
a e
a f
b g
b h
c i
I'd like to be able to calculate an counter relative to the first field:
a d 1
a e 2
a f 3
b g 1
b h 2
c i 1
I can't use the position in the temporary table - apart from anything else it goes too high, whereas the value I need can't go over 2 digits (and there isn't going to be more than 50 entries with the same first field. Are there any methods or techniques to help?
Thanks for any help!

select t1.c1 , t1.c2 , count(t2.c1) cnt
from mytable t1
join mytable t2 on t1.c1 = t2.c1 and t1.c2 >= t2.c2
group by t1.c1, t1.c2
order by t1.c1, cnt
Demo
Explanation
This query assumes that the pair (c1,c2) is unique.
To rank each row (c1,c2) the query counts the number of rows within the group c1 where c2 is less than or equal to c2. For example, for (a,e), there are 2 rows within the group a that are less than or equal to e (namely d and e).

You didn't specify your DBMS, so this is ANSI SQL:
select a,
b,
row_number() over (partition by a order by b) as idx
from the_table;
SQLFiddle: http://sqlfiddle.com/#!15/4cf96/1
row_number() is a window function which will generate a unique number based on the "grouping" and ordering defined with the partition by clause. The Postgres manual has a nice introduction to window functions: http://www.postgresql.org/docs/current/static/tutorial-window.html
This is going to be much faster than a self join

Related

Aggregate and concatenate from two tables based on comparison between columns

I have two tables like this:
Table1
A T
a1 t1
a2 t2
a3 t3
a4 t4
a5 t5
...
...
Table2
E T
e1 t1
e2 t2
e3 t3
e4 t4
e5 t5
...
...
what I wanted to achieve is this:
Table 3
E A'
e1 a1,a2,a3
e2 a4,a5,a6
...
...
The aggregation A' is done like this: In table 2 for each e there is a value in column T : t and with that t you look for the last 3 values in Table 1 that are less than the t in question. So a1, a2, a3 are values of A whose t values are less than t1 in Table 2 whose E is e1.
I know that I could write two queries for this like this:
ResultSet (rt) -> select t from e
and then iterate ResultSet and do something like this :
select A from Table1 where t < rt[i] limit 3 - not sure how to concatenate here :)
but I m pretty sure this is utterly inefficient. There should be a better way to do this.
I m working with Postgresql.
If it had been a dataframe from a file I would use python's pandas. Also I know that python has read_sql but the tables are very huge I don't want to load the whole table in memory which I think it won't but not sure either - anyway its a separate story.
How do we solve this in SQL? Any ideas please.
In table 2 for each e there is a value in column T : t and with that t you look for the last 3 values in Table 1 that are less than the t in question.
I don't understand the results follow this logic. But based on your description, you can use a lateral join:
select t2.*, t1.the_as
from t2 left join lateral
(select array_agg(t1.a) as the_as
from (select t1.*
from t1
where t1.T <= t2.T
order by t1.T desc
limit 3
) t1
) t1
on 1=1;
Note that this uses arrays rather than strings because I think arrays are a better data structure for storing multiple values. That said, you can just use string_agg() instead, if you really want a string. The syntax would be string_agg(t1.a, ',').

Joining two tables together strictly by order

If I have two tables(t1, t2), each with one column
t1
letters
a
b
c
t2
nums
1
2
3
Is it possible to "join" the two together in a way that produces a two-column result set that looks like this:
letters nums
a 1
b 2
c 3
Requirements for the solution:
Must combine each table's data in a specified order, so being able
to order each table's data before joining
Doesn't use any functions, like row_number, to add an extra column to join on
Bonus points:
- If two tables have different row counts, final result set is count of the max of the two tables, and the "missing" data is nulls.
Just wondering if this is possible given the constraints.
You want to use row_number(). However, SQL tables represent unordered sets, so you need a column that specifies the ordering.
The idea is:
select l.letter, n.number
from (select l.*, row_number() over (order by ?) as seqnum
from letters l
) l join
(select n.*, row_number() over (order by ?) as seqnum
from numbers n
) n
on l.seqnum = n.seqnum;
The ? is for the column that specifies the ordering.
If you want all rows in both tables, use full join rather than an inner join.
EDTI:
row_number() is the obvious solution, but you can do this with a correlated subquery assuming the values are unique:
select l.letter, n.number
from (select l.*,
(select count(*) from letters l2 where l2.letter <= l.letter) as seqnum
from letters l
) l join
(select n.*,
(select count(*) from numbers n2 where n2.num <= n.num) as seqnum
from numbers n
) n
on l.seqnum = n.seqnum;
I find the restriction on not using row_number() to be rather absurd, given that it is an ISO/ANSI standard function supported by almost all databases.
If your version of SQL supports an ASCII function (which can generate an ASCII code for each lowercase letter), then you may join on the ASCII code shifted downwards by 96:
SELECT
t1.letters,
t2.nums
FROM table1 t1
INNER JOIN table2 t2
ON t2.nums = ASCII(t1.letters) - 96;
Demo

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

Matching rows: Do I need a cursor? (SQL Server 2005)

I have 2 tables, call them G and T, on which I am selecting records based upon matching on a number of fields.
SELECT
g.ID, t.ID
FROM
g JOIN t
ON (g.Field1 = t.Field1
AND g.Field2 = t.Field2
AND .... )
There can be more than one record that matches each side e.g. rows t1 and t2 are identical on the fields used for matching, as are g1 and g2 and they match each other, giving
t1 g1
t1 g2
t2 g1
t2 g2
(the actual ids are ints, but you get the idea)
What we want is for each T record to match to only one G record (we don't care which as long as they are different ones) e.g. either of
t1 g1
t2 g2
OR
t1 g2
t2 g1
would be acceptable, but NOT
t1 g1
t2 g1
And not both resultsets - we only want the 2 rows total (in this example).
There might be (say) 30,000 rows in the initial selection from each table. Not everything will have a match, this is fine.
Can this be done set-wise or do I have to use a cursor?
EDITED in response to answer.
You can use ROW_NUMBER() to assign some arbitrary identifiers to do the matching on:
;With TOrdered as (
select ID,Field1,Field2,
ROW_NUMBER() OVER (PARTITION BY Field1,Field2 ORDER BY ID) as rn
from T
), GOrdered as (
select ID,Field1,Field2,
ROW_NUMBER() OVER (PARTITION BY Field1,Field2 ORDER BY ID) as rn
from G
)
SELECT
g.ID, t.ID
FROM
GOrdered g
JOIN
TOrdered t
ON (g.Field1 = t.Field1
AND g.Field2 = t.Field2
AND g.rn = t.rn )
(If there are mismatches on counts between the two tables, some rows will not appear in the final result at all - but you haven't really indicated whether they should or not, or how they should be dealt with)

Is possible have different conditions for each row in a query?

How I can select a set of rows where each row match a different condition?
Example:
Supposing I have a table with a column called name, I want the result ONLY IF the first row name matches 'A', the second row name matches 'B' and the third row name matches 'C'.
Edit:
I want to do this to work without a fixed size, but in a way I can define the sequence like R,X,V,P,T and it matches the sequence, each one in a row, but in the order.
you can, but probably not in a way you would want:
if your table has a numeric id field, that is incremented with each row, you can self join that table 3 times (lets say as "a", "b" and "c") and use the join condition a.id + 1 = b.id and b.id + 1 = c.id and put you filter in a where clause like: a.name = 'A' AND b.name = 'B' AND c.name = 'C'
but don't expect performance ...
Assuming that You know how to provide a row number to your rows (ROW_NUMBER() in SQL Server, for instance), You can create a lookup (match) table and join on it. See below for explanation:
LookupTable:
RowNum Value
1 A
2 B
3 C
Your SourceTable source table (assuming You already added RowNum to it-in case You didn't, just introduce subquery for it (or CTE for SQL Server 2005 or newer):
RowNum Name
-----------
1 A
2 B
3 C
4 D
Now You need to inner join LookupTable with your SourceTable on LookupTable.RowNum = SourceTable.RowNum AND LookupTable.Name = SourceTable.Name. Then do a left join of this result with LookupTable on RowNum only. If there is LookupTable.RowNum IS NULL in final result then You know that there is no complete match on at least one row.
Here is code for joins:
SELECT T.*, LT2.RowNum AS Matched
FROM LookupTable LT2
LEFT JOIN
(
SELECT ST.*
FROM SourceTable ST
INNER JOIN LookupTable LT ON LT.RowNum = ST.RowNum AND LT.Name = ST.Name
) T
ON LT2.RowNum = T.RowNum
Result set of above query will contain rows with Matched IS NULL if row is not matching condition from LookupTable table.
I suppose you could do a sub query for each row, but it wouldn't perform well or scale well at all and would be hard to maintain.
This may be close to what your after... but I need to know where you're getting your values for A, B, C etc...
Select [insert your fields here]
FROM
(Select T1.Name, T1.Age, RowNum as t1RowNum from T T1 order by name) T1O
Full Outer JOIN
(Select T2.Name, T2.Age, RowNum as T2rowNum From T T2 order By name) T2O
ON T1O.T1RowNum+1 = T2O.T2RowNum