How to match variable data in SQL Server - sql

I need to map a many-to-many relationship between two flat tables. Table A contains a list of possible configurations (where each column is a question and the cell value is the answer). NULL values denote that the answer does not matter. Table B contains actual configurations with the same columns.
Ultimately, I need the final results to show which configurations are mapped between table B and A:
Example
ActualId | ConfigId
---------+---------
5 | 1
6 | 1
8 | 2
. | .
. | .
. | .
N | M
To give a simple example of the tables and data I'm working with, the first table would look like such:
Table A
--------
ConfigId | Size | Color | Cylinders | ... | ColumnN
---------+------+-------+-----------+-----+--------
1 | 3 | | 4 | ... | 5
2 | 4 | 5 | 5 | ... | 5
3 | | 5 | | ... | 5
And Table B would look like this:
Table B
-------
ActualId | Size | Color | Cylinders | ... | ColumnN
---------+------+-------+-----------+-----+--------
1 | 3 | 1 | 4 | ... | 5
2 | 3 | 8 | 4 | ... | 5
3 | 4 | 5 | 5 | ... | 5
4 | 7 | 5 | 6 | ... | 5
Since the NULL values denote that any value can work, the expected result would be:
Expected
---------
ActualId | ConfigId
---------+---------
1 | 1
2 | 1
3 | 2
3 | 3
4 | 3
I'm trying to figure out the best way to go about matching the actual data which has over a hundred columns. I know trying to check each and every column for NULL values is absolutely wrong and will not perform well. I'm really fascinated with this problem and would love some help to find the best way to tackle this.

So, this joins table a on size, color and cylinders.
The size match will be A against B:
If A.SIZE is null, the compare will B.SIZE=B.SIZE which will always return true.
If A.SIZE is not null, the compare will be A.SIZE=B.SIZE which will only be true if they match.
The matching on color and cylinders are similar.
SELECT * FROM TABLEA A
INNER JOIN TABLEB B ON ISNULL(A.SIZE, B.SIZE)=B.SIZE
AND ISNULL(A.COLOR, B.COLOR)=B.COLOR
AND ISNULL(A.CYLINDERS, B.CYLINDERS)=B.CYLINDERS

Related

Selecting rows of a table in which some special values exist

The first column of a table contains some Ids and the values in the other columns are the numbers corresponded to those Ids. Considering some special numbers, we want to select the rows that this special numbers are among the corresponded numbers to Ids. For example, let we have the following table and the special numbers are 3,5. We want to select the rows in which 2,5 are among the columns except Id:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 2 | 1 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
| 4 | 3 | 5 | 6 |
Therefore, we want to have the following table as the result:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
Would you please introduce me a function in Excel or a query in SQL to do the above selection?
SELECT id,
[corresponded numbers]
FROM TableName
WHERE (charIndex('2', [corresponded numbers]) > 0
AND charIndex('5', [corresponded numbers]) > 0)

SQL: Add values according to index columns only for lines sharing an id

Yesterday I asked this question: SQL: How to add values according to index columns but I found out that my problem is a bit more complicated:
I have an array like this
id | value| position | relates_to_position |type
19 | 100 | 2 | NULL | 1
19 | 50 | 6 | NULL | 2
19 | 20 | 7 | 6 | 3
20 | 30 | 3 | NULL | 2
20 | 10 | 4 | 3 | 3
From this I need to create the resulting table, which adds all the lines where the relates_to_position value matches the position value, but only for lines sharing the same id!
The resulting table should be
id | value| position |type
19 | 100 | 2 | 1
19 | 70 | 6 | 2
20 | 40 | 3 | 2
I am using Oracle 11. There is only one level of recursion, meaning a line would not refer to a line which has the relates_to_pos field set.
I think the following query will do this:
select id, coalesce(relates_to_position, position) as position,
sum(value) as value, min(type) as type
from t
group by id, coalesce(relates_to_position, position);

Joining tables in oracle

I have a master data table like this:
tableA
ID | tainfo1 | tainfo2
----------------------
1 | me | 100
2 | you | 200
3 | they | 300
and an attribute table like that:
tableB:
ID | type | tbinfo1 | tbinfo2
------------------------------
1 | 1 | good | 7
1 | 2 | bad | 5
2 | 2 | so&so | 6
3 | 1 | awesome | 10
In the attribute table i have a very small set of type and I would like to know if there's any chance to make data output like this.
ID | tainfo1 | tainfo2 | tbinfo1_type1 | tbinfo2_type1 | tbinfo1_type2 | tbinfo2_type2
-----------------------------------------------------------------------------------------
1 | me | 100 | good | 7 | bad | 5
2 | you | 200 | | | so&so | 6
3 | they | 300 | awesome | 10 | |
if all the attributes exists, all the columns are filled, like the record 1, also the _typeX column will appear blanks, like the record 2 for type1
I hope the question is clear,
Regards.
Join both tables and pivot result:
select *
from (
select id, tainfo1, tainfo2, type, tbinfo1, tbinfo2
from tableA join tableB using (id))
pivot (max(tbinfo1) t1, max(tbinfo2) t2 for type in (1 info1, 2 info2))
Output:
ID TAINFO1 TAINFO2 INFO1_T1 INFO1_T2 INFO2_T1 INFO2_T2
----- ---------- ---------- ---------- ---------- ---------- ----------
1 me 100 good 7 bad 5
2 you 200 so-so 6
3 they 300 awesome 10
SQLFiddle
This will work for defined number of values in column type. Also pivot is available from Oracle 11g version, for older versions use max(decode...) like here. If you need fully dynamical solution then please read articles: link1, link2.

ActiveRecord select records based on uniqueness of two attributes in combination

My table looks like this:
ID | Multiple | Itemlist_ID | Inventory_ID
----------------------------------------------------------------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 4 | 2
5 | 1 | 4 | 3
6 | 2 | 4 | 2
7 | 2 | 4 | 3
How do I retrieve records with unique combo of Multiple and Itemlist_ID? For example below:
ID | Multiple | Itemlist_ID | Inventory_ID
----------------------------------------------------------------
1 | 1 | 1 | 1
4 | 1 | 4 | 2
6 | 2 | 4 | 2
Note, this is retrieving for a View where the Inventory_ID won't be shown, so I'm not concerned whether I get back records [1,4,6] or [1,5,7] or [2,4,6]. Using a first command is fine.
You can perform a GROUP BY query in Activerecord using this syntax (replace Model with your class name):
Model.group('Multiple', 'Itemlist_ID').select('Multiple', 'Itemlist_ID')
This would retrieve all unique combinations of (Multiple, Itemlist_ID).
You could optionally add aggregate operations on columns other than the two grouped columns, such as SUM, AVG, COUNT, etc. For example, if you wanted to know how many records are in each group:
Model.group('Multiple', 'Itemlist_ID').select('Multiple', 'Itemlist_ID', 'COUNT(1) AS group_count')
This would add an attribute to the result called 'group_count' that would be the number of records contained in each group

In hive, how to do a calculation among 2 rows?

I have this table.
+------------------------------------------------------------+
| ks | time | val1 | val2 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 1 |
| B | 1 | 3 | 5 |
| A | 2 | 6 | 7 |
| B | 2 | 10 | 12 |
| A | 4 | 6 | 7 |
| B | 4 | 20 | 26 |
+------------------------------------------------------------+
What I want to get is for each row,
ks | time | val1 | val1 of next ts of same ks |
To be clear, result of above example should be,
+------------------------------------------------------------+
| ks | time | val1 | next.val1 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 6 |
| B | 1 | 3 | 10 |
| A | 2 | 6 | 6 |
| B | 2 | 10 | 20 |
| A | 4 | 6 | null |
| B | 4 | 20 | null |
+------------------------------------------------------------+
(I need the same next for value2 as well)
I tried a lot to come up with a hive query for this, but still no luck. I was able to write a query for this in sql as mentioned here (Quassnoi's answer), but couldn't create the equivalent in hive because hive doesn't support subqueries in select.
Can someone please help me achieve this?
Thanks in advance.
EDIT:
Query I tried was,
SELECT ks, time, val1, next[0] as next.val1 from
(SELECT ks, time, val1
COALESCE(
(
SELECT Val1, time
FROM myTable mi
WHERE mi.val1 > m.val1 AND mi.ks = m.ks
ORDER BY time
LIMIT 1
), CAST(0 AS BIGINT)) AS next
FROM myTable m
ORDER BY time) t2;
Your query seems quite similar to the "year ago" reporting that is ubiquitous in financial reporting. I think a LEFT OUTER JOIN is what you are looking for.
We join table myTable to itself, naming the two instances of the same table m and n. For every entry in the first table m we will attempt to find a matching record in n with the same ks value but an incremented value of time. If this record does not exist, all column values for n will be NULL.
SELECT
m.ks,
m.time,
m.val1,
n.val1 as next_val1,
m.val2,
n.val2 as next_val2
FROM
myTable m
LEFT OUTER JOIN
myTable n
ON (
m.ks = n.ks
AND
m.time + 1 = n.time
);
Returns the following.
ks time val1 next_val1 val2 next_val2
A 1 1 6 1 7
A 2 6 6 7 7
A 3 6 NULL 7 NULL
B 1 3 10 5 12
B 2 10 20 12 26
B 3 20 NULL 26 NULL
Hope that helps.
I find that using Hive custom map/reduce functionality works great to solve queries similar to this. It gives you the opportunity to consider a set of input and "reduce" to one (or more) results.
This answer discusses the solution.
The key is that you use CLUSTER BY to send all results with similar key value to the same reducer, hence same reduce script, collect accordingly, and then output the reduced results when the key changes, and start collecting for the new key.