How to sum rows in groups of 3? - sql

I have a table that looks like this:
id | amount
1 | 8
2 | 3
3 | 9
3 | 2
4 | 5
5 | 3
5 | 1
5 | 7
6 | 3
7 | 3
8 | 5
I need a query that returns the summed amount of rows grouped by every 3 consequent IDs. The result should be:
ids (not a necessary column, just to explain better) | amount
1,2,3 | 22
4,5,6 | 19
7,8 | 8
In my table, you can assume IDs are always consequent. So there can't be a 10 without existing a 9 too. But the same ID can also show up multiple times with different amounts (just like in my example above).

Assuming ID is a numeric data type.
Demo
SELECT max(id) maxID, SUM(Amount) as Amount
FROM TBLNAME
GROUP BY Ceiling(id/3.0)
ORDER BY maxID
Giving us:
+-------+--------+
| maxid | amount |
+-------+--------+
| 3 | 22 |
| 6 | 19 |
| 8 | 8 |
+-------+--------+
Doc Link: Ceiling
MaxID is included just so the order by makes sense and validation of totals can occur.
I used 3.0 instead of 3 and implicit casting to a decimal data type (a hack I know but it works) otherwise integer math takes place and the rounding when the division occurs provides a incorrect result.
Without the .0 on the 3.0 divisor we'd get:
+-------+--------+
| maxid | amount |
+-------+--------+
| 2 | 11 |
| 5 | 27 |
| 8 | 11 |
+-------+--------+
Ceiling() is used over floor() since floor() would not allow aggregation of 1-3 in the same set.

Related

Selecting rows of a table in which some special values exist

The first column of a table contains some Ids and the values in the other columns are the numbers corresponded to those Ids. Considering some special numbers, we want to select the rows that this special numbers are among the corresponded numbers to Ids. For example, let we have the following table and the special numbers are 3,5. We want to select the rows in which 2,5 are among the columns except Id:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 2 | 1 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
| 4 | 3 | 5 | 6 |
Therefore, we want to have the following table as the result:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
Would you please introduce me a function in Excel or a query in SQL to do the above selection?
SELECT id,
[corresponded numbers]
FROM TableName
WHERE (charIndex('2', [corresponded numbers]) > 0
AND charIndex('5', [corresponded numbers]) > 0)

How to match variable data in SQL Server

I need to map a many-to-many relationship between two flat tables. Table A contains a list of possible configurations (where each column is a question and the cell value is the answer). NULL values denote that the answer does not matter. Table B contains actual configurations with the same columns.
Ultimately, I need the final results to show which configurations are mapped between table B and A:
Example
ActualId | ConfigId
---------+---------
5 | 1
6 | 1
8 | 2
. | .
. | .
. | .
N | M
To give a simple example of the tables and data I'm working with, the first table would look like such:
Table A
--------
ConfigId | Size | Color | Cylinders | ... | ColumnN
---------+------+-------+-----------+-----+--------
1 | 3 | | 4 | ... | 5
2 | 4 | 5 | 5 | ... | 5
3 | | 5 | | ... | 5
And Table B would look like this:
Table B
-------
ActualId | Size | Color | Cylinders | ... | ColumnN
---------+------+-------+-----------+-----+--------
1 | 3 | 1 | 4 | ... | 5
2 | 3 | 8 | 4 | ... | 5
3 | 4 | 5 | 5 | ... | 5
4 | 7 | 5 | 6 | ... | 5
Since the NULL values denote that any value can work, the expected result would be:
Expected
---------
ActualId | ConfigId
---------+---------
1 | 1
2 | 1
3 | 2
3 | 3
4 | 3
I'm trying to figure out the best way to go about matching the actual data which has over a hundred columns. I know trying to check each and every column for NULL values is absolutely wrong and will not perform well. I'm really fascinated with this problem and would love some help to find the best way to tackle this.
So, this joins table a on size, color and cylinders.
The size match will be A against B:
If A.SIZE is null, the compare will B.SIZE=B.SIZE which will always return true.
If A.SIZE is not null, the compare will be A.SIZE=B.SIZE which will only be true if they match.
The matching on color and cylinders are similar.
SELECT * FROM TABLEA A
INNER JOIN TABLEB B ON ISNULL(A.SIZE, B.SIZE)=B.SIZE
AND ISNULL(A.COLOR, B.COLOR)=B.COLOR
AND ISNULL(A.CYLINDERS, B.CYLINDERS)=B.CYLINDERS

SQL: Add values according to index columns only for lines sharing an id

Yesterday I asked this question: SQL: How to add values according to index columns but I found out that my problem is a bit more complicated:
I have an array like this
id | value| position | relates_to_position |type
19 | 100 | 2 | NULL | 1
19 | 50 | 6 | NULL | 2
19 | 20 | 7 | 6 | 3
20 | 30 | 3 | NULL | 2
20 | 10 | 4 | 3 | 3
From this I need to create the resulting table, which adds all the lines where the relates_to_position value matches the position value, but only for lines sharing the same id!
The resulting table should be
id | value| position |type
19 | 100 | 2 | 1
19 | 70 | 6 | 2
20 | 40 | 3 | 2
I am using Oracle 11. There is only one level of recursion, meaning a line would not refer to a line which has the relates_to_pos field set.
I think the following query will do this:
select id, coalesce(relates_to_position, position) as position,
sum(value) as value, min(type) as type
from t
group by id, coalesce(relates_to_position, position);

ActiveRecord select records based on uniqueness of two attributes in combination

My table looks like this:
ID | Multiple | Itemlist_ID | Inventory_ID
----------------------------------------------------------------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 4 | 2
5 | 1 | 4 | 3
6 | 2 | 4 | 2
7 | 2 | 4 | 3
How do I retrieve records with unique combo of Multiple and Itemlist_ID? For example below:
ID | Multiple | Itemlist_ID | Inventory_ID
----------------------------------------------------------------
1 | 1 | 1 | 1
4 | 1 | 4 | 2
6 | 2 | 4 | 2
Note, this is retrieving for a View where the Inventory_ID won't be shown, so I'm not concerned whether I get back records [1,4,6] or [1,5,7] or [2,4,6]. Using a first command is fine.
You can perform a GROUP BY query in Activerecord using this syntax (replace Model with your class name):
Model.group('Multiple', 'Itemlist_ID').select('Multiple', 'Itemlist_ID')
This would retrieve all unique combinations of (Multiple, Itemlist_ID).
You could optionally add aggregate operations on columns other than the two grouped columns, such as SUM, AVG, COUNT, etc. For example, if you wanted to know how many records are in each group:
Model.group('Multiple', 'Itemlist_ID').select('Multiple', 'Itemlist_ID', 'COUNT(1) AS group_count')
This would add an attribute to the result called 'group_count' that would be the number of records contained in each group

In hive, how to do a calculation among 2 rows?

I have this table.
+------------------------------------------------------------+
| ks | time | val1 | val2 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 1 |
| B | 1 | 3 | 5 |
| A | 2 | 6 | 7 |
| B | 2 | 10 | 12 |
| A | 4 | 6 | 7 |
| B | 4 | 20 | 26 |
+------------------------------------------------------------+
What I want to get is for each row,
ks | time | val1 | val1 of next ts of same ks |
To be clear, result of above example should be,
+------------------------------------------------------------+
| ks | time | val1 | next.val1 |
+-------------+---------------+---------------+--------------+
| A | 1 | 1 | 6 |
| B | 1 | 3 | 10 |
| A | 2 | 6 | 6 |
| B | 2 | 10 | 20 |
| A | 4 | 6 | null |
| B | 4 | 20 | null |
+------------------------------------------------------------+
(I need the same next for value2 as well)
I tried a lot to come up with a hive query for this, but still no luck. I was able to write a query for this in sql as mentioned here (Quassnoi's answer), but couldn't create the equivalent in hive because hive doesn't support subqueries in select.
Can someone please help me achieve this?
Thanks in advance.
EDIT:
Query I tried was,
SELECT ks, time, val1, next[0] as next.val1 from
(SELECT ks, time, val1
COALESCE(
(
SELECT Val1, time
FROM myTable mi
WHERE mi.val1 > m.val1 AND mi.ks = m.ks
ORDER BY time
LIMIT 1
), CAST(0 AS BIGINT)) AS next
FROM myTable m
ORDER BY time) t2;
Your query seems quite similar to the "year ago" reporting that is ubiquitous in financial reporting. I think a LEFT OUTER JOIN is what you are looking for.
We join table myTable to itself, naming the two instances of the same table m and n. For every entry in the first table m we will attempt to find a matching record in n with the same ks value but an incremented value of time. If this record does not exist, all column values for n will be NULL.
SELECT
m.ks,
m.time,
m.val1,
n.val1 as next_val1,
m.val2,
n.val2 as next_val2
FROM
myTable m
LEFT OUTER JOIN
myTable n
ON (
m.ks = n.ks
AND
m.time + 1 = n.time
);
Returns the following.
ks time val1 next_val1 val2 next_val2
A 1 1 6 1 7
A 2 6 6 7 7
A 3 6 NULL 7 NULL
B 1 3 10 5 12
B 2 10 20 12 26
B 3 20 NULL 26 NULL
Hope that helps.
I find that using Hive custom map/reduce functionality works great to solve queries similar to this. It gives you the opportunity to consider a set of input and "reduce" to one (or more) results.
This answer discusses the solution.
The key is that you use CLUSTER BY to send all results with similar key value to the same reducer, hence same reduce script, collect accordingly, and then output the reduced results when the key changes, and start collecting for the new key.