Merging rows with the same id - sql

I have many tables in my db which has: RowID, ProductID and Numeric Value (Numeric value can be NULL, what means infinity). In table can be many Rows with the same ProductId. Is this possible to make function that take some of this tables (not all, only chosen by me) and return new table which contains all ProductId from each table but there is only one ProductId in table and Numeric Value is sum of all rows from each table. E.g
Table1:
RowID ProductID Numeric Value
0 1 1.5
1 1 3.5
2 2 4
Table2:
RowID ProductID Numeric Value
0 1 6
1 3 1.25
2 3 NULL
Return Table:
ProductID Numeric Value
1 11 (1.5+3.5+6)
2 4
3 NULL (1.25 + NULL)
*it also can return 0 instead of NULL, all Numeric Values are positive so 0 can represent infinity

I think you can first combine 2 columns and then group them -
SELECT PID, CASE WHEN COUNT(*) = COUNT(NV) THEN SUM(NV) ELSE NULL END
FROM (SELECT ProductID PID, Numeric Value NV
FROM TABLE_1
UNION ALL
SELECT ProductID, Numeric Value
FROM TABLE_2) T
GROUP BY PID

Related

Grouping sequence number in SQL

I have a table like below.
DECLARE #Table TABLE (
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
I have used dense_rank, but the results are not correct.
select [Text], [Order], RequiredResult
, DENSE_RANK() OVER (ORDER BY [text],[Order]) AS ComputedResult
from #Table;
Results:
Text
Order
RequiredResult
ComputedResult
A
1
1
1
A
2
2
2
B
2
1
3
B
2
3
3
B
3
2
4
C
3
1
5
D
1
2
6
G
4
2
7
H
1
3
8
Please help me to calculate the RequiredResult column.
It looks like the RequiredResult column is simple a running sequence that resets after each broken sequence in the Order column when you process the records in the order they were inserted.
This is a typical Data Island analysis task, except in this case the islands are the rows that are sequential sets, the boundary is when the numbering resets back to 1.
Record the input sequence by adding an IDENTITY column to the table variable.
Calculate an island identifier
Due to the rule about the rows being in sequence based on the Order column, we can calculate a unique number for the Island by subtracting the Order from the IDENTITY column, in this case Id
We can then use DENSE_RANK() ordering by the Island Number
Putting all that together:
DECLARE #Table TABLE (
[Id] int IDENTITY(1,1),
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
SELECT [Text],[Order]
, [Id]-[Order] as Island
, RequiredResult
, DENSE_RANK() OVER (ORDER BY [ID]-[ORDER]) AS CalculatedResult
FROM #Table
ORDER BY [ID]
Text
Order
Island
RequiredResult
CalculatedResult
A
1
0
1
1
B
2
0
1
1
C
3
0
1
1
D
1
3
2
2
A
2
3
2
2
B
3
3
2
2
G
4
3
2
2
H
1
7
3
3
B
2
7
3
3
The key here is that we need to record the input sequence so we can us it in the calculation. It doesn't matter what actual numbering value the Id column has, only that it is also in sequence. If that number sequence is broken, then you could use the ROW_NUMER() function result to calculate the Island Number but the specifics on that would depend on the initial query that provides the basic sequential dataset.
You seem to have an ordering in mind for the rows. SQL tables represent unordered (multi)sets. The only column in your data that has the appropriate ordering is text, but your real data might have another column with this information.
Basically, you just want a cumulative sum of the number of 1s up to each row. That would be:
select t.*,
sum(case when ord = 1 then 1 else 0 end) over (order by text)
from t

Get previous value from column A when column B is not null in Hive

I have a table tableA below
ID number Estimate Client
---- ------
1 3 8 A
1 NULL 10 Null
1 5 11 A
1 NULL 19 Null
2 NULL 20 Null
2 2 70 A
.......
I would like to select previous row of Estimate column when number column is not null. For instance, when number = 3, then pre_estimate = NULL, when number = 5, then pre_estimate = 10, and when number = 2, then pre_estimate = 20.
The query below does not seem to return the correct answer in Hive. What should be correct way to do it?
select lag(Estimate, 1) OVER (partition by ID) as prev_estimate
from tableA
where number is not null
Consider the table with following structure:
number - int
estimate - int
order_column - int
order_column is taken as a column on which you want to sort your table rows.
Data in table:
number estimate order_column
3 8 1
NULL 10 2
5 11 3
NULL 19 4
NULL 20 5
2 70 6
I used the following query and got the result you have mentioned.
SELECT * FROM (SELECT number, estimate, lag(estimate,1) over(order by order_column) as prev_estimate from tableA) tbl where tbl.number is not null;
As per my understanding, I didn't find the reason to partition by id, that's why I haven't considered ID in the table.
The reason you were getting wrong results is due to the reason that where clause in main query will select only the records with number as not null and then it computes lag function, but you need to consider all the rows when computing the lag function and then you should select rows with number as not null.

Use case after order by

I was reading an sql book, one of questions is:
Write a query against the Sales.Customers table that returns for each customer the customer ID and region. Sort the rows in the output by region, having NULL marks sort last (after non-NULL values).Note that the default sort behavior for NULL marks in T-SQL is to sort first (before non-NULL values).
And the answer is :
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 1 ELSE 0 END, region;
I can kind of get the idea but still confused, let's take the record with custid = 9 for instance:
since custid 9 has a null region, in the case cstatement return 1, so the query is sth like:
ORDER BY 1, region
which is equivalent to:
ORDER BY custid, region --because custid is the first column
so how come the custid 9 is not before custid 10(the second record in the output)? isn't that output needs to order by custid first, so 9 is before 10?
Your interpretation is incorrect. The 1 is simple a number, not a column reference.
The query is equivalent to:
SELECT custid, region
FROM (SELECT c.*,
(CASE WHEN region IS NULL THEN 1 ELSE 0 END) as region_is_null
FROM Sales.Customers c
) c
ORDER BY region_is_null, region;
This is an important distinction about numbers in the ORDER BY. The expression:
ORDER BY 1
refers to the first column. However,
ORDER BY 1 + 0
is simply a numeric expression that returns the constant 1 -- and will result in an error in SQL Server (which does not allow constants in ORDER BY).
so the query is sth like
ORDER BY 1, region
No this is incorrect. The expression CASE WHEN region IS NULL THEN 1 ELSE 0 END is evaluated per-row; and the 1 is a value instead of column position. Column position inside ORDER BY can only specified only as a literal and not as an expression. So this:
custid region
8 NULL
9 NULL
10 BC
42 BC
45 CA
Becomes:
custid region case...
8 NULL 1
9 NULL 1
10 BC 0
42 BC 0
45 CA 0
And the sorted results could be:
custid region case...
10 BC 0
42 BC 0
45 CA 0
8 NULL 1
9 NULL 1
Or:
custid region case...
42 BC 0
10 BC 0
45 CA 0
9 NULL 1
8 NULL 1
You can try below - in your case 0 will be comign first then 1 so you need to change the order of the value, or you can do desc order if you don't want to change the value
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 0 ELSE 1 END, region
The idea is to use CASE statement to create a calculate virtual column to mark the nulls as 0 and none nulls as 1 and then sort accordingly.
if you use 0 in the order by clause you will get an error because you don't have a column at position of 0, also if you reorder the selected columns the result will be the same.
so the output of case statement is not a position of column it's a calculated column.
customer_id region marker
not important if null 0
ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
is not equivalent to
ORDER BY 1,
region
because in the second one the first column to sort by is always constant, whereas in the first it can change depending on the CASE.
And
ORDER BY 1,
region
is also not equivalent to
ORDER BY custid,
region
again in the first the 1 is constant but custid is variable.
What
ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
does is to "generate" a new column to sort by depending on the content of region. That new column gets 1 when region is null 0 otherwise. If you imagine this new column in the table it would look like
custid | region | new column
...
10 | BC | 0
...
9 | NULL | 1
...
Now if this gets sorted by the new column and the region the customer with ID 10 comes before the customer with ID 9 because the one with ID 10 has the lower value for the new column -- 0 against the 1 from the customer with the ID 9.

Find values which are present in all columns in a Cable

I would like a SQL Server query which finds the Values in a cell which fills multiple columns. For example, if I have table
ID Value1 Value2 Value3
1 2 NULL NULL
1 NULL 3 NULL
1 NULL NULL 4
1 3.4 NULL NULL
2 NULL 3 NULL
2 NULL NULL NULL
3 NULL NULL 91
As in the table above, only 2 of the columns can be filled at a time(First is ID and 2nd is either of Value1, 2 or 3) and ID can be repeated multiple times.
I want to return the ID as only 1 because 1 is the only ID that populates all the three other columns. 2 fills only Value2 and all the other values of 2nd iteration of 2 are NULL where as 3 is present only in Column Value3. Is there someway that I can find the Id's which fill all the other columns.
I would love to do this preferably without a cursor but I can go for cursor if it's compulsory. Thanks
EDIT
Desired Table:
ID
1
The Statement should return only the filtered IDs which populate all the other columns.
Try this
SELECT id,
FROM TableName
GROUP BY id
HAVING MAX(value1) IS NOT NULL AND
MAX(value2) IS NOT NULL AND
MAX(value3) IS NOT NULL
Something for you try if you want some less lines of code:
select ID from dbo.Table_1 group by ID having count(Value1) > 0 AND count(Value2) > 0 AND count(Value3) > 0

Find all ids which lie between Range and are present in a common foreign key id

I want all the specific Ids which have a common other Id. I am sending the data through an user-defined table type.
CREATE TYPE rangeType AS TABLE (
ID2 int NOT NULL,
StartRange int NULL,
EndRange int NULL
);
The table is Like the following
ID1 ID2 Value
11 2 3
12 2 4
12 3 8.9
15 3 10
15 2 4
The value I will send will be of the form
DECLARE #temp_table rangeType
Insert INTO #temp_table values (2,4,10)
INSERT INTO #temp_table values (3,5,10)
So I want Output to be all those ID1's which have both the value of ID2 as 2 and 3 and the rows which have ID2 as 2 should have a value between 4 and 10 and all those rows which have ID2 as 3 should have a value between 5 and 10.
So my output, in this case, should be
ID1
12
15
as the ID1 12 and 15 maps both 2 and 3 and have the ranges between the specified respective ranges.
I Tried an inner join on the table followed by a BETWEEN operator. Which is giving me a correct value the operation which is performed is OR operation rather than an AND operation which I want.
You can use below query
SELECT ID1 FROM TABLE
WHERE ID2 IN (2,3)
AND
CASE WHEN ID2 = 2 AND VALUE >= 4 AND VALUE >=10 THEN 1
WHEN ID2 = 3 AND VALUE >= 5 AND VALUE >=10 THEN 1
ELSE 0
END = 1;