Hive sql loop through table comparing values - sql

I have a table in hive that looks like the below
fruit value
apple 2
apple 3
apple 4
plum 2
plum 3
plum 4
I want to loop through the table and compare the previous value and fruit and create a new column(total) based off of the loop. this would be the logic
if [fruit] = previous[fruit] then total = prev[fruit]
The new table should look like this
fruit value total
apple 2
apple 3 2
apple 4 3
plum 2
plum 3 2
plum 4 3
How can i achieve this using SQL in Hive?
Also i have ordered the results in my query so its grouped by fruit and ascending values

SQL tables represent unordered sets. There is no "previous" row unless a column specifies the ordering. Assuming you have such a column, then you can use lag():
select t.*,
lag(value) over (partition by fruit order by ?) as prev_value
from t;
The ? is for the name of the column that specifies the ordering.

Adding to the previous answer, you can artificially create an order by writing to a temp table like this:
create table #holding (rowid int identity, fruit varchar(max), value int)
insert #holding
select fruit, value from your table
order by fruit, value
This will recreate the order in the original table and allow you to do what Gordon said above

Related

Updating uniqueidentifier column with same value for rows with matching column value

I need a little help. I have this (simplified) table:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
NULL
2
My Title 2
1
NULL
3
My Title 3
2
NULL
4
My Title 4
2
NULL
5
My Title 5
2
NULL
6
My Title 6
3
NULL
What I am trying to accomplish is generating the same uniqueidentifier for all rows having the same subtype.
So result would be this:
ID
Title
Subtype
RelatedUniqueID
1
My Title 1
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
2
My Title 2
1
439753d3-9103-4d0e-9dd0-569dc71fd6a3
3
My Title 3
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
4
My Title 4
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
5
My Title 5
2
d0f08203-1197-4cc7-91bb-c4ca34d7cb0a
6
My Title 6
3
055838c6-a814-4bd1-a859-63d4544bb449
Requirements
One query to update all rows at once
The actual table has many more rows with hundreds of subtypes, so manually building a query for each subtype is not an option
Using SQL Server 2017
Thanks for any assist.
Because newid() is applied per-row, you have to generate the values first, so this has to involve the use of a temporary or permanent table to store the correlated ID>Subtype value.
So first you need to generate the GUID values per Subtype :
with subtypes as (
select distinct subtype
from t
)
select Subtype, NewId() RelatedId into #Id
from subtypes
And then you can use an updatable CTE to apply these to your base table:
with r as (
select t.*, id.RelatedId
from #id id
join t on t.subtype=id.Subtype
)
update r
set relatedUniqueId=RelatedId
See example DB<>Fiddle
You can use an updatable CTE with a window function to get this data:
with r as (
select t.*,
RelatedId = first_value(newid()) over (partition by t.Subtype order by ID rows unbounded preceding)
from t
)
update r
set relatedUniqueId = RelatedId;
db<>fiddle
I warn though, that newid() is somewhat unpredictable in when it is calculated, so don't try messing about with a joined update (unless you pre-save the IDs like #Stu has done).
For example, see this fiddle, the IDs were calculated differently for every row.
I have found the single query solution.
Pre-requirement for this to work is that RelatedUniqueID must already contain random values. (e.g. set default field value to newid)
UPDATE TestTable SET ForeignUniqueID = TG.ForeignUniqueID FROM TestTable TG INNER JOIN TestTable ON TestTable.SubType = TG.SubType
Update
As Stu mentions in the comments, this solution might affect performance on large datasets. Please keep that in mind.

How can I remove duplicate rows from a table but keeping the summation of values of a column

Suppose there is a table which has several identical rows. I can copy the distinct values by
SELECT DISTINCT * INTO DESTINATIONTABLE FROM SOURCETABLE
but if the table has a column named value and for the sake of simplicity its value is 1 for one particular item in that table. Now that row has another 9 duplicates. So the summation of the value column for that particular item is 10. Now I want to remove the 9 duplicates(or copy the distinct value as I mentioned) and for that item now the value should show 10 and not 1. How can this be achieved?
item| value
----+----------------
A | 1
A | 1
A | 1
A | 1
B | 1
B | 1
I want to show this as below
item| value
----+----------------
A | 4
B | 2
Thanks in advance
You can try to use SUM and group by
SELECT item,SUM(value) value
FROM T
GROUP BY item
SQLfiddle:http://sqlfiddle.com/#!18/fac26/1
[Results]:
| item | value |
|------|-------|
| A | 4 |
| B | 2 |
Broadly speaking, you can just us a sum and a GROUP BY clause.
Something like:
SELECT column1, SUM(column2) AS Count
FROM SOURCETABLE
GROUP BY column1
Here it is in action: Sum + Group By
Since your table probably isn't just two columns of data, here is a slightly more complex example showing how to do this to a larger table: SQL Fiddle
Note that I've selected my rows individually so that I can access the necessary data, rather than using
SELECT *
And I have achieved this result without the need for selecting data into another table.
EDIT 2:
Further to your comments, it sounds like you want to alter the actual data in your table rather than just querying it. There may be a more elegant way to do this, but a simple way use the above query to populate a temporary table, delete the contents of the existing table, then move all the data back. To do this in my existing example:
WITH MyQuery AS (
SELECT name, type, colour, price, SUM(number) AS number
FROM MyTable
GROUP BY name, type, colour, price
)
SELECT * INTO MyTable2 FROM MyQuery;
DELETE FROM MyTable;
INSERT INTO MyTable(name, type, colour, price, number)
SELECT * FROM MyTable2;
DROP TABLE MyTable2;
WARNING: If youre going to try this, please use a development environment first (i.e one you don't mind breaking!) to ensure it does exactly what you want it to do. It's imperative that your initial query captures ALL the data you want.
Here is the SQL Fiddle of this example in action: SQL Fiddle

Updating Relational Tables using merge

In a hypothetical example, say I have two tables: FARM and FRUIT
FARM is organized like:
FARM_ID Size
1 50
2 100
3 200
...
and FRUIT is organized like:
Reference_ID FRUIT
1 Banana
1 Grape
1 Orange
2 Banana
2 Strawberry
FRUIT table is created from taking a parameter #fruit from excel which is a delimited string using '/'.
For example, #fruit = 'Banana/Grape/Orange'
And using a statement like:
INSERT INTO FRUIT(
Fruit,
Reference_ID,
)
SELECT Fruit, Scope_IDENTITY() from split_string(#fruit, '/')
Where split_string is a function.
My goal is to check for updates. I want to take in a Farm_ID and #fruit and check to see if any changes have been made to the fruit.
1) If the values haven't changed, dont do anything
2) If a new fruit was added, add it to the FRUIT table with the farm_ID
3) If there is a fruit in the FRUIT table that does not correspond to the new delimited list for the respectful FARM_ID, remove it from the FRUIT table.
I think a Merge statement would probably work but open to suggestions. Let me know if anything is unclear. Thank you
EDIT
Im fairly new to SQL but have tried using a merge...
Declare #foo tinyint
Merge Fruit as Target
Using (Select Fruit , #workingID From split_string(#fruit, '/') As source (fruit, ID)
--#workingID is just a way to get the ID from other parts of the sproc.
ON (TARGET.fruit = source.fruit)
WHEN MATCHED THEN
SET #foo = 1
WHEN NOT MATCHED
THEN DELETE
WHEN NOT MATCHED THEN
INSERT INTO FRUIT(
Reference_ID,
Fruit
)
VALUES(
Then I am a bit stuck on how to get unique, new values
Any way your input contains the new fruit list against the farm id. So better option is to delete the existing and insert the new list of fruit against the farmid.
Sample script is given below.
--loading the input to temp table
SELECT Fruit,#referenceid ReferenceId -- farmid corresponding tithe fruit list
INTO #temp
FROM Split_string(#fruit,'/')
-- delete the existing data against the given farmid
DELETE FROM fruit f
WHERE EXISTS ( SELECT 1 FROM #temp t
WHERE f.Reference_id=t.ReferenceId)
-- insert the new list
INSERT INTO fruit
SELECT fruit,referenceId
FROM #temp

Update SQL column with sequenced number

I have a table SL_PROD which has the following columns, NUMBER, DEPTCODE, DISP_SEQ AND SL_PROD_ID.
SL_PROD_ID is an identity column which incrementally increases with each row.
I need to write a query which updates the DISP_SEQ column with sequential numbers (1-X) for the rows which have a DEPTCODE of '725'. I've tried several things with no luck, any ideas?
Try this:
A common table expression can be used in updates. This is extremely usefull, if you want to use the values of window functions (with OVER) as update values.
Attention: Look carefully what you are ordering for. I used NUMBER but you might need some other sort column (maybe your IDENTITY column)
CREATE TABLE #SL_PROD(NUMBER INT,DEPT_CODE INT,DISP_SEQ INT,SL_PROD_ID INT IDENTITY);
INSERT INTO #SL_PROD(NUMBER,DEPT_CODE,DISP_SEQ) VALUES
(1,123,0)
,(2,725,0)
,(3,725,0)
,(4,123,0)
,(5,725,0);
WITH UpdateableCTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY NUMBER) AS NewDispSeq
,DISP_SEQ
FROM #SL_PROD
WHERE DEPT_CODE=725
)
UPDATE UpdateableCTE SET DISP_SEQ=NewDispSeq;
SELECT * FROM #SL_PROD;
GO
--Clean up
--DROP TABLE #SL_PROD;
The result (look at the lines with 725)
1 123 0 1
2 725 1 2
3 725 2 3
4 123 0 4
5 725 3 5

Fruits are running away ! Stored Procedure SQL?

I have a table like this,
Table - Fruits
ID
Fruit_Truck_ID
Fruit_Crate_ID (can be null)
Mango (column exists but might get deleted) (these columnns are bit)
Apple
etc etc...
Now, what I want is that, create a stored procedure, which will either provide (truck id + fruit name) or provide (truck id + crate id + fruit name)
Fruit Columns can be added and deleted so I need to make a stored procedure which do following,
If in table, if fruit crate id is not null then select using truck id and crate id + given name of column, e.g. if I give this, GetFruitMango(truck_ID, Crate_ID, "Mango", it should return me the value of mango column which is bit (0 or 1)
Now if Crate_ID provided is not in table or is null then use truck_ID and Column name to get what's in mango column etc.