SQL - Is this possible? - sql

I have a table that looks like this:
--------------------------------------------
| Date | House# | Subscription |
--------------------------------------------
| 3/02/10 | x | Monthly |
--------------------------------------------
| 3/03/10 | y | Weekly |
--------------------------------------------
| 3/04/10 | z | Daily |
--------------------------------------------
I need a command that will take a column name and an int and shift the values in those columns up so many levels. So (house, 1) would put z where y is, y where x is, and z would go to 0/Null. Whereas (house, 2) would put z where x is and y and z would go to 0/null.
I understand that SQL does not actually extract ables row by row, so is this possible?
Thanks ahead of time!

You can do this in a stored procedure using cursors.

You should use PL/SQL, here is an example (not for this particular example):
DECLARE
CURSOR cpaises
IS
SELECT CO_PAIS, DESCRIPCION, CONTINENTE
FROM PAISES;
co_pais VARCHAR2(3);
descripcion VARCHAR2(50);
continente VARCHAR2(25);
BEGIN
OPEN cpaises;
LOOP
FETCH cpaises INTO co_pais,descripcion,continente;
EXIT WHEN cpaises%NOTFOUND;
dbms_output.put_line(descripcion);
END LOOP;
CLOSE cpaises;
END;
I think you could use a variable to indicate which column to select and to update, and inside a loop, you can have an array, with the last n values.

You can use PL/SQL routine .Take the column name and number as input and then implement the logic as you want. Cursors as suggested above is one of the options that you have.

I would think adding a column that contains a value to use as a sort order you could then update that column as needed and then ordered by that column. If it is not possible to change that table perhaps you could create a new table to hold the sort column and join the two

Related

In Postgres: Select columns from a set of arrays of columns and check a condition on all of them

I have a table like this:
I want to perform count on different set of columns (all subsets where there is at least one element from X and one element from Y). How can I do that in Postgres?
For example, I may have {x1,x2,y3}, {x4,y1,y2,y3},etc. I want to count number of "id"s having 1 in each set. So for the first set:
SELECT COUNT(id) FROM table WHERE x1=1 AND x2=1 AND x3=1;
and for the second set does the same:
SELECT COUNT(id) FROM table WHERE x4=1 AND y1=1 AND y2=1 AND y3=1;
Is it possible to write a loop that goes over all these sets and query the table accordingly? The array will have more than 10000 sets, so it cannot be done manually.
You should be able convert the table columns to an array using ARRAY[col1, col2,...], then use the array_positions function, setting the second parameter to be the value you're checking for. So, given your example above, this query:
SELECT id, array_positions(array[x1,x2,x3,x4,y1,y2,y3,y4], 1)
FROM tbl
ORDER BY id;
Will yield this result:
+----+-------------------+
| id | array_positions |
+----+-------------------+
| a | {1,4,5} |
| b | {1,2,4,7} |
| c | {1,2,3,4,6,7,8} |
+----+-------------------+
Here's a SQL Fiddle.

PostgreSQL - Start A Transaction block IN Function

I'm trying to use create a transaction block inside a function, so my goal is to use this function one at time, so if some one use this Function and another want to use it, he can't until the first one is finish i create this Function :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
BEGIN;
SELECT MAX(max_value) INTO max FROM sch_lock.table_concurente;
INSERT INTO sch_lock.table_concurente(max_value, date_insertion) VALUES(max + 1, now());
-- Sleep a wail
PERFORM pg_sleep(time_to_wait);
RETURN max;
COMMIT;
END;
$$
LANGUAGE plpgsql;
But it seams not work, i have a mistake Syntax error BEGIN;
Without BEGIN; and COMMIT i get a correct result, i use this query to check :
-- First user should to wait 10 second
SELECT my_job(10) as max_value;
-- First user should to wait 3 second
SELECT my_job(3) as max_value;
So the result is :
+-----+----------------------------+------------+
| id | date | max_value |
+-----+----------------------------+------------+
| 1 | 2017-02-13 13:03:58.12+00 | 1 |
+-----|----------------------------+------------+
| 2 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
| 3 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
But the result should be :
+-----+----------------------------+------------+
| id | date | max_value |
+-----+----------------------------+------------+
| 1 | 2017-02-13 13:03:58.12+00 | 1 |
+-----|----------------------------+------------+
| 2 | 2017-02-13 13:10:00.291+00 | 2 |
+-----+----------------------------+------------+
| 3 | 2017-02-13 13:10:00.291+00 | 3 |
+-----+----------------------------+------------+
so the third one id = 3 should have the max_value = 3 and not 2, this happen because the first user Select the max = 1 and wait 10 sec and the second user Select the max = 1 and wait 3 sec before Insertion, but the right solution is : I can't use this Function Until the First one finish, for that i want to make something secure and protected.
My questions is :
how can i make a Transaction block inside a function?
Do you have any suggestion how can we make this, with a secure way?
Thank you.
Ok so you cannot COMMIT in a function. You can have a save point and roll back to the save point however.
Your smallest possible transaction is a single statement parsed and executed by the server from the client, so every transaction is a function. Within a transaction, however, you can have save points. In this case you would look at the exception handling portions of PostgreSQL to handle this.
However that is not what you want here. You want (I think?) data to be visible during a long-running server-side operation. For that you are kind of out of luck. You cannot really increment your transaction ids while running a function.
You have a few options, in order of what I would consider to be good practices (best to worst):
Break down your logic into smaller slices that each move the db from one consistent state to another, and run those in separate transactions.
Use a message queue (like pg_message_queue)in the db, plus an external worker, and something which runs a step and yields a message for the next step. Disadvantage is this adds more maintenance.
Use a function or framework like dblink or pl/python, or pl/perlu to connect back to the db and run transactions there. ick....
You can use dblink for this. Something like :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
SELECT INTO RES dblink_connect('con','dbname=local');
SELECT INTO RES dblink_exec('con', 'BEGIN');
...
SELECT INTO RES dblink_exec('con', 'COMMIT');
SELECT INTO RES dblink_disconnect('con');
END;
$$
LANGUAGE plpgsql;
I don't know if this is a good way or not but what if we use LOCK TABLE for example like this :
CREATE OR REPLACE FUNCTION my_job(time_to_wait integer) RETURNS INTEGER AS $$
DECLARE
max INT;
BEGIN
-- Lock table so no one will use it until the first one is finish
LOCK TABLE sch_lock.table_concurente IN ACCESS EXCLUSIVE MODE;
SELECT MAX(max_value) INTO max FROM sch_lock.table_concurente;
INSERT INTO sch_lock.table_concurente(max_value, date_insertion) VALUES(max + 1, now());
PERFORM pg_sleep(time_to_wait);
RETURN max;
END;
$$
LANGUAGE plpgsql;
It gives me the right result.

sql server 2008 r2 cursor is not forwarded by fetch

I stumbled upon a very strange behaviour while working on some T-SQL Code.
I am working on a SQL Server 2008 R2 SP2 (build nr.: 10.50.4000).
My question to you guys is if anybody has seen such a behaviour before or if anybody might be able to explain it to me.
So,
What's the situation?
We have a table, which looks like that:
product_number | id_object | position_in_product
---------------------------------------------------
1 | 101 | 1
1 | 102 | 1
1 | 103 | 1
2 | 201 | 1
2 | 202 | 1
2 | 203 | 1
Multiple object ids are allocated to one product number. The order should be defined by the position_in_product column. The funny part lies exactly in establishing that order.
Of course, after doing that the table should look like this:
product_number | id_object | position_in_product
---------------------------------------------------
1 | 101 | 1
1 | 102 | 2
1 | 103 | 3
2 | 201 | 1
2 | 202 | 2
2 | 203 | 3
What's going on?
To update the order column we create a cursor with the following statement:
DECLARE
table_runner CURSOR LOCAL FORWARD_ONLY FOR
SELECT id_object, product_number
FROM table
WHERE ident = #ident
ORDER BY product_number
By using this cursor and counting the rows with the same product_number we should be able to update the position_in_product column. (This has worked in every installation until now)
To move the cursor to the next row we use this:
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
The whole function looks like this:
OPEN table_runner
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
while ##FETCH_STATUS = 0
BEGIN
/* update_logic */
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
END
CLOSE table_runner
And that is the part, that does not work as expected.
The fetch will not give me the next row. I am getting always the same result row.
The while loop does never end, the fetch_status is always 0, but the result stays the same.
The Workaround
After searching the web for quite a while without any results i decided to try a more pragmatical way and put another FETCH statement in.
I know that the id_object variable is unique and has to change in every loop cycle,
so i remembered the last fetched id and put this under the loop fetch statement:
if #id_object_memory = #table_runner$id_object
begin
FETCH next from table_runner
INTO #table_runner$id_object, #table_runner$product_number
set #id_object_memory = #table_runner$id_object
end
else
set #id_object_memory = #table_runner$id_object
With that the loop works as expected, the column in question is updated as it should and the cursor will reach the end of the result set.
The big ?
Has anyone any explanation for that?
There are more cursor defined in the same procedure and they all work as expected.
I have absolute no clue how to explain this.
So, thanks for reading ;)
I can't help with the cursor issue, I've never seen this before, but should point out you don't need a cursor at all to do this update. You can simply use:
WITH CTE AS
( SELECT Product_Number,
ID_Object,
Position_in_Product,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Product_Number
ORDER BY id_object)
FROM T
WHERE ident = #ident
)
UPDATE CTE
SET Position_in_Product = RowNumber;
Example on SQL Fiddle
You possibly don't even need to store this column, and can just use ROW_NUMBER in a query where the position_in_product is required.
Cursors are so 2000 ;-)
Seriously though; avoid cursors at all costs. Set-based operations > looping.
Just create a view with the following:
CREATE VIEW your_view
AS
SELECT product_number
, id_object
, Row_Number() OVER (PARTITION BY product_number ORDER BY id_object) As position_in_product
FROM your_table
;
No need to ever perform the update; the row numbers will "automatically" recalculate.

Is there a way to transpose data in Hive?

Can data in Hive be transposed? As in, the rows become columns and columns are the rows? If there is no function straight up, is there a way to do it in a couple of steps?
I have a table like this:
| ID | Names | Proc1 | Proc2 | Proc3 |
| 1 | A1 | x | b | f |
| 2 | B1 | y | c | g |
| 3 | C1 | z | d | h |
| 4 | D1 | a | e | i |
I want it to be like this:
| A1 | B1 | C1 | D1 |
| x | y | z | a |
| b | c | d | e |
| f | g | h | i |
I have been looking up other related questions and they all mention using lateral views and explode, but is there a way to selectively choose columns for lateral(ly) view(ing) and explod(ing)?
Also, what might be the rough process to achieve what I would like to do? Please help me out. Thanks!
Edit: I have been reading this link: https://cwiki.apache.org/Hive/languagemanual-lateralview.html and it shows me half of what I want to achieve. The first example in the link is basically what I'd like except that I don't want the rows to repeat and want them as column names. Any ideas on how to get the data to a form such that if I do an explode, it would result in my desired output, or the other way, ie, explode first to lead to another step that would then lead to my desired output table. Thanks again!
I don't know of a way out of the box in hive to do this, sorry. You get close with explode etc. but I don't think it can get the job done.
Overall, conceptually, I think it's hard to a transpose without knowing what the columns of the destination table are going to be in advance. This is true, in particular for hive, because the metadata related to how many columns, their types, their names, etc. in a database - the metastore. And, it's true in general, because not knowing the columns beforehand, would require some sort of in-memory holding of data (ok, sure with spills) and users may need to be careful about not overflowing the memory and such (just like dynamic partitioning in hive).
In any case, long story short, if you know the columns of the destination table beforehand, life is good. There isn't a set command in hive per se, to the best of my knowledge, but you could use a bunch of if clauses and case statements (ugly I know, but that's how I have done the same in the past) in the select clause to transpose the data. Something along the lines of SQL - How to transpose?
Do let me know how it goes!
As Mark pointed out there's no easy way to do this in Hive since PIVOT doesn't present in Hive and you may also encounter issues when trying to use the case/when 'trick' since you have multiple values (proc1,proc2,proc3).
As for testing purposes, you may try a different approach:
select v, o1, o2, o3 from (
select k,
v,
LEAD(v,3) OVER() as o1,
LEAD(v,6) OVER() as o2,
LEAD(v,9) OVER() as o3
from (select transform(name,proc1,proc2,proc3) using 'python strm.py' AS (k, v)
from input_table) q1
) q2 where k = 'A1';
where strm.py:
import sys
for line in sys.stdin:
line = line.strip()
name, proc1, proc2, proc3 = line.split('\t')
print '%s\t%s' % (name, proc1)
print '%s\t%s' % (name, proc2)
print '%s\t%s' % (name, proc3)
The trick here is to use a python script in the map phase which emits each column of a row as distinct rows. Then every third (since we have 3 proc columns) row will form the resulting row which we get by peeking forward (lead).
However, this query does the job, it has the drawback that as the input grows, you need to peek the next 3rd element in the query which may lead to performance hit. Anyway you may evaluate it for testing purposes.

cloning hierarchical data

let's assume i have a self referencing hierarchical table build the classical way like this one:
CREATE TABLE test
(name text,id serial primary key,parent_id integer
references test);
insert into test (name,id,parent_id) values
('root1',1,NULL),('root2',2,NULL),('root1sub1',3,1),('root1sub2',4,1),('root
2sub1',5,2),('root2sub2',6,2);
testdb=# select * from test;
name | id | parent_id
-----------+----+-----------
root1 | 1 |
root2 | 2 |
root1sub1 | 3 | 1
root1sub2 | 4 | 1
root2sub1 | 5 | 2
root2sub2 | 6 | 2
What i need now is a function (preferrably in plain sql) that would take the id of a test record and
clone all attached records (including the given one). The cloned records need to have new ids of course. The desired result
would like this for example:
Select * from cloningfunction(2);
name | id | parent_id
-----------+----+-----------
root2 | 7 |
root2sub1 | 8 | 7
root2sub2 | 9 | 7
Any pointers? Im using PostgreSQL 8.3.
Pulling this result in recursively is tricky (although possible). However, it's typically not very efficient and there is a much better way to solve this problem.
Basically, you augment the table with an extra column which traces the tree to the top - I'll call it the "Upchain". It's just a long string that looks something like this:
name | id | parent_id | upchain
root1 | 1 | NULL | 1:
root2 | 2 | NULL | 2:
root1sub1 | 3 | 1 | 1:3:
root1sub2 | 4 | 1 | 1:4:
root2sub1 | 5 | 2 | 2:5:
root2sub2 | 6 | 2 | 2:6:
root1sub1sub1 | 7 | 3 | 1:3:7:
It's very easy to keep this field updated by using a trigger on the table. (Apologies for terminology but I have always done this with SQL Server). Every time you add or delete a record, or update the parent_id field, you just need to update the upchain field on that part of the tree. That's a trivial job because you just take the upchain of the parent record and append the id of the current record. All child records are easily identified using LIKE to check for records with the starting string in their upchain.
What you're doing effectively is trading a bit of extra write activity for a big saving when you come to read the data.
When you want to select a complete branch in the tree it's trivial. Suppose you want the branch under node 1. Node 1 has an upchain '1:' so you know that any node in the branch of the tree under that node must have an upchain starting '1:...'. So you just do this:
SELECT *
FROM table
WHERE upchain LIKE '1:%'
This is extremely fast (index the upchain field of course). As a bonus it also makes a lot of activities extremely simple, such as finding partial trees, level within the tree, etc.
I've used this in applications that track large employee reporting hierarchies but you can use it for pretty much any tree structure (parts breakdown, etc.)
Notes (for anyone who's interested):
I haven't given a step-by-step of the SQL code but once you get the principle, it's pretty simple to implement. I'm not a great programmer so I'm speaking from experience.
If you already have data in the table you need to do a one time update to get the upchains synchronised initially. Again, this isn't difficult as the code is very similar to the UPDATE code in the triggers.
This technique is also a good way to identify circular references which can otherwise be tricky to spot.
The Joe Celko's method which is similar to the njreed's answer but is more generic can be found here:
Nested-Set Model of Trees (at the middle of the article)
Nested-Set Model of Trees, part 2
Trees in SQL -- Part III
#Maximilian: You are right, we forgot your actual requirement. How about a recursive stored procedure? I am not sure if this is possible in PostgreSQL, but here is a working SQL Server version:
CREATE PROCEDURE CloneNode
#to_clone_id int, #parent_id int
AS
SET NOCOUNT ON
DECLARE #new_node_id int, #child_id int
INSERT INTO test (name, parent_id)
SELECT name, #parent_id FROM test WHERE id = #to_clone_id
SET #new_node_id = ##IDENTITY
DECLARE #children_cursor CURSOR
SET #children_cursor = CURSOR FOR
SELECT id FROM test WHERE parent_id = #to_clone_id
OPEN #children_cursor
FETCH NEXT FROM #children_cursor INTO #child_id
WHILE ##FETCH_STATUS = 0
BEGIN
EXECUTE CloneNode #child_id, #new_node_id
FETCH NEXT FROM #children_cursor INTO #child_id
END
CLOSE #children_cursor
DEALLOCATE #children_cursor
Your example is accomplished by EXECUTE CloneNode 2, null (the second parameter is the new parent node).
This sounds like an exercise from "SQL For Smarties" by Joe Celko...
I don't have my copy handy, but I think it's a book that'll help you quite a bit if this is the kind of problems you need to solve.