Postgresql Select rows and update column - sql

I have SQL Select query with where clauses. For e.g
select * from table where status = 1
And how can I update single column with selected rows simultaneously while selecting? I want to mark selected rows, to avoid reselect on the next loop. Something like:
select * from table where status = 1; update table set proc = 1 where id in (select id from table where status = 1)
But this query will not return results.

Use the returning clause:
update table
set proc = 1
where id in (select id from table where status = 1)
returning *;
(Btw: I assume the inner select is not actually selecting from the same table, because then the statement does not really makes sense as it could be rewritten with a simple where stauts = 1)

Related

Stored Procedure calling variables from table

I have a stored procedure that uses a variable ID, I have a list of valid IDs in a table.
I'm trying to write a stored procedure that runs a specific piece of code if the ID exists in table. I'm just not sure of the syntax.
Below is my pseudo-code of what I'm attempting to do.
IF
#ID = possible id IN (SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid')
SELECT * FROM dbo.[results]
ELSE
SELECT * FROM dbo.[otherresults]
I'm using SQL Server
Typically, this is the case where you would use EXISTS; as in....
IF EXISTS (SELECT * FROM ID_TABLE WHERE ID = #ID)
While #ID IN (SELECT DISTINCT would work, that query requires going through the table data to assemble a result set that is then checked for #ID's inclusion. EXISTS queries do not create result sets, and return early on the first row fitting the criteria.
If your query
SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid'
always returns only one id, then below solution is enough
#ID = (SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid')
If it returns list of ids, then you need to create a temp table and store those id's like below
create table #temp_table(id int)
insert into #temp_table SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid'

Hive Query with a large WHERE Condition

I am writing a HIVE query to pull about 2,000 unique keys from a table.
I keep getting this error - java.lang.StackOverflowError
My query is basic but looks like this:
SELECT * FROM table WHERE (Id = 1 or Id = 2 or Id = 3 Id = 4)
my WHERE clause goes all the way up to 2000 unique id's and I receive the error above. Does anyone know of a more efficient way to do this or get this query to work?
Thanks!
You may use the SPLIT and EXPLODE to convert the comma separated string to rows and then use IN or EXISTS.
using IN
SELECT * FROM yourtable t WHERE
t.ID IN
(
SELECT
explode(split('1,2,3,4,5,6,1998,1999,2000',',')) as id
) ;
Using EXISTS
SELECT * FROM yourtable t WHERE
EXISTS
(
SELECT 1 FROM (
SELECT
explode(split('1,2,3,4,5,6,1998,1999,2000',',')) as id
) s
WHERE s.id = t.id
);
Make use of the Between clause instead of specifying all unique ids:
SELECT ID FROM table WHERE ID BETWEEN 1 AND 2000 GROUP BY ID;
i you can create a table for these IDs and after use the condition of exist in the new table to get only your specific IDs

union unusual behavior

Trying to union two tables with the same field into one master table but for some reason im getting a weird result.
select count(*)
from staging.sandoval_parcels
where parcel_id = 0;
returns 0
select count(*)
from staging.bernalillo_parcels
where parcel_id = 0;
returns 0
but when i merge the tables using
CREATE TABLE staging.master_parcels
AS
SELECT * FROM bernalillo_parcels
UNION ALL
SELECT * FROM sandoval_parcels
;
then
select count(*)
from staging.master_parcels
where parcel_id = 0;
returns 85553
both tables have the same fields and the fields are the same data type,also, no of the values for any field are missing, thus no nulls, why am i getting ids = 0 when either of the table have parcel_ids = 0?
The order of the fields matter, replace the * for the explicit name, other wise the second query field will be inserted on the first query position. But not necessarily on the same field you want.
CREATE TABLE staging.master_parcels
AS
SELECT parcel_id, field1 ... FROM bernalillo_parcels
UNION ALL
SELECT parcel_id, field1 ... FROM sandoval_parcels
;
Union will merge tables even if the column order is not the same. If all of the columns match and are in the same order, it will union distinct values and not create duplicates if the rows are the same for each table. Having the order and data type be the same is important.

Doing a join only if count is greater than one

I wonder if the following a bit contrived example is possible without using intermediary variables and a conditional clause.
Consider an intermediary query which can produce a result set that contain either no rows, one row or multiple rows. Most of the time it produces just one row, but when multiple rows, one should join the resulting rows to another table to prune it down to either one or no rows. After this if there is one row (as opposed to no rows), one would want to return multiple columns as produced by the original intermediary query.
I have in my mind something like following, but it won't obviously work (multiple columns in switch-case, no join etc.), but maybe it illustrates the point. What I would like to have is to just return what is currently in the SELECT clause in case ##ROWCOUNT = 1 or in case it is greater, do a INNER JOIN to Auxilliary, which prunes down x to either one row or no rows and then return that. I don't want to search Main more than once and Auxilliary only when x here contains more than one row.
SELECT x.MainId, x.Data1, x.Data2, x.Data3,
CASE
WHEN ##ROWCOUNT IS NOT NULL AND ##ROWCOUNT = 1 THEN
1
WHEN ##ROWCOUNT IS NOT NULL AND ##ROWCOUNT > 1 THEN
-- Use here #id or MainId to join to Auxilliary and there
-- FilteringCondition = #filteringCondition to prune x to either
-- one or zero rows.
END
FROM
(
SELECT
MainId,
Data1,
Data2,
Data3
FROM Main
WHERE
MainId = #id
) AS x;
CREATE TABLE Main
(
-- This Id may introduce more than row, so it is joined to
-- Auxilliary for further pruning with the given conditions.
MainId INT,
Data1 NVARCHAR(MAX) NOT NULL,
Data2 NVARCHAR(MAX) NOT NULL,
Data3 NVARCHAR(MAX) NOT NULL,
AuxilliaryId INT NOT NULL
);
CREATE TABLE Auxilliary
(
AuxilliaryId INT IDENTITY(1, 1) PRIMARY KEY,
FilteringCondition NVARCHAR(1000) NOT NULL
);
Would this be possible in one query without a temporary table variable and a conditional? Without using a CTE?
Some sample data would be
INSERT INTO Auxilliary(FilteringCondition)
VALUES
(N'SomeFilteringCondition1'),
(N'SomeFilteringCondition2'),
(N'SomeFilteringCondition3');
INSERT INTO Main(MainId, Data1, Data2, Data3, AuxilliaryId)
VALUES
(1, N'SomeMainData11', N'SomeMainData12', N'SomeMainData13', 1),
(1, N'SomeMainData21', N'SomeMainData22', N'SomeMainData23', 2),
(2, N'SomeMainData31', N'SomeMainData32', N'SomeMainData33', 3);
And a sample query, which actually behaves as I'd like it to behave with the caveat I'd want to do the join only if querying Main directly with the given ID produces more than one result.
DECLARE #id AS INT = 1;
DECLARE #filteringCondition AS NVARCHAR(1000) = N'SomeFilteringCondition1';
SELECT *
FROM
Main
INNER JOIN Auxilliary AS aux ON aux.AuxilliaryId = Main.AuxilliaryId
WHERE MainId = #id AND aux.FilteringCondition = #filteringCondition;
You don't usually use a join to reduce the result set of the left table. To limit a result set you'd use the where clause instead. In combination with another table this would be WHERE [NOT] EXISTS.
So let's say this is your main query:
select * from main where main.col1 = 1;
It returns one of the following results:
no rows, then we are done
one row, then we are also done
more than one row, then we must extend the where clause
The query with the extended where clause:
select * from main where main.col1 = 1
and exists (select * from other where other.col2 = main.col3);
which returns one of the following results:
no rows, which is okay
one row, which is okay
more than one row - you say this is not possible
So the task is to do this in one step instead. I count records and look for a match in the other table for every record. Then ...
if the count is zero we get no result anyway
if it is one I take that row
if it is greater than one, I take the row for which exists a match in the other table or none when there is no match
Here is the full query:
select *
from
(
select
main.*,
count(*) over () as cnt,
case when exists (select * from other where other.col2 = main.col3) then 1 else 0 end
as other_exists
from main
where main.col1 = 1
) counted_and_checked
where cnt = 1 or other_exists = 1;
UPDATE: I understand that you want to avoid unnecessary access to the other table. This is rather difficult to do however.
In order to only use the subquery when necessary, we could move it outside:
select *
from
(
select
main.*,
count(*) over () as cnt
from main
where main.col1 = 1
) counted_and_checked
where cnt = 1 or exists (select * from other where other.col2 = main.col3);
This looks much better in my opinion. However there is no precedence among the two expressions left and right of an OR. So the DBMS may still execute the subselect on every record before evaluating cnt = 1.
The only operation that I know of using left to right precedence, i.e. doesn't look further once a condition on the left hand side is matched is COALESCE. So we could do the following:
select *
from
(
select
main.*,
count(*) over () as cnt
from main
where main.col1 = 1
) counted_and_checked
where coalesce( case when cnt = 1 then 1 else null end ,
(select count(*) from other where other.col2 = main.col3)
) > 0;
This may look a bit strange, but should prevent the subquery from being executed, when cnt is 1.
You may try something like
select * from Main m
where mainId=#id
and #filteringCondition = case when(select count(*) from Main m2 where m2.mainId=#id) >1
then (select filteringCondition from Auxilliary a where a.AuxilliaryId = m.AuxilliaryId) else #filteringCondition end
but it's hardly very fast query. I'd better use temp table or just if and two queries.

Select rows and Update same rows for locking?

I need to write a procedure that will allow me to select x amount of rows and at the same time update those rows so the calling application will know those records are locked and in use. I have a column in the table named "locked". The next time the procedure is called it will only pull the next x amount of records that do not have the "locked" column checked. I have read a little about the OUTPUT method for SQL server, but not sure that is what I want to do.
As you suggested, you can use the OUTPUT clause effectively:
Live demo: https://data.stackexchange.com/stackoverflow/query/8058/so3319842
UPDATE #tbl
SET locked = 1
OUTPUT INSERTED.*
WHERE id IN (
SELECT TOP 1 id
FROM #tbl
WHERE locked = 0
ORDER BY id
)​
Also see this article:
http://www.sqlmag.com/article/tsql3/more-top-troubles-using-top-with-insert-update-and-delete.aspx
Vote for Cade Roux's answer, using OUTPUT:
UPDATE #tbl
SET locked = 1
OUTPUT INSERTED.*
WHERE id IN (SELECT TOP 1 id
FROM #tbl
WHERE locked = 0
ORDER BY id)​
Previously:
This is one of the few times I can think of using a temp table:
ALTER PROCEDURE temp_table_test
AS
BEGIN
SELECT TOP 5000 *
INTO #temp_test
FROM your_table
WHERE locked != 1
ORDER BY ?
UPDATE your_table
SET locked = 1
WHERE id IN (SELECT id FROM #temp_test)
SELECT *
FROM #temp_test
IF EXISTS (SELECT NULL
FROM tempdb.dbo.sysobjects
WHERE ID = OBJECT_ID(N'tempdb..#temp_test'))
BEGIN
DROP TABLE #temp_test
END
END
This:
Fetches the rows you want, stuffs them into a local temp table
Uses the temp table to update the rows to be "locked"
SELECTs from the temp table to give you your resultset output
Drops the temp table because they live for the session