how to find continuous series using pl/sql - sql

i am a pl/sql programmer and facing a problem in finding continuity in series for the same date
suppose i am having series like
1000,1001,
1002,1003,
1004,1005,
1016,1017,
1018,1019,
1020,1021,
1035,1036,
1037,1038,
1039,1040
and i am looking for the output as
from_series ------------- to_series
1000 ------------- 1005
1016 ------------- 1021
1035 ------------- 1040
i did trying it with but the problem which i faced is in case
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
r.series_NO-ISSUE_NO=1 );
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
ISSUE_NO=r.series_NO+1 );
I am getting the result by joining the above two queries in alignment. It's ok for few records but my records are in lac's, it's taking a long time to fetch data from joining these two queries.
please let me the appropriate way to sort out the data in correct interval.

Assuming a simple table structure such as:
CREATE TABLE T (x INT);
INSERT INTO T (x) VALUES
(1000), (1001), (1002), (1003),
(1004), (1005), (1016), (1017),
(1018), (1019), (1020), (1021),
(1035), (1036), (1037), (1038),
(1039), (1040);
You can use ROW_NUMBER() to get a static value for sequential numbers, you can then group by this value to get the min and max values in a range:
SELECT MIN(x) AS RangeStart, MAX(x) AS RangeEnd
FROM ( SELECT X,
X - ROW_NUMBER() OVER(ORDER BY x) AS GroupBy
FROM T
) t
GROUP BY GroupBy;
Example On SQL Fiddle

Related

Given a table of numbers, can I get all the rows which add up to less than or equal to a number?

Say I have a table with an incrementing id column and a random positive non zero number.
id
rand
1
12
2
5
3
99
4
87
Write a query to return the rows which add up to a given number.
A couple rules:
Rows must be "consumed" in order, even if a later row makes it a a perfect match. For example, querying for 104 would be a perfect match for rows 1, 2, and 4 but rows 1-3 would still be returned.
You can use a row partially if there is more available than is necessary to add up to whatever is leftover on the number E.g. rows 1, 2, and 3 would be returned if your max number is 50 because 12 + 5 + 33 equals 50 and 90 is a partial result.
If there are not enough rows to satisfy the amount, then return ALL the rows. E.g. in the above example a query for 1,000 would return rows 1-4. In other words, the sum of the rows should be less than or equal to the queried number.
It's possible for the answer to be "no this is not possible with SQL alone" and that's fine but I was just curious. This would be a trivial problem with a programming language but I was wondering what SQL provides out of the box to do something as a thought experiment and learning exercise.
You didn't mention which RDBMS, but assuming SQL Server:
DROP TABLE #t;
CREATE TABLE #t (id int, rand int);
INSERT INTO #t (id,rand)
VALUES (1,12),(2,5),(3,99),(4,87);
DECLARE #target int = 104;
WITH dat
AS
(
SELECT id, rand, SUM(rand) OVER (ORDER BY id) as runsum
FROM #t
),
dat2
as
(
SELECT id, rand
, runsum
, COALESCE(LAG(runsum,1) OVER (ORDER BY id),0) as prev_runsum
from dat
)
SELECT id, rand
FROM dat2
WHERE #target >= runsum
OR #target BETWEEN prev_runsum AND runsum;

Sample certain number of result rows from a postgres table based on given proportions

Let's say I have a table named population with 1000 rows like the following:
And I have another table named proportions that holds the desired proportions of different Group_Names that I want to extract:
I want to randomly sample 100 rows from population table where the proportions of the Group_Names within the sample is in line with that of the Proportion field within proportions table. So in that 100 rows sample, 50 rows should be Group-A, 30 rows should be Group-B and 20 rows should be Group-C.
I can manually sample like:
CREATE EXTENSION tsm_system_rows;
SELECT * FROM population TABLESAMPLE SYSTEM_ROWS(100);
But I do not know how to sample from population programmatically based on proportions table especially if proportions table has a lot more Group_Names than 3 as shown in the example.
The main problem that you will be facing is that TABLESAMPLE takes the sample before applying your group filter. Say that you want 20 rows from group C. The chances of getting those 20 by running
SELECT *
FROM population TABLESAMPLE system_rows(20)
WHERE group_name = 'C'
are pretty slim if group C is small relative to other groups in population.
I'd solve this by writing a stored function that receives as parameters the group name and wanted amount of rows, and samples the table until reaching the wanted amount of rows.
You should also limit the number of iterations, in case that the group is very sparse or there or not enough rows to fulfill the need.
So the function could look like so
CREATE OR REPLACE FUNCTION sample_group (p_group_name text, sample_size int, max_iterations int)
RETURNS int[]
LANGUAGE PLPGSQL AS $$
DECLARE
result int[];
i int := 0;
BEGIN
WHILE i < max_iterations AND coalesce(array_length(result, 1), 0) < sample_size LOOP
WITH sample AS (
SELECT group_name, value
FROM population TABLESAMPLE BERNOULLI (1)
LIMIT 10 * sample_size
), add_rows AS (
SELECT result || array_agg(value) arr
FROM sample
WHERE group_name = p_group_name
)
SELECT array_agg(DISTINCT value), i + 1
INTO result, i
FROM add_rows, unnest(arr) AS t(value);
END LOOP;
RETURN result[1:sample_size];
END;
$$;
I'm using BERNOULLI sampling to avoid getting the same rows over and over.
The function did most of the work for you. All that remains is to call it. In this example I'm setting an upper limit of 500 on the iterations.
SELECT group_name, unnest(sample_group(group_name, (100*proportion)::int, 500)) AS value
from proportions;
You can sample based on randomly assigned row numbers:
select *
from
(
select *
,case
when row_number()
over (partition by pop.group_name
order by random()) <= pr.proportion * 100 -- sample size
then 1
else 0
end as flag
from population as pop
join proportions as pr
on pop.group_name = pr.group_name
) as dt
where flag = 1
Edit:
If the table is large creating a SAMPLE before ROW_NUMBER might greatly reduce the number of rows processed. Of course, the SAMPLE size must be large enough to contain at least the required number of rows, i.e. way over 100 rows.

Postgres union of queries in loop

I have a table with two columns. Let's call them
array_column and text_column
I'm trying to write a query to find out, for K ranging from 1 to 10, in how many rows does the value in text_column appear in the first K elements of array_column
I'm expecting results like:
k | count
________________
1 | 70
2 | 85
3 | 90
...
I did manage to get these results by simply repeating the query 10 times and uniting the results, which looks like this:
SELECT 1 AS k, count(*) FROM table WHERE array_column[1:1] #> ARRAY[text_column]
UNION ALL
SELECT 2 AS k, count(*) FROM table WHERE array_column[1:2] #> ARRAY[text_column]
UNION ALL
SELECT 3 AS k, count(*) FROM table WHERE array_column[1:3] #> ARRAY[text_column]
...
But that doesn't looks like the correct way to do it. What if I wanted a very large range for K?
So my question is, is it possible to perform queries in a loop, and unite the results from each query? Or, if this is not the correct approach to the problem, how would you do it?
Thanks in advance!
You could use array_positions() which returns an array of all positions where the argument was found in the array, e.g.
select t.*,
array_positions(array_column, text_column)
from the_table t;
This returns a different result but is a lot more efficient as you don't need to increase the overall size of the result. To only consider the first ten array elements, just pass a slice to the function:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t;
To limit the result to only rows that actually contain the value you can use:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t
where text_column = any(array_column[1:10]);
To get your desired result, you could use unnest() to turn that into rows:
select k, count(*)
from the_table t, unnest(array_positions(array_column[1:10], text_column)) as k
where text_column = any(array_column[1:10])
group by k
order by k;
You can use the generate_series function to generate a table with the expected number of rows with the expected values and then join to it within the query, like so:
SELECT t.k AS k, count(*)
FROM table
--right join ensures that you will get a value of 0 if there are no records meeting the criteria
right join (select generate_series(1,10) as k) t
on array_column[1:t.k] #> ARRAY[text_column]
group by t.k
This is probably the closest thing to using a loop to go through the results without using something like PL/SQL to do an actual loop in a user-defined function.

where clause with = sign matches multiple records while expected just one record

I have a simple inline view that contains 2 columns.
-----------------
rn | val
-----------------
0 | A
... | ...
25 | Z
I am trying to select a val by matching the rn randomly by using the dbms_random.value() method as in
with d (rn, val) as
(
select level-1, chr(64+level) from dual connect by level <= 26
)
select * from d
where rn = floor(dbms_random.value()*25)
;
My expectation is it should return one row only without failing.
But now and then I get multiple rows returned or no rows at all.
on the other hand,
>>select floor(dbms_random.value()*25) from dual connect by level <1000
returns a whole number for each row and I failed to see any abnormality.
What am I missing here?
The problem is that the random value is recalculated for each row. So, you might get two random values that match the value -- or go through all the values and never get a hit.
One way to get around this is:
select d.*
from (select d.*
from d
order by dbms_random.value()
) d
where rownum = 1;
There are more efficient ways to calculate a random number, but this is intended to be a simple modification to your existing query.
You also might want to ask another question. This question starts with a description of a table that is not used, and then the question is about a query that doesn't use the table. Ask another question, describing the table and the real problem you are having -- along with sample data and desired results.

Dynamic pivot for thousands of columns

I'm using pgAdmin III / PostgreSQL 9.4 to store and work with my data. Sample of my current data:
x | y
--+--
0 | 1
1 | 1
2 | 1
5 | 2
5 | 2
2 | 2
4 | 3
6 | 3
2 | 3
How I'd like it to be formatted:
1, 2, 3 -- column names are unique y values
0, 5, 4 -- the first respective x values
1, 5, 6 -- the second respective x values
2, 2, 2 -- etc.
It would need to be dynamic because I have millions of rows and thousands of unique values for y.
Is using a dynamic pivot approach correct for this? I have not been able to successfully implement this:
DECLARE #columns VARCHAR(8000)
SELECT #columns = COALESCE(#columns + ',[' + cast(y as varchar) + ']',
'[' + cast(y as varchar)+ ']')
FROM tableName
GROUP BY y
DECLARE #query VARCHAR(8000)
SET #query = '
SELECT x
FROM tableName
PIVOT
(
MAX(x)
FOR [y]
IN (' + #columns + ')
)
AS p'
EXECUTE(#query)
It is stopping on the first line and giving the error:
syntax error at or near "#"
All dynamic pivot examples I've seen use this, so I'm not sure what I've done wrong. Any help is appreciated. Thank you for your time.
**Note: It is important for the x values to be stored in the correct order, as sequence matters. I can add another column to indicate sequential order if necessary.
The term "first row" assumes a natural order of rows, which does not exist in database tables. So, yes, you need to add another column to indicate sequential order like you suspected. I am assuming a column tbl_id for the purpose. Using the ctid would be a measure of last resort. See:
Deterministic sort order for window functions
The code you present looks like MS SQL Server code; invalid syntax for Postgres.
For millions of rows and thousands of unique values for Y it wouldn't even make sense to try and return individual columns. Postgres has generous limits, but not nearly generous enough for that. According to the source code or the manual, the absolute maximum number of columns is 1600.
So we don't even get to discuss the restrictive characteristics of SQL, which demands to know columns and data types at execution time, not dynamically adjusted during execution. You would need two separate calls, like we discussed in great detail under this related question.
Dynamic alternative to pivot with CASE and GROUP BY
Another answer by Clodoaldo under the same question returns arrays. That can actually be completely dynamic. And that's what I suggest here, too. The query is actually rather simple:
WITH cte AS (
SELECT *, row_number() OVER (PARTITION BY y ORDER BY tbl_id) AS rn
FROM tbl
ORDER BY y, tbl_id
)
SELECT text 'y' AS col, array_agg (y) AS values
FROM cte
WHERE rn = 1
UNION ALL
( -- parentheses required
SELECT text 'x' || rn, array_agg (x)
FROM cte
GROUP BY rn
ORDER BY rn
);
Result:
col | values
----+--------
y | {1,2,3}
x1 | {0,5,4}
x2 | {1,5,6}
x3 | {2,2,2}
db<>fiddle here
Old sqlfiddle
Explanation
The CTE computes a row_number rn for each row (each x) per group of y. We are going to use it twice, hence the CTE.
The 1st SELECT in the outer query generates the array of y values.
The 2nd SELECT in the outer query generates all arrays of x values in order. Arrays can have different length.
Why the parentheses for UNION ALL? See:
Sum results of a few queries and then find top 5 in SQL