SQL Populate table with random data - sql

I have a table with two fields:
id(UUID) that is primary Key and
description (var255)
I want to insert random data with SQL sentence.
I would like that description would be something random.
PS: I am using PostgreSQL.

I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:
test=# SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;
id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)

The following worked for me:
create table t_random as select s, md5(random()::text) from generate_Series(1,5) s;

Here it is a more elegant way using the latest features. I will use the Unix dictionary (/usr/share/dict/words) and copy it into my PostgreSQL data:
cp /usr/share/dict/words data/pg95/words.list
Then, you can easily create a ton of no sense description BUT searchable using dictionary words with the following steps:
1) Create table and function. getNArrayS gets all the elements in an array and teh number of times it needs to concatenate.
CREATE TABLE randomTable(id serial PRIMARY KEY, description text);
CREATE OR REPLACE FUNCTION getNArrayS(el text[], count int) RETURNS text AS $$
SELECT string_agg(el[random()*(array_length(el,1)-1)+1], ' ') FROM generate_series(1,count) g(i)
$$
VOLATILE
LANGUAGE SQL;
Once you have all in place, run the insert using CTE:
WITH t(ray) AS(
SELECT (string_to_array(pg_read_file('words.list')::text,E'\n'))
)
INSERT INTO randomTable(description)
SELECT getNArrayS(T.ray, 3) FROM T, generate_series(1,10000);
And now, select as usual:
postgres=# select * from randomtable limit 3;
id | description
----+---------------------------------------------
1 | ultracentenarian splenodiagnosis manurially
2 | insequent monopolarity funipendulous
3 | ruminate geodic unconcludable
(3 rows)

I assume sentance == statement? You could use perl or plperl as perl has some good random data generators. Check out perl CPAN module Data::Random to start.
Here's a sample of a perl script to generate some different random stuff taken from CPAN.
use Data::Random qw(:all);
my #random_words = rand_words( size => 10 );
my #random_chars = rand_chars( set => 'all', min => 5, max => 8 );
my #random_set = rand_set( set => \#set, size => 5 );
my $random_enum = rand_enum( set => \#set );
my $random_date = rand_date();
my $random_time = rand_time();
my $random_datetime = rand_datetime();
open(FILE, ">rand_image.png") or die $!;
binmode(FILE);
print FILE rand_image( bgcolor => [0, 0, 0] );
close(FILE);

Related

Postgres json column with versions, return the oldest one

I have the following table:
create table subscriptions (id int, data jsonb);
filled with the following data:
insert into subscriptions (id, data)
values
(1, '{"versions": {"10.2.3": 3, "9.2.3": 4, "12.2.3": 5, "1.2.3": 5}}'),
(2, '{"versions": {"0.2.3": 3, "2.2.3": 4, "3.2.3": 5}}');
And I am trying to write a query that will result in:
-----------------
| id | minVersion |
-----------------
| 1 | 1.2.3 |
| 2 | 0.2.3 |
-----------------
What I have tried:
select
(string_to_array(jsonb_object_keys((data->'versions')::jsonb), '.'))[1] as total_prices
from subscriptions;
(does not work but I think those methods will be useful here)
I would create a function that extracts the version keys and sorts them to pick the smallest.
create or replace function smallest_version(p_version jsonb)
returns text
as
$$
select version
from jsonb_each_text(p_version) as x(version, num)
order by string_to_array(x.version, '.')::int[]
limit 1
$$
language sql
immutable;
The above will only work if there are no versions like 1.2.3-beta or other non-numeric characters in your version number.
Then use it like this:
select smallest_version(data->'versions')
from subscriptions;
Online example

SQL composite key value vs string

I have a list of integer from 1 to N elements (N < 24)
At the moment, there are two solutions to manage this value in a SQL database (I think it is the same for MySQL and Microsoft SQL Server)
Solution 1: use VARCHAR and , to separate integer values:
aaa | 40,50,50,10,600,200
aab | 40,50,600,200
aac | 40,50,50,10,600,200,500,1
Solution 2: create a new table with composite primary key (key, id) (id = index of element in list) and value:
aaa | 0 | 40
aaa | 1 | 50
aaa | 2 | 50
....
aab | 0 | 40
aab | 1 | 50
aab | 2 | 600
....
What is it better solution considering I have many items of data to load and I need to refresh this data many times
Thanks
Edit:
my operative case is: i need to refresh/read all data (list for key) with same call and i never call one by one, this is why i think first approach better.
And all math like avg or max i wanna do on client.
Usually the second approach is preferable. One advantage is ease of access:
-- Third value of aaa
select value from mytable where key = 'aaa' and pos = 3;
-- Avarage value of aaa
select avg(value) from mytable where key = 'aaa';
-- Avarage number of values
select avg(cnt) from (select count(*) as cnt from mytable group by key) counted;
Another is data consistency. You can add simple constraints to your columns, such as to allow only integers from, say, 1 to 700 and positions only up to 23.
There is an exception to the above, though. If you use the database only to store the list as is and you don't want to select separate values or even aggregate them, i.e. if this is just a string to the DBMS and your queries don't care about its content, then store it as a simple string. Why not?
The second solution that you propose is the classic way of doing this, I would recommend that.
The first solution is quite terrible in scaling and in other hundred things

Spatial Clustering - associate cluster attribute (id) to the geometry that is part of the cluster

I'm having certain issues into associate a clustered set of geometries with their own proprieties.
Data
I've a table with a set of geometries,
buildings {
gid integer,
geom geometry(Multipoligon,4326)
}
And I've run the function ST_ClusterWithin with a certain threshold over a the "buildings" table.
From that cluster analysis, I got a table that named "clusters",
clusters {
cid Integer,
geom geometry(GeometryCollection,4326)
}
Question
I would love to extract into a table all geometry with associated its own cluster information.
clustered_building {
gid Integer
cid Integer
geom geometry(Multipoligon,4326)
}
gid | cid | geom |
-----+------------+-----------------------+
1 | 1 | multypoligon(...) |
2 | 1 | multypoligon(...) |
3 | 1 | multypoligon(...) |
4 | 2 | multypoligon(...) |
5 | 3 | multypoligon(...) |
6 | 3 | multypoligon(...) |
What I Did (but does not work)
I've been trying using the two function ST_GeometryN / ST_NumGeometries parse each MultyGeometry and extract the information of the cluster with this query that is derived from one of the Standard Example of the ST_Geometry manual page.
INSERT INTO clustered_building (cid, c_item , geom)
SELECT sel.cid, n, ST_GeometryN(sel.geom, n) as singlegeom
FROM ( SELECT cid, geom, ST_NumGeometries(geom) as num
FROM clusters") AS sel
CROSS JOIN generate_series(1,sel.num) n
WHERE n <= ST_NumGeometries(sel.geom);
The query, it takes few seconds if I force to use a series of 10.
CROSS JOIN generate_series(1,10)
But it got stuck when I ask to generate a series according to the number of item in each GeometryCollection.
And also, this query does not allow me to link the single geometry to his own features into the building table because I'm losing the "gid"
could someone please help me,
thanks
Stefano
I don't have you data, but using some dummy values, where ids 1, 2 and 3 intersect and 4 and 5, you can do something like the following:
WITH
temp (id, geom) AS
(VALUES (1, ST_Buffer(ST_Makepoint(0, 0), 2)),
(2, ST_Buffer(ST_MakePoint(1, 1), 2)),
(3, ST_Buffer(ST_MakePoint(2, 2), 2)),
(4, ST_Buffer(ST_MakePoint(9, 9), 2)),
(5, ST_Buffer(ST_MakePoint(10, 10), 2))),
clusters(geom) as
(SELECT
ST_Makevalid(
ST_CollectionExtract(
unnest(ST_ClusterIntersecting(geom)), 3))
FROM temp
)
SELECT array_agg(temp.id), cl.geom
FROM clusters cl, temp
WHERE ST_Intersects(cl.geom, temp.geom)
GROUP BY cl.geom;
If you wrap the final cl.geom is ST_AsText, you will see something like:
{1,2,3} | MULTIPOLYGON(((2.81905966523328 0.180940334766718,2.66293922460509 -0.111140466039203,2.4142135623731 -0.414213562373094,2.11114046603921 -0.662939224605089,1.81905966523328 -0.819059665233282,1.84775906502257 -0.765366864730179,1.96157056080646 -0.390180644032256,2 0,2 3.08780778723872e-16,2 0,2.39018064403226 0.0384294391935396,2.76536686473018 0.152240934977427,2.81905966523328 0.180940334766718))......
{4,5} | MULTIPOLYGON(((10.8190596652333 8.18094033476672,10.6629392246051 7.8888595339608,10.4142135623731 7.58578643762691,10.1111404660392 7.33706077539491,9.76536686473018 7.15224093497743,9.39018064403226 7.03842943919354,9 7,8.60981935596775 7.03842943919354,8.23463313526982 7.15224093497743,7.8888595339608 7.33706077539491,7.58578643762691 7.5857864376269,7.33706077539491 7.88885953396079,7.15224093497743 8.23463313526982
where you can see the ids 1,2,3, below to the first multipolygon, and 4,5 the other.
The general idea is you cluster the data, and then you intersect the returned clusters with the original data, using array_agg to group the ids together, so that the returned Multipolygons now contain the original ids. The use of ST_CollectionExtract with 3 as the second paramter, in conjunction with unnest, which splits the geometry collection returned by ST_ClusterIntersecting back into rows, returns each contiguous cluster as a (Multi)Polygon. The ST_MakeValid is because sometimes when you intersect geometries with other related geometries, such as the original polygons with your clustered polygonss, you get strange rounding effects and GEOS error about non-noded intersections and the like.
I answered a similar question on gis.stackexchange recently that you might find useful.

Parse records by version number in PostgreSQL

I have a table of records that have a 'Version' property. The version is stored as text. I know how to match on specific versions (e.g. WHERE version = '1.2.3'), but I want to be able to use comparison operators (<, >, =, etc.) on the version number (e.g. WHERE version > '1.2.3').
Does anyone know of a good way to achieve this?
For versionsort, casting to cidr might be a solution. It works for the trivial cases. Non-numeric characters will probably spoil the party.
-- some data
DROP table meuk;
CREATE table meuk
( id SERIAL NOT NULL PRIMARY KEY
, version text
);
INSERT INTO meuk(version) VALUES
( '1.1.1'), ( '1.1.2'), ( '1.1.11'), ( '1.1.102')
, ( '0.1.1'), ( '2.1.2'), ( '22.1.11'), ( '0.1.102')
;
SELECT * FROM meuk
ORDER BY version::cidr -- cast to cidr (less restrictive than inet)
;
Rseult:
DROP TABLE
CREATE TABLE
INSERT 0 8
id | version
----+---------
5 | 0.1.1
8 | 0.1.102
1 | 1.1.1
2 | 1.1.2
3 | 1.1.11
4 | 1.1.102
6 | 2.1.2
7 | 22.1.11
(8 rows)

Custom sorting (order by) in PostgreSQL, independent of locale

Let's say I have a simple table with two columns: id (int) and name (varchar). In this table I store some names which are in Polish, e.g.:
1 | sępoleński
2 | świecki
3 | toruński
4 | Włocławek
Now, let's say I want to sort the results by name:
SELECT * FROM table ORDER BY name;
If I have C locale, I get:
4 | Włocławek
1 | sępoleński
3 | toruński
2 | świecki
which is wrong, because "ś" should be after "s" and before "t". If I use Polish locale (pl_PL.UTF-8), I get:
1 | sępoleński
2 | świecki
3 | toruński
4 | Włocławek
which is also not what I want, because I would like names starting with capital letters to be first just like in C locale, like this:
4 | Włocławek
1 | sępoleński
2 | świecki
3 | toruński
How can I do this?
If you want a custom sort, you must define some function that modifies your values in some way so that the natural ordering of the modified values fits your requirement.
For example, you can append some character or string it the value starts with uppercase:
CREATE OR REPLACE FUNCTION mysort(text) returns text IMMUTABLE as $$
SELECT CASE WHEN substring($1 from 1 for 1) =
upper( substring($1 from 1 for 1)) then 'AAAA' || $1 else $1 END
;
$$ LANGUAGE SQL;
And then
SELECT * FROM table ORDER BY mysort(name);
This is not foolprof (you might want to change 'AAA' for something more apt) and hurts performance, of course.
If you want it efficient, you'll need to create another column that "naturally" sorts correctly (e.g. even in the C locale), and use that as a sorting criterion. For that, you should use the approach of the strxfrm C library function. As a straight-forward strxfrm table for your approach, replace each letter with two ASCII letters: 's' would become 's0' and 'ś' would become 's1'. Then 'świecki' becomes 's1w0i0e0c0k0i0', and the regular ASCII sorting will sort it correctly.
If you don't want to create a separate column, you can try to use a function in the where clause:
SELECT * FROM table ORDER BY strxfrm(name);
Here, strxfrm needs to be replaced with a proper function. Either you write one yourself, or you use the standard translate function (although this doesn't support replacing a character with two of them, so you'll need some more involved transformation).