Changing a column type from integer to string - sql

Using PostgreSQL, what's the command to migrate an integer column type to a string column type?
Obviously I'd like to preserve the data, by converting the old integer data to strings.

You can convert from INTEGER to CHARACTER VARYING out-of-the-box, all you need is ALTER TABLE query chaning column type:
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE tbl (col INT);
INSERT INTO tbl VALUES (1), (10), (100);
ALTER TABLE tbl ALTER COLUMN col TYPE CHARACTER VARYING(10);
Query 1:
SELECT col, pg_typeof(col) FROM tbl
Results:
| col | pg_typeof |
|-----|-------------------|
| 1 | character varying |
| 10 | character varying |
| 100 | character varying |

I suggest a four step process:
Create a new string column. name it temp for now. See http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Set the string column. something like update myTable set temp=cast(intColumn as text) see http://www.postgresql.org/docs/9.3/static/functions-formatting.html for more interesting number->string conversions
Make sure everything in temp looks the way you want it.
Remove your old integer column. Once again, see http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Rename temp to the old column name. Again: http://www.postgresql.org/docs/9.3/static/ddl-alter.html
This assumes you can perform the operation while no clients are connected; offline. If you need to make this (drastic) change in an online table, take a look at setting up a new table with triggers for live updates, then swap to the new table in an atomic operation. see ALTER TABLE without locking the table?

Related

Adding column to sqlite database and distribute rows based on primary key

I have some data elements containing a timestamp and information about Item X sales related to this timestamp.
e.g.
timestamp | items X sold
------------------------
1 | 10
4 | 40
7 | 20
I store this data in an SQLite table. Now I want to add to this table. Especially if I get data about another item Y.
The item Y data might or might not have different timestamps but I want to insert this data into the existing table so that it looks like this:
timestamp | items X sold | items Y sold
------------------------------------------
1 | 10 | 5
2 | NULL | 10
4 | 40 | NULL
5 | NULL | 3
7 | 20 | NULL
Later on additional sales data (columns) must be added with the same scheme.
Is there an easy way to accomplish this with SQLite?
In the end I want to fetch data by timestamp and get an overview which items were sold at this time. Most examples consider the usecase to add a complete row (one record) or a complete column if it perfectly matches to the other columns.
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
(Using pythons sqlite3 package to create and manipulate the DB)
Thanks!
Dynamically adding columns is not a good design. You could add them using
ALTER TABLE your_table ADD COLUMN the_column_name TEXT
the column, for existing rows would be populated with nulls, although you could specify a DEFAULT value and the existing rows would then be populated with that value.
e.g. the following demonstrates the above :-
DROP TABLE IF EXISTS soldv1;
CREATE TABLE IF NOT EXISTS soldv1 (timestamp INTEGER PRIAMRY KEY, items_sold_x INTEGER);
INSERT INTO soldv1 VALUES(1,10),(4,40),(7,20);
SELECT * FROM soldv1 ORDER BY timestamp;
ALTER TABLE soldv1 ADD COLUMN items_sold_y INTEGER;
UPDATE soldv1 SET items_sold_y = 5 WHERE timestamp = 1;
INSERT INTO soldv1 VALUES(2,null,10),(5,null,3);
SELECT * FROM soldv1 ORDER BY timestamp;
resulting in the first query returning :-
and the second query returning :-
However, as stated, the above is not considered a good design as the schema is dynamic.
You could alternately manage an equivalent of the above with the addition of either a new column (to also be part of the primary key) or by prefixing/suffixing the timestamp with a type.
Consider, as an example, the following :-
DROP TABLE IF EXISTS soldv2;
CREATE TABLE IF NOT EXISTS soldv2 (type TEXT, timestamp INTEGER, items_sold INTEGER, PRIMARY KEY(timestamp,type));
INSERT INTO soldv2 VALUES('x',1,10),('x',4,40),('x',7,20);
INSERT INTO soldv2 VALUES('y',1,5),('y',2,10),('y',5,3);
INSERT INTO soldv2 VALUES('z',1,15),('z',2,5),('z',9,25);
SELECT * FROM soldv2 ORDER BY timestamp;
This has replicated, data-wise, your original data and additionally added another type (column items_sold_z) without having to change the table's schema (nor having the additional complication of needing to update rather than insert as per when applying timestamp 1 items_sold_y 5).
The result from the query being :-
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
SQLite is a valid tool. What you then do with the data can probably be done as easy as in excel (perhaps simpler) and probably much simpler than trying to process the data in csv format.
For example, say you wanted the total items sold per timestamp and how many types were sold then :-
SELECT timestamp, count(items_sold) AS number_of_item_types_sold, sum(items_sold) AS total_sold FROM soldv2 GROUP by timestamp ORDER BY timestamp;
would result in :-

Distance calculation in a trigger postgresql

I've got two tables "modulo1_cella" and "modulo2_campionamento".
The first, "modulo1_cella" contains polygons, while the latter, "modulo2_campionamento", contains points (samples). Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
Table "public.modulo1_cella"
Column | Type | Modifiers
-------------------+-------------------+------------------------------------------------------------------
cella_id | integer | not null default nextval('modulo1_cella_cella_id_seq'::regclass)
nome_cella | character varying |
geometria | geometry |
campione_id | integer |
dist_camp | double precision |
Table "public.modulo2_campionamento"
Column | Type | Modifiers
--------------------------+-----------------------------+----------------------------------------------------------------------------------
campione_id | integer | not null default nextval('modulo2_campionamento_aria_campione_id_seq'::regclass)
x_campionamento | double precision |
y_campionamento | double precision |
codice_campione | character varying(10) |
cella_id | integer |
geometria | geometry(Point,4326) |
I'm looking for an INSERT/UPDATE trigger that for each row of "modulo1_cella" table, i.e. for each polygon, returns:
the nearest sample, "campione_id";
the corrisponding distance, "dist_camp".
I created a query that works, but I'm not able to convert it to a trigger.
CREATE TEMP TABLE TemporaryTable
(
cella_id int,
campione_id int,
distanza double precision
);
INSERT INTO TemporaryTable(cella_id, campione_id, distanza)
SELECT
DISTINCT ON (m1c.cella_id) m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography) as dist
FROM modulo1_cella As m1c, modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria::geography, m1c.geometria::geography, 50000)
ORDER BY m1c.cella_id, m2cmp.campione_id, ST_Distance(m2cmp.geometria::geography, m1c.geometria::geography);
UPDATE modulo1_cella as mc
SET campione_id=tt.campione_id, dist_camp=tt.distanza
from TemporaryTable as tt
where tt.cella_id=mc.cella_id;
DROP TABLE TemporaryTable;
Any help? Thank you in advance.
First, if "geometria" is not geography and is instead geometry, you should make it a geography type on the table.
ALTER TABLE modulo2_campionamento
ALTER COLUMN geometria
SET DATE TYPE geography(POINT 4326)
USING (geometria::geography);
ALTER TABLE modulo1_cella
ALTER COLUMN geometria
SET DATA TYPE geography(4326)
USING (geometria::geography);
Now, I need to assign to each polygon the nearest sample, and the identificative of the sampler itself.
You would not normally do this because it's very fast to find the nearest sample using a KNN search anyway.
CREATE INDEX ON modulo1_cella USING gist (geometria);
CREATE INDEX ON modulo2_campionamento USING gist (geometria);
VACUUM FULL ANALYZE modulo1_cella;
VACUUM FULL ANALYZE modulo2_campionamento;
SELECT *
FROM modulo1_cella As m1c
CROSS JOIN LATERAL (
SELECT *
FROM modulo2_campionamento As m2cmp
WHERE ST_DWithin(m2cmp.geometria, m1c.geometria, 50000)
ORDER BY m2cmp.geometria <-> m1c.geometria,
m1c.cella_id,
m2cmp.campione_id
FETCH FIRST ROW ONLY
) AS closest_match
That's much faster than the DISTINCT ON query you wrote.
If that is fast enough, I suggest using a VIEW. If that's not fast enough, I suggest using a MATERIALIZE VIEW. If it's still not fast enough you have a very niche load and it may be worth investigating a solution with triggers. But only then.

What is the best way to change the type of a column in a SQL Server database, if there is data in said column?

If I have the following table:
| name | value |
------------------
| A | 1 |
| B | NULL |
Where at the moment name is of type varchar(10) and value is of type bit.
I want to change this table so that value is actually a nvarchar(3) however, and I don't want to lose any of the information during the change. So in the end I want to end up with a table that looks like this:
| name | value |
------------------
| A | Yes |
| B | No |
What is the best way to convert this column from one type to another, and also convert all of the data in it according to a pre-determined translation?
NOTE: I am aware that if I was converting, say, a varchar(50) to varchar(200), or an int to a bigint, then I can just alter the table. But I require a similar procedure for a bit to a nvarchar, which will not work in this manner.
The best option is to ALTER bit to varchar and then run an update to change 1 to 'Yes' and 0 or NULL to 'No'
This way you don't have to create a new column and then rename it later.
Alex K's comment to my question was the best.
Simplest and safest; Add a new column, update with transform, drop existing column, rename new column
Transforming each item with a simple:
UPDATE Table
SET temp_col = CASE
WHEN value=1
THEN 'yes'
ELSE 'no'
END
You should be able to change the data type from a bit to an nvarchar(3) without issue. The values will just turn from a bit 1 to a string "1". After that you can run some SQL to update the "1" to "Yes" and "0" to "No".
I don't have SQL Server 2008 locally, but did try on 2012. Create a small table and test before trying and create a backup of your data to be safe.

Create column from other columns in Database

I have a table name: test
ID | Prefix | ACCID
ID's type is INTEGER which is selected from ID_SEQ
Prefix's type is VARCHAR(6)
ACCID is the combination of Prefix + ID
I want to auto-create ACCID when I insert the ID and Prefix value such as
INSERT INTO TEST (PREFIX) VALUES ('A01407V');
and the database store the ACCID as 'A01407V000001'
I create the sequence as
CREATE SEQUENCE ID_SEQ AS INT MAXVALUE 999999 CYCLE;
How to implement SQL statement to produce this result?
Thank you for all solutions and suggestions.
Ps. I use Apache Derby as my SQL Server
As documented in the manual, Derby supports generated columns (since Version 10.5)
The real problem is the formatting of a number with leading zeros as Derby has no function for that.
If you really, really think you need to store a value that can always be determined by the values already stored in the table, you can use something like this:
create table test
(
id integer,
prefix varchar(6),
accid generated always as (prefix||substr('000000', 1, 6 - length(rtrim(char(id))))||rtrim(char(id)))
);
The expression substr('000000', 1, 6 - length(rtrim(char(id))))||rtrim(char(id)) is just a complicated way to format a the ID with leading zeros.
I would highly recommend to not store this value though. It is much cleaner to create a view that shows this value if you do need access to this in SQL.
You can use COMPUTED Column.
Is a computed column that is based on some other column in the table. We can physically save the data of the column/ or not. Table will automatically update the value of this column.
syntax:
columnname AS expression [PERSISTED]
--PERSISTED will make it physically saved, otherwise it will be calculated every time.
We can create indexes on computed columns.
You add, The following in the table CREATE Script
ACCID AS Prefix + CAST(ID AS CHAR(6)) [PERSISTED]

Making PostgreSQL a little more error tolerant?

This is sort of a general question that has come up in several contexts, the example below is representative but not exhaustive. I am interested in any ways of learning to work with Postgres on imperfect (but close enough) data sources.
The specific case -- I am using Postgres with PostGIS for working with government data published in shapefiles and xml. Using the shp2pgsql module distributed with PostGIS (for example on this dataset) I often get schema like this:
Column | Type |
------------+-----------------------+-
gid | integer |
st_fips | character varying(7) |
sfips | character varying(5) |
county_fip | character varying(12) |
cfips | character varying(6) |
pl_fips | character varying(7) |
id | character varying(7) |
elevation | character varying(11) |
pop_1990 | integer |
population | character varying(12) |
name | character varying(32) |
st | character varying(12) |
state | character varying(16) |
warngenlev | character varying(13) |
warngentyp | character varying(13) |
watch_warn | character varying(14) |
zwatch_war | bigint |
prog_disc | bigint |
zprog_disc | bigint |
comboflag | bigint |
land_water | character varying(13) |
recnum | integer |
lon | numeric |
lat | numeric |
the_geom | geometry |
I know that at least 10 of those varchars -- the fips, elevation, population, etc., should be ints; but when trying to cast them as such I get errors. In general I think I could solve most of my problems by allowing Postgres to accept an empty string as a default value for a column -- say 0 or -1 for an int type -- when altering a column and changing the type. Is this possible?
If I create the table before importing with the type declarations generated from the original data source, I get better types than with shp2pgsql, and can iterate over the source entries feeding them to the database, discarding any failed inserts. The fundamental problem is that if I have 1% bad fields, evenly distributed over 25 columns, I will lose 25% of my data since a given insert will fail if any field is bad. I would love to be able to make a best-effort insert and fix any problems later, rather than lose that many rows.
Any input from people having dealt with similar problems is welcome -- I am not a MySQL guy trying to batter PostgreSQL into making all the same mistakes I am used to -- just dealing with data I don't have full control over.
Could you produce a SQL file from shp2pgsql and do some massaging of the data before executing it? If the data is in COPY format, it should be easy to parse and change "" to "\N" (insert as null) for columns.
Another possibility would be to use shp2pgsql to load the data into a staging table where all the fields are defined as just 'text' type, and then use an INSERT...SELECT statement to copy the data to your final location, with the possibility of massaging the data in the SELECT to convert blank strings to null etc.
I don't think there's a way to override the behaviour of how strings are converted to ints and so on: possibly you could create your own type or domain, and define an implicit cast that was more lenient... but this sounds pretty nasty, since the types are really just artifacts of how your data arrives in the system and not something you want to keep around after that.
You asked about fixing it up when changing the column type: you can do that too, for example:
steve#steve#[local] =# create table test_table(id serial primary key, testvalue text not null);
NOTICE: CREATE TABLE will create implicit sequence "test_table_id_seq" for serial column "test_table.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
CREATE TABLE
steve#steve#[local] =# insert into test_table(testvalue) values('1'),('0'),('');
INSERT 0 3
steve#steve#[local] =# alter table test_table alter column testvalue type int using case testvalue when '' then 0 else testvalue::int end;
ALTER TABLE
steve#steve#[local] =# select * from test_table;
id | testvalue
----+-----------
1 | 1
2 | 0
3 | 0
(3 rows)
Which is almost equivalent to the "staging table" idea I suggested above, except that now the staging table is your final table. Altering a column type like this requires rewriting the entire table anyway: so actually, using a staging table and reformatting multiple columns at once is likely to be more efficient.