sql query to truncate columns which are above specified length - sql

I have the following table in postgres:
create table1 (col1 character varying, col2 character varying);
My table has the following data:
col1 col2
Questions Tags Users
Value1 Value2 Val
I want find the length of col1 and col2 and when the length of values of column 1 and column2 exceeds 6, I want to truncate it and discard the remaining values. i.e. I want my final table to look like the following:
col1 col2
Questi Tags U
Value1 Value2
Actually the reason why I want to do this is, when I create index on table1 then I am getting the following error:
ERROR: index row size 2744 exceeds maximum 2712 for index "allstrings_string_key"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
I know I can do this by importing the values to some programming language and then truncating the value. Is there some way by which I may achieve the same using an sql query in postgres.

Couldn't you just update them to contain only strings of length 6 at max?
I am no postrgres pro, so this is probably not the best method, but should do the job anyways:
UPDATE table1 SET col1 = SUBSTRING(col1, 1, 6) WHERE LEN(col1) > 6
UPDATE table1 SET col2 = SUBSTRING(col2, 1, 6) WHERE LEN(col2) > 6

I'd suggest that you actually follow the advice from Postgres, rather than changing your data. Clearly, that column with a 2k character long string shouldn't be indexed -- or not with a btree index anyway.
If the idea behind the index is searching, use full text search instead:
http://www.postgresql.org/docs/current/static/textsearch.html
If the idea behind the need is for sorting, use a functional index instead. For instance:
create index tbl_sort on (substring(col from 1 for 20));
Then, instead of ordering by col, order by substring(col from 1 for 20).

Have you tried changing the type of the column to CHAR instead of VARCHAR?
ALTER TABLE table1
ALTER COLUMN col1 SET DATA TYPE CHAR(6),
ALTER COLUMN col2 SET DATA TYPE CHAR(6)
If you need the column to be variable length, you can specify a limit (note that this is a PostgreSQL extension):
ALTER TABLE table1
ALTER COLUMN col1 SET DATA TYPE CHARACTER VARYING(6),
ALTER COLUMN col2 SET DATA TYPE CHARACTER VARYING(6)

Related

SQL UPDATE value based on row and column location without ID or key

In SQL (I'm using postgres, but am open to other variations), is it possible to update a value based on a row location and a column name when the table doesn't have unique rows or keys? ...without adding a column that contains unique values?
For example, consider the table:
col1
col2
col3
1
1
1
1
1
1
1
1
1
I would like to update the table based on the row number or numbers. For example, change the values of rows 1 and 3, col2 to 5 like so:
col1
col2
col3
1
5
1
1
1
1
1
5
1
I can start with the example table:
CREATE TABLE test_table (col1 int, col2 int, col3 int);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
INSERT INTO test_table (col1, col2, col3) values(1,1,1);
Now, I could add an additional column, say "id" and simply:
UPDATE test_table SET col2 = 5 WHERE id = 1
UPDATE test_table SET col2 = 5 WHERE id = 3
But can this be done just based on row number?
I can select based on row number using something like:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER() FROM test_table
) as sub
WHERE row_number BETWEEN 1 AND 2
But this doesn't seem to play well with the update function (at least in postgres). Likewise, I have tried using some subsets or common table expressions, but again, I'm running into difficulties with the UPDATE aspect. How can I perform something that accomplishes something like this pseudo code?: UPDATE <my table> SET <col name> = <new value> WHERE row_number = 1 or 3, or... This is trivial other languages like R or python (e.g., using pandas's .iloc function). It would be interesting to know how to do this in SQL.
Edit: in my table example, I should have specified the column types to something like int.
This is one of the many instances where you should embrace the lesser evil that is Surrogate Keys. Whichever table has a primary key of (col1,col2,col3) should have an additional key created by the system, such as an identity or GUID.
You don't specify the data type of (col1,col2,col3), but if for some reason you're allergic to surrogate keys you can embrace the slightly greater evil of a "combined key", where instead of a database-created value your unique key field is derived from some other fields. (In this instance, it'd be something like CONCAT(col1, '-', col2, '-', col3) ).
Should neither of the above be practical, you will be left with the greatest evil of having to manually specify all three columns each time you query a record. Which means that any other object or table which references this one will need to have not one but three distinct fields to identify which record you're talking about.
Ideally, btw, you would have some business key in the actual data which you can guarantee by design will be unique, never-changing, and never-blank. (Or at least changing so infrequently that the db can handle cascade updates reasonably well.)
You may wind up using a surrogate key for performance in such a case anyway, but that's an implementation detail rather than a data modeling requirement.

How to update certain columns in table to truncate records after certain length and regex replace string?

I have a list of columns in a table that I want to truncate to 255 records max and remove any percent signs from the columns, how would I do this?
old Table
col1 col2
adfaadfadfadfdfdafdjf;kldjf;adjsfjads;f 60%
new Table
col1 col2
adfaadfadfadf 60
col1 is not representative of the full string only used for illustration purposes.
I am using sql server 2012.
code so far:
SELECT
case
when len(col)=255
then left(col, 255)
else col end col
from table
Is this not as simple as...?
UPDATE YourTable
SET StringColumn = LEFT(StringColumn,255),
PercentColumn = REPLACE(PercentColumn,'%','')
GO
--You then probably want to fix that column's datatype.
ALTER TABLE YourTable ALTER COLUMN PercentColumn int; --Assuming integer values only.

apache hive column comment with CTAS

Sorry for all the setup. This is a hive datatype and comment question.
I have a single file in HDFS which combines 4 sets of table data. Breaking the data out ahead of time is not my preferred option. The first 4 rows specify the column headers:
*1 col1, col2, col3
*2 cola, colb, colc, cold, col5e
etc....
data rows begin with matching number at position 1 of the header.
1 data, data, data,
2 data, data, data, data, data,
etc...
The base hive table is just col0 - col60 for the raw file. I've tried creating a CTAS table to hold all of the "1" columns and one for the "2" columns where I can specify data type, and comments. Since the column names vary, I cannot give the columns names on the base table nor can I comment them with column based metadata.
This DDL didn't work but giving an example of what I'm hoping to do. Any thoughts ?
CREATE TABLE foo (
col1 as meaningful_name string comment 'meaningful comment')
as
SELECT col1
FROM base_hive table
WHERE col1 = 1;
CREATE TABLE foo
as
SELECT col1 string comment 'meaningful comment'
FROM base_hive table
WHERE col1 = 1;
thanks TD
I dont understand much what you are trying to achieve here, but looking at your DDL, I can see some errors. For the correct CREATE TABLE AS SELECT implementation, pl use the below DDL:
CREATE TABLE foo (
col1 STRING COMMENT 'meaningful comment')
AS
SELECT col1 AS meaningful_name
FROM base_hive table
WHERE col1 = 1;

archive one table date in another table with archive date in Oracle

i have one table test it has 10 column with 20 rows.
I need to move this data to archive_test table which has 11 column (10 same as test table plus one column is archive date).
when i tried to insert like below its shows error because number of column mismatch.
insert into archive_test
select * from test;
Please suggest the better way to do this.Thanks!
Well, obviously you need to supply values for all the columns, and although you can avoid doing so you should also explicitly state whic value is going to be inserted into which column. If you have an extra column in the target table you either:
Do not mention it
Specify a default value as part of its column definition in the table
Have a trigger to populate it
Specify a value for that column.
eg.
insert into table archive_test (col1, col2, col3 ... col11)
select col1,
col2,
col3,
...
sysdate
from test;
assuming that archive_date is the last column:
INSERT INTO archive_test
SELECT test.*, sysdate
FROM test

Can I set a formula for a particular column in SQL?

I want to implement something like Col3 = Col2 + Col1 in SQL.
This is somewhat similar to Excel, where every value in column 3 is sum of corresponding values from column 2 and column 1.
Have a look at Computed Columns
A computed column is computed from an
expression that can use other columns
in the same table. The expression can
be a noncomputed column name,
constant, function, and any
combination of these connected by one
or more operators.
Also from CREATE TABLE point J
Something like
CREATE TABLE dbo.mytable
( low int, high int, myavg AS (low + high)/2 ) ;
Yes, you can do it in SQL using the UPDATE command:
UPDATE TABLE table_name
SET col3=col1+col2
WHERE <SOME CONDITION>
This assumes that you already have a table with populated col1 and col2 and you want to populate col3.
Yes. Provided it is not aggregating data across rows.
assume that col1 and col2 are integers.
SELECT col1, col2, (col1 + col2) as col3 FROM mytable