Add column with substring of other column in SQL (Snowflake) - sql

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.

I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))

You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-

There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

Related

Sql loop through the values on a table

first off, noob alert! :))
I need to construct a query that runs on many tables. The tables vary on name just on the last digits as per client code. The thing is, the values that change aren't sequential so looping as in i=1,2,3,... does not work. A possible solution would be to have those values on a given field on an other table.
Here is the code for the first two clients 015 and 061. The leading zero(s) must are essential.
SELECT LnMov2017015.CConta, RsMov2017015.DR, RsMov2017015.NInt, "015" AS CodCli
FROM LnMov2017015 INNER JOIN RsMov2017015 ON LnMov2017015.NReg = RsMov2017015.NReg
WHERE (((LnMov2017015.CConta)="6" And (LnMov2017015.CConta)="7") AND ((RsMov2017015.DR)=9999))
UNION SELECT LnMov2017061.CConta, RsMov2017061.DR, RsMov2017061.NInt, "061" AS CodCli
FROM LnMov2017061 INNER JOIN RsMov2017061 ON LnMov2017061.NReg = RsMov2017061.NReg
WHERE (((LnMov2017061.CConta)="6" And (LnMov2017061.CConta)="7") AND ((RsMov2017061.DR)=9999))
...
So for the first SELECT the table Name is LnMov2017015, the ending 015 being the value, the client code, that changes from table to table e.g. in the second SELECT the table name is LnMov2017061 (061) being what distinguishes the table.
For each client code there are two tables e.g. LnMov2017015 and RsMov2017015 (LnMov2017061 and RsMov2017061 for the second set client shown).
Is there a way that I can build the SQL, based upon the example SQL above?
Does anyone have an idea for a solution? :)
Apparently it is possible to build a query object to read data in another db without establishing table link. Just tested and to my surprise it works. Example:
SELECT * FROM [SoilsAgg] IN "C:\Users\Owner\June\DOT\Lab\Editing\ConstructionData.accdb";
I was already using this structure in VBA to execute DELETE and UPDATE action statements.
Solution found :)
Thank you all for your input.
Instead of linking 100 tables (password protected), I'll access them with SLQ
FROM Table2 IN '' ';database=C:\db\db2.mdb;PWD=mypwd'
And merge them all with a query, before any other thing!

SQL-92 (Filemaker): How can I UPDATE a list of sequential numbers?

I need to re-assign all SortID's, starting from 1 until MAX (SortID) from a subset of records of table Beleg, using SQL-92, after one of the SortID's has changed (for example from 444 to 444.1). I have tried several ways (for example SET #a:=0; UPDATE table SET field=#a:=#a+1 WHERE whatever='whatever' ORDER BY field2), but it didn't work, as these solutions all need a special kind of SQL, like SQLServer or Oracle, etc.
The SQL that I use is SQL-92, implemented in FileMaker (INSERT and UPDATE are available, though, but nothing fancy).
Thanks for any hint!
Gary
From what I know, SQL-92 is a standard and not a language. So you can say you are using T-SQL, which is mostly SQL-92 compliant, but you can't say I program SQL Server in SQL-92. The same applies to FileMaker.
I suppose you are trying to update your table through ODBC? The Update statement looks OK, but there are no variables if FileMaker SQL (and I am not sure using a variable inside query will give you result you expect, I think you will set SortId in every row to 1). You are thinking about doing something like Window functions with row() in TSQL, but I do not think this functionality is available.
The easiest solution is to use FileMaker, resetting the numbering for a column is really a trivial task which takes seconds. Do you need help with this?
Edit:
I was referring to TSQL functions rank() and row_number(), there is no row() function in TSQL
I finally got the answer from Ziggy Crueltyfree Zeitgeister on the Database Administrators copy of my question.
He suggested to break this down into multiple steps using a temporary table to store the results:
CREATE TABLE sorting (sid numeric(10,10), rn int);
INSERT INTO sorting (sid, rn)
SELECT SortID, RecordNumber FROM Beleg
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210
ORDER BY SortID;
UPDATE Beleg SET SortID = (SELECT rn FROM sorting WHERE sid=Beleg.SortID)
WHERE Year ( Valuta ) = 2016
AND Ursprungskonto = 1210;
DROP TABLE sorting;
Of course! I just keep the table definition in Filemaker (let the type coercion be done by Filemaker this way), and filling and deleting from it with my function: RenumberSortID ().

One select for multiple records by composite key

Such a query as in the title would look like this I guess:
select * from table t where (t.k1='apple' and t.k2='pie') or (t.k1='strawberry' and t.k2='shortcake')
... --10000 more key pairs here
This looks quite verbose to me. Any better alternatives? (Currently using SQLite, might use MYSQL/Oracle.)
You can use for example this on Oracle, i assume that if you use regular concatenate() instead of Oracle's || on other DB, it would work too (as it is simply just a string comparison with the IN list). Note that such query might have suboptimal execution plan.
SELECT *
FROM
TABLE t
WHERE
t.k1||','||t.k2 IN ('apple,pie',
'strawberry,shortcake' );
But if you have your value list stored in other table, Oracle supports also the format below.
SELECT *
FROM
TABLE t
WHERE (t.k1,t.k2) IN ( SELECT x.k1, x.k2 FROM x );
Don't be afraid of verbose syntax. Concatenation tricks can easily mess up the selectivity estimates or even prevent the database from using indexes.
Here is another syntax that may or may not work in your database.
select *
from table t
where (k1, k2) in(
('apple', 'pie')
,('strawberry', 'shortcake')
,('banana', 'split')
,('raspberry', 'vodka')
,('melon', 'shot')
);
A final comment is that if you find yourself wanting to submit 1000 values as filters you should most likely look for a different approach all together :)
select * from table t
where (t.k1+':'+t.k2)
in ('strawberry:shortcake','apple:pie','banana:split','etc:etc')
This will work in most of the cases as it concatenate and finds in as one column
off-course you need to choose a proper separator which will never come in the value of k1 and k2.
for e.g. if k1 and k2 are of type int you can take any character as separator
SELECT * FROM tableName t
WHERE t.k1=( CASE WHEN t.k2=VALUE THEN someValue
WHEN t.k2=otherVALUE THEN someotherValue END)
- SQL FIDDLE

How can I delete a set of records from sql table

I have a table with the name column containing values like 'arthur_01', 'arthur_02', 'arthur_03'...'arthur_10' and many other values.
Is there a way to delete all the 'arthur_0x' in one query?
I know that the BETWEEN operator takes a range, but is there a way for it to take in a range where half the string is numeric?
You need the LIKE operator: http://www.w3schools.com/sql/sql_like.asp
DELETE FROM table WHERE column LIKE 'arthur_0%';
This solution makes somse assumptions about what data is beyond 'arthur_0.....'. Otherwise you may need RDBS specific functions.
There are different approaches to this and unfortunately most of them are vendor specific.
One approach is to extract the character of interest and compare to target value
DELETE FROM table1
WHERE SUBSTRING(column_name, 9, 1) = x;
or for a range of values
DELETE FROM table1
WHERE SUBSTRING(column_name, 9, 1) BETWEEN x AND y
Here is SQLFiddle demo (MySQL)
Here is SQLFiddle demo (SQL Server)
Note: the drawback of this approach is that the optimizer won't be able to use any indices you might have on this column.
Another approach is to leverage pattern matching, but it's not implemented the same way in different RDBMSes.
In SQL Server for example you can do
DELETE FROM table1
WHERE column_name LIKE 'arthur_0[345]%';
Which will effectively delete any rows with values in column_name that start with
arthur_03 or arthur_04 or arthur_05.
Here is SQLFiddle demo
Same thing in MySQL might look like
DELETE FROM table1
WHERE column_name RLIKE '^arthur_0[345]';
Here is SQLFiddle demo

informix check if table exists and then read the value

I have a table in informix (Version 11.50.UC4) called NextRecordID with just one column called id and it will have one row. What I want to do is copy this value into another table. But don't want my query to fail if this table does not exist. Something like
if table NextRecordID exists
then insert into sometable values ('NextRecordID', (select id from NextRecordID))
else insert into sometable values ('NextRecordID', 1)
I ended up using the below SQL query. Its not ANSI SQL but works the informix server I am using.
insert into sometable values ('NextRecordID',
select case (select 1 from systables where tabname='nextrecordid')
when 1 then (select nextid from nextrecordid)
else (select 1 from systables where tabname='systables') end
from systables where tabname='systables');
What is happening here is within insert query I get the value to be inserted by using select query. Now that select query is interesting. It uses case statement of Informix. I have written a select query to check if the table nextrecordid exists in systables and return 1 if it exists. If this query returns 1, I query the table nextrecordid for the value or else I wrote a query to return the default value 1. This work for me.
You should be able to do this by checking the systables table.
Thank you for including server version information - it makes answering your question easier.
You've not indicated which language(s) you are using.
Normally, though, you design a program to expect a certain schema (certain tables to be present), and then fail - preferably under control - if those tables are not present. Also, it is not clear whether you would get into problems because of repeated execution of the second INSERT statement. Nor is it clear when the NextRecordID table is updated - presumably, once the value has been used, it must be updated.
You should look at SERIAL (BIGSERIAL) and see whether that is appropriate for you.
You should also look at whether a SEQUENCE would be appropriate to use here - it certainly looks rather like it might be applicable.
As Adam Hughes points out, if you want to check whether the NextRecordID table is present in the database, you would look in the systables table. Be aware, though, that your search will need to be against an all lower-case name (nextrecordid).
Also, MODE ANSI databases complicate life - you have to worry about the table's owner (because there could be multiple tables called nextrecordid in a MODE ANSI database). Most likely, you don't have to worry about that - any more than you are likely to have to worry about delimited identifiers for table "someone"."NextRecordID" (which is a different table from someone.NextRecordID).