How can I delete rows that have 2 characters or less in a particular column? - sql

I'm cleansing a database table so that I can build a data warehouse for my coursework, however first I need to make sure that the data is of a quality.
There are a lot of name entries have only a single letter. I want to delete those rows with a single script.

You can get the number of chars in the column name with the function length()
delete from tablename
where length(trim(name)) < 2
The function trim() could also be useful in this case.

Delete from #your_table
Where length(#your_name_column)=1
This will remove all the rows with a 1 character length name

This has not been tested but REGEXP_LIKE can be very usefull for things like this.
delete from your_table
where regexp_like (column, '[A-Z]|[a-z]')

Related

How do I remove the first two characters of every row in this column using SQL?

How do I remove the first two characters of every row in this column?
Ticket Number
J2F4T45T
J2J3J3J2
J25TGYHJ2
J2FFJ2J2
J2MG8NGJ2
If you are using MSSQL You can use STUFF like this to see how it works:
SELECT STUFF('0123456789',1,2,'')
And then with an update
UPDATE YOUR_TABLE SET YOUR_COLUMN = STUFF('YOUR_COLUMN',1,2,'')
if you were using mysql there is an similar function INSERT()
There are a few different ways to do this, you can play with SUBSTRING or even regular expressions depending on your database.

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

SQL - just view the description for explanation

I would like to ask if it is possible to do this:
For example the search string is '009' -> (consider the digits as string)
is it possible to have a query that will return any occurrences of this on the database not considering the order.
for this example it will return
'009'
'090'
'900'
given these exists on the database. thanks!!!!
Use the Like operator.
For Example :-
SELECT Marks FROM Report WHERE Marks LIKE '%009%' OR '%090%' OR '%900%'
Split the string into individual characters, select all rows containing the first character and put them in a temporary table, then select all rows from the temporary table that contain the second character and put these in a temporary table, then select all rows from that temporary table that contain the third character.
Of course, there are probably many ways to optimize this, but I see no reason why it would not be possible to make a query like that work.
It can not be achieved in a straight forward way as there is no sort() function for a particular value like there is lower(), upper() functions.
But there is some workarounds like -
Suppose you are running query for COL A, maintain another column SORTED_A where from application level you keep the sorted value of COL A
Then when you execute query - sort the searchToken and run select query with matching sorted searchToken with the SORTED_A column

Oracle - How to make auto-increment column with varchar type?

In my assignment with Oracle 11g, I am asked to make a table with column has this structure:
[NL|TE|][0-9]^10
Where NL or TE is inputed when INSERT row and [0-9]^10 is an auto-increment 10 digits number.
Example:
NL1234567890 or TE0253627576
When INSERT, the user should only write this:
INSERT INTO TableA VALUES ('NL');
And the DBMS take care of the rest. So how can I do so? Im still a newbie in this thing.
CREATE SEQUENCE your_seq;
/
CREATE OR REPLACE TRIGGER your_tablename_BI
BEFORE INSERT
ON your_tablename
REFERENCING NEW AS NEW OLD AS OLD
FOR EACH ROW
BEGIN
:NEW.your_col := :NEW.your_col || trim(to_char(your_seq.nextval, '0000000000'));
END your_tablename_BI;
/
Sample code?
'NL' || to_char(yoursequence.nextval)
I would keep them as separate columns. One is a VARCHAR2 that takes NL or whatever, the other is a NUMBER which is populated by the sequence.
You can then concatenate them at query time (put it in a view if you want) or use a virtual column.
Why? I can almost guarantee you that at some point you'll have a requirement to query the table on the character portion, or the numeric portion, or to sort on one or the other. Since you kept them separate, this is easy. If you had squashed them into a single column, you would have had to parse the values out at query time which leads to more complicated code than you need.

How to delete a common word from large number of datas in a Postgres table

I have a table in Postgres. In that table more than 1000 names are there. Most of the names are start with SHRI or SMT. I want to delete this SHRT and SMT from the names and to save original name only. How can I do that with out any database function?
I'll step you through the logic:
Select left(name,3) from table
This select statement will bring back the first 3 chars of a column (the 'left' three). If we are looking for SMT in the first three chars, we can move it to the where statement
select * from table where left(name,3) = 'SMT'
Now from here you have a few choices that can be used. I'm going to keep to the left/right style, though replace could likely be used. We want the chars to the right of the SMT, but we don't know how long each string is to pick out those chars. So we use length() to determine that.
select right(name,length(name)-3) from table where left(name,3) = 'SMT'
I hope my syntax is right there, I'm lacking a postgres environment to test it. The logic is 'all the chars on the right of the string except the last 3 (the minus 3 excludes the 3 chars on the left. change this to 4 if you want all but the last 4 on the left)
You can then change this to an update statement (set name = right(name,length(name)-3) ) to update the table, or you can just use the select statement when you need the name without the SMT, but leave the SMT in the actual data.