Postgres Data type conversion - sql

I have this dataset that's in a SQL format. However the DATE type needs to be converted into a different format because I get the following error
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
ERROR: date/time field value out of range: "28-10-96"
LINE 58: ...040','2','10','','P13-00206','','','','','1-3-95','28-10-96'...
^
HINT: Perhaps you need a different "datestyle" setting.
I've definitely read the documentation on date format
http://www.postgresql.org/docs/current/static/datatype-datetime.html
But my question is how do I convert all of the dates in a proper format without going through all the 500 or so data rows and making sure each one is correct before inserting into a DB. Backend is handle by rails, but I figured going through SQL to cleaning it up will be best here.
I have a CREATE TABLE statement above this dataset, and mind you the data set was given to be via a DBF converter/external source
Here's part of my dataset
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
(1,'ACACIA WINERY','PROD','8000','34436','','0','50000','250000','APPT','75','525','27375','3612','63','30987','22','97','x','001_02169-MOD_AcaciaWinery','','','','','1-11-79','1-9-82','34436','x','125000','Los Carneros','1');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('2','AETNA SPRING CELLARS','PROD','2500','2500','','0','2000','20000','TST APPT','0','3','156','0','0','156','1','10','x','','','','','x','1-4-86','1-6-86','2500','','0','Napa Valley','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('3','ALTA VINEYARD CELLAR','PROD','480','480','','0','5000','5000','NO','0','4','208','0','0','208','4','6','x','003_U-387879','','','','','2-5-79','1-9-80','480','','0','Diamond Mountain District','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('4','BLACK STALLION','PROD','43600','43600','','0','100000','100000','PUB','50','350','18200','0','0','18200','2','45','x','P13-00391','','','','','1-5-80','1-9-85','43600','','0','Oak Knoll District of Napa Valley','3');
INSERT INTO winery_attributes
(ID,NAME,STATUS,BLDSZ_ORIG,BLDSZ_CURR,HAS_CAVE,CAVESIZE,PROD_ORIG,PROD_CURR,TOUR_TASTG,VISIT_DAY,VISIT_WEEK,VISIT_YR,VISIT_MKTG,VISIT_NMEV,VISIT_ALL,EMPLYEENUM,PARKINGNUM,WDO,LAST_UP,IN_CITYBDY,IN_AIASP,NOTES,SMLWNRYEXM,APPRV_DATE,ESTAB_DATE,TOTAL_SIZE,SUBJ_TO_75,GPY_AT_75,AVA,SUP_DIST)
VALUES
('5','ALTAMURA WINERY','PROD','11800','11800','x','3115','50000','50000','APPT','0','20','1040','0','0','1040','2','10','','P13-00206','','','','','1-3-95','28-10-96','14915','x','50000','Napa Valley','4');

The dates in your data set are in the form of a string. Since they are not in the default datestyle (which is YYYY-MM-DD) you should explicitly convert them to a date as follows:
to_date('1-5-80', 'DD-MM-YY')
If you store the data in a timestamp instead, use
to_timestamp('1-5-80', 'DD-MM-YY')
If you are given the data set in the form of the INSERT statements that you show, then first load all the data as simple strings into varchar columns, then add date columns and do an UPDATE (and similarly for integer and boolean columns):
UPDATE my_table
SET estab = to_date(ESTAB_DATE, 'DD-MM-YY'), -- column estab of type date
apprv = to_date(APPRV_DATE, 'DD-MM-YY'), -- etc
...
When the update is done you can ALTER TABLE to drop the text columns with dates (integers, booleans).

Related

How to validate the date in a varchar column in SQL Server

I have a staging table which contains all varchar columns. I want to validate a date stored in the data column. Since my staging table contains all varchar columns, then all csv records are inserted into table.
After inserted into the staging table, I need a validation for specific date column to validate date are properly present or not. If any string value comes then I need to eliminate from staging table
Building on #Larnu's comment, you can use TRY_CONVERT to select only the records that contain proper dates, then use those records to do some further action. Consider the following example:
-- Using a table variable as an example of the source data
DECLARE #SampleTable TABLE
(
Id int,
SomePossibleDateField varchar(20)
)
-- Now insert some sample data into the table variable, just for illustration
INSERT INTO #SampleTable
VALUES (1, '2021-05-04'),
(2, '2021-05-05'),
(3, 'not a date'),
(4, NULL),
(5, ''),
(6, '2021-05-06')
-- Now select all the records that contain proper dates:
SELECT * FROM #SampleTable WHERE TRY_CONVERT(DATE, [SomePossibleDateField], 120) > '1900-01-01'
The results of the final select statement above are
Id SomePossibleDateField
1 2021-05-04
2 2021-05-05
6 2021-05-06
Some things to note:
First, in this sample, for simplicity, all the dates are expressed as format 120 (ODBC Canonical). So you may need to try different formats depending on your data. See the date formats listed on the CAST page for the different format values.
Second, that select statement tests for dates greater than the year 1900, but you can change that to any other date that makes sense for your data.
Finally, in case you are looking specifically for records that only contain bad data, you can do that by changing the select statement to something like:
SELECT * FROM #SampleTable
WHERE TRY_CONVERT(DATE, [SomePossibleDateField], 120) = ''
OR TRY_CONVERT(DATE, [SomePossibleDateField], 120) IS NULL
Which results with:
Id SomePossibleDateField
3 not a date
4 NULL
5
Unfortunately, an empty string does not result in NULL like bad data does, it simply gets passed through as empty string. So, if you are specifically looking for bad records, you will need to check both for IS NULL and for '' as shown in the example above.

Copying data from one column to another in the same table sets data to null in original column

I created a new column [LastLoginDate-NoTime] with the data type Date. I already have another column [LastLoginDate] that is of Datetime datatype.
Columns with the values
I am trying to copy values from the LastLoginDate column to the LastLoginDate-NoTime column using this query:
UPDATE [dbo].[SapUsersExt]
SET [LastLoginDate] = [LastLoginDate-NoTime]
But the problem I am having is that when I execute this query, it sets the data to null in the original column.
Screenshot: Error
I am also trying to convert the data from the LastLoginDate to just date format in the new column LastLoginDate-NoTime so that I can use it in my application. How would I do that?
I am trying to copy values from the LastLoginDate column to the LastLoginDate-NoTime column using this query
In that case, you're doing it exactly backwards - you should use this SQL instead:
UPDATE [dbo].[SapUsersExt]
SET [LastLoginDate-NoTime] = [LastLoginDate]
The first column - right after the SET - is the target column into which your values will be written.
The second column, after the = symbol, is where the data comes from (column or expression).
You had it backwards - setting the column with the actual values, to all NULL ....
This of course only works for a "one time" update - this will not keep your columns in sync over time, when new data is being inserted. For such a case, you'd need a computed column
ALTER TABLE dbo.SapUsersExt
ADD LastLoginDateOnly AS CAST(LastLoginDate AS DATE) PERSISTED;
or a trigger.
Or maybe, you don't even really need to actually store that date-only value - just use
SELECT
CAST(LastLoginDate AS DATE),
.......
if you need to date-only value from LastLoginDate

Snowflake - insert date

I have a value of 12/31/18 and created table in snowflake:
create table my_date (a date);
insert into my_date values ('12/31/18');
select * from my_date;
Result: 0018-12-31
I want to get: 2018-12-31
I saw about 2 number format:
https://docs.snowflake.net/manuals/sql-reference/parameters.html#label-two-digit-century-start
but not sure if this is specification of a column type or data needs to be transformed before the insert?
The parameter two_digit_century_start seems not to be used when parameter date_input_format is set to AUTO. You can get your example working correctly by setting the date format with a parameter ("alter session..." statement on line 2 below). Your complete working example would look like this:
create table my_date (a date);
alter session set DATE_INPUT_FORMAT = 'MM/DD/YY';
insert into my_date values ('12/31/18');
select * from my_date;
This results in 2018-12-31.
Snowflake best-practices recommend to specify the format explicitly with to_date(value, 'format') or by setting the format in parameters. You can find the best practices for date/time functions from Snowflake documentation here: https://docs.snowflake.net/manuals/user-guide/date-time-input-output.html#date-time-function-format-best-practices

SQL Server Add Default datetime column for existing table

I want to have a new column in the table that will show the date and time of the inserts, but without modifying the queries to include the column itself.
I have added the new column in the following way:
ALTER TABLE DBO.HOURLYMODULETIMES
ADD CreateTime datetime DEFAULT NOT NULL getdate()
This adds the values to previous entries, but when I try to INSERT INTO the table without including the new column
INSERT INTO DBO.HOURLYMODULETIMES VAlUES
(99999999,11111,2222,'JA')
Table has 5 columns ID, AVGMODULETIME, SUMHOURS, USERNAME, CreateTime(newly added). I get the following error:
Column name or number of supplied values does not match table definition.
Is it possible to create such a column without modifying the queries?
You have to specify the columns now when you want to omit one of them when doing INSERT:
INSERT INTO DBO.HOURLYMODULETIMES (ID, AVGMODULETIME, SUMHOURS, USERNAME)
VALUES (99999999,11111,2222,'TEST')
It's good programming practice to always do this, since table definitions may change over time - as you have noticed!

Incorrect value inserted into SQL server table when inserted value is greater than column length

Situation:
I have an existing legacy table [dbo].[Values] which sits on SQL server 2017.
In this table I have 3 columns.
int tableid (Primary Key)
char (8) code
char (7) description
Code and description are both custom data types but both are just char (8) ,char (7) with no additional logic.
Action
If I insert into this table where the inserted value to column code is greater than 8 char's I get * inserted into that column.
No error or warning is given.
I have looked in triggers, constraints, policies, the table's creation script, custom datatypes. I cannot find any where there is logic that says if truncate set value = *
Question
What part of sql server would modify values before it is saved into the table?
You would see this if you are also using the incorrect datatype and getting an implicit cast from int.
CREATE TABLE #T(C CHAR(8))
INSERT INTO #T VALUES (1111111111);
SELECT *
FROM #T
It is documented behaviour here
Solution. Use a string
INSERT INTO #T VALUES ('1111111111');
/*String or binary data would be truncated.*/
This is legacy behaviour that is unlikely to change