Error writing dataframe to SQL database table using dbWriteTable in R - sql

I'm attempting to create a new table in a SQL database from a series of dataframes that I've made available to the global environment (dataframes created via a function in R).
I'm able to connect to the server:
#libraries
library(odbc)
library(dplyr)
library(DBI)
#set up a connection to the database
staging_database <- dbConnect(odbc::odbc(), "staging_db")
#Write dataframe to table in database (table does not exist yet in SQL!)
dbWriteTable(staging_database , 'test_database', `demographics_dataframe`, row.names = FALSE)
However, I'm getting the following error:
Error: <SQL> 'CREATE TABLE "test_database" (
"field1" varchar(255),
"field2" BIT,
"field3" BIT,
"field4" varchar(255),
"field5" varchar(255),
"field6" varchar(255),
"field7" varchar(255),
"field8" INT,
"field9" INT,
Very unhelpful error here - is there something I'm missing? I've followed the documentation for dbWritetable. Something I'm noticing, that I believe may be a part of the problem, is that I can't view any existing tables within "staging_db".
dbListTables(staging_database) reveals a bunch of metadata, but no actual tables that exist (I can verify they exist by logging into Microsoft SQL Server).

I recently encountered the same error. In my case, there was no problem with special characters or unexpected symbols anywhere in the table, since I was able to write the table “piece by piece” each time writing only a subset of columns.
I am guessing it has something to do with the number of columns to be written. In a single go, I was able to write up to 173 columns.
Rather than think that this is the limit I would say there is an internal limit for the length of the “CREATE TABLE” string hidden somewhere along the way.
Workaround that worked for me was to split the table into 2 sets of columns with the ID in both of them and joining them later.
Edit
So the truth was for me that 2 columns had the same name. The error indeed comes from the "CREATE TABLE" string - one cannot create table with duplicated column names.
Even a year after asking the question, I hope this can help someone dealing with the same problem ;)

Related

Azure Data Studio not respect specified casing with PostgreSQL [duplicate]

I'm writing a Java application to automatically build and run SQL queries. For many tables my code works fine but on a certain table it gets stuck by throwing the following exception:
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: column "continent" does not exist
Hint: Perhaps you meant to reference the column "countries.Continent".
Position: 8
The query that has been run is the following:
SELECT Continent
FROM network.countries
WHERE Continent IS NOT NULL
AND Continent <> ''
LIMIT 5
This essentially returns 5 non-empty values from the column.
I don't understand why I'm getting the "column does not exist" error when it clearly does in pgAdmin 4. I can see that there is a schema with the name Network which contains the table countries and that table has a column called Continent just as expected.
Since all column, schema and table names are retrieved by the application itself I don't think there has been a spelling or semantical error so why does PostgreSQL cause problems regardless? Running the query in pgAdmin4 nor using the suggested countries.Continent is working.
My PostgreSQL version is the newest as of now:
$ psql --version
psql (PostgreSQL) 9.6.1
How can I successfully run the query?
Try to take it into double quotes - like "Continent" in the query:
SELECT "Continent"
FROM network.countries
...
In working with SQLAlchemy environment, i have got this error with the SQL like this,
db.session.execute(
text('SELECT name,type,ST_Area(geom) FROM buildings WHERE type == "plaza" '))
ERROR: column "plaza" does not exist
Well, i changed == by = , Error still persists, then i interchanged the quotes, like follows. It worked. Weird!
....
text("SELECT name,type,ST_Area(geom) FROM buildings WHERE type = 'plaza' "))
This problem occurs in postgres because the table name is not tablename instead it is "tablename".
for eg.
If it shows user as table name,
than table name is "user".
See this:
Such an error can appear when you add a space in the name of a column by mistake (for example "users ").
QUICK FIX (TRICK)
If you have recently added a field which you have already deleted before and now trying to add the same field back then let me share you this simple trick! i did this and the problem was gone!!
so, now just delete the migration folder entirely on the app,then instead of adding that field you need to now add a field but with the name of which you have never declared on this app before, example if you are trying to add title field then create it by the name of heading and now do the migration process separately on the app and runserver, now go to admin page and look for that model and delete all the objects and come to models back and rename the field that you recently made and name it to which you were wishing it with earlier and do the migrations again and now your problem must have been gone!!
this occurs when the objects are there in the db but you added a field which wasn't there when the earlier objs were made, so by this we can delete those objs and make fresh ones again!
I got the same error when I do PIVOT in RedShift.
My code is similar to
SELECT *
INTO output_table
FROM (
SELECT name, year_month, sales
FROM input_table
)
PIVOT
(
SUM(sales)
FOR year_month IN ('nov_2020', 'dec_2020', 'jan_2021', 'feb_2021', 'mar_2021', 'apr_2021', 'may_2021', 'jun_2021', 'jul_2021', 'aug_2021',
'sep_2021', 'oct_2021', 'nov_2021', 'dec_2021', 'jan_2022', 'feb_2022', 'mar_2022', 'apr_2022', 'may_2022', 'jun_2022',
'jul_2022', 'aug_2022', 'sep_2022', 'oct_2022', 'nov_2022')
)
I tried year_month without any quote (got the error), year_month with double quote (got the error), and finally year_month with single quote (it works this time). This may help if someone in the same situation like my example.

Dropping XML columns in DB2

I'm trying to drop an XML column in db2 in the following manner. But every time I do this, it is resulting in an error
create table One(name int, address XML);
alter table One add column age xml;
alter table One drop column age;
Error starting at line : 5 in command -
alter table One drop column age
Error report -
DB2 SQL Error: SQLCODE=-1242, SQLSTATE=42997, SQLERRMC=7, DRIVER=4.11.77
DB2 official documentation suggests that issue is fixed in DB 9.7. I'm currently checking on 10.5 & 11.5 versions but still facing the same issue.
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/xml/src
/tpc/db2z_alterxml.html
DB2 documentation suggests to run CHECK pending status on a table after a re-org but there were no commands that are available.
Is there a way to resolve this drop column issue for XML datatypes? Or else DB2 is not allowed to drop XML columns in a table by default?
Can someone suggest on this issue?
https://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.wn.doc/doc/c0055038.html
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.messages.sql.doc/com.ibm.db2.luw.messages.sql.doc-gentopic5.html#sql1242n
IBM suggests all the XML columns in a table need to dropped in a single alter statement. Is this still a restriction in 10.5 & higher versions of db2 ?
This remains a restriction in Db2-Linux/Unix/Windows up to and including Db2 v11.5.
IBM states (in the help for SQL1242N reason code 7)
"For a table containing multiple columns of data type XML, either do
not drop any XML columns or drop all of the XML columns in the table
using a single ALTER TABLE statement
"
This restriction only applies to tables with more than one XML column.
You can workaround in various ways, for example create a new table and copy the existing data into it etc, or arrange your physical data model differently.

Create non-quoted identifier column names when creating sql developer table in python

After connecting to the oracle DB, I have created a sql developer table via Python, I then load my csv into the df and convert the datatypes to varchar for a faster load into the table (because the default str type takes an unreasonable amount of time). The data loads fast and it is all present, but the issue is when interrogating the data in SQL Developer, I am forced to put '' round the column names for it to be recognised, and when trying to perform simple operations on the data such as SELECT * FROM new_table ORDER BY 'CATEGORY' asc sql can not seem to sort my data at all, does anyone have any suggestions please? Below is a snippet of the python code I have used
os.chdir('C:\\Oracle\\instantclient_19_5')
dataload= Table('new_table, meta,
Column('Category',VARCHAR(80)),
Column('Code',VARCHAR(80)),
Column('Medium',VARCHAR(80)),
Column('Source',VARCHAR(80)),
Column('Start_date',VARCHAR(80)),
Column('End_date',VARCHAR(80))
meta.create_all(engine)
df3= pd.read_csv(fname)
dtyp = {c:types.VARCHAR(df3[c].str.len().max())
for c in df3.columns[df3.dtypes == 'object'].tolist()}
df3.to_sql(new_table, engine, index=False, if_exists='replace', dtype=dtyp)

RODBC sqlSave table creation problems

I'm having trouble creating a table using RODBC's sqlSave (or, more accurately, writing data to the created table).
This is different than the existing sqlSave question/answers, as
the problems they were experiencing were different, I can create tables whereas they could not and
I've already unsuccesfully incorporated their solutions, such as closing and reopening the connection before running sqlSave, also
The error message is different, with the only exception being a post that was different in the above 2 ways
I'm using MS SQL Server 2008 and 64-bit R on a Windows RDP.
I have a simple data frame with only 1 column full of 3, 4, or 5-digit integers.
> head(df)
colname
1 564
2 4336
3 24810
4 26206
5 26433
6 26553
When I try to use sqlSave, no data is written to the table. Additionally, an error message makes it sound like the table can't be created though the table does in fact get created with 0 rows.
Based on a suggestion I found, I've tried closing and re-opening the RODBC connection right before running sqlSave. Even though I use append = TRUE, I've tried dropping the table before doing this but it doesn't affect anything.
> sqlSave(db3, df, table = "[Jason].[dbo].[df]", append = TRUE, rownames = FALSE)
Error in sqlSave(db3, df, table = "[Jason].[dbo].[df]", :
42S01 2714 [Microsoft][ODBC SQL Server Driver][SQL Server]There is already
an object named 'df' in the database.
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE [Jason].[dbo].[df]
("df" int)'
I've also tried using sqlUpdate() on the table once it's been created. It doesn't matter if I create it in R or SQL Server Management Studio, I get the error table not found on channel
Finally, note that I have also tried this without append = TRUE and when creating a new table, as well as with and without the rownames option.
Mr.Flick from Freenode's #R had me check if I could read in the empty table using sqlQuery and indeed, I can.
Update
I've gotten a bit closer with the following steps:
I created an ODBC connection that goes directly to my Database within the SQL Server, instead of just to the default (Master) DB then specifying the path to the table within the table = or tablename = statements
Created the table in SQL Server Management Studio as follows
GO
CREATE TABLE [dbo].[testing123](
[Person_DIMKey] [int] NULL
) ON [PRIMARY]
GO
In R I used sqlUpdate with my new ODBC connection and no brackets around the tablename
Now sqlUpdate() sees the table, however it complains that it needs a unique column
Indicating that the only column in the table is the unique column with index = colname results in an error saying that the column does not exist
I dropped and recreated the table specifying a primary key,
GO
CREATE TABLE [dbo].[jive_BNR_Person_DIMKey](
[jive_BNR_Person_DIMKey] [int] NOT NULL PRIMARY KEY
) ON [PRIMARY]
GO
which generated both a Primary Key and Index (according to the GUI interface of SQL Sever Management Studio) named PK__jive_BNR__2754EC2E30F848ED
I specified this index/key as the unique column in sqlUpdate() but I get the following error:
Error in sqlUpdate(db4, jive_BNR_Person_DIMKey, tablename = "jive_BNR_Person_DIMKey", :
index column(s) PK__jive_BNR__2754EC2E30F848ED not in database table
For the record, I was specifying the correct column name (not "colname") for index; thanks to MrFlick for requesting clarification.
Also, these steps are numbered 1 through 7 in my post but StackOverflow resets the numbering of the list a few times when it gets displayed. If anyone can help me clean that aspect of this post up I'd appreciate it.
After hours of working on this, I was finally able to get sqlSave to work while specifying the table name--deep breathe, where to start. Here is the list of things I did to get this to work:
Open 32-bit ODBC Administrator and create a User DSN and configure it for your specific database. In my case, I am creating a global temp table so I linked to tempdb. Use this connection Name in your odbcConnection(Name). Here is my code myconn2 <- odbcConnect("SYSTEMDB").
Then I defined my data types with the following code: columnTypes <- list(Record = "VARCHAR(10)", Case_Number = "VARCHAR(15)", Claim_Type = "VARCHAR(15)", Block_Date = "datetime", Claim_Processed_Date = "datetime", Status ="VARCHAR(100)").
I then updated my data frame class types using as.character and as.Date to match the data types listed above.
I already created the table since I've been working on it for hours so I had to drop the table using sqlDrop(myconn2, "##R_Claims_Data").
I then ran: sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
Then my head fell off because it worked! I really hope this helps someone going forward. Here are the links that helped me get to this point:
Table not found
sqlSave in R
RODBC
After re-reading the RODBC vignette and here's the simple solution that worked:
sqlDrop(db, "df", errors = FALSE)
sqlSave(db, df)
Done.
After experimenting with this a lot more for several days, it seems that the problems stemmed from the use of the additional options, particularlly table = or, equivalently, tablename =. Those should be valid options but somehow they manage to cause problems with my particular version of RStudio ((Windows, 64 bit, desktop version, current build), R (Windows, 64 bit, v3), and/or MS SQL Server 2008.
sqlSave(db, df) will also work without sqlDrop(db, "df") if the table has never existed, but as a best practice I'm writing try(sqlDrop(db, "df", errors = FALSE), silent = TRUE) before all sqlSave statements in my code.
We have had this same problem, which after a bit of testing we solved simply by not using square brackets in the schema and table name reference.
i.e. rather than writing
table = "[Jason].[dbo].[df]"
instead write
table = "Jason.dbo.df"
Appreciate this is now long past the original question, but just for anyone else who subsequently trips up on this problem, this is how we solved it. For reference, we found this out by writing a simple 1 item dataframe to a new table, which when inspected in SQL contained the square brackets in the table name.
Here are a few rules of thumb:
If things aren't working out, then manually specify the column types just as #d84_n1nj4 suggested.
columnTypes <- list(Record = "VARCHAR(10)", Case_Number = "VARCHAR(15)", Claim_Type = "VARCHAR(15)", Block_Date = "datetime", Claim_Processed_Date = "datetime", Status ="VARCHAR(100)")
sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
If #1 doesn't work, then continue to specify the columns, but specify them all as VARCHAR(255). Treat this as a temp or staging table, and move the data over with sqlQuery with your next step, just as #danas.zuokas suggested. This should work, but even if it doesn't, it gets you closer to the metal and puts you in better position to debug the problem with SQL Server Profiler if you need it. <- And yes, if you still have a problem, it's likely due to either a parsing error or type conversion.
columnTypes <- list(Record = "VARCHAR(255)", Case_Number = "VARCHAR(255)", Claim_Type = "VARCHAR(255)", Block_Date = "VARCHAR(255)", Claim_Processed_Date = "VARCHAR(255)", Status ="VARCHAR(255)")
sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
sqlQuery(channel, 'insert into real_table select * from R_Claims_Data')
Due to RODBC's implementation, and not due to any inherent limitation in T-SQL, R's logical type (i.e. [TRUE, FALSE]) will not convert to T-SQL's BIT type (i.e. [1, 0]), so don't try this. Either convert the logical type to [1, 0] in the R layer or take it down to the SQL layer as a VARCHAR(5) and convert it to a BIT in the SQL layer.
In addition to some of the answered posted earlier, here's my workaround. NOTE: I use this as part of a small ETL process, and the destination table in the DB is dropped and recreated each time.
Basically you want to name your dataframe what you destination table is named:
RodbcTest <- read.xlsx('test.xlsx', sheet = 4, startRow = 1, colNames = TRUE, skipEmptyRows = TRUE)
Then make sure your connection string includes the target database (not just server):
conn <- odbcDriverConnect(paste("DRIVER={SQL Server};Server=localhost\\sqlexpress;Database=Charter;Trusted_Connection=TRUE"))
after that, I run a simple sqlQuery that conditionally drops the table if it exists:
sqlQuery(conn, "IF OBJECT_ID('Charter.dbo.RodbcTest') IS NOT NULL DROP TABLE Charter.dbo.RodbcTest;")
Then finally, run sqlSave without the tablename param, which will create the table and populate it with your dataframe:
sqlSave(conn, RodbcTest, safer = FALSE, fast = TRUE)
I've encountered the same problem-- the way I found around it is to create the an empty table using regular CREATE TABLE SQL syntax, and then append to it via sqlSave. For some reason, when I tried it your way, I could actually see the table name in the MSSQL database - even after R threw the error message you showed above - but it would be empty.

VS 2005 SSIS Error value origin

I have an ssis package created in vs 2005 that has started to give me the following error:
[Lawson Staging Table [4046]] Error: There was an error with input column "JOB_CODE" (4200) on input "OLE DB Destination Input" (4059). The column status returned was: "The value violated
the integrity constraints for the column.".
My first question is: what are the 4046, 4200 & 4059 values following my table, column and destination?
My second question is about the integrity constraint message. The destination table is a heap (no keys or indexes) with no constraints. The destination column is defined as a varchar(10). The input column is from oracle, is defined as char(9) and is called job_code. So - where is there an integrity constraint defined?
The final question is about the select statement; looks like the following:
Select ...
,lpad(trim(e.job_code),10,'0') as job_code ...
If I take the lpad and trim functions out, it works but I need these functions in place because my spec calls for a fixed length column padded with leading zeros. This column returns data as expected in TOAD but fails in the ssis package. Does anyone see an issue with how the functions are being used?
Since this package worked in the past but suddenly started to throw this error, I'm assuming that new invalid data has come into play. however, recently added rows don't seem to be any different then historical records.
Those numbers are more likely to be the ids assigned to the each task/table/column etc.
You could probably go to the advanced editor of the data flow task and look at the input and output properties. You can see that for each input or for each column there is an ID assigned.
Next: The error that you are getting occurs usually when "Allow Nulls" option is unchecked.
Try this:
Look at the name of the column for this error/warning.
Go to SSMS and find the table
Allow Nulls for that Column
Save the table
Rerun the SSIS