ERROR: extra data after last expected column, how to fix - sql

I tried to create a table to import a csv file as you can see below, but this error came up, what do I do to fix this.
id VARCHAR(255),
race_ethnicity VARCHAR(255),
sex CHAR(1),
date_of_svc DATE,
icd10_category VARCHAR(3),
billed_amt DOUBLE PRECISION,
allowed_amt DOUBLE PRECISION,
claim_status VARCHAR(255),
cpt INT
);
COPY public.test_dataset FROM 'C:\Users\marca\Downloads\Test dataset - SQL - Sep. 2021 - HCDA.csv' WITH CSV HEADER;```
ERROR: extra data after last expected column
CONTEXT: COPY test_dataset, line 2: "A100B9111,african american,F,01/01/2020,F10, $350.00 , $250.00 ,Paid,99281,"
SQL state: 22P04

Related

add two columns of a row in apache pig

Find out top 5 country with Sum of bars and strips in a flag.
Input is :
I tried the below code1:
grunt> A =load 'mapreduce/flagdata.txt' using PigStorage(',') as (name: chararray, landmass: int, zon: int, area: int, population: int, language: int, religion: int, bars: int, stripes: int, colours: int, red: int, green: int, blue: int, gold: int, white: int, black: int, orange: int, mainhue: chararray, circles: int, crosses: int, saltires: int, quarters: int, sunstairs: int, crescent: int, triangle: int, icon: int, animate: int, text: int, topleft:chararray, botleft: chararray);
grunt> cnt = foreach A generate A.$0, (A.$7+A.$8);//(the same output even if used column name like A.name,A.bars)//
grunt> ord = order cnt by $1 desc;
grunt> lm = limit ord 5;
grunt> dump lm;
Actual output of code1:
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 0: Scalar has more than one row in the output. 1st : (Afghanistan,5,1,648,16,10,2,0,3,5,1,1,0,1,1,1,0,green,0,0,0,0,1,0,0,1,0,0,black,green), 2nd :(Albania,3,1,29,3,6,6,0,0,3,1,0,0,1,0,1,0,red,0,0,0,0,1,0,0,0,1,0,red,red)
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
Code2:
grunt> cnt = foreach A generate A::$0, (A::$7+A::$8) as total;
<line 6, column 28> Unexpected character '$'
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 6, column 28> Unexpected character '$'
grunt> cnt = foreach A generate A::name, (A::bars+A::stripes) as total;
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 6, column 25> Invalid field projection. Projected field [A::name] does not exist in schema: name:chararray,landmass:int,zon:int,area:int,population:int,langu
age:int,religion:int,bars:int,stripes:int,colours:int,red:int,green:int,blue:int,gold:int,white:int,black:int,orange:int,mainhue:chararray,circles:int,crosses:int,
saltires:int,quarters:int,sunstairs:int,crescent:int,triangle:int,icon:int,animate:int,text:int,topleft:chararray,botleft:chararray.
Expected output is:
Need to display the name of top 5 countries with sum(bars+stripes) is greater.(seperate column is just for reference)
I am getting different outputs and sometimes errors(Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.) while modifying the above code. Please help in obtaining sum of two columns.
If the datatype of bars and stripes is int then just use '+'.SUM operates on columns.Also no need to group if the country list is unique.
cnt = foreach A generate name,(bars + stripes) as total;
ord = order cnt by $1 desc;
lm = limit ord 5;
dump lm;

Conversion failed when converting the varchar to data type int

I have this query, I need to save Pubname, ISBNname, copiesname and createdname as Integers
DECLARE #a Table(bkID INT, bkname varchar(100), bkpub INT, bkISBN INT, bkcopies INT, bkcreatedby INT)
INSERT INTO #a (bkID, bkname, bkpub, bkISBN, bkcopies, bkcreatedby)
SELECT A.nameID,
A.bkname,
CAST((B.Pubname) AS INT),
CAST((C.ISBNname) AS int),
CAST((D.copiesname) AS int),
CAST((F.createdname) AS int)
FROM #bkname AS A
LEFT JOIN #bkPub AS B
ON (A.nameID = PubID)
LEFT JOIN #bkISBN AS C
ON (A.nameID = C.ISBNID)
LEFT JOIN #bkcopies AS D
ON (A.nameID = D.copiesID)
LEFT JOIN #bkcraeted AS F
ON (A.nameID = F.craetedID)
It returns this error:
Msg 245, Level 16, State 1, Procedure LR_InsertBookArray, Line 46
[Batch Start Line 2] Conversion failed when converting the varchar
value ''3' to data type int.
It seems highly unlikely that columns called name are actually integers in disguise. My guess is that you simply want the ids from the reference table, but your question doesn't have enough information to know if that is true.
In the meantime, you can use TRY_CAST():
TRY_CAST(B.Pubname AS int),
TRY_CAST(C.ISBNname AS int),
TRY_CAST(D.copiesname AS int),
TRY_CAST(F.createdname AS int)
This will return NULL if the values cannot be converted -- but it avoids the error.

How to identify the rows with missing data in the column due to hidden # in the .txt file

I have a below .txt files exported from the source system. Due to the # in one field in source system while exporting the .txt file some of the data after # fields do not have any data in the .txt file when exported.
For example below..
LINE|PANO| INOW|DEL|EASLN|EBSAP|LIM1IT|NOMIT|VALUE|KTE1|
1|7870|1000000||40500369|10|25624.0||0.00|SERVI TORNG|33277|
2|294|1000000||500324|10|590.84 ||0.00|REFUDIAL GATNGWAM|30448|
3|9410|1000000||200500325|10|5905.61||0.00|SUPLIVER EXTRACNS|37478|
4|573|1000000||600004075|10||||||||
5|739|1000000||700500290|10|40917.37|||||||
6|741|1000000||50500289|10|2782.53 ||0.00|SECUERVIC LUWE|29161|
7|948|1000000||||||||||||
8|996|1000000||960050035|10|7497.3||0.00|SCOUOUT URBISH IDM647 |38271|
9|1320|1000000||800500319|10|1395.93||0.00|TUATO AIRS|36427|
10|12054|1000000||9000287|10|458.42||0.00|SECURICE GOLA|||||
In the above example line 4, 5, 7 and 10 data is missing after certain fields due to the # in the source system field. But there is data in the source system for these line items.
How to recognize these line items as the missing information / records issue, if I have a large volume of .txt file for 10 Million-line items.
Please kindly share the SQL query/ any other way to identify these line items with the missing data.
another example
LINE|PANO| INOW|DEL|EASLN|EBSAP|LIM1IT|NOMIT|VALUE|KTE1|
1|7870|1000000||40500369|10|25624.0||0.00|SERVI TORNG|33277|
2|294|1000000||500324|10|590.84 ||0.00|REFUDIAL GATNGWAM|30448|
3|9410|1000000||200500325|10|5905.61||0.00|SUPLIVER EXTRACNS|37478|
4|573|1000000||600004075|10
5|739|1000000||700500290|10|40917.37
6|741|1000000||50500289|10|2782.53 ||0.00|SECUERVIC LUWE|29161|
7|948|1000000
8|996|1000000||960050035|10|7497.3||0.00|SCOUOUT URBISH IDM647 |38271|
9|1320|1000000||800500319|10|1395.93||0.00|TUATO AIRS|36427|
10|12054|1000000||9000287|10|458.42||0.00|SECURICE GOLA
data truncated if # exists.
Would the following do what you require?
I created a temporary table #HiddenHash and populated it with some of your example data, you will obviously have the data from a BULK INSERT or whatever mechanism you are using.
CREATE TABLE
#HiddenHash
(
LINE VARCHAR (2)
,PANO VARCHAR (25)
,INOW VARCHAR (25)
,DEL VARCHAR (25)
,EASLN VARCHAR (25)
,EBSAP VARCHAR (25)
,LIM1IT VARCHAR (25)
,NOMIT VARCHAR (25)
,VALUE VARCHAR (25)
,KTE1 VARCHAR (25)
)
INSERT INTO #HiddenHash
VALUES
('1','7870','1000000','','40500369','10','25624.0','0.00','SERVI TORNG','33277')
,('2','294','1000000','',' 500324','10','590.84 ','0.00','REFUDIAL GATNGWAM','30448')
,('3','9410','1000000','','200500325','10','5905.61','0.00','SUPLIVER EXTRACNS','37478')
,('4','573','1000000','','600004075','10','','','','')
,('5','739','1000000','','700500290','10','40917.37','','','')
,('6','741','1000000','','50500289','10','2782.53 ','0.00','SECUERVIC LUWE','29161')
,('7','948','1000000','','','','','','','')
,('8','996','1000000','','960050035','10','7497.3','0.00','SCOUOUT URBISH IDM647 ','38271')
,('9','1320','1000000','','800500319','10','1395.93','0.00','TUATO AIRS','36427')
,('10','12054','1000000','','9000287','10','458.42','0.00','SECURICE GOLA','')
Then I count how many columns there are in the table.
DECLARE #CountColumns INT
SET #CountColumns = (SELECT COUNT (*)
FROM TEMPDB.SYS.COLUMNS
WHERE NAME <> 'DEL' AND
object_id = object_id('tempdb.dbo.#HiddenHash')
)
Then count those rows where the columns are blank and show those where they do not match the number of columns contained in the variable.
SELECT LINE,PANO,INOW,EASLN,EBSAP,LIM1IT,NOMIT,VALUE,KTE1
FROM (
SELECT
LINE,PANO,INOW,EASLN,EBSAP,LIM1IT,NOMIT,VALUE,KTE1,
(
SELECT COUNT(*)
FROM (VALUES (LINE),(PANO),(INOW),(EASLN),(EBSAP),(LIM1IT),(NOMIT),
(VALUE),(KTE1)) AS Cnt(col)
WHERE Cnt.Col <> ''
) AS NotBlank
FROM #HiddenHash)cc
WHERE cc.NotBlank <> #CountColumns
Which gives the following result
LINE PANO INOW EASLN EBSAP LIM1IT NOMIT VALUE KTE1
4 573 1000000 600004075 10
5 739 1000000 700500290 10 40917.37
7 948 1000000
10 12054 1000000 9000287 10 458.42 0.00 SECURICE GOLA

HowTO: Import PSQL dump file ? pgAdminConsole or PSQL console

Hello guys as the title says I am having trouble trying to import ' create ' new database from dump file. When i try to run sql query - i get error regarding
' COPY '
. When u run through psql console i get
wrong command \n
the SQL file looks like this ( just a sample ofc as the whole lot is quite big .. )
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.1.12
-- Dumped by pg_dump version 9.3.3
-- Started on 2014-04-01 17:05:29
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- TOC entry 209 (class 1259 OID 32258844)
-- Name: stats_call; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
--
CREATE TABLE bensonsorderlystats_call (
id integer,
callerid text,
entry timestamp with time zone,
oqid integer,
oqnumcalls integer,
oqannounced double precision,
oqentrypos integer,
oqexitpos integer,
oqholdtime double precision,
acdcallid text,
acdentry timestamp with time zone,
acdqueueid integer,
acdagents integer,
acdentrypos integer,
acdexitpos integer,
acdholdtime double precision,
holdtime double precision,
exit timestamp with time zone,
agentid integer,
talktime double precision,
calltime double precision,
callend timestamp with time zone,
reason integer,
wraptime double precision,
acdsubqueueid integer,
working integer,
calledback integer,
accountid integer,
needed integer,
ringingagentid integer,
ringtime double precision,
presented integer,
notecode integer,
note text,
recording text,
wrapcode integer
);
ALTER TABLE public.stats_call OWNER TO postgres;
--
-- TOC entry 2027 (class 0 OID 32258844)
-- Dependencies: 209
-- Data for Name: stats_call; Type: TABLE DATA; Schema: public; Owner: postgres
--
COPY stats (id, callerid, entry, oqid, oqnumcalls, oqannounced, oqentrypos, oqexitpos, oqholdtime, acdcallid, acdentry, acdqueueid, acdagents, acdentrypos, acdexitpos, acdholdtime, holdtime, exit, agentid, talktime, calltime, callend, reason, wraptime, acdsubqueueid, working, calledback, accountid, needed, ringingagentid, ringtime, presented, notecode, note, recording, wrapcode) FROM stdin;
1618693 unknown 2014-02-01 02:59:48.297+00 2512 \n \n \n \n 0 1391223590.58579 2014-02-01 02:59:48.297+00 1872 \n
on the above screen when i run the import with
\i C:<path>/file.sql with delimiter \n i get wrong command \n
i have tried also
`\i C:<path>/file.sql delimiter \n`
`\i C:<path>/file.sql`
Can any one tell me please, how to get this db into server. Help appreciated. Thanks
In general, you can issue \set ON_ERROR_STOP in psql before including a SQL file, to stop at the first error and not be flooded by subsequent errors.
When trying to copy into a non-existing table, COPY fails and all the data after it is rejected with a lot error messages.
Looking at the beginning of your dump, there seems to be a few problems indeed.
It creates a table named bensonsorderlystats_call but then gives ownership to postgres to another public.stats_call which is not supposed to exist.
And then it tries to COPY data into a table named stats which is also never created, assuming you're restoring into an empty database.
It looks as if someone manually edited the dump and messed up.
Try psql -U username -f dump.sql database_name.

Arithmetic overflow error converting expression to data type float

Working on an analysis of bonds. I have attempted to make a payment function that replicates the PMT function of excel. For the bonds, the "Cusip" is their identifier, their "PASS THRU RATE" is their annual interest rate, the "ORIGINAL WA MATURITY" is the total number of periods, and the "ORIGINAL BALANCE" is the original face value of the bond.
The equation for calculating a monthly payment by paper is:
M=[OF(i(1+i)^n)]/[(1+i)^(n-1)]
M=Monthly payment
OF=Original Face
i=annual interest rate/12
n=number of periods
I have a table with all the columns needed for this function, as well as different tables for different months that I will try and use this for. This is what I have so far, creating the function and trying to fix for data types:
if object_id('dbo.PMT') > 0
drop function dbo.PMT
go
create function dbo.PMT(#rate numeric(15,9), #periods smallint, #principal numeric(20,2) )
returns numeric (38,9)
as
begin
declare #pmt numeric (38,9)
select #pmt = #principal
/ (power(1+#rate,#periods)-1)
* (#rate*power(1+#rate,#periods))
return #pmt
end
go
drop function dbo.PMT
go
create function dbo.PMT
(
#rate float,
#periods smallint,
#principal numeric(20,2)
)
returns numeric (38,9)
as
begin
declare #pmt numeric (38,9)
declare #WK_periods float,
#WK_principal float,
#wk_One float,
#WK_power float
select #WK_periods = #periods,
#WK_principal = #principal,
#WK_One = 1
select #pmt =
round(
( #WK_principal * (#rate*power(#WK_One+#rate,#WK_periods)))
/ (power(#WK_One+#rate,#WK_periods)-#WK_One)
,9)
return #pmt
end
go
select ALL [CUSIP NUMBER]
,[PASS THRU RATE]
,[ORIGINAL WA MATURITY]
,[ORIGINAL BALANCE],
dbo.pmt((mbs012013.[PASS THRU RATE]),mbs012013.[ORIGINAL WA MATURITY],mbs012013.[ORIGINAL BALANCE])
FROM
[MBS_STATS].[dbo].[mbs012013]
However, I receive
(502882 row(s) affected)
Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type float.
when I attempt to execute it. I cannot figure out what is causing this. Any help would be great!
In the line below, you have #WK_principal as a FLOAT, and you're assigning the value of #principal which is a NUMERIC(20,2).
#WK_principal = #principal,
That seems to be the most likely culprit. We'd need to be able to see your data to help otherwise. Also, I'm not clear on why you're creating the function one way, then dropping it and recreating it differently. Or are you just showing two different attempts?