Uploading to BigQuery GIS: "Invalid nesting: loop 1 should not contain loop 0" - google-bigquery

I'm uploading a CSV file to BigQuery GIS but it fails with "Invalid nesting: loop 1 should not contain loop 0".
This is the error in full:
Upload complete.
BigQuery error in load operation: Error processing job 'mytable-1176:bqjob_rc625a5098ae5fb_0000017289ff6c01_1': Error while reading data,
error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Failure details:
- Error while reading data, error message: Could not parse 'POLYGON
((-0.02679766923455 51.8338973454281,-0.02665013926462
51.83390841216...' as geography for field geom (position 0)
starting at location 23 Invalid nesting: loop 1 should not contain
loop 0
I've pasted the offending row of the CSV file in full below. It appears to be valid WKT (or at any rate, if I do SELECT ST_GeomFromText('POLYGON(....)') in Postgres, or in BigQuery itself, I don't get an error.
I can upload other rows of the CSV file just fine, so this isn't a problem with the schema etc.
My CSV file, in full:
"POLYGON ((-0.02679766923455 51.8338973454281,-0.02665013926462 51.8339084121668,-0.026560668613456 51.8339151296535,-0.026487799104821 51.8339205969954,-0.026347243837181 51.8339311509483,-0.026281189190482 51.8339361120738,-0.026195955952021 51.8339425137374,-0.026119198116154 51.8339482752767,-0.026003486068282 51.8339569617883,-0.025952488850356 51.8339607906816,-0.025861270469634 51.8339676401587,-0.025858880441179 51.833967860811,-0.025828348676774 51.8339706933666,-0.025711206268708 51.8339815496773,-0.025689593686264 51.8339835517849,-0.025640538635981 51.8339884382609,-0.025529205899202 51.8339995358046,-0.025488766227802 51.8340035687339,-0.02540224517472 51.8340121962521,-0.025345346891212 51.8340178681142,-0.025335200706554 51.8340188757231,-0.025340349282672 51.834037216355,-0.025358081339574 51.8341162858309,-0.02530239219927 51.8342041307738,-0.025137161521987 51.8344648008,-0.025126602922354 51.8344835251042,-0.025117917332667 51.8344989177765,-0.025112142226516 51.8345091618299,-0.025029432267333 51.8347162751015,-0.025024329062831 51.8347290752289,-0.025009526856435 51.8347661622769,-0.025093294071485 51.8348450548926,-0.025096355164552 51.8348576953871,-0.02509500197774 51.8348697132258,-0.02509108994899 51.8348834865336,-0.025088461171685 51.8348894941415,-0.025083736509161 51.834900277382,-0.025072148729253 51.8349192991243,-0.025066276262158 51.8349271226369,-0.025058952287815 51.8349372687386,-0.025042611052947 51.834953135315,-0.025001680789442 51.8349786422223,-0.024920260658812 51.8350331883359,-0.024784590322317 51.8351240689296,-0.024592664989898 51.8352526349419,-0.024589543101483 51.8352549834202,-0.024520606248935 51.8353068434611,-0.024501351755 51.8353358705759,-0.024471109202685 51.8353814835663,-0.024447518472049 51.8354170560924,-0.024415970917911 51.8354646254313,-0.024417975754184 51.835564067945,-0.024418971386853 51.8356132810276,-0.02441636899482 51.8356607970846,-0.024412069417915 51.8357391278945,-0.024405890624276 51.835851768366,-0.024404645018278 51.8358745876421,-0.024405581774846 51.835921497729,-0.024408752345367 51.8360808568968,-0.024410061546665 51.8361464500484,-0.024413180002599 51.8363030207515,-0.024413726767404 51.8363306539892,-0.0244128839888 51.8363572028036,-0.024408747608483 51.8364878440004,-0.024407687807063 51.8365213401423,-0.024406476329442 51.8365599592925,-0.024391641544557 51.836678011519,-0.024386208097079 51.8367212266961,-0.024342690041741 51.8367069893255,-0.024319288279565 51.8366997621065,-0.024292175040315 51.8366914384262,-0.024256993656756 51.8366804074215,-0.024209378619192 51.8366665867489,-0.024155315566661 51.8366508502892,-0.024105536982117 51.8366373798885,-0.024053478095738 51.8366236103787,-0.024004689328762 51.8366107410593,-0.023970131471096 51.836601716714,-0.023909047139602 51.8365858891387,-0.023872509708533 51.8365763279375,-0.023813366264893 51.8365609285831,-0.023800608538063 51.8365575849119,-0.023784986418606 51.8365536895414,-0.023742454842103 51.8365431193584,-0.023711439553104 51.8365354043849,-0.023694010323261 51.8365310110389,-0.023665520720037 51.8365239589599,-0.023639661080915 51.8365175085891,-0.023621685803881 51.8365129891569,-0.023606995500241 51.8365093612042,-0.023594058239185 51.836506140384,-0.023579988098348 51.8365026037803,-0.023576657648261 51.836501756494,-0.023573212341634 51.8365008803008,-0.023570943037101 51.8365003296066,-0.023566131987943 51.8364991517017,-0.023560820359479 51.836497803524,-0.023550842839 51.836494614457,-0.023534152361942 51.8364892623447,-0.023519322307527 51.8364845169988,-0.02348978408578 51.8364729061891,-0.023475831119834 51.8364673572881,-0.023454784263057 51.8364590634327,-0.023440395773423 51.8364535161996,-0.023420399054051 51.8364457705282,-0.023412745105461 51.8364422248457,-0.023381943124318 51.8364279850281,-0.023357997010233 51.8364146337661,-0.023352306254349 51.8364113368901,-0.023317752798112 51.8363896244224,-0.023286469209969 51.836365503042,-0.023270902824062 51.8363503952582,-0.023271184375594 51.8363476124069,-0.023383603050238 51.836110354259,-0.023465777469187 51.8359369179285,-0.023472651192556 51.8358651407666,-0.023528809877392 51.8357004122102,-0.023529686057616 51.835690679384,-0.023538063914411 51.83565382625,-0.023543689808186 51.835629102312,-0.023572573948308 51.8355021681838,-0.023577786720198 51.8354792537265,-0.023618788068211 51.8352990106071,-0.023627360729621 51.8352613514407,-0.023661334277455 51.8351119863322,-0.023669026973139 51.8350781790229,-0.023688225716162 51.8349937501145,-0.023717627975541 51.8348645765991,-0.023722804894815 51.8348418143991,-0.023756486300809 51.8346938021715,-0.023763907778347 51.8346612132372,-0.023790901015912 51.834542628006,-0.023795341167441 51.8345230996081,-0.023800489727373 51.8345105970367,-0.023838123662149 51.834419158237,-0.023854300226396 51.8343798733486,-0.023881261233424 51.8343143955334,-0.02360425553657 51.8343556996006,-0.023715342137444 51.8343230723896,-0.023748777209506 51.8343116206823,-0.023788430888355 51.8342980344079,-0.023870547343144 51.8342676988052,-0.023942069257488 51.8342416452568,-0.023977465644738 51.8342284909458,-0.024007122432287 51.8342174702501,-0.024015599076738 51.8342144833934,-0.024058719278701 51.8341992917174,-0.024087038370323 51.834189318598,-0.024105922723606 51.8341826669339,-0.024107853809983 51.8341819889956,-0.024165165953312 51.8341609030475,-0.024196280263318 51.834154052201,-0.024216549190068 51.834149590904,-0.02430529023458 51.8341300490434,-0.024364346544379 51.8341175259067,-0.024380183743731 51.8341141681084,-0.024434405357854 51.8341026697566,-0.024448191552983 51.8340997450887,-0.02450384848189 51.8340892779444,-0.024534730584179 51.8340827558198,-0.024571252512039 51.834075051532,-0.024617624661644 51.8340664695984,-0.02465593398552 51.8340593798021,-0.024659169226995 51.8340587507378,-0.024698227325573 51.8340511339728,-0.024715633647968 51.8340477395406,-0.024790094189827 51.8340371025226,-0.024838827806997 51.8340302776774,-0.024900748436463 51.8340216150771,-0.024978773610225 51.8340117751699,-0.025070395982381 51.8340016689899,-0.025162402800403 51.833992063766,-0.02519959947995 51.8339881743187,-0.025258434658431 51.8339820224963,-0.025299537926511 51.8339784144308,-0.025322340623602 51.8339764143945,-0.025367654362725 51.8339724453802,-0.025433566943196 51.833966754013,-0.02546940835365 51.8339636510231,-0.025478048048634 51.8339629058588,-0.025545930176343 51.8339589740019,-0.025596525520343 51.8339560377582,-0.025600745391944 51.8339557938799,-0.025724272639777 51.833947958345,-0.025744834562549 51.8339467478911,-0.025819588005694 51.8339423467392,-0.025865426412415 51.8339396542424,-0.025940870479972 51.8339357232096,-0.026005709012821 51.8339323514664,-0.026011573315888 51.8339320092857,-0.026097210950677 51.833926981359,-0.026163087127967 51.8339231143903,-0.026182935984022 51.8339219458446,-0.02626063552581 51.8339182143234,-0.026381722979675 51.8339123963649,-0.026433135483049 51.8339100220004,-0.026570879650513 51.833903674085,-0.026606068864626 51.8339025380898,-0.026689092338078 51.8338998667835,-0.026732668330249 51.8338984578396,-0.02679766923455 51.8338973454281),(-0.026648986091666 51.8339075655333,-0.026350862668855 51.833928055419,-0.026484565304541 51.8339188701762,-0.026564475458192 51.8339133771113,-0.026648986091666 51.8339075655333),(-0.026648986091666 51.8339075655333,-0.026649000595502 51.8339075657766,-0.02679766923455 51.8338973454281,-0.026648986091666 51.8339075655333),(-0.024985527558019 51.8340129946412,-0.024837331236428 51.8340326354761,-0.024948153517607 51.834017951115,-0.024985527558019 51.8340129946412),(-0.025153445104997 51.8339934420137,-0.025006535825553 51.8340102091839,-0.025133197780121 51.8339957546967,-0.025153445104997 51.8339934420137))"
Could this be a winding order problem?
For reference my source data is an EPSG:27700 shapefile, and I converted it to EPSG:4326 WKT CSV using ogr2ogr.

This is a problem with polygon orientation, for details see: https://cloud.google.com/bigquery/docs/gis-data#polygon_orientation
When loading WKT geography data, BigQuery assumes the polygons are oriented as described in the link above (to allow loading polygons larger than hemisphere).
ST_GeogFromText has different default, and that allows it to consume this data. If you pass second parameter oriented => TRUE to ST_GeogFromText function, you get the same result.
The workaround is to either load data as STRING and then convert to Geography using ST_GeogFromText, or load using GeoJSON format instead of WKT. GeoJSON is planar map format, so there is no polygon orientation ambiguity.
You can make ogr2ogr produce CSV with GeoJSON using something like
ogr2ogr -f csv -dialect sqlite \
-sql "select AsGeoJSON(ST_Transform(geometry, 4326)) geom, * from shp"
out.csv ./input/
See also https://medium.com/#mentin/loading-large-spatial-features-to-bigquery-geography-2f6ceb6796df

Related

Snowflake COPY INTO from JSON - ON_ERROR = CONTINUE - Weird Issue

I am trying to load JSON file from Staging area (S3) into Stage table using COPY INTO command.
Table:
create or replace TABLE stage_tableA (
RAW_JSON VARIANT NOT NULL
);
Copy Command:
copy into stage_tableA from #stgS3/filename_45.gz file_format = (format_name = 'file_json')
Got the below error when executing the above (sample provided)
SQL Error [100069] [22P02]: Error parsing JSON: document is too large, max size 16777216 bytes If you would like to continue loading
when an error is encountered, use other values such as 'SKIP_FILE' or
'CONTINUE' for the ON_ERROR option. For more information on loading
options, please run 'info loading_data' in a SQL client.
When I had put "ON_ERROR=CONTINUE" , records got partially loaded, i.e until the record with more than max size. But no records after the Error record was loaded.
Was "ON_ERROR=CONTINUE" supposed to skip only the record that has max size and load records before and after it ?
Yes, the ON_ERROR=CONTINUE skips the offending line and continues to load the rest of the file.
To help us provide more insight, can you answer the following:
How many records are in your file?
How many got loaded?
At what line was the error first encountered?
You can find this information using the COPY_HISTORY() table function
Try setting the option strip_outer_array = true for file format and attempt the loading again.
The considerations for loading large size semi-structured data are documented in the below article:
https://docs.snowflake.com/en/user-guide/semistructured-considerations.html
I partially agree with Chris. The ON_ERROR=CONTINUE option only helps if the there are in fact more than 1 JSON objects in the file. If it's 1 massive object then you would simply not get an error or the record loaded when using ON_ERROR=CONTINUE.
If you know your JSON payload is smaller than 16mb then definitely try the strip_outer_array = true. Also, if your JSON has a lot of nulls ("NULL") as values use the STRIP_NULL_VALUES = TRUE as this will slim your payload as well. Hope that helps.

Text was truncated or one or more characters had no match in the target code page ole db source to flat file destination

I'm exporting a table output to a CSV file. I'm doing it using SSIS package which has OLE DB Source and Flat File Destination. I'm getting the following errors:
[Flat File Destination [2]] Error: Data conversion failed. The data conversion for column "Address" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
[Flat File Destination [2]] Error: Cannot copy or convert flat file data for column "Address".
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Flat File Destination" (2) failed with error code 0xC02020A0 while processing input "Flat File Destination Input" (6). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
[OLE DB Source [9]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on OLE DB Source returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Can anyone please advise?
The output column for Address is specified as smaller than your original table column.
See this SO: SSIS data conversion failed
Summary:
(1) Right Click on Flat File Source and choose “Show Advanced Editor” Go to “Input and Output Properties “ Tab Expand “Flat File Source Output” and choose “External Columns”
(2) Select column "Address" and on right hand side, increase length to be same size as column in your original table
Double check anywhere in your Export wizard that allows you to set column sizes. Make sure those of your output file match those of your original table columns.
#user7396598
Thank you for pointing me in the right direction. So I ran a comparison the records seem to be inserting in the same order only until a point then after they are not matching. I could captured the bad data. by running the following:
select * from table where address != cast(address as varchar(1000)), when I removed the bad data my SSIS packaged worked.
Now I need to figure out how to convert the bad data into acceptable format for the CSV.
Reference - https://stackoverflow.com/a/2683496/8452633
SO i had a similar problem of bad data in one of my columns causing this error even after increasing the size of the output column. In my case I solved this problem by replacing the bad data in my columns by using replace function.
I exported the data by writing a query and in that query instead of "select *" I wrote all the column names and used the replace function on the columns that were causing the problems. I replaced all the characters that could potentially cause truncation e.g. comma, pipe, tabs etc with an empty space.

SSIS XML Source Error - Input string was not in a correct format

I have an attribute tlost with the definition below in the XSD file. I have tried both use="required" and use="optional".
<xs:attributeGroup name="defense">
<xs:attribute name="tlost" use="required" type="xs:decimal"/>
</xs:attributeGroup>
In the XML document I am trying to import I will get a value like the following:
<defense ast="0" category="special_team" tlost="0" int="0"/>
I am executing an SSIS package that takes the tlost value and inserts it into a sql database table. The column in the database table has a datatype of DECIMAL(28,10) and allows nulls.
When I execute the package, the previous values work perfectly and the data is inserted. However, when I get a value where tlost="" in the XML file, the package fails and the record is not inserted.
In the data flow path editor, the data type for tlost is DT_DECIMAL. When I check the Advanced Editor for the XML Source, the Input and Output properties have a data type for tlost as decimal [DT_DECIMAL].
I can't figure out why this is failing. I tried to create a derived column and cast it as a (DT_DECIMAL, 10) data type. That didn't work. I tried to check for a null value and replace with 0 if null, that didn't work. So I just ignored the column all together and in the Derived Column task, I replaced the tlost column value with (DT_DECIMAL, 10) 0 to just insert a 0 value and ignore whatever is in the xml file, and the job still failed with the following error message:
Error: 0xC020F444 at Load Play Summary Tables, XML Source [1031]: The error "Input string was not in a correct format." occurred while processing "XML Source.Outputs[defense].Columns[tlost]".
Error: 0xC02090FB at Load Play Summary Tables, XML Source [1031]: The "XML Source" failed because error code 0x80131537 occurred, and the error row disposition on "XML Source.Outputs[defense].Columns[tlost]" at "XML Source.Outputs[defense]" specifies failure on error. An error occurred on the specified object of the specified component.
Error: 0xC02092AF at Load Play Summary Tables, XML Source [1031]: The XML Source was unable to process the XML data. Pipeline component has returned HRESULT error code 0xC02090FB from a method call.
Error: 0xC0047038 at Load Play Summary Tables, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on XML Source returned error code 0xC02092AF. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Please help. I have exhausted everything I can think of to fix this issue. I am processing hundreds of files, and I can't keep fixing bad data files every time this issue occurs.
Can you please try these
1 - Change to data type to string in xsd and before loading into tables take care of data type conversion.
2 - If possible generate the xsd by passing your xml and then verify the data type and use it accordingly ...
rest of the xsd can be changed accordingly...
below is screen grab of what I tried. hope it helps]1

ADLA/U-SQL Error: Vertex user code error

I just have a simple U-SQL that extracts a csv using Extractors.Csv(encoding:Encoding.[Unicode]); and outputs into a lake store table. The file size is small around 600MB and is unicode type. The number of rows is 700K+
These are the columns:
UserId int,
Email string,
AltEmail string,
CreatedOn DateTime,
IsDeleted bool,
UserGuid Guid,
IFulfillmentContact bool,
IsBillingContact bool,
LastUpdateDate DateTime,
IsTermsOfUse string,
UserTypeId string
When I submit this job to my local, it works great without any issues. Once I submit it to ADLA, I get the following error:
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract_Partition[0][0] with error: Vertex user code error.
Vertex failed with a fail-fast error
Vertex SV1_Extract_Partition[0][0].v1 {BA7B2378-597C-4679-AD69-07413A143E47} failed
Error:
Vertex user code error
exitcode=CsExitCode_StillActive Errorsnippet=An error occurred while processing adl://lakestore.azuredatalakestore.net/Data/User.csv
Any help is appreciated!
Since the file is larger than 250MB, you need to make sure that you upload it as a row-oriented file and not a binary file.
Also, please check the reply for the following question to see how you currently can find more details on the error: Debugging u-sql Jobs

CSV file input not working together with set field value step in Pentaho Kettle

I have a very simple Pentaho Kettle transformation that causes a strange error. It consists of reading a field X from a CSV, add a field Y, set Y=X and finally write it back to another CSV.
Here you can see the steps and the configuration for them:
You can also download the ktr file from here. The input data is just this:
1
2
3
When I run this transformation, I get this error message:
ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : Unexpected error
ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : org.pentaho.di.core.exception.KettleStepException:
Error writing line
Error writing field content to file
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeRowToFile(TextFiIeOutput.java:273)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFileOutput.processRow(TextFiIeOutput.java:195)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
atjava.Iang.Thread.run(Unknown Source)
Caused by: org.pentaho.di.core.exception.KettleStepException:
Error writing field content to file
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeField(TextFileOutput.java:435)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeRowToFile(TextFiIeOutput.java:249)
3 more
Caused by: org.pentaho.di.core.exception.KettleVaIueException:
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
at org.pentaho.di.core.row.vaIue.VaIueMetaBase.getBinaryString(VaIueMetaBase.java:2185)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.formatField(TextFiIeOutput.java:290)
at org.pentaho.di.trans.steps.textfiIeoutput.TextFiIeOutput.writeField(TextFileOutput.java:392)
4 more
All of the above lines start with 2015/09/23 12:51:18 - Text file output.0 -, but I edited it out for brevity. I think the relevant, and confusing, part of the error message is this:
Y Number : There was a data type error: the data type of [B object [[B©b4136a] does not correspond to value meta [Number]
Some further notes:
If I bypass the set field value step by using the lower hop instead, the transformation finish without errors. This leads me to believe that it is the set field value step that causes the problem.
If I replace the CSV file input with a data frame with the same data (1,2,3) everything works just fine.
If I replace the file output step with a dummy the transformation finish without errors. However, if I preview the dummy, it causes a similar error and the field Y has the value <null> on all three rows.
Before I created this MCVE I got the error on all sorts of seemingly random steps, even when there was no file output present. So I do not think this is related to the file output.
If I change the format from Number to Integer, nothing changes. But if I change it to string the transformations finish without errors, and I get this output:
X;Y
1;[B#49e96951
2;[B#7b016abf
3;[B#1a0760b0
Is this a bug? Am I doing something wrong? How can I make this work?
It's because of lazy conversion. Turn it off. This is behaving exactly as designed - although admittedly the error and user experience could be improved.
Lazy conversion must not be used when you need to access the field value in your transformation. That's exactly what it does. The default should probably be off rather than on.
If your field is going directly to a database, then use it and it will be faster.
You can even have "partially lazy" streams, where you use lazy conversion for speed, but then use select values step, to "un-lazify" the fields you want to access, whilst the remainder remain lazy.
Cunning huh?