Nifi CompressContent - Got this exception "IOException thrown from CompressContent: java.io.IOException: Input is not in the .gz format" - gzip

I am new to Nifi and trying to do a POC on below flow.
I get XML messages from a Kakfa topic. I need to consume the XML message get few attributes and data which is GZIP compressed format from XML elements, GZIP decompress the data (which is again an XML) and then load to MySQL DB. I am trying this and got stuck in below step.
(1)ConsumeKafka → (2)EvaluateXPath (flowfile-attribute = I set few XML elements as flowfile-attributes which is useful downstream) → (3)EvaluateXPath (flowfile-content = get gzip data using XPATH expression = string(//ABC/data) ) → (4)UpdateAttribute (mime.type = application/gzip) → (5) CompressContent (Compression Format = use mime.type attribute and mode = decompress)
My CompressContent is failing with the below Exception.
org.apache.nifi.processor.exception.ProcessException: IOException thrown from CompressContent[id=be4b9583-016e-1000-7cce-b9d822334c4c]: java.io.IOException: java.io.IOException: Input is not in the .gz format
It could be because my datatype of flowfile-content from (3)EvaluateXPath is set to String. Do I need to convert String to byte before feeding to CompressContent? If Yes, how can I do that in the same (3)EvaluateXPath by using some kind of function toBytes()?
Thanks in advance for your help!!!

Got the solution for this issue. Data is Base64 encoded and hence the Gzip process is unable to decompress, so I have added "Base64EncodeContent" processor before "CompressContent' (Gzip Decompress) and that solved the issue.

Related

Uploading to BigQuery GIS: "Invalid nesting: loop 1 should not contain loop 0"

I'm uploading a CSV file to BigQuery GIS but it fails with "Invalid nesting: loop 1 should not contain loop 0".
This is the error in full:
Upload complete.
BigQuery error in load operation: Error processing job 'mytable-1176:bqjob_rc625a5098ae5fb_0000017289ff6c01_1': Error while reading data,
error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Failure details:
- Error while reading data, error message: Could not parse 'POLYGON
((-0.02679766923455 51.8338973454281,-0.02665013926462
51.83390841216...' as geography for field geom (position 0)
starting at location 23 Invalid nesting: loop 1 should not contain
loop 0
I've pasted the offending row of the CSV file in full below. It appears to be valid WKT (or at any rate, if I do SELECT ST_GeomFromText('POLYGON(....)') in Postgres, or in BigQuery itself, I don't get an error.
I can upload other rows of the CSV file just fine, so this isn't a problem with the schema etc.
My CSV file, in full:
"POLYGON ((-0.02679766923455 51.8338973454281,-0.02665013926462 51.8339084121668,-0.026560668613456 51.8339151296535,-0.026487799104821 51.8339205969954,-0.026347243837181 51.8339311509483,-0.026281189190482 51.8339361120738,-0.026195955952021 51.8339425137374,-0.026119198116154 51.8339482752767,-0.026003486068282 51.8339569617883,-0.025952488850356 51.8339607906816,-0.025861270469634 51.8339676401587,-0.025858880441179 51.833967860811,-0.025828348676774 51.8339706933666,-0.025711206268708 51.8339815496773,-0.025689593686264 51.8339835517849,-0.025640538635981 51.8339884382609,-0.025529205899202 51.8339995358046,-0.025488766227802 51.8340035687339,-0.02540224517472 51.8340121962521,-0.025345346891212 51.8340178681142,-0.025335200706554 51.8340188757231,-0.025340349282672 51.834037216355,-0.025358081339574 51.8341162858309,-0.02530239219927 51.8342041307738,-0.025137161521987 51.8344648008,-0.025126602922354 51.8344835251042,-0.025117917332667 51.8344989177765,-0.025112142226516 51.8345091618299,-0.025029432267333 51.8347162751015,-0.025024329062831 51.8347290752289,-0.025009526856435 51.8347661622769,-0.025093294071485 51.8348450548926,-0.025096355164552 51.8348576953871,-0.02509500197774 51.8348697132258,-0.02509108994899 51.8348834865336,-0.025088461171685 51.8348894941415,-0.025083736509161 51.834900277382,-0.025072148729253 51.8349192991243,-0.025066276262158 51.8349271226369,-0.025058952287815 51.8349372687386,-0.025042611052947 51.834953135315,-0.025001680789442 51.8349786422223,-0.024920260658812 51.8350331883359,-0.024784590322317 51.8351240689296,-0.024592664989898 51.8352526349419,-0.024589543101483 51.8352549834202,-0.024520606248935 51.8353068434611,-0.024501351755 51.8353358705759,-0.024471109202685 51.8353814835663,-0.024447518472049 51.8354170560924,-0.024415970917911 51.8354646254313,-0.024417975754184 51.835564067945,-0.024418971386853 51.8356132810276,-0.02441636899482 51.8356607970846,-0.024412069417915 51.8357391278945,-0.024405890624276 51.835851768366,-0.024404645018278 51.8358745876421,-0.024405581774846 51.835921497729,-0.024408752345367 51.8360808568968,-0.024410061546665 51.8361464500484,-0.024413180002599 51.8363030207515,-0.024413726767404 51.8363306539892,-0.0244128839888 51.8363572028036,-0.024408747608483 51.8364878440004,-0.024407687807063 51.8365213401423,-0.024406476329442 51.8365599592925,-0.024391641544557 51.836678011519,-0.024386208097079 51.8367212266961,-0.024342690041741 51.8367069893255,-0.024319288279565 51.8366997621065,-0.024292175040315 51.8366914384262,-0.024256993656756 51.8366804074215,-0.024209378619192 51.8366665867489,-0.024155315566661 51.8366508502892,-0.024105536982117 51.8366373798885,-0.024053478095738 51.8366236103787,-0.024004689328762 51.8366107410593,-0.023970131471096 51.836601716714,-0.023909047139602 51.8365858891387,-0.023872509708533 51.8365763279375,-0.023813366264893 51.8365609285831,-0.023800608538063 51.8365575849119,-0.023784986418606 51.8365536895414,-0.023742454842103 51.8365431193584,-0.023711439553104 51.8365354043849,-0.023694010323261 51.8365310110389,-0.023665520720037 51.8365239589599,-0.023639661080915 51.8365175085891,-0.023621685803881 51.8365129891569,-0.023606995500241 51.8365093612042,-0.023594058239185 51.836506140384,-0.023579988098348 51.8365026037803,-0.023576657648261 51.836501756494,-0.023573212341634 51.8365008803008,-0.023570943037101 51.8365003296066,-0.023566131987943 51.8364991517017,-0.023560820359479 51.836497803524,-0.023550842839 51.836494614457,-0.023534152361942 51.8364892623447,-0.023519322307527 51.8364845169988,-0.02348978408578 51.8364729061891,-0.023475831119834 51.8364673572881,-0.023454784263057 51.8364590634327,-0.023440395773423 51.8364535161996,-0.023420399054051 51.8364457705282,-0.023412745105461 51.8364422248457,-0.023381943124318 51.8364279850281,-0.023357997010233 51.8364146337661,-0.023352306254349 51.8364113368901,-0.023317752798112 51.8363896244224,-0.023286469209969 51.836365503042,-0.023270902824062 51.8363503952582,-0.023271184375594 51.8363476124069,-0.023383603050238 51.836110354259,-0.023465777469187 51.8359369179285,-0.023472651192556 51.8358651407666,-0.023528809877392 51.8357004122102,-0.023529686057616 51.835690679384,-0.023538063914411 51.83565382625,-0.023543689808186 51.835629102312,-0.023572573948308 51.8355021681838,-0.023577786720198 51.8354792537265,-0.023618788068211 51.8352990106071,-0.023627360729621 51.8352613514407,-0.023661334277455 51.8351119863322,-0.023669026973139 51.8350781790229,-0.023688225716162 51.8349937501145,-0.023717627975541 51.8348645765991,-0.023722804894815 51.8348418143991,-0.023756486300809 51.8346938021715,-0.023763907778347 51.8346612132372,-0.023790901015912 51.834542628006,-0.023795341167441 51.8345230996081,-0.023800489727373 51.8345105970367,-0.023838123662149 51.834419158237,-0.023854300226396 51.8343798733486,-0.023881261233424 51.8343143955334,-0.02360425553657 51.8343556996006,-0.023715342137444 51.8343230723896,-0.023748777209506 51.8343116206823,-0.023788430888355 51.8342980344079,-0.023870547343144 51.8342676988052,-0.023942069257488 51.8342416452568,-0.023977465644738 51.8342284909458,-0.024007122432287 51.8342174702501,-0.024015599076738 51.8342144833934,-0.024058719278701 51.8341992917174,-0.024087038370323 51.834189318598,-0.024105922723606 51.8341826669339,-0.024107853809983 51.8341819889956,-0.024165165953312 51.8341609030475,-0.024196280263318 51.834154052201,-0.024216549190068 51.834149590904,-0.02430529023458 51.8341300490434,-0.024364346544379 51.8341175259067,-0.024380183743731 51.8341141681084,-0.024434405357854 51.8341026697566,-0.024448191552983 51.8340997450887,-0.02450384848189 51.8340892779444,-0.024534730584179 51.8340827558198,-0.024571252512039 51.834075051532,-0.024617624661644 51.8340664695984,-0.02465593398552 51.8340593798021,-0.024659169226995 51.8340587507378,-0.024698227325573 51.8340511339728,-0.024715633647968 51.8340477395406,-0.024790094189827 51.8340371025226,-0.024838827806997 51.8340302776774,-0.024900748436463 51.8340216150771,-0.024978773610225 51.8340117751699,-0.025070395982381 51.8340016689899,-0.025162402800403 51.833992063766,-0.02519959947995 51.8339881743187,-0.025258434658431 51.8339820224963,-0.025299537926511 51.8339784144308,-0.025322340623602 51.8339764143945,-0.025367654362725 51.8339724453802,-0.025433566943196 51.833966754013,-0.02546940835365 51.8339636510231,-0.025478048048634 51.8339629058588,-0.025545930176343 51.8339589740019,-0.025596525520343 51.8339560377582,-0.025600745391944 51.8339557938799,-0.025724272639777 51.833947958345,-0.025744834562549 51.8339467478911,-0.025819588005694 51.8339423467392,-0.025865426412415 51.8339396542424,-0.025940870479972 51.8339357232096,-0.026005709012821 51.8339323514664,-0.026011573315888 51.8339320092857,-0.026097210950677 51.833926981359,-0.026163087127967 51.8339231143903,-0.026182935984022 51.8339219458446,-0.02626063552581 51.8339182143234,-0.026381722979675 51.8339123963649,-0.026433135483049 51.8339100220004,-0.026570879650513 51.833903674085,-0.026606068864626 51.8339025380898,-0.026689092338078 51.8338998667835,-0.026732668330249 51.8338984578396,-0.02679766923455 51.8338973454281),(-0.026648986091666 51.8339075655333,-0.026350862668855 51.833928055419,-0.026484565304541 51.8339188701762,-0.026564475458192 51.8339133771113,-0.026648986091666 51.8339075655333),(-0.026648986091666 51.8339075655333,-0.026649000595502 51.8339075657766,-0.02679766923455 51.8338973454281,-0.026648986091666 51.8339075655333),(-0.024985527558019 51.8340129946412,-0.024837331236428 51.8340326354761,-0.024948153517607 51.834017951115,-0.024985527558019 51.8340129946412),(-0.025153445104997 51.8339934420137,-0.025006535825553 51.8340102091839,-0.025133197780121 51.8339957546967,-0.025153445104997 51.8339934420137))"
Could this be a winding order problem?
For reference my source data is an EPSG:27700 shapefile, and I converted it to EPSG:4326 WKT CSV using ogr2ogr.
This is a problem with polygon orientation, for details see: https://cloud.google.com/bigquery/docs/gis-data#polygon_orientation
When loading WKT geography data, BigQuery assumes the polygons are oriented as described in the link above (to allow loading polygons larger than hemisphere).
ST_GeogFromText has different default, and that allows it to consume this data. If you pass second parameter oriented => TRUE to ST_GeogFromText function, you get the same result.
The workaround is to either load data as STRING and then convert to Geography using ST_GeogFromText, or load using GeoJSON format instead of WKT. GeoJSON is planar map format, so there is no polygon orientation ambiguity.
You can make ogr2ogr produce CSV with GeoJSON using something like
ogr2ogr -f csv -dialect sqlite \
-sql "select AsGeoJSON(ST_Transform(geometry, 4326)) geom, * from shp"
out.csv ./input/
See also https://medium.com/#mentin/loading-large-spatial-features-to-bigquery-geography-2f6ceb6796df

Snowflake COPY INTO from JSON - ON_ERROR = CONTINUE - Weird Issue

I am trying to load JSON file from Staging area (S3) into Stage table using COPY INTO command.
Table:
create or replace TABLE stage_tableA (
RAW_JSON VARIANT NOT NULL
);
Copy Command:
copy into stage_tableA from #stgS3/filename_45.gz file_format = (format_name = 'file_json')
Got the below error when executing the above (sample provided)
SQL Error [100069] [22P02]: Error parsing JSON: document is too large, max size 16777216 bytes If you would like to continue loading
when an error is encountered, use other values such as 'SKIP_FILE' or
'CONTINUE' for the ON_ERROR option. For more information on loading
options, please run 'info loading_data' in a SQL client.
When I had put "ON_ERROR=CONTINUE" , records got partially loaded, i.e until the record with more than max size. But no records after the Error record was loaded.
Was "ON_ERROR=CONTINUE" supposed to skip only the record that has max size and load records before and after it ?
Yes, the ON_ERROR=CONTINUE skips the offending line and continues to load the rest of the file.
To help us provide more insight, can you answer the following:
How many records are in your file?
How many got loaded?
At what line was the error first encountered?
You can find this information using the COPY_HISTORY() table function
Try setting the option strip_outer_array = true for file format and attempt the loading again.
The considerations for loading large size semi-structured data are documented in the below article:
https://docs.snowflake.com/en/user-guide/semistructured-considerations.html
I partially agree with Chris. The ON_ERROR=CONTINUE option only helps if the there are in fact more than 1 JSON objects in the file. If it's 1 massive object then you would simply not get an error or the record loaded when using ON_ERROR=CONTINUE.
If you know your JSON payload is smaller than 16mb then definitely try the strip_outer_array = true. Also, if your JSON has a lot of nulls ("NULL") as values use the STRIP_NULL_VALUES = TRUE as this will slim your payload as well. Hope that helps.

Importing data from multi-value D3 database into SQL issues

Trying to use the mv.NET by bluefinity tools. Made some integration packages with it for importing data from a d3 multi-value database into MS SQL 2012 but seem to be having some trouble with the mapping.
For the VOYAGES table have some commentX fields in the D3 application that are acting quite unwieldy and the INSERT fails after a certain number of rows with the following message
>Error: 0xC0047062 at INSERT, mvNET Source[354]: System.Exception: Error #8: dataReader[0] = LTPAC002 ci.BufferColumnIndex = 52, ci.ColumnName = COMMGROUP(Error #8: dataReader[0] = LTPAC002 ci.BufferColumnIndex = 52, ci.ColumnName = COMMGROUP(The value is too large to fit in the column data area of the buffer.))
at mvNETDataSource.mvNETSource.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPrimeOutput(IDTSManagedComponentWrapper100 wrapper, Int32 outputs, Int32[] outputIDs, IDTSBuffer100[] buffers, IntPtr ppBufferWirePacket)
Error: 0xC0047038 at INSERT, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.The PrimeOutput method on mvNET Source returned error code 0x80131500.The component returned a failure code when the pipeline engine called PrimeOutput().The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.There may be error messages posted before this with more information about the failure.
The value is too large to fit in the column data area of the buffer. -> tried changing the input / outputs types but can't seem to get it right.
In the SQL table the columns are of type ntext.
In the .dtsx job the data type for the columns are of type Unicode String [DT_WSTR] with length 4000 , I guess these are auto-detected.
The import worked for other D3 files like this not sure why it fails for these comment fields.
Running the query on the mv.NET Data Manager ( on the d3 server) times out after 240 seconds so maybe this is the underlying issue?
Any ideas how to proceed? Thank you ~
Most like reason is column COMMGROUP does not have correct data type or some record in source do not fit in output type
To find error record (causing) you have to use on redirect row (property of component failing component ) and get the result set in some txt.csv /or tsv file .
then check data
The exception is being thrown from mv.NET so I suggest you call (or ask your reseller) to call Bluefinity support and ask them about this. You're paying for support, might as well use it. Those programs shouldn't be allowed to throw exceptions like that.
D3 doesn't export Unicode, that might be one issue. But if the Data Manager times-out then I suspect something is wrong in the connectivity into D3. Open a Connection Monitor from the Session Monitor and watch the connection when you make the request. I'm guessing it's either hanging or more probably it's falling into BASIC Debug.
Make sure all D3-side programs related to this are either all Flash-compiled, or all Not Flashed. Your app code will fall into Debug if it's not Flashed but MVNET.BP is.
If it's your program that's in Debug, fix it. If you're not sure which program it is, LIST-RUNTIME-ERRORS in DM.
If it's a MVNET.BP program, again work with Bluefinity. If you are using MVSP for connectivity then the Connection Monitor may be useless, you'll need to change that to an IP (Telnet) connection to see the raw data exchange.

ADLA/U-SQL Error: Vertex user code error

I just have a simple U-SQL that extracts a csv using Extractors.Csv(encoding:Encoding.[Unicode]); and outputs into a lake store table. The file size is small around 600MB and is unicode type. The number of rows is 700K+
These are the columns:
UserId int,
Email string,
AltEmail string,
CreatedOn DateTime,
IsDeleted bool,
UserGuid Guid,
IFulfillmentContact bool,
IsBillingContact bool,
LastUpdateDate DateTime,
IsTermsOfUse string,
UserTypeId string
When I submit this job to my local, it works great without any issues. Once I submit it to ADLA, I get the following error:
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract_Partition[0][0] with error: Vertex user code error.
Vertex failed with a fail-fast error
Vertex SV1_Extract_Partition[0][0].v1 {BA7B2378-597C-4679-AD69-07413A143E47} failed
Error:
Vertex user code error
exitcode=CsExitCode_StillActive Errorsnippet=An error occurred while processing adl://lakestore.azuredatalakestore.net/Data/User.csv
Any help is appreciated!
Since the file is larger than 250MB, you need to make sure that you upload it as a row-oriented file and not a binary file.
Also, please check the reply for the following question to see how you currently can find more details on the error: Debugging u-sql Jobs

Parsing structured syslog with syslog-ng

I am trying to leverage the parsing of structured data feature in syslog-ng. From my firewall, I am forwarding the following message:
<14>1 2012-10-06T11:03:56.493 SRX100 RT_FLOW - RT_FLOW_SESSION_CLOSE [junos#2636.1.1.1.2.36 reason="TCP FIN" source-address="192.168.199.207" source-port="59292" destination-address="184.73.190.157" destination-port="80" service-name="junos-http" nat-source-address="50.193.12.149" nat-source-port="19230" nat-destination-address="184.73.190.157" nat-destination-port="80" src-nat-rule-name="source-nat-rule" dst-nat-rule-name="None" protocol-id="6" policy-name="trust-to-untrust" source-zone-name="trust" destination-zone-name="untrust" session-id-32="9375" packets-from-client="9" bytes-from-client="4342" packets-from-server="7" bytes-from-server="1507" elapsed-time="1" application="UNKNOWN" nested-application="UNKNOWN" username="N/A" roles="N/A" packet-incoming-interface="vlan.0"]
Based on the format of the IETF logs, it appears to be correct, but for some reason the structured data is actually being parsed as the message portion of the log and not being parsed as structured data.
On the syslog-ng side, you need to use either a syslog() source, or a tcp() source with flags(syslog-proto) set, and then the stuff will end up in variables like ${.SDATA.junos#2636.1.1.1.2.36.reason} and so on and so forth, which then you can use as you see fit.