I have a big file (say 100K records) with header and trailer. Trailer contains information regarding number of records in this file. is there any way for WSO2 ESB to load this entire file for processing (say reading each row and perform some validation and than sending validated data to external end point) and validate number of records is matching against data present in trailer ?
Related
I'll get a list of coupons by mail. That needs to be stored somewhere somehow (bigquery?) where I can request and send it to the user. The user should only be able to get 1 unique code, that was not used beforehand.
I need the ability to get a code and write, that it was used, so the next request gets the next code...
I know it is a completely vague question but I'm not sure how to implement that, anyone has any ideas?
thanks in advance
Thr can be multiples solution for same requirement, one of them is given below :-
Step 1. Try to get coupons over a file (CSV, JSON, and etc) as per your preference/requirement.
Step 2. Load Source file to GCS (storage).
Step 3. Write a Dataflow code which read data from GCS (file) an load data to a different Bigquery table (tentative name: New_data). Sample code.
Step 4. Create a Dataflow code to read data from Bigquery table New_data and compare it with History_data and identify new coupons and write data to a file on GCS or Bigquery table. Sample code.
Step 5. Schedule entire process over an orchestrator/Cloud scheduler/Cron tab job.
Step 6. Once you have data you can send it to consumers through any communication channel.
I have some sample data that I've been loading into Google BigQueries. I have been importing the data in ndjson format. If I load the data all in one file, I see them show up in a different order in the table's preview tab than when I sequentially import them one ndjson line at a time.
When importing sequentially I wait till I see the following output:
Waiting on bqjob_XXXX ... (2s) Current status: RUNNING
Waiting on bqjob_XXXX ... (2s) Current status: DONE
The order the rows show up seems to match the order I append them as the job importing them seem to finish before I move on to the next. But when loading them all in one file, they show up in a different order than they exist in my data file.
So why do the data entries show up in a different order when loading in bulk? How are the data entries queued to be loaded and also how are they indexed into the table?
BigQuery has no notion of indexes. Data in BigQuery tables have no particular order that you can rely on. If you need to get ordered data out of BigQuery you will need to use explicit ORDER BY in your query - which btw quite not recommended for large results as it increases resource cost and can end up with Resources Exceeded error.
BigQuery internal storage can "shuffle" your data rows internally for the best / most optimal performance of querying. So again - there is no such things as physical order of data in BigQuery tables
Oficial language in docs is like this - line ordering is not guaranteed for compressed or uncompressed files.
I have a file size of more than 5gb on google cloud storage , I am not able to load that file to bigquery table.
errors thrown are:
1) too many errors
2) Too many values in row starting at position: 2213820542
I searched and found it could be because of max file size reached., so my question is how can i upload file having size greater than quota policy, plz help me. I have a billing account on bigquery.
5gb is OK. The error says for the row starting at position 2213820542, it has more columns than specified, e.g. your provided a schema of n columns, and that row has more than n columns after splitting.
I am stuck as in how to identify the different connections(flows) in trace file.
The following is the format in which the trace file is being created
event
time
Source node
Destination node
Packet type
Packet size
flags
fid
Source address
Dest. address
Seq. number
Packet id
If you take a look at the frame format of a trace file, the 8th column is the flow id which you can extract using an awkfile.
Read more on awk file and how you can isolate or count sent and received packets along with flow id. Once you have that just divide the two.
I have created a webtest which is a series of web service requests. My data source contains a list of mobile numbers and these mobile numbers can be of two types - A and B. The problem is that data source contains the mix of A and B. When the test runs, it loads one mobile number from the data source (XML file). I want to determine when the test is running as to what is the type of the mobile number (A or B)! Because depending on that I will be sending appropriate message to the web server.
It is however possible for me to create a text file which contains key value pairs (mobile number, type) before running the tests. However adding a plugin which reads the whole file and then finds the mobile number type will be too slow. Is it possible to have these mappings stored in memory during the entire duration of the test? So that I can just query them?
Thanks
Amare
Instead of using the XML file as the data source, use your new text file as the data source.
For example, if your data source is DataSource1 and your file is numbers.csv, and you have columns mobile number and type then in your test you can refer to the following context parameters:
DataSource1.numbers#csv.mobile#number
DataSource1.numbers#csv.type
Use a pair of String Comparison Conditional Rules to decide which request to execute depending on the value of DataSource1.numbers#csv.type.