Google BigQuery Data Extraction

Google BigQuery Data Extraction - google-bigquery

I'm trying to extract data from BigQuery, but I'm not being able to extract all the data, from the beginning, since it aggregates the data per day, in different datasets.
I tried do delete the final part of the source table "FROM independent-tea-354108.152182944.**ga_sessions_20230126**" but it retrieves an error message.

Related

BigQuery - JSON_QUERY - Need to find a common path for multiple rows

I am new to Bigquery and stackoverflow as well. I am dealing with multiple rows of JSON codes in a given column and they are all similar but they have different Id's. I am trying to extract DAYS from them.
Currently I am using the following formula to extract DAYS.
SELECT JSON_QUERY(Column,'$.data.extract.b2ed07ab-8f70-47e1-9550-270a23ec5e37.sections[0].TIME[0].DAYS') FROM DEFAULT_TABLE_LOCATION
This gives me what I want but there are multiple rows of data where the only difference is the ID which I've mentioned above b2ed07ab-8f70-47e1-9550-270a23ec5e37.
Is there a way for me to use something like
SELECT JSON_QUERY(Column,'$.data.extract.????????-????-????-????-????????????.sections[0].TIME[0].DAYS') FROM DEFAULT_TABLE_LOCATION
to get the same data stored in different rows?
In summary, Is it possible for me to have a common JSON path to extract values stored in multiple rows, given that they have the same char length to find the DAYS?
Here's a sample code. I have omitted most of the irrelevant code as it was too big to paste here.
{"data":{"CONFDETAILS":[...Some XYZ code...},"extract":{"b2ed07ab-8f70-47e1-9550-270a23ec5e37":{.......},"entities":null,"sections":[{.......,"TIME":[{"DAYS":[false,false,false,false,false,false,true],"end":"23:59","start":"00:00"},{"DAYS":[true,false,false,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,true,false,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,true,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,true,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,false,true,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,false,false,true,false],"end":"23:59","start":"00:00"}],........}
and to give some more perspective, the data in the following rows look like this. With just the ID being different.
{"data":{"CONFDETAILS":[...Some XYZ code...},"extract":{"e520ab02-6ec1-4fdf-b810-0d1b74fc719c":{.......},"entities":null,"sections":[{.......,"TIME":[{"DAYS":[false,false,false,false,false,false,true],"end":"23:59","start":"00:00"},{"DAYS":[true,false,false,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,true,false,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,true,false,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,true,false,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,false,true,false,false],"end":"23:59","start":"00:00"},{"DAYS":[false,false,false,false,false,true,false],"end":"23:59","start":"00:00"}],........}

Trouble uploading single column csv to bigquery with split columns

I'm trying to upload a dataset to bigquery so that i can query the data. The dataset is currently in a csv, with all the data for each row in one column, split by commas. I want to have the data split into columns using the comma as a delimiter.
When trying to upload using autodetect schema, the 10 columns have been detected, but are called 'string_0, string_1, string_2 etc' and the rows still have all the data in the first column.
When trying to upload by manually inputting the schema, i get these errors:
CSV table encountered too many errors, giving up. Rows: 1; errors: 1.
CSV table references column position 9, but line starting at position:117 contains only 1 columns.
On both occasions I set header rows to skip = 1
Here's an image of the dataset.
Any help would be really appreciated!

I see here a three potential reasons for the error you're hitting:
Source data CSV file structural problem - the file does not correspond to the RFC 4180 specification prerequisites, i.e. used untypical line-breaks(line delimiters);
Bigquery sink table schema mismatch - i.e. missing a
dedicated column for a particular input data;
Bigquery schema type mismatch - parsing a table column that owns a
type that differs from input one.
Please find also more particularities for Bigquery auto-detect schema method, loading CSV format data, that can help you to solve above mentioned issue.

Google Data Studio - Max Record Limit?

I have a table in google bigquery with 1.4mil records and parcel number as a unique field, I need to be able to extract the data as a csv.
However, when I explore in data studio and break it down by parcel, data studio puts a limit of exactly 1.1mil records, even worse, when I export it as a .csv there are only 750k lines.
Is there a limit in data studio?
Please help!!

Yes. Currently (March 2019), there's a limit of ~1m rows when fetching data from BigQuery.
If you are trying to extract 1m+ rows as CSV, ideally, you should be doing it from the BigQuery end. See Exporting BigQuery table data. Data Studio should work as a data exploration tool on top of BigQuery.

SQL-Pandas : transform with Sum on past and new data

I have some user records in xlsx file and I am storing those records in SQL Database by applying some groupby-transform-Sum() functionality of pandas if there multiple instances for particular combination monthly basis (Exa. Id-Location-Designation-Day-PolicySold)
Until now all of the past and newly added data used to maintained in xlsx file only, but moving on only recent 3 months data will be available and I need to store this new data into SQL Db ensuring past data which is already present in SQL DB( for several months,years) intact and ensuring no duplicate entries .
Can anyone suggest me efficient approach to handle this?
My current approach is :
1. Read the past data from SQL table before performing new write operation.
2. Read new data from xlsx.
3. Merge both
4. Apply Groupby.transformation with Sum() to convert daily to monthly data

copy a field from one data source to other in tableau

I wanted to get rid off one data source in Tableau, that's why instead of using 2 different data source for one dashboard, I wanted to copy all relevant fields from one data source to other. Is there any way in Tableau, by which I can copy-paste tos field from one to other data source?
In the attached screenshot, I wanted to copy the advisor sales field in data source biadvisorSalesmonth24 to bitransactionPartnerDay365:

You cannot make schema or structure changes to a table / datasource from within Tableau. If advisor sales is not in the bitransactionPartnerDay365 data source, then you will have to keep both data sources in the workbook and join them together.
Now, if you are familiar with the datasets and know the necessary table layout, you could write a custom SQL command and use that SQL command to retrieve the desired data as a single data source.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Google BigQuery Data Extraction - google-bigquery

Related

BigQuery - JSON_QUERY - Need to find a common path for multiple rows

Trouble uploading single column csv to bigquery with split columns

Google Data Studio - Max Record Limit?

SQL-Pandas : transform with Sum on past and new data

copy a field from one data source to other in tableau

Categories

Resources