I have 2 projects in Google Big Query.
select * FROM `project1.analytics_1.Table1`
select * FROM `project2.analytics_2.Table2`
When I get data from Google Big Query, the first one works but for the second one I am getting The key didn't match any rows in the table.
Related
I am looking for a way to merge three seperate datasets (.csv format) into one in Azure Synapse and then store it as a new .csv in Azure Blob Storage. I am using the Union data flow based on this tutorial: https://www.youtube.com/watch?v=vFCNbHqWct8
Generally speaking, the extraction and saving of the new file works. However, when merging the files I receive 3x the number of rows as in the source datasets. Each source dataset has 36 entries each. CustomerID ranges from 1-36 in each dataset.
Dataset 1 has 2 columns: CustomerID, loyalty_level
Dataset 2 has 3 columns: CustomerID, name, email
Dataset 3 has 2 columns: CustomerID, salestotal
When I run it, I get a dataset with 108 rows, instead of the aspired 36. Where is my mistake? Am I approaching the process incorrectly?
You are getting 108 rows because the union transformation is combining the 3 separate datasets into 1. If you watch the video in the union transformation documentation page it describes the behavior of this transformation.
To get your desired results you need to use the join transformation. Using the CustomerID as your join condition this will join the datasets together keeping your row count at 36.
One thing to watch out for is the type of join you choose. If you have customers in one file that are not in another you can drop records. This post describes the different types of joins very well. I suggest you get a firm understanding of this different types of joins.
I have created the following table Called GDN All accounts which resulted from the following query:
SELECT * FROM `GDNA`
UNION ALL
SELECT * FROM `GDNB`
UNION ALL
SELECT * FROM `GDNC`
UNION ALL
SELECT * FROM `GDND
SELECT * FROM `GDNE`
However, once I have opened the table in preview mode it did not show any values only it did when I have re-run the query.
Moreover, my final aim is to connect this table to PowerBi, still once in PowerBi and connected to the data source no values are showing up only nulls.
Someone can help me with this?
Thanks
Connect and collect data separately from each table. Once this is done, first check all tables are having your expected data or not.
If you found all tables containing expected data, you can now create a new table using Append option in power query. This new table will contain all data together as per your expectation.
Remember, in preview mode not all data shown always if there are big amount of data in the source. You will get the complete list in table visual in the report.
On Google Cloud Platform, I have multiple billing accounts. For each billing account, I created a scheduled export to BigQuery that executes multiple times a day.
However, I'd like to have an overview of all of my billing accounts. I want to create a master data table with all of my billing accounts combined.
All of the data tables have the exact same schema. Some sample fields:
cost:FLOAT
sku:STRING
service:STRING
I have already successfully joined my two data tables with a JOIN query:
SELECT * FROM `TABLE 1`
UNION ALL
SELECT * FROM `TABLE 2`
After I've made this query, I clicked "Save results" --> "BigQuery Table." However, I believe this is just a one-time export.
I'd like to update this on a regular basis (say, once every 3 hours) without duplicating the entries.
How do I continuously combine these data tables while making sure I don't have duplicate rows? In other words, for new entries that come into both tables, how do I just append only those new entries to my new master table?
Use a view:
create view v_t as
select * from `TABLE 1`
union all
select * from `TABLE 2`;
This will always be up-to-date, because the tables are referenced when you query them.
Note: You can create the view using the BQ query interface by running the query and selecting "create view". Actually, you don't need to run the query, but I always do just to be sure.
im planning to build a new ads system and we are considering to use google bigquery.
ill quickly describe my data flow :
Each User will be able to create multiple ADS. (1 user, N ads)
i would like to store the ADS impressions and i thought of 2 options.
1- create a table for impressions , for example table name is :Impressions fields : (userid,adsid,datetime,meta data fields...)
in this options of all my impressions will be stored in a single table.
main pros : ill be able to big data queries quite easily.
main cons: table will be hugh, and with multiple queries, ill end up paying too much (:
option 2 is to create table per ads
for example, ads id 1 will create
Impression_1 with fields (datetime,meta data fields)
pros: query are cheaper, data table is smaller
cons: todo big dataquery sometimes ill have to create a union and things will complex
i wonder what are your thoughts regarding this ?
In BigQuery it's easy to do this, because you can create tables per each day, and you have the possibility to query only those tables.
And you have Table wildcard functions, which are a cost-effective way to query data from a specific set of tables. When you use a table wildcard function, BigQuery only accesses and charges you for tables that match the wildcard. Table wildcard functions are specified in the query's FROM clause.
Assuming you have some tables like:
mydata.people20140325
mydata.people20140326
mydata.people20140327
You can query like:
SELECT
name
FROM
(TABLE_DATE_RANGE(mydata.people,
TIMESTAMP('2014-03-25'),
TIMESTAMP('2014-03-27')))
WHERE
age >= 35
Also there are Table Decorators:
Table decorators support relative and absolute <time> values. Relative values are indicated by a negative number, and absolute values are indicated by a positive number.
To get a snapshot of the table at one hour ago:
SELECT COUNT(*) FROM [data-sensing-lab:gartner.seattle#-3600000]
There is also TABLE_QUERY, which you can use for more complex queries.
Currently I have a web application (JSF 2.0 and SQL Server) where values for tDataTable are retrieved using a Query A and then based on retrieved result we run another query to retrieve associated details.
for example :
Query A: returns all Restaurants in User selected Area (let's imagine there are 1000s)
Query B: Based on Returned Result (Above) , Query B retries reviews for each restaurant using the RestaurantPK value.
So the way it's done now, if there are 200 rows once Query A has executed, We call Query B 200 Times to retrieve review information for each restaurant.
So my question is how can i make more efficient? What is the standard practice in these cases? I chose the title as "Best approach to embed Sub-query result" as i suspect a sub-query will need to be used in Query A , but cannot figure out how it would work, given for each restaurant(Row) there could be 10-15 reviews).
UPDATE:
I had tried JOIN before posting my question here but problem was i was only getting one review for each restaurant instead of all available reviews for each restaurant. I'm starting to think the only way to do this would be to Write a storedProcedure where once A is executed I'll store the result in a #temp table, loop through the result and retrieve all the reviews for each restaurant and insert them back into the #temp table where if one resturant has 10 reviews, they will be separated by ; and back in java I'll split them back into a meaningful format. that way i am still returning just one row for each restaurant but with embedded reviews as one of columns.
so i would have:
Resturnat, Location, Phone......Reviews
X,Sydney,1234,.....,AAAAAAA;BBBBBBBB;CCCCCCCCC;
Comments?
Use joins in SQL, so you can get all data in 1 query.:
Select rest.id, reviews.text
from restaurants rest
inner join reviews on reviews.rest_id = rest.id
where some conditions for restaurants
You haven't said how you are 'displaying' this data so can't recommend an absolute best approach. I would say your 2 options are
A join that returns all reviews - the restaurant information will be duplicated for each row so when displaying you should loop through and check if the current row's RestaurantPK is different to the previous and only display the info then.
Return 2 tables form the SQL server. Load the results of Query A into a #temp or #temp table and then do something like
SELECT * from #temp
SELECT * FROM reviews WHERE RestaurantPK IN (SELECT RestaurantPK from #temp)
although obviously selecting your columns by name rather than *