I'm writing a code to load data from Google Ads api to BigQuery table by using Cloud Functions, the process query a table called ad_group_ad but i'm struggling when trying to validate if there's duplicated rows at my destination.
By reading the docs I was expecting to find some attribute used to identifier a column or a group of columns that represents the table key. May this question seem obviously but i ain't having progress when trying to google this.
Is there a way to identifies if there's is duplicated rows? I'm not using any group by instruction when collecting, just a simple select like the example below:
SELECT
segments.ad_network_type,
campaign.name,
ad_group.name,
ad_group.id,
so
on,
and,
so,
forth
FROM ad_group_ad
WHERE segments.date = ?
The combination of ad ID and ad group ID is guaranteed to be unique.
However, as soon as you include segments in your select clause, you'll get multiple rows with the same IDs. So the combination of ad_group.id, ad.id plus whatever segment fields you need should be a candidate key.
Related
I want to identify the bad/invalid records so that i can add in a separate SQL Table. For example, we have an account object. And i want to find bad accounts. But i need to apply some filters on contact object. If conditions satisfy based on contact then i want to inserts those invalid account records in SQL Table.
I don't want to directly query from contact. I want to query using account but conditions should be used from contact.
Do anyone knows what is the best way to perform loop in Pentaho? Check each record for contact , if all contact's condition satisfy then add Account id in table. If one of the contact record doesn't satisfy condition. The relevant account should not be added in SQL Table
For Example:
On Account "A" we have 10 contacts
if the email field is empty on all 10 contacts then add Account in SQL table(As bad data)
if on two of contact rcords has email field populated but 8 of them are blank then Account id shouldn't be added in SQL table
How we can better implement this scenario using Pentaho? Any help matters
Thanks
So you can create a transformation similar to this:
You have a query with the different account contacts
Order the query data by account
Group the information by accounts and calculate the maximum ContactMail (so if all mails in contacts are null, the max will be a null, is the result of that step is shown in the Preview data part of my screenshot)
Filter rows by MaxContactMail IS NOT NULL
These could be the basic steps, you'll need to add more steps or perform more than one transformation depending on the complexity of your data.
I'm using Tableau to show some schools data.
My data structure gives a table that has all de school classes in the country. The thing is I need to count, for example, how many schools has Primary and Preschool (both).
A simplified version of my table should look like this:
In that table, if I want to know the number needed in the example, the result should be 1, because in only one school exists both Primary and Preschool.
I want to have a multiple filter in Tableau that gives me that information.
I was thinking in the SQL query that should be made and it needs a GROUP BY statement. An example of the consult is here in a fiddle: Database example query
In the SQL query I group by id all the schools that meet either one of the conditions inside de IN(...) and then count how many of them meet both (c=2).
Is there a way to do something like this in Tableau? Either using groups or sets, using advanced filters or programming a RAW SQL calculated fiel?
Thanks!
Dubafek
PS: I add a link to my question in Tableu's forum because you can download my testing workbook there: Tableu's forum question
I've solved the issue using LODs (specifically INCLUDE and EXCLUDE statements).
I created two calculated fields having the aggregation I needed:
Then I made a calculated field that leaves only the School IDs that matches the number of types they have (according with the filtering) with the number of types selected in the multiple filter (both of the fields shown above):
Finally, I used COUNTD([Condition]) to display the amounts of schools matching with at least the School types selected.
Hope this helps someone with similar issue.
PS: If someone wants the Workbook with the solution I've uploaded it in an answer in the Tableau Forum
So, basically, I have two tables called "dadoscatalogo" and "palavras_chave", with a common field, "patrimonio" which is the primary key of "dadoscatalogo".
I'm using a servlet to connect to the database with these tables, and passing a query to search for entries based on some search criteria that's defined by the user.
Now, since the user can search for entries based on information present in both tables, I need to do an INNER JOIN, and then use WHERE to search for that info. I'm also using LIKE, because the user may pass just part of the information, and not all of it.
So, to test it all out, I tried passing it a few parameters to work with, and see how it went. After some debugging, I found out that there was some mistake in the query. But I can't seem to be able to point out exactly what it is.
Here's the test query:
SELECT dadoscatalogo.patrimonio
FROM dadoscatalogo
INNER JOIN palavras_chave
ON dadoscatalogo.patrimonio=palavras_chave.patrimonio
WHERE dadoscatalogo.patrimonio LIKE '%'
AND dadoscatalogo.titulo LIKE '%tons%'
OR palavras_chave.palchave LIKE '%programming%';
So, basically, what I'm trying to do with this query is, get all the primary keys from "dadoscatalogo" that are linked to a record with a "titulo" containing "tons", or a "palchave" containing "programming".
PS. Sorry for the names not being in English, hopefully it won't be too much of a distraction.
EDIT: Right now, the tables don't have much:
This is the dadoscatalogo table:
http://gyazo.com/fdc848da7496cea4ea2bcb6fbe81cb25
And this is the palavras_chave table:
http://gyazo.com/6bb82f844caebe819f380e515b1f504e
When they join, I'm expecting it to have 4 records, and it would get the one with patrimonio=2 in dadoscatalogo (which has "tons" in titulo), and the one with palchave=programming (which would have patrimonio=1)
As per my understanding run below query:
SELECT dadoscatalogo.patrimonio
FROM dadoscatalogo
INNER JOIN palavras_chave
ON dadoscatalogo.patrimonio=palavras_chave.patrimonio
WHERE dadoscatalogo.titulo LIKE '%tons%'
OR palavras_chave.palchave LIKE '%programming%';
I'm trying to find the best way to query both news feed and wall using a single request.
First attempt:
Query me/home and me/feed in batch request.
Problem: querying me/home gives me bad results due to Graph API bugs (showing blocked items and on the contrary not showing some items that should be shown) so I decided to change to FQL which seems to handle it much better.
Second attempt:
Use single batch request to query:
(1) me/feed directly.
(2) fql.query for stream table with filter_key set to 'others'.
Problem: Needs to also query for user names because the stream table contains only ids.
Third attempt:
Use batch request to query:
(1) me/feed directly
(2) fql.multiquery for stream table with filter_key set to 'others' and the names table with "WHERE id IN (SELECT actor_id FROM #stream)".
Problem: Fails. It returns "Error: batch parameter must be a JSON array" although it is a json array.
Fourth Attempt:
Use fql.multiquery to get news feed stream, wall stream and names.
Problem: I have no idea how to get a view similar to me/feed using FQL. The best I could get is a list of all my own posts but it doesn't show photos the user is tagged in (so I guess more things are missing).
Appreciate any hints.
Due to FQL not doing SQL style joins, getting information from multiple tables in one query is currently impossible.
Use FQL on the stream table to get the list of posts you want to display be sure to grab the source_id. The source_id can be a user id, page id, event id, group id, and there may be more objects too, just don't remember off the top of my head. (You may also want to do similar caching of the actor_id, target_id and viewer_id)
Cache the source_ids in a dictionary style data cache with source_id being the PK.
Loop thru the cache for ones you don't have information on
Try grabbing the information from the user table based upon id, then next the page table, then event table, and group table until you can find what that ID belongs to. Store the information in your cache
For display merge together the stream table items with the source_id information.
I'm developing a website with a custom search function and I want to collect statistics on what the users search for.
It is not a full text search of the website content, but rather a search for companies with search modes like:
by company name
by area code
by provided services
...
How to design the database for storing statistics about the searches?
What information is most relevant and how should I query for them?
Well, it's dependent on how the different search modes work, but generally I would say that a table with 3 columns would work:
SearchType SearchValue Count
Whenever someone does a search, say they search for "Company Name: Initech", first query to see if there are any rows in the table with SearchType = "Company Name" (or whatever enum/id value you've given this search type) and SearchValue = "Initech". If there is already a row for this, UPDATE the row by incrementing the Count column. If there is not already a row for this search, insert a new one with a Count of 1.
By doing this, you'll have a fair amount of flexibility for querying it later. You can figure out what the most popular searches for each type are:
... ORDER BY Count DESC WHERE SearchType = 'Some Search Type'
You can figure out the most popular search types:
... GROUP BY SearchType ORDER BY SUM(Count) DESC
Etc.
This is a pretty general question but here's what I would do:
Option 1
If you want to strictly separate all three search types, then create a table for each. For company name, you could simply store the CompanyID (assuming your website is maintaining a list of companies) and a search count. For area code, store the area code and a search count. If the area code doesn't exist, insert it. Provided services is most dependent on your setup. The most general way would be to store key words and a search count, again inserting if not already there.
Optionally, you could store search date information as well. As an example, you'd have a table with Provided Services Keyword and a unique ID. You'd have another table with an FK to that ID and a SearchDate. That way you could make sense of the data over time while minimizing storage.
Option 2
Treat all searches the same. One table with a Keyword column and a count column, incorporating SearchDate if needed.
You may want to check this:
http://www.microsoft.com/sqlserver/2005/en/us/express-starter-schemas.aspx