How to use loop to find related object using Pentaho Data Integration - migration

I want to identify the bad/invalid records so that i can add in a separate SQL Table. For example, we have an account object. And i want to find bad accounts. But i need to apply some filters on contact object. If conditions satisfy based on contact then i want to inserts those invalid account records in SQL Table.
I don't want to directly query from contact. I want to query using account but conditions should be used from contact.
Do anyone knows what is the best way to perform loop in Pentaho? Check each record for contact , if all contact's condition satisfy then add Account id in table. If one of the contact record doesn't satisfy condition. The relevant account should not be added in SQL Table
For Example:
On Account "A" we have 10 contacts
if the email field is empty on all 10 contacts then add Account in SQL table(As bad data)
if on two of contact rcords has email field populated but 8 of them are blank then Account id shouldn't be added in SQL table
How we can better implement this scenario using Pentaho? Any help matters
Thanks

So you can create a transformation similar to this:
You have a query with the different account contacts
Order the query data by account
Group the information by accounts and calculate the maximum ContactMail (so if all mails in contacts are null, the max will be a null, is the result of that step is shown in the Preview data part of my screenshot)
Filter rows by MaxContactMail IS NOT NULL
These could be the basic steps, you'll need to add more steps or perform more than one transformation depending on the complexity of your data.

Related

Does the Google Ads ad_group_ad has an unique atribute?

I'm writing a code to load data from Google Ads api to BigQuery table by using Cloud Functions, the process query a table called ad_group_ad but i'm struggling when trying to validate if there's duplicated rows at my destination.
By reading the docs I was expecting to find some attribute used to identifier a column or a group of columns that represents the table key. May this question seem obviously but i ain't having progress when trying to google this.
Is there a way to identifies if there's is duplicated rows? I'm not using any group by instruction when collecting, just a simple select like the example below:
SELECT
segments.ad_network_type,
campaign.name,
ad_group.name,
ad_group.id,
so
on,
and,
so,
forth
FROM ad_group_ad
WHERE segments.date = ?
The combination of ad ID and ad group ID is guaranteed to be unique.
However, as soon as you include segments in your select clause, you'll get multiple rows with the same IDs. So the combination of ad_group.id, ad.id plus whatever segment fields you need should be a candidate key.

Tableau count values after a GROUP BY in SQL

I'm using Tableau to show some schools data.
My data structure gives a table that has all de school classes in the country. The thing is I need to count, for example, how many schools has Primary and Preschool (both).
A simplified version of my table should look like this:
In that table, if I want to know the number needed in the example, the result should be 1, because in only one school exists both Primary and Preschool.
I want to have a multiple filter in Tableau that gives me that information.
I was thinking in the SQL query that should be made and it needs a GROUP BY statement. An example of the consult is here in a fiddle: Database example query
In the SQL query I group by id all the schools that meet either one of the conditions inside de IN(...) and then count how many of them meet both (c=2).
Is there a way to do something like this in Tableau? Either using groups or sets, using advanced filters or programming a RAW SQL calculated fiel?
Thanks!
Dubafek
PS: I add a link to my question in Tableu's forum because you can download my testing workbook there: Tableu's forum question
I've solved the issue using LODs (specifically INCLUDE and EXCLUDE statements).
I created two calculated fields having the aggregation I needed:
Then I made a calculated field that leaves only the School IDs that matches the number of types they have (according with the filtering) with the number of types selected in the multiple filter (both of the fields shown above):
Finally, I used COUNTD([Condition]) to display the amounts of schools matching with at least the School types selected.
Hope this helps someone with similar issue.
PS: If someone wants the Workbook with the solution I've uploaded it in an answer in the Tableau Forum

Querying via a form with value from textbox as criteria

The situation: metadata about biological specimens are collected in an Access table. The specimens come from human patients and patient data are collected in a separate table. To limit the amount of private health information we have hanging around, the patient database must be updated with new patients only when we actually receive samples from them.
So that the data entry workers know when they need to update the patient table, I want a button in the specimen data entry form that will pass an entered patient id value as criteria to a query.
The query looks like this right now:
SELECT Patients.[Patient id]
FROM Patients
WHERE (((Patients.[Patient id])=[Forms]![Specimen entry]![patient id]));
but it never has results, even when I run it from records that I know correspond to patients in the patient table. How do I fix this?
Suggestions about what to call this situation so that I can make better searches about it would also be appreciated. I'm an Access novice.
The query looks correct, but make sure the WHERE clause is comparing numbers to numbers or strings to strings (not a number to a string). Also confirm that the form and textbox names are correct. A quick test using your query worked for me.
Depending on how you plan to present the information, you can also dynamically create the query in VBA and then pass the information to the form.
For searching, I'd recommend some combination of access, dynamic, query, and vba.
alternative option
If you're only looking to see if a single patient exists in the table, it may be simpler to use the dlookup function:
If IsNull(DLookup("[Patient ID]", "Patients", "[Patient ID]='" & Me.Patient_ID & "'")) Then MsgBox "does not exist"
This will check to see if the patient exists (return a number) or does not exist (returns NULL).
https://support.office.com/en-us/article/DLookup-Function-8896cb03-e31f-45d1-86db-bed10dca5937

SQL Server 2012 Managment Studio Basic SELECT queries

I am new to SQL Server. I have been assigned to do some simple queries to start off, then eventually move on to more complex queries.
I have spent a lot of time on this website: http://www.w3schools.com and I understand it, I think, but then when I go back to my company's database, I find myself searching from many, many, different tables with different information.
For example, a table would say [Acct_Name] and the query comes back with not the correct account name (s) that I need. Any advice that you think might help me? Thank you.
It sounds like you are looking to limit your results to specific accounts. There are many ways to go about this, so no one will be able to give you a all encompassing answer but if you are looking to just pull a single account
SELECT * FROM (your table name) WHERE Acct_Name = 'the account name'
The * means you are selecting all columns in the table and your WHERE clause is where you set your search conditionals, like account name or by account ID. If you had a account creation date, you could get all accounts created on or before a date like this
SELECT * FROM (your table name) WHERE Created < '2016-06-01 00:00:00'
Replace the column name 'Created' with the column that holds the date field of account creation
Learning the WHERE clause and what you can do there to limit your results will get you on a solid footing to start, from there you will want to learn JOINs and how to link tables by primary keys.
Code academy has some great tutorials https://www.codecademy.com/learn/learn-sql

Return rows that have every kind of instances that appear in a table for a given attribute

E.g, given the schema:
account(id, type, cname)
I want to return the cnames of customers with every type of account that appears in account.
The following query will do:
SELECT cname, type
FROM account AS cust_account
WHERE NOT EXISTS (
SELECT type
FROM account
EXCEPT
SELECT type
FROM account
WHERE account.cname = cust_account.cname
);
However, I'm having trouble understanding the logic behind it. In particular, I don't see what's going on with the EXCEPT statement. Can anybody clarify this? Thank you.
As the WHERE NOT EXIST clause processes each record, it runs the following steps for the customer in that record. If it sees an empty list from step #3 it will include the customer in the query results (NOT EXISTS) .
Make a list of all the possible account types.
Make list of all accounts types that THIS customer has.
Subtract the list in step #2 from the list in step #1. This will create a list of any account types that THIS customer does NOT have. This will be empty for customers that have all accounts.
To be clear, since the WHERE NOT EXIST clause contains a correlated-sub query, steps 1-3 (which are performed by the sub query) are run one time for EACH row in the account table. For example, if the accounts table contains 100,000, the sub query will be run 100,000 times.