How do I combine two tables that have only a few similar columns? - sql

I'm attempting to combine two tables, both of which aren't related in any way except for a few columns (ID, Created Date, Country, etc.). In essence, I simply want to append one table to another. However, I would like to combine the columns that are similar and add on the columns that are not similar. I've attempted a Union, but my tables don't have the same number of columns. Currently, I'm working with this:
SELECT * FROM `leads`, `opportunity`
where `leads`.`Id` = `opportunity`.`Id`
which doesn't really work when I want to use this new query as a subquery elsewhere. Additionally, the fields in each table can change at any time, so I’m never sure which columns are matching or non-matching. I simply want to append the rows from one table onto the other while automatically combining columns with identical names. I feel like I'm missing something obvious...
NOTE: I am doing this within DOMO, so I have a few more limitations than I normally would.

You can use joins
SELECT * FROM `leads` JOIN `opportunity`
on `leads`.`Id` = `opportunity`.`Id`
and to get only selected columns
SELECT leads.column_name, opportunity.column_name FROM `leads` JOIN `opportunity`
on `leads`.`Id` = `opportunity`.`Id`

Related

How can I add blank columns to the results of a SQL query?

I'm working on a query that pulls demographic information for people who have visited a location. However, the fields required for this report aren't all available in the DB, and some will need to be added manually from a separate Excel file after I've exported the results of my query. The person who will be responsible for merging the two files asked if it would be possible to create blank columns so they can more easily see where the missing data needs to go, and I wasn't sure how to go about that. Obviously those blank columns could just be created in the exported spreadsheet, but I wondered if there was a way to add them in the SQL query itself.
My SELECT statement currently looks something like this—I've just added comments to mark the missing fields so I can keep track of what order all the fields for this report need to be in.
SELECT DISTINCT
PersonID,
PersonFName,
PersonLName,
PersonDOB,
VisitID,
--StaffFName,
--StaffLName,
FacilityPhone,
FacilityAddress,
...and so on
Since those two staff name fields don't exist in my DB, I obviously can't actually include them in the SELECT list. But is there a way to still include those two fields as blank columns in my query results? Something along the lines of "select [nothing] as StaffFName"?
Just add literal nulls to the select clause:
SELECT DISTINCT
PersonID,
PersonFName,
PersonLName,
PersonDOB,
VisitID,
null as StaffFName,
null as StaffLName,
FacilityPhone,
FacilityAddress,
...
Or, if you prefer, you can use empty strings instead:
...
'' as StaffFName,
'' as StaffLName,
...
But null is the canonical way to represent the absence of data.

Bigquery - remove duplicates of certain columns, but not all

I have two tables I am left joining together. The first tables has transnational level detail, causing the key I join to the second table to duplicate. When I left join the second table, the measure "company_spend" is highly inflated.
I need a way to keep only a single value of the duplicated data, and my thought was to run a distinct function on only those columns, but I am not seeing that Bigquery supports distinct functions on only a few columns, but not all.
SELECT UPPER(cwnextt.Current_Contract_Number) AS Current_Contract_Number,
UPPER(cwnextt.Replacement_Contract_Number) AS Replacement_Contract_Number,
UPPER(cwnextt.Current_Contract_Name) AS Current_Contract_Name,
UPPER(cwnextt.Supplier_Top_Parent_Entity_Code) AS Supplier_Top_Parent_Entity_Code,
UPPER(cwnextt.Supplier_Top_Parent_Name) AS Supplier_Top_Parent_Name,
UPPER(cwnextt.company_Entity_Code) AS company_Entity_Code,
UPPER(cwnextt.Facility_Name) AS Facility_Name,
smart.company_Spend AS companySpend
FROM `test_etl_field.contracts_with_member_entity_codes_test_view_2` cwnextt
--this table is what is causing the below table to duplicate,
--but I need all of this data AS well in its current format.
LEFT JOIN `test.trans_analysis` tsa
ON TRIM(UPPER(cwnextt.company_entity_code)) = TRIM(UPPER(tsa.company_entity_code))
AND TRIM(UPPER(cwnextt.Supplier_Top_Parent_Entity_Code)) = TRIM(UPPER(tsa.manufacturer_top_parent_entity_code))
AND TRIM(UPPER(cwnextt.Current_Contract_Name)) = TRIM(UPPER(tsa.contract_category))
AND cwnextt.spend_period_yyyyqmm = tsa.spend_period_yyyyqmm
--this table contains "company_spend" which is now duplicated
LEFT JOIN `test_etl_field.ecr_smart_data` smart
ON smart.company_entity_code = cwnextt.company_entity_code
AND (smart.contract_number = cwnextt.current_contract_number
OR smart.contract_number = cwnextt.replacement_contract_number)
AND smart.month_key = cwnextt.spend_period_yyyyqmm
If something can be created that will keep company_spend from duplicating on the second left join, that is what I am after.
Not sure to understand all the details of your problem but here's a fact from BigQuery doc :
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows
and returns only the remaining rows.
You can't apply DISTINCT on specific columns because it doesn't make sense. Let's say you have 4 columns and call DISTINCT on 3 columns, what is SQL supposed to do with the last one ?
You must tell SQL which value to keep for the remaining column and GROUP BY is the right solution here.
So if you want to:
Remove a column that has been duplicated : Just adjust your SELECT to get only the columns you want
Remove lines that have the same value in specific columns : I would suggest a GROUP BY on the targeted column and taking the aggregation you want (first, avg, sum or whatever) for the remaining ones.
Remove the value from a row if another row has the same : You may not want to do that. A row has to keep its value and you won't get it back. Besides, same problem, which row do you want to keep ?
Hope this helps ! Feel free to give clarification on your problem if you want more specific answers.
While I couldn't resolve this issue in SQL, I used Tableau via a FIXED LOD to aggregate the data passed duplicates so the end user could visualize the output with accuracy. Not ideal, but the SQL route wasn't make sense.

Update JOIN table contents

I have a table joined from two other tables. I would like this table to stay updated with entries in the other two tables.
First Table is "employees"
I am using the ID, Last_Name, and First_Name.
And the second Table is "EmployeeTimeCardActions"
using columns ID, ActionTime, ActionDate, ShiftStart, and ActionType.
ID is my common column that the join was created by..Joined Table...
Because I usually have a comment saying I did not include enough information, I do not need a exact specific code sample and I think I have included everything needed. If there is a good reason to include more I will, I just try to keep as little company information public as possible
Sounds like you're having your data duplicated across tables. Not a smart idea at all. You can update data in one table when a row is updated in a different one via triggers but this is a TERRIBLE approach. If you want to display data joined from 2 tables, the right approach here is using an SQL VIEW which will display the current data.

Filter Rows - Pentaho

We are Getting inputs from two different tables and passing it to the Filter rows.
But we are getting the below error.
The DATE_ADDED Table has only one column DATE_ADDED and similarly the TODAYS_DATE Table has a single column TODAYS_DATE .
The condition given in the Filter is DATE_ADDED < TODAYS_DATE .
The transaformation is
Can someone tell, where I am doing the mistake
It won't work like this. You expect a join of two streams (like SQL JOIN of two tables) but actually you will have a union (like SQL UNION).
When two streams are intersected on a step they must have identical columns - names, order and types - and the result will be the union of both streams with the same structure as origins.
When you intersect streams with different structures - different column names in your case - you will have unpredictable column names and actually only one column - nothing to compare with.
To do what you need use the Merge Join step (do not forget to sort streams on the joining key)
Both the column names and types should be identical if you wanna merge the columns in single step, right click on both steps and click output fields to verify the datatypes.
if datatype issues arrives OR you want to rename the columns, you can place select step(for each table steps) after table steps and select the DATE Type(in your case)in the Meta-data tab, and rename the fields as well.
Hope this helps... :)

SQL Server join and wildcards

I want to get the results of a left join between two tables, with both having a column of the same name, the column on which I join. The following query is seen as valid by the import/export wizard in SQL Server, but it always gives an error. I have some more conditions, so the size wouldn't be too much. We're using SQL Server 2000 iirc and since we're using an externally developed program to interact with the database (except for some information we can't retrieve that way), we can not simply change the column name.
SELECT table1.*, table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
At least, I think the column name is the problem, or am I doing something else wrong?
Do more columns than just your join key have the same name? If only your join key has the same name then simply select one of them since the values will be equivalent except for the non-matching rows (which will be NULL). You will have to enumerate all your other columns from one of the tables though.
SELECT table2.samename,table1.othercolumns,table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
You may need to explicitly list the columns from one of the tables (the one with less fields), and leave out the 2nd instance of what would be the duplicate field..
select Table1.*, {skip the field Table2.sameName} Table2.fld2, Table2.Fld3, Table2.Fld4... from
Since its a common column, it APPEARS its trying to create twice in the result set, thus choking your process.
Since you should never use select *, simply replace it with the column names of the columns you want. THe join column has the same value (or null) in both sides of the join, so only select one of themm the one from table1 which will always have the value.
If you want to select all the columns from both tables just use Select * instead of including the tables separately. That will however leave you with duplicate column names in the result set, so even reading them out by name will not work and reading them by index will give inconsistent results, as changing the columns in the database will change the resultset, breaking any code depending on the ordinals of the columns.
Unfortunately the best solution is to specify exactly the columns you need and create aliases for the duplicates so they are unique.
I quickly get the column headings by setting the query to text mode and copying the top row ...