SQL Server join and wildcards - sql

I want to get the results of a left join between two tables, with both having a column of the same name, the column on which I join. The following query is seen as valid by the import/export wizard in SQL Server, but it always gives an error. I have some more conditions, so the size wouldn't be too much. We're using SQL Server 2000 iirc and since we're using an externally developed program to interact with the database (except for some information we can't retrieve that way), we can not simply change the column name.
SELECT table1.*, table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
At least, I think the column name is the problem, or am I doing something else wrong?

Do more columns than just your join key have the same name? If only your join key has the same name then simply select one of them since the values will be equivalent except for the non-matching rows (which will be NULL). You will have to enumerate all your other columns from one of the tables though.
SELECT table2.samename,table1.othercolumns,table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename

You may need to explicitly list the columns from one of the tables (the one with less fields), and leave out the 2nd instance of what would be the duplicate field..
select Table1.*, {skip the field Table2.sameName} Table2.fld2, Table2.Fld3, Table2.Fld4... from
Since its a common column, it APPEARS its trying to create twice in the result set, thus choking your process.

Since you should never use select *, simply replace it with the column names of the columns you want. THe join column has the same value (or null) in both sides of the join, so only select one of themm the one from table1 which will always have the value.

If you want to select all the columns from both tables just use Select * instead of including the tables separately. That will however leave you with duplicate column names in the result set, so even reading them out by name will not work and reading them by index will give inconsistent results, as changing the columns in the database will change the resultset, breaking any code depending on the ordinals of the columns.
Unfortunately the best solution is to specify exactly the columns you need and create aliases for the duplicates so they are unique.
I quickly get the column headings by setting the query to text mode and copying the top row ...

Related

sql - perform join excluding repeated fields

I have the following problem, I use sql server, I need to join two tables by a field, but when performing the join I am duplicating the key field, my query is as follows:
select A.*, B.*
from Database.dbo.Module1 A
LEFT JOIN RRHH.dbo.Module2 B on A.key1 = B.key1
is it possible to exclude from the select the key1 field from the module2 table?
In the tables, I have another few duplicate fields, I could write every field I need from the tables in the select, but , it would be easier to exclude the fields I don't need. Consider that each table has hundreds of fields that are needed.
It is impossible. Specify fields you need.
There is no "all columns except <these>" syntax in T-SQL, sorry.
Of course it's very easy to generate the list of columns from any table by simply dragging the Columns node onto a query window. This works in both SSMS and Azure Data Studio, as I describe in this Bad Habits post:
Then just prefix the ones you need, and delete the ones you don't.

SQL Exclude a specific column from SQL query result

I have a query where I process columns from two tables and at the end I want ALL colulmns from one temporary table and ONLY ONE column from the other table. Also I do not want the KEY column to appear twice after the join.
I cannot find a clean efficient way to do it. I found these solutions:
Specify all columns explicitly. Bad for obvious reasons if you have to type multiple columns
Get all columns and the DROP the ones you dont need. Not efficient because you carry loads of data and then throwing them away.
Is there a one liner SQL command that leaves out a single column?
Is there an SQL command that removes duplicate KEY column after joining?
Thanks!!
How about selecting all columns from one table and one from the other?
select t1.*, t2.col
from t1 join
t2
on . . .

Bigquery - remove duplicates of certain columns, but not all

I have two tables I am left joining together. The first tables has transnational level detail, causing the key I join to the second table to duplicate. When I left join the second table, the measure "company_spend" is highly inflated.
I need a way to keep only a single value of the duplicated data, and my thought was to run a distinct function on only those columns, but I am not seeing that Bigquery supports distinct functions on only a few columns, but not all.
SELECT UPPER(cwnextt.Current_Contract_Number) AS Current_Contract_Number,
UPPER(cwnextt.Replacement_Contract_Number) AS Replacement_Contract_Number,
UPPER(cwnextt.Current_Contract_Name) AS Current_Contract_Name,
UPPER(cwnextt.Supplier_Top_Parent_Entity_Code) AS Supplier_Top_Parent_Entity_Code,
UPPER(cwnextt.Supplier_Top_Parent_Name) AS Supplier_Top_Parent_Name,
UPPER(cwnextt.company_Entity_Code) AS company_Entity_Code,
UPPER(cwnextt.Facility_Name) AS Facility_Name,
smart.company_Spend AS companySpend
FROM `test_etl_field.contracts_with_member_entity_codes_test_view_2` cwnextt
--this table is what is causing the below table to duplicate,
--but I need all of this data AS well in its current format.
LEFT JOIN `test.trans_analysis` tsa
ON TRIM(UPPER(cwnextt.company_entity_code)) = TRIM(UPPER(tsa.company_entity_code))
AND TRIM(UPPER(cwnextt.Supplier_Top_Parent_Entity_Code)) = TRIM(UPPER(tsa.manufacturer_top_parent_entity_code))
AND TRIM(UPPER(cwnextt.Current_Contract_Name)) = TRIM(UPPER(tsa.contract_category))
AND cwnextt.spend_period_yyyyqmm = tsa.spend_period_yyyyqmm
--this table contains "company_spend" which is now duplicated
LEFT JOIN `test_etl_field.ecr_smart_data` smart
ON smart.company_entity_code = cwnextt.company_entity_code
AND (smart.contract_number = cwnextt.current_contract_number
OR smart.contract_number = cwnextt.replacement_contract_number)
AND smart.month_key = cwnextt.spend_period_yyyyqmm
If something can be created that will keep company_spend from duplicating on the second left join, that is what I am after.
Not sure to understand all the details of your problem but here's a fact from BigQuery doc :
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows
and returns only the remaining rows.
You can't apply DISTINCT on specific columns because it doesn't make sense. Let's say you have 4 columns and call DISTINCT on 3 columns, what is SQL supposed to do with the last one ?
You must tell SQL which value to keep for the remaining column and GROUP BY is the right solution here.
So if you want to:
Remove a column that has been duplicated : Just adjust your SELECT to get only the columns you want
Remove lines that have the same value in specific columns : I would suggest a GROUP BY on the targeted column and taking the aggregation you want (first, avg, sum or whatever) for the remaining ones.
Remove the value from a row if another row has the same : You may not want to do that. A row has to keep its value and you won't get it back. Besides, same problem, which row do you want to keep ?
Hope this helps ! Feel free to give clarification on your problem if you want more specific answers.
While I couldn't resolve this issue in SQL, I used Tableau via a FIXED LOD to aggregate the data passed duplicates so the end user could visualize the output with accuracy. Not ideal, but the SQL route wasn't make sense.

SQL multiple tables join and pivot with column name and value

I'm looking for a way to join two (sometimes more) tables.
I'll start with two and add as I get the pieces working.
Table1 has two columns that identify it
T1ContainerID
T1ObjectID
Table2 has similar columns but starts with T2 but the values will match
T2ContainerID
T2ObjectID
In Table2 there are two columns I am targeting
ObjectName
ObjectValue
There can be any number of ObjectName entries for a given record.
For instance one may have name, address,and a date
another may have Name, address,port,date,ServerName,Device,Status
What I need is a way to pivot all of the potential columns in Table2 in line with Table1 and is that value is not in Table2 for table1 then just make it NULL. I want the header of these columns to be the ObjectName and the value to be ObjectValue. If i can't get a wildcard to grab all potential values i can settle for just calling out each column manually. I was only hoping for a wildcard as it may change as different values for new records get added. Worst case i just adjust code to add anything new.
I do have a bunch of queries that rebuild the database every night and dump it into a different database but I'd like to have a query to pull the results from the main database to get current values rather than something that was run every morning.

When is it required to give a table name an alias in SQL?

I noticed when doing a query with multiple JOINs that my query didn't work unless I gave one of the table names an alias.
Here's a simple example to explain the point:
This doesn't work:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
This does:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases as p on items.date=p.purchase_date
group by folder_id
Can someone explain this?
You are using the same table Purchases twice in the query. You need to differentiate them by giving a different name.
You need to give an alias:
When the same table name is referenced multiple times
Imagine two people having the exact same John Doe. If you call John, both will respond to your call. You can't give the same name to two people and assume that they will know who you are calling. Similarly, when you give the same resultset named exactly the same, SQL cannot identify which one to take values from. You need to give different names to distinguish the result sets so SQL engine doesn't get confused.
Script 1: t1 and t2 are the alias names here
SELECT t1.col2
FROM table1 t1
INNER JOIN table1 t2
ON t1.col1 = t2.col1
When there is a derived table/sub query output
If a person doesn't have a name, you call them and since you can't call that person, they won't respond to you. Similarly, when you generate a derived table output or sub query output, it is something unknown to the SQL engine and it won't what to call. So, you need to give a name to the derived output so that SQL engine can appropriately deal with that derived output.
Script 2: t1 is the alias name here.
SELECT col1
FROM
(
SELECT col1
FROM table1
) t1
The only time it is REQUIRED to provide an alias is when you reference the table multiple times and when you have derived outputs (sub-queries acting as tables) (thanks for catching that out Siva). This is so that you can get rid of ambiguities between which table reference to use in the rest of your query.
To elaborate further, in your example:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
My assumption is that you feel that each join and its corresponding on will use the correlating table, however you can use whichever table reference you want. So, what happens is that when you say on items.date=purchases.purchase_date, the SQL engine gets confused as to whether you mean the first purchases table, or the second one.
By adding the alias, you now get rid of the ambiguities by being more explicit. The SQL engine can now say with 100% certainty which version of purchases that you want to use. If it has to guess between two equal choices, then it will always throw an error asking for you to be more explicit.
It is required to give them a name when the same table is used twice in a query. In your case, the query wouldn't know what table to choose purchases.purchase_date from.
In this case it's simply that you've specified purchases twice and the SQL engine needs to be able to refer to each dataset in the join in a unique way, hence the alias is needed.
As a side point, do you really need to join into purchases twice? Would this not work:
SELECT
subject
from
items
join purchases
on items.folder_id=purchases.item_id
and items.date=purchases.purchase_date
group by folder_id
The alias are necessary to disambiguate the table from which to get a column.
So, if the column's name is unique in the list of all possible columns available in the tables in the from list, then you can use the coulmn name directly.
If the column's name is repeated in several of the tables available in the from list, then the DB server has no way to guess which is the right table to get the column.
In your sample query all the columns names are duplicated because you're getting "two instances" of the same table (purchases), so the server needs to know from which of the instance to take the column. SO you must specify it.
In fact, I'd recommend you to always use an alias, unless there's a single table. This way you'll avoid lots of problems, and make the query much more clear to understand.
You can't use the same table name in the same query UNLESS it is aliased as something else to prevent an ambiguous join condition. That's why its not allowed. I should note, it's also better to use always qualify table.field or alias.field so other developers behind you don't have to guess which columns are coming from which tables.
When writing a query, YOU know what you are working with, but how about the person behind you in development. If someone is not used to what columns come from what table, it can be ambiguous to follow, especially out here at S/O. By always qualifying by using the table reference and field, or alias reference and field, its much easier to follow.
select
SomeField,
AnotherField
from
OneOfMyTables
Join SecondTable
on SomeID = SecondID
compare that to
select
T1.SomeField,
T2.AnotherField
from
OneOfMyTables T1
JOIN SecondTable T2
on T1.SomeID = T2.SecondID
In these two scenarios, which would you prefer reading... Notice, I've simplified the query using shorter aliases "T1" and "T2", but they could be anything, even an acronym or abbreviated alias of the table names... "oomt" (one of my tables) and "st" (second table). Or, as something super long as has been in other posts...
Select * from ContractPurchaseOffice_AgencyLookupTable
vs
Select * from ContractPurchaseOffice_AgencyLookupTable AgencyLkup
If you had to keep qualifying joins, or field columns, which would you prefer looking at.
Hope this clarifies your question.