change sql code to eliminate repeating - sql

Got a question regarding SQL and ColdFusion: I can't write SQL code properly, so that it won't repeat the variables twice. So far I've got:
<cfquery name="get_partner_all" datasource="#dsn#">
SELECT
C.COMPANY_ID,
C.FULLNAME,
CP.MOBILTEL,
CP.MOBIL_CODE,
CP.IMCAT_ID,
CP.COMPANY_PARTNER_TEL,
CP.COMPANY_PARTNER_TELCODE,
CP.COMPANY_PARTNER_TEL_EXT,
CP.MISSION,
CP.DEPARTMENT,
CP.TITLE,
CP.COMPANY_PARTNER_SURNAME,
CP.COMPANY_PARTNER_NAME,
CP.PARTNER_ID,
CP.COMPANY_PARTNER_EMAIL,
CP.HOMEPAGE,
CP.COUNTY,
CP.COUNTRY,
CP.COMPANY_PARTNER_ADDRESS,
CP.COMPANY_PARTNER_FAX,
CC.COMPANYCAT,
CRM.BAKIYE,
CRM.BORC,
CRM.ALACAK
FROM
COMPANY_PARTNER CP,
COMPANY C,
COMPANY_CAT CC,
#DSN2_ALIAS#.COMPANY_REMAINDER_MONEY CRM
WHERE
C.COMPANY_ID = CP.COMPANY_ID
AND C.COMPANY_ID = CRM.COMPANY_ID
AND C.COMPANYCAT_ID = CC.COMPANYCAT_ID
As you can see definition C.COMPANY_ID is repeated twice, so the variable shown also twice, but I need this (CRM) definition to display some money issues.
Can anyone show me how I can define it in a different way so that the output of this code won't repeat the variables?

I assume you mean that you get multiple columns in the result set, each with the name "COMPANY_ID". The solution to this is to specify specific columns from all of the tables, instead of SELECT * (not just for the COMPANY_CAT table, alias CC).
If you're getting "repeated" rows, then you need to examine the contents of these rows. What's happening there is that one or more rows from another table is matching one row from the "COMPANY" table. Each matching pair of rows generates a row in the output. Now you've expanded your column list, compare a pair of rows which have the same COMPANY_ID - in which columns do they differ? If it's in, say, the last 3 columns, then there are multiple rows in CRM which match the same COMPANY_ID.
Once you've identified the other table that is causing duplicates to occur, you need to decide how to limit them - should you be aggregating values from that table (e.g. SUM or MAX), or is there a way to further refine which row from the other table you want to match to the row in COMPANY.
At a guess though, I'd speculate that one company could have multiple partners...

Don't use select table.*. Instead, name each column explicitly and don't repeat columns, as follows:
select
c.company_id,
c.blah_blah,
-- don't select cp.company_id
cp.foo_bar,
-- etc

You just need to remove * and replace with column name list. It is always advisable to write column list instead of * as performance point of view. Also if you are adding any column in database table and using * to get data sometime it will not reflect new column in query result due to caching.
In you case just keep company_id for any one of the table. That's it.

Related

Bigquery - remove duplicates of certain columns, but not all

I have two tables I am left joining together. The first tables has transnational level detail, causing the key I join to the second table to duplicate. When I left join the second table, the measure "company_spend" is highly inflated.
I need a way to keep only a single value of the duplicated data, and my thought was to run a distinct function on only those columns, but I am not seeing that Bigquery supports distinct functions on only a few columns, but not all.
SELECT UPPER(cwnextt.Current_Contract_Number) AS Current_Contract_Number,
UPPER(cwnextt.Replacement_Contract_Number) AS Replacement_Contract_Number,
UPPER(cwnextt.Current_Contract_Name) AS Current_Contract_Name,
UPPER(cwnextt.Supplier_Top_Parent_Entity_Code) AS Supplier_Top_Parent_Entity_Code,
UPPER(cwnextt.Supplier_Top_Parent_Name) AS Supplier_Top_Parent_Name,
UPPER(cwnextt.company_Entity_Code) AS company_Entity_Code,
UPPER(cwnextt.Facility_Name) AS Facility_Name,
smart.company_Spend AS companySpend
FROM `test_etl_field.contracts_with_member_entity_codes_test_view_2` cwnextt
--this table is what is causing the below table to duplicate,
--but I need all of this data AS well in its current format.
LEFT JOIN `test.trans_analysis` tsa
ON TRIM(UPPER(cwnextt.company_entity_code)) = TRIM(UPPER(tsa.company_entity_code))
AND TRIM(UPPER(cwnextt.Supplier_Top_Parent_Entity_Code)) = TRIM(UPPER(tsa.manufacturer_top_parent_entity_code))
AND TRIM(UPPER(cwnextt.Current_Contract_Name)) = TRIM(UPPER(tsa.contract_category))
AND cwnextt.spend_period_yyyyqmm = tsa.spend_period_yyyyqmm
--this table contains "company_spend" which is now duplicated
LEFT JOIN `test_etl_field.ecr_smart_data` smart
ON smart.company_entity_code = cwnextt.company_entity_code
AND (smart.contract_number = cwnextt.current_contract_number
OR smart.contract_number = cwnextt.replacement_contract_number)
AND smart.month_key = cwnextt.spend_period_yyyyqmm
If something can be created that will keep company_spend from duplicating on the second left join, that is what I am after.
Not sure to understand all the details of your problem but here's a fact from BigQuery doc :
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows
and returns only the remaining rows.
You can't apply DISTINCT on specific columns because it doesn't make sense. Let's say you have 4 columns and call DISTINCT on 3 columns, what is SQL supposed to do with the last one ?
You must tell SQL which value to keep for the remaining column and GROUP BY is the right solution here.
So if you want to:
Remove a column that has been duplicated : Just adjust your SELECT to get only the columns you want
Remove lines that have the same value in specific columns : I would suggest a GROUP BY on the targeted column and taking the aggregation you want (first, avg, sum or whatever) for the remaining ones.
Remove the value from a row if another row has the same : You may not want to do that. A row has to keep its value and you won't get it back. Besides, same problem, which row do you want to keep ?
Hope this helps ! Feel free to give clarification on your problem if you want more specific answers.
While I couldn't resolve this issue in SQL, I used Tableau via a FIXED LOD to aggregate the data passed duplicates so the end user could visualize the output with accuracy. Not ideal, but the SQL route wasn't make sense.

SQL to identify duplicate columns from table having hundreds of column

I've 250+ columns in customer table. As per my process, there should be only one row per customer however I've found few customers who are having more than one entry in the table
After running distinct on entire table for that customer it still returns two rows for me. I suspect one of column may be suffixed with space / junk from source tables resulting two rows of same information.
select distinct * from ( select * from customer_table where custoemr = '123' ) a;
Above query returns two rows. If you see with naked eye to results there is not difference in any of column.
I can identify which column is causing duplicates if I run query every time for each column with distinct but thinking that would be very manual task for 250+ columns.
This sounds like very dumb question but kind of stuck here. Please suggest if you have any better way to identify this, thank you.
Solving this one-time issue with sql is too much effort. Simply copy-paste to excel, transpose data into columns and use some simple function like "if a==b then 1 else 0".

How do you complex join a number table with an actual table with many clauses dependent on the data from the number table?

I have a table of numbers (PLSQL collection containing some_table_line_ids passed in from a website).
Then I have some_table also has columns -> config_data, config_state
I want to pull in all lines that have the same table_id from the all the table_ids in the number table.
I also want to pull in all lines that have the same config_data as each record pulled in from the first part.
So its a parent/child relationship. This can be done in two for loops by selecting a line by an id in a cursor then another for loop selecting each line equaling the parents config data. Each loop I am performing data manipulation on each line.
I would like to combine both these into a single cursor having all table ids that I need.
What would that look like?
You just want to do a complicated join on different factors. Something like:
select st2.*
from numbers n join
some_table st
on st.table_id = n.table_id join
some_table st2
on st2.config_data = st.config_data
Quite possibly, you actually want:
select distinct st.*
since you might otherwise have duplicates. Or, you might want:
select n.table_id, st.config_data, st2.*
So you know which of the original values was responsible for bringing in the row.
You describe the array as a PL/SQL collection. If you employ a SQL type instead you could include it in the FROM clause by using the TABLE function.
create type some_table_line_id_nt as table of number;
Something like:
select s.*
from some_table s
join table(some_table_line_ids) t
on s.id = t.column_value
(I haven't offered a complete solution as you haven't given enough details of table structure and data.)
I solved the issue using start with and connect by prior.

SQL Query for filtering columns returned

I want to return columns based on some meta data in an other table. i.e. i have my table which contains 10 columns, and another table which contains those columns denormalise with metadata to do with them.
i.e.
Table - Car:
columns - Make,Model,Colour
and another table called "Flags" which has a row for each of the above columns and each row has a column for "IsSearchable" and "ShowOnGrid" - that sort of thing.
The query i want is one which will return all columns from the cars table that are flagged in the "Flags" table as "ShowInGrid"
----EDIT
Apologise, I should have stated that this is on SQL Server 2008.
Also, I dont want to have to physically state the columns which i would like to return, i.e. If i add a column to the car table, then add it into the Flags table and declare it to be searchable, I don't want to have to physically state in the SQL Query that i want to return that column, i want it to automatically pull through.
You need to use dynamic SQL; this can easily be done with a stored procedure.
Something like this might work:
Select
D.CarID,
Case D.ShowMake When True Then D.Make Else NULL END AS Make
...
From
(Select
C.CarID, C.Make, C.Model, C.Colour, F.IsSearchable, F.ShowOnGrid, F.ShowMake
From
Cars C
Inner Join
Flags F
On C.CarID = F.CarID) D
I didn't write in all the case statements and don't know how many flags you're working, but you can give it a try. It would require to filter on null values in your application. If you actually want the columns omitted on the basis of the Flag column value the other answer and comment are both right on. Either Dynamic SQL or build your query outside in another language first.

SQL Server join and wildcards

I want to get the results of a left join between two tables, with both having a column of the same name, the column on which I join. The following query is seen as valid by the import/export wizard in SQL Server, but it always gives an error. I have some more conditions, so the size wouldn't be too much. We're using SQL Server 2000 iirc and since we're using an externally developed program to interact with the database (except for some information we can't retrieve that way), we can not simply change the column name.
SELECT table1.*, table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
At least, I think the column name is the problem, or am I doing something else wrong?
Do more columns than just your join key have the same name? If only your join key has the same name then simply select one of them since the values will be equivalent except for the non-matching rows (which will be NULL). You will have to enumerate all your other columns from one of the tables though.
SELECT table2.samename,table1.othercolumns,table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
You may need to explicitly list the columns from one of the tables (the one with less fields), and leave out the 2nd instance of what would be the duplicate field..
select Table1.*, {skip the field Table2.sameName} Table2.fld2, Table2.Fld3, Table2.Fld4... from
Since its a common column, it APPEARS its trying to create twice in the result set, thus choking your process.
Since you should never use select *, simply replace it with the column names of the columns you want. THe join column has the same value (or null) in both sides of the join, so only select one of themm the one from table1 which will always have the value.
If you want to select all the columns from both tables just use Select * instead of including the tables separately. That will however leave you with duplicate column names in the result set, so even reading them out by name will not work and reading them by index will give inconsistent results, as changing the columns in the database will change the resultset, breaking any code depending on the ordinals of the columns.
Unfortunately the best solution is to specify exactly the columns you need and create aliases for the duplicates so they are unique.
I quickly get the column headings by setting the query to text mode and copying the top row ...