I'm trying to build a calculated table, containing the mapping between different datasets. The keys I'm using to do the lookup can be repeated and I would like to generate the list of all possible combinations. In SQL, this would be a join which would generate additional rows. I'm looking to do the same in DAX, with a calculated table, however LOOKUPVALUE can only return one row and will error if it finds more than one match.
A table of multiple values was supplied where a single value was expected
I feel like it could be possible with summarise columns and a virtual relationship, however when trying this, I also get an error
=SUMMARIZECOLUMNS (
Label[LabelText],
User[Dim_CustomerUser_Skey],
Computer[Dim_Computer_Skey]
,FILTER ( Computer, Label[Device] = Computer[Device name])
, FILTER ( User, Label[UserName] =User[UserName])
)
but this also gives:
Calculated table 'CalculatedTable 1': A single value for column 'Device' in table 'Label' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result
How to I produce a calculated table for a many to many?
In SQL, there are Joins. Luckily for us DAX provide joins between tables.
But first of all, what function to use for what? Here it is:
Left Outer: GENERATEALL, NATURALLEFTOUTERJOIN
Right Outer: GENERATEALL, NATURALLEFTOUTERJOIN
Full Outer: CROSSJOIN, GENERATE, GENERATEALL
Inner: GENERATE, NATURALINNERJOIN
Left Anti: EXCEPT
Right Anti: EXCEPT
Visit : https://www.sqlbi.com/articles/from-sql-to-dax-joining-tables/
Related
I have two tables I am left joining together. The first tables has transnational level detail, causing the key I join to the second table to duplicate. When I left join the second table, the measure "company_spend" is highly inflated.
I need a way to keep only a single value of the duplicated data, and my thought was to run a distinct function on only those columns, but I am not seeing that Bigquery supports distinct functions on only a few columns, but not all.
SELECT UPPER(cwnextt.Current_Contract_Number) AS Current_Contract_Number,
UPPER(cwnextt.Replacement_Contract_Number) AS Replacement_Contract_Number,
UPPER(cwnextt.Current_Contract_Name) AS Current_Contract_Name,
UPPER(cwnextt.Supplier_Top_Parent_Entity_Code) AS Supplier_Top_Parent_Entity_Code,
UPPER(cwnextt.Supplier_Top_Parent_Name) AS Supplier_Top_Parent_Name,
UPPER(cwnextt.company_Entity_Code) AS company_Entity_Code,
UPPER(cwnextt.Facility_Name) AS Facility_Name,
smart.company_Spend AS companySpend
FROM `test_etl_field.contracts_with_member_entity_codes_test_view_2` cwnextt
--this table is what is causing the below table to duplicate,
--but I need all of this data AS well in its current format.
LEFT JOIN `test.trans_analysis` tsa
ON TRIM(UPPER(cwnextt.company_entity_code)) = TRIM(UPPER(tsa.company_entity_code))
AND TRIM(UPPER(cwnextt.Supplier_Top_Parent_Entity_Code)) = TRIM(UPPER(tsa.manufacturer_top_parent_entity_code))
AND TRIM(UPPER(cwnextt.Current_Contract_Name)) = TRIM(UPPER(tsa.contract_category))
AND cwnextt.spend_period_yyyyqmm = tsa.spend_period_yyyyqmm
--this table contains "company_spend" which is now duplicated
LEFT JOIN `test_etl_field.ecr_smart_data` smart
ON smart.company_entity_code = cwnextt.company_entity_code
AND (smart.contract_number = cwnextt.current_contract_number
OR smart.contract_number = cwnextt.replacement_contract_number)
AND smart.month_key = cwnextt.spend_period_yyyyqmm
If something can be created that will keep company_spend from duplicating on the second left join, that is what I am after.
Not sure to understand all the details of your problem but here's a fact from BigQuery doc :
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows
and returns only the remaining rows.
You can't apply DISTINCT on specific columns because it doesn't make sense. Let's say you have 4 columns and call DISTINCT on 3 columns, what is SQL supposed to do with the last one ?
You must tell SQL which value to keep for the remaining column and GROUP BY is the right solution here.
So if you want to:
Remove a column that has been duplicated : Just adjust your SELECT to get only the columns you want
Remove lines that have the same value in specific columns : I would suggest a GROUP BY on the targeted column and taking the aggregation you want (first, avg, sum or whatever) for the remaining ones.
Remove the value from a row if another row has the same : You may not want to do that. A row has to keep its value and you won't get it back. Besides, same problem, which row do you want to keep ?
Hope this helps ! Feel free to give clarification on your problem if you want more specific answers.
While I couldn't resolve this issue in SQL, I used Tableau via a FIXED LOD to aggregate the data passed duplicates so the end user could visualize the output with accuracy. Not ideal, but the SQL route wasn't make sense.
I wanted to retrieve some data from a table based on two columns see the below table structure
Update
i want the output data based on two condition
1. if the code value is having 'Web' or 'Offline'.
2. Memo column is having data same as Pre_memo column.
Output should be as shown below
So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data.
select distinct OrderTable.Memo,
max(OrderTable.Memo_Date) as Date1,
max(ot.Pre_Memo_Date) as Date2
from OrderTable,
OrderTable ot
where OrderTable.code in ('Web')
and ot.code in ('Offline')
and OrderTable.Memo = ot.Pre_Memo
group by OrderTable.Memo
Can anyone help on this? With the use of OrderTable only once in the query and filter based on memo and pre_memo column as it's having same data?
You can use union all and do the conditional aggregation :
select Memo, max(case when code = 'Offline' then Date end) as Memo_date,
max(case when code = 'Web' then Date end) as Per_Memo_date
from (select Date, 'Web' as code, Pre_memo as Memo
from OrderTable o
where code = 'Web'
union all
select Date, 'Offline', Memo
from OrderTable o
where code = 'Offline'
) t
group by Memo;
"I wanted to retrieve some data from a table based on two columns see the below table structure"
Providing a sample is sufficient to illustrate the problem (and it is desirable to do so on SO) but it is not sufficient and thus not a replacement for defining the problem, which you have failed to do.
Absent such definition of the problem, we can only guess what you're trying to achieve. E.G.
from the subset of tuples that have 'Offline' for 'code' value, take the MAX() 'Date' value per appearing value of 'Memo'.
Match that (using some matching condition) to the subset of tuples that have 'Web' for 'code value and retain the 'Date' value from those as 'Memo_date' in the result set.
matching condition being that 'Memo' value of [a tuple in] the former is equal to 'Pre_memo' value in [the matching tuple in] the latter.
If all that is correct, then that explains why it is impossible to do this in SQL without having at least two references. You cannot avoid doing some kind of matching, and matching by definition takes two distinct things to match (even if the two distinct things are distinct subsets of one and the same thing). In fact it is almost certainly a fundamental design mistake for you to have those two distinct things in one single table, probably under the totally misguided belief that "having everything in one table makes things easier".
"So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data"
From the way you have presented the question, I suspect that you were hoping for some means to exploit the fact that those 'Offline' tuples are "the next" after a 'Web' tuple, and that you could write the SQL in such a way that the engine could then derive a sort of "single pass" algorithm (which you probably assume would go faster).
It does not work like that. SQL tables have no inherent ordering and as a consequence there simply ain't no such thing as "the next" in a table.
I'm trying to achieve 2 joins. If I run the 1st join alone it pulls 4 lots of results, which is correct. However when I add the 2nd join which queries the same reference table using the results from the select statement it pulls in additional results. Please see attached. The squared section should not be being returned
So I removed the 2nd join to try and explain better. See pic2. I'm trying to get another column which looks up InvolvedInternalID against the initial reference table IRIS.Practice.idvClient.
Your database is simply doing as you tell it. When you add in the second join (confusingly aliased as tb1 in a 3 table query) the database is finding matching rows that obey the predicate/truth statement in the ON part of the join
If you don't want those rows in there then one of two things must be the case:
1) The truth you specified in the ON clause is faulty; for example saying SELECT * FROM person INNER JOIN shoes ON person.age = shoes.size is faulty - two people with age 13 and two shoes with size 13 will produce 4 results, and shoe size has nothing to do with age anyway
2) There were rows in the table joined in that didn't apply to the results you were looking for, but you forgot to filter them out by putting some WHERE (or additional restriction in the ON) clause. Example, a table holds all historical data as well as current, and the current record is the one with a NULL in the DeletedOn column. If you forget to say WHERE deletedon IS NULL then your data will multiply as all the past rows that don't apply to your query are brought in
Don't alias tables with tbX, tbY etc.. Make the names meaningful! Not only do aliases like tbX have no relation to the original table name (so you encounter tbX, and then have to go searching the rest of the query to find where it's declared so you can say "ah, it's the addresses table") but in this case you join idvclient in twice, but give them unhelpful aliases like tb1, tb3 when really you should have aliased them with something that describes the relationship between them and the rest of the query tables
For example, ParentClient and SubClient or OriginatingClient/HandlingClient would be better names, if these tables are in some relationship with each other.
Whatever the purpose of joining this table in twice is, alias it in relation to the purpose. It may make what you've done wriong easier to spot, for example "oh, of course.. i'm missing a WHERE parentclient.type = 'parent'" (or WHERE handlingclient.handlingdate is not null etc..)
The first step to wisdom is by calling things their proper names
I am trying to calculate hours flowing in and out of a cost center. When the cost center lends out an employee for an hour it's +1 and when they borrow an employee for an hour it's -1.
Right now I'm using a query that says
select
columns
from dbo.table
where EmployeeCostCenter <> ProjectCostCenter
So when ProjectCostCenter = ID_CostCenter it returns +HoursQuantity.
Then I update ID_CostCenter = EmployeeCostCenter then where ID_CostCenter = EmployeeCostCenter to take -HoursQuantity.
That works fine. The problem is when I import it to Spotfire I can't filter on the main table even after I added the table relations. Can anyone explain why?
I can upload the actual code if needed, but I use 4 queries and a couple of them are quite lengthy. The main table, a temp table to calculate incoming hours, and a temp table to calculate outgoing hours are the only ones involved in this problem I think.
(moved to answer to avoid lengthy discussion)
Essentially, data relations are used to populate filtering / marking between different data-sets. Just like in RDBMS, the relation is what Spotfire uses as the link between dataset. Essentially it's the same as the column or columns you join on. Thus, any column that you wish to filter in TableA and have the result set limited in TableB (or visa versa) must be a relation.
Column matches aren't related columns, but are associated for aggregations, category axis, etc within each visualization. So if TableA has "amount" and TableB has "amount debit" and you wanted to use both of these in an expression, say Sum([TableA].[amount],[TableB].[amount debit]), they would need to be matched in order to not produce erroneous results.
Lastly, once you set up your relations, you should check your filter panel to set up how you want the filtering to work. You can have the rows included, excluded, or ignored all together. Here is a link explaining that.
Got a question regarding SQL and ColdFusion: I can't write SQL code properly, so that it won't repeat the variables twice. So far I've got:
<cfquery name="get_partner_all" datasource="#dsn#">
SELECT
C.COMPANY_ID,
C.FULLNAME,
CP.MOBILTEL,
CP.MOBIL_CODE,
CP.IMCAT_ID,
CP.COMPANY_PARTNER_TEL,
CP.COMPANY_PARTNER_TELCODE,
CP.COMPANY_PARTNER_TEL_EXT,
CP.MISSION,
CP.DEPARTMENT,
CP.TITLE,
CP.COMPANY_PARTNER_SURNAME,
CP.COMPANY_PARTNER_NAME,
CP.PARTNER_ID,
CP.COMPANY_PARTNER_EMAIL,
CP.HOMEPAGE,
CP.COUNTY,
CP.COUNTRY,
CP.COMPANY_PARTNER_ADDRESS,
CP.COMPANY_PARTNER_FAX,
CC.COMPANYCAT,
CRM.BAKIYE,
CRM.BORC,
CRM.ALACAK
FROM
COMPANY_PARTNER CP,
COMPANY C,
COMPANY_CAT CC,
#DSN2_ALIAS#.COMPANY_REMAINDER_MONEY CRM
WHERE
C.COMPANY_ID = CP.COMPANY_ID
AND C.COMPANY_ID = CRM.COMPANY_ID
AND C.COMPANYCAT_ID = CC.COMPANYCAT_ID
As you can see definition C.COMPANY_ID is repeated twice, so the variable shown also twice, but I need this (CRM) definition to display some money issues.
Can anyone show me how I can define it in a different way so that the output of this code won't repeat the variables?
I assume you mean that you get multiple columns in the result set, each with the name "COMPANY_ID". The solution to this is to specify specific columns from all of the tables, instead of SELECT * (not just for the COMPANY_CAT table, alias CC).
If you're getting "repeated" rows, then you need to examine the contents of these rows. What's happening there is that one or more rows from another table is matching one row from the "COMPANY" table. Each matching pair of rows generates a row in the output. Now you've expanded your column list, compare a pair of rows which have the same COMPANY_ID - in which columns do they differ? If it's in, say, the last 3 columns, then there are multiple rows in CRM which match the same COMPANY_ID.
Once you've identified the other table that is causing duplicates to occur, you need to decide how to limit them - should you be aggregating values from that table (e.g. SUM or MAX), or is there a way to further refine which row from the other table you want to match to the row in COMPANY.
At a guess though, I'd speculate that one company could have multiple partners...
Don't use select table.*. Instead, name each column explicitly and don't repeat columns, as follows:
select
c.company_id,
c.blah_blah,
-- don't select cp.company_id
cp.foo_bar,
-- etc
You just need to remove * and replace with column name list. It is always advisable to write column list instead of * as performance point of view. Also if you are adding any column in database table and using * to get data sometime it will not reflect new column in query result due to caching.
In you case just keep company_id for any one of the table. That's it.