Is there a way that after comparing two tables and then use the Case function?
I am trying to have a new column base on Exists transformation. In sql I do it like this:
(isnull (select 'YES' from sales where salesperson = t1.salesperson group by salesperson), 'NO')) AS registeredSales
T1 is personal.
Or should I include the table into the stream of the joins and then use the case() function to compare the two columns?
If there's another way to work around to compare these two streams, I would be pleased to hear.
Thanks.
Flat files in a datalake can also be compared. We can use the derived column in dataflow to gernerate a new column.
I create a dataflow demo cotains two sources: CustomerSource(customer.csv stored in datalake2) and SalesSource(sales.csv stored in datalake2 and it contains only one column) as follows
Then I join the two sources with the column CustomerId
Then I use Select activity to give an alias to the CustomerId from SalesSource
In the DerivedColumn, I select the Add column and enter the expression iifNull(SalesCustomerID, 'NO', 'YES') to generate a new column named 'registeredSales' as follows:
The last column of the result shows:
Related
I have a dataset that contains several tables that have suffixes in their name:
table_src1_serie1
table_src1_serie2
table_src2_opt1
table_src2_opt2
table_src3_type1_v1
table_src3_type2_v1
table_src3_type2_v2
I know that i can use this type of queries in BQ:
select * from `project.dataset.table_*`
to get all the rows from theses different tables.
What i am trying to achieve is to have a column that will contain for instance the type of source (src1, src2, src3)
Assuming the schema of all tables the same - you can add below to your select list (for BigQuery Standard SQL)
SPLIT(_TABLE_SUFFIX, '_')[SAFE_OFFSET(0)] AS src
My overall goal is to create new tables by selecting columns in existing tables with certain patterns/tags in their column names. This is in SQL Server.
For example, I have a common need to get all contact information out of a company table, and into its own contact table.
I haven't been able to find a programmatic approach in SQL to express excluding columns from a SELECT statement based on string.
When looking to options like the COL_NAME function, those require an ID arg, which kills that option for me.
Wishing there was something built in that could work like the following:
SELECT *
FROM TABLE
WHERE COL_NAME() LIKE 'FLAG%'
Any ideas? Open to anything! Thanks!!
One trick could be to firstly using the following code to get the column names you desire:
select * from information_schema.columns
where table_name='tbl' and column_name like 'FLAG%'
Then concatenate them into a comma-delimited string and finally create a dynamic sql query using the created string as the column names.
It is straight forward to create a calculated field in a table that uses data IN the table... due to the fact that the expression builder is straight forward to use. However, it appears to me that the expression builder for the calculated field only works with data IN the table;
i.e: expression builder in table MYTABLE works with fields FIELD1.MYTABLE, FIELD2.MYTABLE etc.
Inventory Problem
My problem is that I have two 'count' fields that result from my queries that apply to INPUTQUERY and OUTPUTQUERY (gives me a count of all input data added and a count of all output data added) and now I want to subtract the two to get a stock.
I can't link the table that was created from my query because it wont be able to continually update do the relationship itself, and thus i'm stuck either using the expression builder/SQL.
First question:
Is it possible to have the expression builder reference data from other tables?
i.e expressionbuilder for:
MAINTABLE CALCULATEDFIELD.MAINTABLE = INPUTSUM.INPUTTABLE - OUTPUTSUM.OUTPUTTABLE
(which gives a difference of the two)?
Second question:
if the above isn't possible, can I do this through an SQL code ?
i.e
SELECT(data from INPUTSUM)
FROM(INPUTTABLE)
-
SELECT(data from OUTPUTSUM)
FROM(OUTPUTTABLE)
Try this:
SELECT SUM(T.INPUTSUM) - SUM(T.OUTPUTSUM) AS RESULTSUM
FROM
(
SELECT INPUTSUM, 0 AS OUTPUTSUM
FROM INPUTTABLE
UNION
SELECT 0 AS INPUTSUM, OUTPUTSUM
FROM OUTPUTTABLE
) AS T
I am stuck in the following query. This was working properly on mySQL but it gives error on MSSQL-2005. The main purpose of the query is to copy data from one table to another without duplicates based on multiple columns comparison from both tables.
I can do this to compare one column for duplication, but I can't do when I compare more then one column for duplication.
Here is my query.
INSERT INTO eBayStockTaking (OrderLineItemID,Qty,SKU,SubscriberID,eBayUserID)
SELECT OrderLineItemID,Qty,SKU,SubscriberID,eBayUserID
FROM tempEBayStockTaking WHERE (OrderLineItemID,SubscriberID,eBayUserID)
Not In (SELECT OrderLineItemID,SubscriberID,eBayUserID FROM eBayStockTaking)
Note: I have been through many similar questions but all in vain.
Thanks
Rather try NOT EXISTS
Something like
INSERT INTO eBayStockTaking (OrderLineItemID,Qty,SKU,SubscriberID,eBayUserID)
SELECT OrderLineItemID,
Qty,
SKU,
SubscriberID,
eBayUserID
FROM tempEBayStockTaking t
WHERE Not EXISTS (
SELECT *
FROM eBayStockTaking e
WHERE e.OrderLineItemID = t.OrderLineItemID
AND e.SubscriberID = t.SubscriberID
AND e.eBayUserID = t.eBayUserID)
)
I know MySQL allows Row Subqueries, nut SQL Server does not allow this.
I am using Pentaho Spoon to do some transformation. I am using 'Table Input' and joining multiple tables to get final output table.
I need to achieve:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date = '2013-1-9'
AND TBLB.date BETWEEN '2012-11-15' AND '2013-1-9';
I am manually inserting '2012-11-15' but I am using Get System Data to insert '2012-1-9'. I am using 1 Get System Data.
My query is:
SELECT COUNT(distinct ID)
FROM TBLA join TBLB ON TBLA.ID=TBLB.ID
WHERE
TBLA.ID=334
AND TBLA.date='?'
AND TBLB.date BETWEEN '2012-11-15' AND '?';
I get error message in Table Input saying No value specified for parameter 2
Any suggestion will be appreciated.
Thank you.
Simple one this; You need to "duplicate" the system date. So add another line in "get system data" called "date2" or something, make it the same as the first line, and then it will fill in the 2nd parameter or ?
OR simply change the query to say between '2012-11-15' and TBLA.date
then you dont need the 2nd parameter
Personally I prefer the pattern of a Get System Info/Add Constants step to create one row with multiple columns that feeds into a Database Join step. Then you replace parameters in your query with columns instead of rows, and you can specify a column more than once.