Left Joining Tables in Qlik View - qlikview

I have loaded 2 tables in QlikView - TEAM and RESOURCE, and stored them in QVD format on disk. The tables can be seein in Table viewer.
Now I want to make another table TEAM_RESOURCE by left joining the two initial tables. I am having problems with that. What is the correct syntax? Is it better to use the tables loaded directly in QV, or the same tables stored in QVD format.
On Google I did not find best practices nor straightforward syntax examples.

When you need to store information for later use, or if you have multiple models/qvws using the same data. For the latter you would use a generator qvw to extract and store qvds, and in the models use them.
If you only have one model, then rather not store qvd's as it defeats the purpose.
The correct syntax for joining tables within qlikview is the following:
LOAD
Field1,
Field2,
Field3,
KeyField
FROM aaa;
LEFT JOIN
LOAD
Field4,
Field5,
Field6,
KeyField
FROM bbb;
The above would left join the last table to the previous, on the fields that have the exact same name (case sensitive)
The below would join to a specific previously loaded table:
Table1:
LOAD
Field1,
Field2,
Field3,
KeyField
FROM aaa;
LOAD
Field 6,
Field 7
FROM ccc;
LEFT JOIN (Table1)
LOAD
Field4,
Field5,
Field6,
KeyField
FROM bbb;

Related

How to JOIN ON something that isn't an 'EQUAL' value?

I was wondering on how to JOIN on something that isn't an equal sign. For example, I have a few tables, all with IDs, and I can easily do the following (for equals):
LEFT JOIN ON ID1 = ID2
The above example works perfect when columns have an exact match.
But some columns, instead of having a single ID, have multiple IDs, and weird separator, for example:
Table A
ID
ID7523
ID8891
ID7463
ID5234
ID7562
As you can see, Table A has individual IDs only - works great for exact join matches (=). There are no "splits" in table A, all exact matches.
TableB
ID
ID5234 -- ID7562
ID7523
ID8891
ID7463
ID5234 -- ID7562
ID7562 -- ID5234
There's a space and two dashes and another space between some of these IDs, called 'splits', and to make matters worse, sometimes they list one ID first, sometimes they list it last (not sure if that matters yet).
I do not have the ability to edit any of the tables.
Is there any way to join the ones with the dashes also?
Thanks!
LEFT JOIN ID1 -- ID2
Received error: An expression of non-boolean type specified in a context where a condition is expected
At this point, I'm not worried about all of the logic, but just connecting the tables together.
Fuzzymatch thinking...
Is it possible to think outside the box and use something like a JOIN where the ON has a CONTAINS or LIKE statement?
UPDATE... it is possible.
Ref: Using JOIN Statement with CONTAINS function
That is 100% possible, although, inefficient from a querying perspective.
Firstly, a JOIN implies equality so using something like a JOIN ON ColumnA LIKE ColumnB is not going to be permissible (at least not with ANSI SQL - there may be some proprietary commands I'm not aware of). What you can do however create a brand new set for Table2 including a user-defined Column B within memory and use this new altered foreign key to JOIN your tables.
So for instance instead of:
SELECT TABLE1.*, TABLE2.*
FROM TABLE1
JOIN TABLE2 ON ID1 = ID2
Do something like:
SELECT TABLE1.*, TABLE2_MODIFIED.*
FROM TABLE1
JOIN (SELECT TABLE2.*, LEFT(ID2, 6) AS new_id FROM TABLE2) TABLE2_MODIFIED ON ID1 = new_id
So what this does is create a temporary in-memory subset of TABLE2 (called a derived table) with a user-defined field that trims everything to the right of the first 6 characters of the ID2 field. At that point you have two keys that are ready for a typical JOIN.
If the RDBMS type you are using doesn't have a LEFT function, see if SUBSTRING, TRIM or even a CASE function will work for you. But, ultimately, if you need to join two sets and your foreign keys aren't equal, you want to redefine one of your sets to make them equal as needed.

BigQuery how to automatically handle "duplicate column names" on left join

I am working with a dataset of tables that (a) often requires joining tables together, however also (b) frequently has duplicate columns names. Any time I write a query along the lines of:
SELECT
t1.*, t2.*
FROM t1
LEFT JOIN t2 ON t1.this_id = t2.matching_id
...I get the error Duplicate column names in the result are not supported. Found duplicate(s): this_col, that_col, another_col, more_cols, dupe_col, get_the_idea_col
I understand that with BigQuery, it is better to avoid using * when selecting tables, however my data tables aren't too big + my bigquery budget is high, and doing these joins with all columns helps significantly with data exploration.
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
Thanks!
The simplest way is to select records rather than columns:
SELECT t1, t2
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
This is pretty much what I do for ad hoc queries.
If you want the results as columns and not records (they don't look much different in the results), you can use EXCEPT:
SELECT t1.* EXCEPT (duplicate_column_name),
t2.* EXCEPT (duplicate_column_name),
t1.duplicate_column_name as t1_duplicate_column_name,
t2.duplicate_column_name as t2_duplicate_column_name
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
This is possible with BigQuery Legacy SQL - which can be handy for data exploration unless you are dealing with data types or using some functions/features specific to standard sql
So below
#legacySQL
SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.this_id = t2.matching_id
will produce output where all column names will be prefixed with respective alias like t1_this_id and t2_matching_id

Is there any reason why joining two views over quadruples their combined run times

I have 2 views that are selecting large datasets from an external source.
They aren't doing any calculations or aggregations, just a long select statement.
I am using an
INNER JOIN
to link the two views based on a GUID.
The individual selections from each view are as follows.
view1, 3:08 Run time, 174,842 Records retrieved
view2, 0:02 Run Time, 93,493 Records retrieved
When I Join them, I get the following
Join, 14:32 Run Time, 177,753 records retrieved
So far, I've tried
LEFT JOIN
RIGHT JOIN
INNER JOIN
JOIN
I've tried joining view1 to view2 vs joining view2 to view1.
I've tried calling one view then selecting from that while joining to the other view.
Nothing seems to impact it.
SQL below for reference
SELECT
v1.guid,
CONVERT(DATE, v1.CreatedOn) AS CreatedOn,
field1,
field2,
field3,
field4,
field5
FROM
View1 v1
INNER JOIN View2 V2 ON v1.guid = v2.guid
WHERE
field6 = 'value'
(obligatory those aren't the actual field names)
I'm getting the expected result, its just taking way too long for its purpose.
Any help optimising would be appreciated
Try the following:
SELECT *
INTO #view1
FROM view1
SELECT *
INTO #view2
FROM view2
SELECT
v1.guid,
CONVERT(DATE, v1.CreatedOn) AS CreatedOn,
field1,
field2,
field3,
field4,
field5
FROM #View1 v1
INNER JOIN #View2 V2
ON v1.guid = v2.guid
WHERE
field6 = 'value'
The first two statements materialized the view data in temporary tables. If the engine is not able to build good execution plan in your original query, the above should help.
If the above is not helping, try to defined the temporary tables the better way defining primary keys. Something like this:
CREATE TABLE #view1
(
guid UNIQUEIDENTIFIER PRIMARY KEY
....
)
INSERT INTO #view1
SELECT *
FROM view1
So, in this way the data should be ordered by GUID and in theory we should get faster join.
The above can lead to better performance but we have a bigger issue here - you are joining by UNIQUEIDENTIFIER - I know you may see people using this as primary key, but you will finding joining by int or bigint faster. If you need such guid column in order not to expose internal IDs in your application or something else, this does not not mean you can't have integer column to perform the joins in the SQL.
Also, if you are not able to store the data in the view in temporary tables, you can check how indexed views are created and if you can - store the data that is needed only (apply filtering criteria in advance) - for example:
INSERT INTO #view1
SELECT *
FROM view1
WHERE field6 = 'value'
So, now the table has fewer rows, right?
It turns out the answer was don't select 100+ columns into a view then select 1 column out of it! who would've thought it

sql query to Combine different fields from 2 different tables with no relations except a common field - SQL Server Compact 3.5 SP2

My first question here. This has been a really helpful platform so far. I am some what a newbie in sql. But I have a freelance project in hand which I should release this month.(reporting application with no database writes)
To the point now: I have been provided with data (excel sheets with rows spanning up to 135000). Requirement is to implement a standalone application. I decided to use sql server compact 3.5 sp2 and C#. Due to time pressure(I thought it made sense too), I created tables based on each xls module, with fields of each tables matching the names of the headers in the xls, so that it can be easily imported via CSV import using SDF viewer or sql server compact toolbox added in visual studio. (so no further table normalizations done due to this reason).
I have a UI design for a typical form1 in which inputs from controls in it are to be checked in an sql query spanning 2 or 3 tables. (eg: I have groupbox1 with checkboxes (names matching field1,field2.. of table1) and groupbox2 with checkboxes matching field3, field4 of table2). also date controls based on which a common 'DateTimeField' is checked in each of the tables.
There are no foreign keys defined on tables for linking(did not arise the need to, since the data are different for each). The only commmon field is a 'DateTimeField'(same name) which exists in each table. (basically readings on a datetime stamp from locations. field1, field 2 etc are locations. For a particular datetime there may or may not be readings from table 1 or table2)
How will I accomplish an sql select query(using Union/joins/nested selects - if sql compact 3.5 supports it) to return fields from the 2 tables based on datetime(where clause). For a given date time there can be even empty values for fields in table 2. I have done a lot of research on this and tried as well. but not yet a good solution probably also due to my bad experience. apologies!
I would really appreciate any of your help! Can provide a sample of the data how it looks if you need it. Thanks in advance.
Edit:
Sample Data (simple as that)
Table 1
t1Id xDateTime loc1 loc2 loc3
(could not format the tabular schmema here. sorry. but this is self explanatory)
... and so on up to 135000 records existing imported from xls
Table 2
t2Id xDateTime loc4 loc5 loc6
.. and so on up to 100000 records imported from xls. merging table 1 and table 2 will result in a huge amount of blank rows/values for a date time.. hence leaving it as it is.
But a UI multiselect(loc1,loc2,loc4,loc5 from both t1 and t2) event from winform needs to combine the result from both tables based on a datetime.
... and so on
I managed to write it which comes very close. I say very close cause i have test in detail with different combination of inputs.. Thanks to No'am for the hint. Will mark as answer if everything goes well.
SELECT T1.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
INNER JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION
SELECT T2.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
RIGHT JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION
SELECT T1.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
LEFT JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
If 't1DateTime' and 't2DateTime' are the common fields, then apparently you need a query such as
SELECT table1.t1DateTime, table1.tiID, table1.loc2, table2.t2id, table2.loc4
FROM table1
INNER JOIN table2 ON table2.t2DateTime = table1.t1DateTime
This will give you values from rows which match in both tables, according to DateTime. If there is also supposed to be a match with the locations then you will have to add the desired condition to the 'ON' statement.
Based on your comment:
For a given date time there can be even empty values for fields in table 2
my understanding would be that you are not interested in orphaned records in table 2 (based on date) so in that case a LEFT JOIN would do it:
SELECT table1.t1DateTime, table1.tiID, table1.loc2, table2.t2id, table2.loc4
FROM table1
LEFT JOIN table2 ON table2.t2DateTime = table1.t1DateTime
However if there are also entries in table2 with no matching dates in table1 that you need to return you could try this:
SELECT table1.t1DateTime, table1.tiID, table1.loc2, ISNULL(table2.t2id, 0), ISNULL(table2.loc4, 0.0)
FROM table1
LEFT JOIN table2 ON table2.t2DateTime = table1.t1DateTime
WHERE (T1.t1DateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION ALL
SELECT table2.t2DateTime, '0', '0.0', table2.t2id, table2.loc4
FROM table2
LEFT OUTER JOIN table1 on table1.t1DateTime=table2.t2DateTime
WHERE table1.t1Datetime IS NULL AND T2.t2DateTime BETWEEN 'somevalue1' AND 'somevalue2'
Thanks a lot to #kbbucks.
Works with this so far.
SELECT T1.MonitorDateTime, T1.loc2, T.loc4
FROM Table1 T1
LEFT JOIN Table2 T2 ON T2.MonitorDateTime = T1.MonitorDateTime
WHERE T1.MonitorDateTime BETWEEN '04/05/2011 15:10:00' AND '04/05/2011 16:00:00'
UNION ALL
SELECT T2.MonitorDateTime, '', T2.loc4
FROM Table2 T2
LEFT OUTER JOIN Table1 T1 ON T1.MonitorDateTime = T2.MonitorDateTime
WHERE T1.MonitorDateTime IS NULL AND T2.MonitorDateTime BETWEEN '04/05/2011 15:10:00' AND '04/05/2011 16:00:00'

What would be the best way to write this query

I have a table in my database that has 1.1MM records. I have another table in my database that has about 2000 records under the field name, "NAME". What I want to do is do a search from Table 1 using the smaller table and pull the records where they match the smaller tables record. For example Table 1 has First Name, Last Name. Table 2 has Name, I want to find every record in Table 1 that contains any of Table 2 Names in either the first name field or the second name field. I tried just making an access query but my computer just froze. Any thoughts would be appreaciated.
have you considered the following:
Select Table1.FirstName, Table1.LastName
from Table1
where EXISTS(Select * from Table2 WHERE Name = Table1.FirstName)
or EXISTS(Select * from Table2 WHERE Name = Table1.LastName)
I have found before that on large tables this might work better than an inner join.
Be sure to create indexes on Table1.first_name, Table1.last_name, and Table2.name. They will dramatically speed up your query.
Edit: For Microsoft Access 2007, see CREATE INDEX.
See above previous notes about indexes, but I believe from your description, you want something like:
select table1.* from table1
inner join
table2 on (table1.first_name = table2.name OR table1.last_name = table2.name);
It should go something like this,
Select Table1.FirstName, Table1.LastName
from Table1
where Table1.FirstName IN (Select Distinct Name from Table2)
or Table1.LastName IN (Select Distinct Name from Table2)
And there are various other ways to run this same query, i would suggest you see execution plan for each of these queries to find out which one is the fastest. In addition creating indexes on the column which is used in a "where" condition will also speed up the query.
i agree with astander. based on my experience, using EXIST instead of IN is a lot faster.