I've a question regarding QlikView Direct Discovery (DD).
I'm importing first a whole database table in memory of QlikView via
SQL SELECT `customer_id`, `customer`, `run_id` FROM `db_customer`.`qry_qlikview_customer`;
Afterwards DD:
DIRECT QUERY
DIMENSION customer_id, run_id
MEASURE deal_id, type_id
DETAIL kri1, kri2, kri3
FROM db_customer.qry_qlikview_direct_discovery;
The run_id represent a date and is several times per customer available.
The script run through without any error and the table relation looks quite good e.g. $Syn 1 = customer_id and run_id
However, if I select an customer and a run_id, QlikView tell me that the direct discovery can not run through.
Out of the ODBC errorlog:
1497001997:SELECT customer_id, customer, run_id FROM db_customer.qry_qlikview_customer;
1497001997:Using direct execution;
1497001998:query has been executed;
1497001998:SELECT DISTINCT customer_id FROM db_customer.qry_qlikview_direct_discovery;
1497001998:Using direct execution;
1497002100:query has been executed;
1497002101:SELECT DISTINCT run_id FROM db_customer.qry_qlikview_direct_discovery;
1497002101:Using direct execution;
1497002198:query has been executed;
If I change the position of run_id from DIMENSION to MEASURE, the dd statement give results back. Unfortunately for all run_id's and not only for the selected one.
Can anybody help or have an idea?
Thank you very much in advance
Best regards
Andreas
Having a synthetic key $Syn 1 = customer_id and run_id on the direct discovery table is a known limitation and Qlik does not support this.
Create a new key field in the database table that is a concatination of customer_id and run_id like customer_id|run_id and do the same in Qlik so you have that key as a single connection between your tables.
Related
Table contains the following columns:
timestamp, date, customer_id, page_id
For example, query is:
for each customer, identify the first "page_id" that customer visited most recent day.
If there is a "read-only access" database. Are queries written differently?
Thanks in advance.
Read-only access means you can only read and do query's that can read the database.
You won't be able to insert,update or drop tables.
I have a table that contains stop times for a transit system. The details aren't important, but my table essentially looks like this:
I am importing the data from a CSV file which contains everything except the next stop ID. I want to generate Next Stop ID to speed up some data processing I am going to do in my app.
For each row, the Next Stop ID should be the Stop ID from the next row with matching Trip ID and Service ID. The ordering should be based on the Stop Sequence, which will be increasing but not necessarily in order (1, 20, 21, 23, etc rather than 1,2,3,4...).
Here is an example of what I'm hoping it will look like. For simplicity, I kept all the service IDs the same and there are two Trip IDs. If there is no next stop I want that entry to just be blank.
I think it makes sense to do this entirely in SQL, but I'm not sure how best to do it. I know how I would do it in a standard programming language, but not SQL. Thank you for your help.
You can use lead():
select
t.*,
lead(stop_id)
over(partition by trip_id, service_id order by stop_sequence) next_stop_id
from mytable t
It is not ncessarily an good idea to actally store that derived information, since you can compute on the fly when needed (you can put the query in a view to make it easier to access it). But if you want this in an update, then, assuming that stop_id is the primary key of the table, that would look like:
update mytable
set next_stop_id = t.next_stop_id
from (
select
stop_id,
lead(stop_id) over(partition by trip_id, service_id order by stop_id) next_stop_id
from mytable
) t
where mytable.stop_id = t.stop_id
Given that I have two tables
Customer (id int, username varchar)
Order (customer_id int, order_date datetime)
Now I want to insert into Order table based on customer information which is available in Customer table.
There are a couple of ways I can approch this problem.
First - I can query the customer information into a variable and then use it in an INSERT statement.
DECLARE #Customer_ID int
SELECT #Customer_ID = id FROM Customer where username = 'john.smith'
INSERT INTO Orders (customer_id, order_date) VALUES (#Customer_ID, GETDATE())
Second Approach is to use a combination of INSERT and SELECT query.
INSERT INTO Orders (customer_id, order_date)
SELECT id, GETDATE() FROM Customers
WHERE username = 'john.smith'
So my question is that which is a better way to proceed in terms of speed and overhead and why ? I know if we have a lot of information getting queried from Customer table then the second approach is much better.
p.s. I was asked this question in one of the technical interviews.
The second approach is better.
The first approach will fail if the customer is not found. No check is being done to make sure the customer id has been returned.
The second approach will do nothing if the customer is not found.
From an overhead approach why create variables if they are not needed. Set based sql is usually the better approach.
In a typical real-world order-entry system, the user has already looked the Customer up via a Search interface, or has chosen the customer from a list of customers displayed alphabetically; so your client program, when it goes to insert an order for that customer, already knows the CustomerID.
Furthermore, the order date is typically defaulted to getdate() as part of the ORDERS table definition, and your query can usually ignore that column.
But to handle multiple line items on an order, your insert into ORDER_HEADER needs to return the order header id so that it can be inserted into the ORDER DETAIL line item(s) child rows.
I don't recommend either approach. Why do you have the customer name and not the id in the first place? Don't you have a user interface that maintains a reference to the current customer by holding the ID in its state? Doing the lookup by name exposes you to potentially selecting the wrong customer.
If you must do this for reasons unknown to me, the 2nd approach is certainly more efficient because it only contains one statement.
Make the customer id in order table a foreign key which refers to customer table.
I'd appreciate any help you can offer - I'm currently trying to decide on a schema for a voting app I'm building with PHP / MySQL, but I'm completely stuck on how to optimise it. The key elements are to allow only one vote per user per item, and be able to build a chart detailing the top items of the month – based on votes received that month.
So far the initial schema is:
Items_table
item_id
total_points
(lots of other fields unrelated to voting)
Voting_table
voting_id
item_id
user_id
vote (1 = up; 0 = down)
month_cast
year_cast
So I'm wondering if it's going to be a case of selecting all information from voting table where month = currentMonth & year = currentYear, somehow running a count and grouping by item_id; if so, how would I go about doing so? Or would I be better off creating a separate table for monthly charts which is updated with each vote, but then should I be concerned with the requirement to update 3 database tables per vote?
I'm not particularly competent – if it shows – so would really love any help / guidance someone could provide.
Thanks,
_just_me
I wouldn't add separate tables for monthly charts; to prevent users from casting more than one vote per item, you could use a unique key on voting_table(item_id, user_id).
As for the summary, you should be able to use a simple query like
select item_id, vote, count(*), month, year
from voting_table
group by item_id, vote, month, year
I would use a voting table similar to this:
create table votes(
item_id
,user_id
,vote_dtm
,vote
,primary key(item_id, user_id)
,foreign key(item_id) references item(item_id)
,foreign key(user_id) references users(user_id)
)Engine=InnoDB;
Using a composite key on a innodb table will cluster the data around the items, making it much faster to find the votes related to an item. I added a column vote_dtm which would hold the timestamp for when the user voted.
Then I would create one or several views, used for reporting purposes.
create view votes_monthly as
select item_id
,year(vote_dtm) as year
,month(vote_dtm) as month
,sum(vote) as score
,count(*) as num_votes
from votes
group
by item_id
,year(vote_dtm)
,month(vote_dtm);
If you start having performance issues, you can replace the view with a table containing pre-computed values without even touching the reporting code.
Note that I used both count(*) and sum(vote). The count(*) would return the number of cast votes, whereas the sum would return the number of up-votes. Howver, if you changed the vote column to use +1 for upvotes and -1 for downvotes, a sum(vote) would return a score much like the votes on stackoverflow are calculated.
I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:
There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.
Here is the simplified schema:
Table PARTS_SOLD:
part_id
date
other columns
Table PARTS_PURCHASED
part_id
date
other columns
What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.
The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.
I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.
You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.
If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.
Perhaps the following would help:
SELECT S.*
FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
FROM PARTS_PURCHASED
GROUP BY PART_ID) D
ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE
Share and enjoy.
I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:
SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date
FROM PARTS_PURCHASED
GROUP BY part_id) AS subtable
ON PARTS_SOLD.part_id = subtable.part_id
AND PARTS_SOLD.date < subtable.max_date