In QlikView, I have a table Data and one database table A. Table A should be used twice (A_Left, A_Right). (Table A can have thousands of entries.)
My load script is:
A_Left:
Load a_id_left,
a_name_left
inline [
a_id_left, a_name_left
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
A_Rigtht:
Load a_id_right,
a_name_right
inline [
a_id_right, a_name_right
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
Data:
Load id,
a_id_left,
a_name_left as 'Name_Left',
a_id_right,
a_name_right as 'Name_Right',
data
inline [
id, a_id_left, a_right_id, data
1, 1, 2, 37
1, 1, 3, 18
1, 2, 3, 62
];
So my question is: What is the best way to use lookup tables in QlikView?
(Should I use MAPPING and/or ApplyMap? Why? Is that faster?)
One other part of the question is: Would it help change the data structure from star to table?
(I know that would cost more memory.) And, by the way: How could I put all data in one table
so that I can store it completely in one QVD file?
Thanks for help an ideas.
For simple lookups where you wish to look up a single value from another value you can use a MAPPING load and then use the ApplyMap() function. For example, say I have the following table:
LOAD
*
INLINE [
UserID, System
1, Windows
2, Linux
3, Windows
];
I have another table that contains UserID and UserName as follows:
LOAD
*
INLINE [
UserID, UserName
1, Alice
2, Bob
3, Carol
];
I can then combine the above tables with ApplyMap as follows:
UserNameMap:
MAPPING LOAD
*
INLINE [
UserID, UserName
1, Alice
2, Bob
3, Carol
];
SystemData:
LOAD
UserID,
ApplyMap('UserNameMap', UserID, 'MISSING') as UserName,
System
INLINE [
UserID, System
1, Windows
2, Linux
3, Windows
];
ApplyMap is very fast and should not significantly slow down your load time (although it will not be as fast as a direct QVD load). However, as mentioned ApplyMap can only be used if you wish to map a single value into your table. For more fields, you will need to use a join (which is similar to a SQL JOIN) if you wish to combine your results into a single table.
If you do not wish to join them into a single table (but keep it as a "star" scheme), just make sure that the fields that you wish to link are named the same. For example:
A_Left:
Load a_id_left,
a_name_left as [Name_Left]
inline [
a_id_left, a_name_left
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
A_Rigtht:
Load a_id_right,
a_name_right as [Name_Right]
inline [
a_id_right, a_name_right
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
Data:
Load id,
a_id_left,
a_id_right,
data
inline [
id, a_id_left, a_right_id, data
1, 1, 2, 37
1, 1, 3, 18
1, 2, 3, 62
];
(I have removed your "name" fields from "Data" as it would fail to load).
This will then work in your QlikView document due to QlikView's automatic field associativity.
However, if you wish to have the data in a single table (e.g. for output to QVD) then in your case you will need to JOIN your two tables into Data. We can rearrange some of the tables to make our life a bit easier, if we put your Data table first, we can then join your other two tables on:
Data:
Load id,
a_id_left,
a_id_right,
data
inline [
id, a_id_left, a_id_right, data
1, 1, 2, 37
1, 1, 3, 18
1, 2, 3, 62
];
LEFT JOIN (Data)
Load a_id_left,
a_name_left as [Name_Left]
inline [
a_id_left, a_name_left
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
LEFT JOIN (Data)
Load a_id_right,
a_name_right as [Name_Right]
inline [
a_id_right, a_name_right
1, 'nwsnd'
2, 'dcsdcws'
3, 'fsdf' ];
This will then resort in a single table named "Data" which you can then output to QVD etc.
You may wish to think about optimising your "Table A" extract since it is almost being loaded twice, this may take some time (e.g. from long distance server etc.) so it may be better to grab your data in one go and then slice it once it's in memory (much faster). A quick example could be like the below:
TableA:
LOAD
a_id_left,
a_id_right,
a_name_left,
a_name_right
FROM ...;
Data:
Load id,
a_id_left,
a_id_right,
data
inline [
id, a_id_left, a_id_right, data
1, 1, 2, 37
1, 1, 3, 18
1, 2, 3, 62
];
LEFT JOIN (Data)
LOAD DISTINCT
a_id_left,
a_name_left as [Name_Left]
RESIDENT TableA;
LEFT JOIN (Data)
LOAD DISTINCT
a_id_right,
a_name_right as [Name_Right]
RESIDENT TableA;
DROP TABLE TableA;
To answer the last part, you can use the CONCATENATE function to append the contents of one table to anoher, i.e.
Final:
Load *
from
Table1
CONCATENATE
Load *
from
Table2
will give you one table, called Final with the merged contents of Table1 and Table2.
Related
I have a model that looks like this:
Requests: user, req_time, req_text
In the DB, the records can look like this:
id, user_id, req_time, req_text
1 1 TIMESTAMP YES
2 1 TIMESTAMP NO
3 2 TIMESTAMP YES
etc.
How do I write a Django ORM query that: groups the Requests by user, filters the Requests based on req_text, and also, select the max id of the resulting result set. So for each user, I will return one row which matches the filter condition and also has the greatest id.
from django.db.models.aggregates import Max
request_values = Requests.objects.filter(req_text='YES') \
.values('user') \
.annotate(max_id=Max('id'))
Then request_values will look like this:
[
{'user': 1, 'max_id': 1},
{'user': 2, 'max_id': 4},
{'user': 3, 'max_id': 5},
{'user': 4, 'max_id': 12},
...
]
I'm working on a kanban board using Postgres as my database. Normally I would use MongoDB for something like this, but I'm trying to practice and improve my understanding of relational databases.
So I have boards, which can have many lanes, and lanes can have many cards. Boards, lanes, and cards are 3 tables in the database. Lanes have a boardId which is a board.id value, cards have a laneId which is a lane.id value.
I want to query for a board by its id, but also have it include an array of its lanes. Then, I want each of the lanes to include an array of its cards. Like this:
{
id:123
title: 'Board title',
lanes: [{id: 0, title: 'Lane title', cards: [{id: 0, text: 'Card text'}]}, {id: 1, title: 'Lane title', cards: [{id: 1, text: 'Card text'}, {id: 2, text: 'Card text'}]}]
}
I have the first part down with a query that gets a board, then creates an array of lanes. Not sure if this is the 'right' way to do it, but here it is, with an example looking for a board with id '123':
select "boards"."id" as "boardId", "boards"."title" as "boardTitle", ARRAY_AGG(json_build_object('id', lanes.id, 'title', lanes.title)) as lanes from "boards" inner join "lanes" on "boards"."id" = "lanes"."boardId" and "boards"."id" = 123 group by "boards"."id"
But I'm not sure how I would get the cards to be included as a cards array for each element in the lanes array. My guess is that I could add another join like "cards" on "lanes"."id" = "cards"."laneId"... but then I don't know how I would include the cards for each lane in the json_build_object.
It would be best to accept json from the database in order to put rows inside of arrays.
I would build this up using CTEs to make it clear as possible:
with agg_lanes as (
select lane_id as id, jsonb_agg(cards) as "cards"
from cards
group by lane_id
), agg_boards as (
select b.id, b.title, jsonb_agg(a) as "lanes"
from agg_lanes a
join lanes l on l.id = a.id
join boards b on b.id = l.board_id
group by b.id, b.title
)
select to_json(a)
from agg_boards a
;
Working fiddle here.
I have a table visitors(id, email, first_seen, sessions, etc.)
and another table trackings(id, visitor_id, field, value) that stores custom, user supplied data.
I want to query these and merge the visitor data columns and the trackings into a single column called data
For example, say I have two trackings
(id: 3, visitor_id: 1, field: "orders_made", value: 2)
(id: 4, visitor_id: 1, field: "city", value: 'new york')
and a visitor
(id: 1, email: 'hello#gmail.com, sessions: 5)
I want the result to be on the form of
(id: 1, data: {email: 'hello#gmail.com', sessions: 5, orders_made: 2, city: 'new york'})
What's the best way to accomplish this using Postgres 9.4?
I'll start by saying trackings is a bad idea. If you don't have many things to track, just store json instead; that's what it's made for. If you have a lot of things to track, you'll become very unhappy with the performance of trackings over time.
First you need a json object from trackings:
-- WARNING: Behavior of this with duplicate field names is undefined!
SELECT json_object(array_agg(field), array_agg(value)) FROM trackings WHERE ...
Getting json for visitors is relatively easy:
SELECT row_to_json(email, sessions) FROM visitors WHERE ...;
I recommend you do not just squash all those together. What happens if you have a field called email? Instead:
SELECT row_to_json((SELECT
(
SELECT row_to_json(email, sessions) FROM visitors WHERE ...
) AS visitor
, (
SELECT json_object(array_agg(field), array_agg(value)) FROM trackings WHERE ...
) AS trackings
));
I have a collection called survey_data which has following fields
topic_id
indicator_id
population_id
The survey_data has over 2.8M records. I want to fetch the populations from populations collection for a given set of indicator_id and topic_id. But the query below is taking 20 seconds even after adding a compound index for all the fields.
db.survey_data.find({"topic_id":60,"indicator_id":16)
How can i improve the performance? May be a single query using "mongoid" for rails3 would be preferred.
Explain
{
"cursor" : "BtreeCursor data_source_index",
"isMultiKey" : false,
"n" : 2261852,
"nscannedObjects" : 2261852,
"nscanned" : 2261852,
"nscannedObjectsAllPlans" : 2261852,
"nscannedAllPlans" : 2261852,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 21,
"nChunkSkips" : 0,
"millis" : 19952,
"indexBounds" : {
"data_source_id" : [
[
60,
60
]
]
}
SurveyData Index:
index({data_source_id: 1,
data_source_year_id: 1,
indicator_id: 1,
indicator_option_id: 1,
country_id: 1,
provinces_state_id: 1,
health_regions_id: 1,
health_regions_type_id: 1,
other_administrative_boundary_id: 1,
sub_population_options_id: 1,
reportability_id: 1},
{name:"survey_data_index",
background: true
})
Three things to look at:
topic_id doesn't appear to be in your index.
Try creating an index with just the fields that you are querying on, in the same order as your query.
Do you need to grab 100,000 records all at once? If you pull the first 100 records using limit does it speed things up?
There are some really great resources on query tuning. Here are a couple:
Optimize Query Performance with Indexes and Projections
Automated Slow Query Analysis: Dex the Index Robot
If you can't do it, how would SQL do it?
Basically I want to select all my question objects where there is at least two with the same attribute. That attribute is called, lets say, word_id.
So how would I only select all the objects that that share only once a common attribute with another object?
If I have three objects :
# Question(id: 1, word_id: 1)
# Question(id: 2, word_id: 2)
# Question(id: 3, word_id: 2)
# Question(id: 4, word_id: 1)
# Question(id: 5, word_id: 1)
# Question(id: 6, word_id: 1)
I would want to return just id's 2 and 3 since they both share a common attribute twice.
Is that possible? I crudely do this by making two calls to the DB where first I call all the objects in question, add them to an array, and subtract from that array objects that match my requirements. I was just curious if there was a more elegant way to do it all at once.
Just SQL:
SELECT * FROM questions WHERE world_id IN (
SELECT world_id FROM questions GROUP BY world_id HAVING count(*) = 2
)
Rails:
Question.where("world_id IN (?)", Question.find(:all, select: "world_id",
group: "world_id HAVING count(*) = 2"))
I guess that's still two queries though...