Row-level security in Cloudera Impala - sql

I need to implement row level security based on user id in Impala. The approach I am following right now is that I have a user to role mapping, and use that to form a master query as follows:
create view dervied_view as
select *, 1 as roleid from src_table where a = 1 and b = 2
union
select *, 2 as roleid from src_table where a = 1 and b = 3
...
...
And then, have another query as follows:
create view well_known_named_view as
select * from derived_view where roleid in
(select roleid from role_mapping table where userid = effective_user());
This way, whenever a user logs in, he just needs to query the well known view, without the need to create a view on a per user/role basis. The problem is that this query times out in Hue (which is where it will be used most often), and takes at least 10 minutes to execute a basic query on in the shell. Is there a better way to make this work?

Related

How to give access and partial access to specifc Users in ORACLE database

I am trying to create a view in an Oracle database whereby if a USER is in the PLAYER field of the base table, only the rows of this player are outputted, but if a USER matches the MANAGER role of another base table, all rows of this first base table are outputted.
So far I have this which I'm not sure even works?
CREATE VIEW Player_view AS
SELECT (CASE WHEN USER IN PT.PLAYER THEN (SELECT * FROM PT WHERE USER = PT.PLAYER)
ELSE WHEN USER IN PM.MANAGER THEN (SELECT * FROM PT)
END
FROM Player_Table PT, Player_Managers PM
Otherwise, I have tried with grant permissions - however, how do I give grant permission to SELECT over just one row vs. all rows?
Probably something like the following should suffice
create view player_view as
select * from player_table
where player = user
union all
select * from player_table
where exists ( select 1 from player_manager where manager = user )
If the two are not mutually exclusive, then you could add a 'not exists' to the second query
create view player_view as
select * from player_table
where player = user
union all
select * from player_table
where exists ( select 1 from player_manager where manager = user )
and player != user

How to apply security policy on a table which depends information of inner join table?

I want to implement Row Level Security to my Database. In my database there are 2 table. First table consist id,name and status column. Status column is used to determine the level of a record. Second table consist wage,time and user_id column.
User Table
id
name
status
10
james
non-vip
11
mark
vip
12
edward
non-vip
Note: id is unique
Wage Table
userid
wage
month
10
100
jan
11
500
jan
12
250
jan
Normally when i run for "select * from wagetable where wage > 200" this query, it will return record 11 and 12. However i want that vip people's wage shouldn't be seen by HR analysts. To accomplish this task i will create a security policy on wage table but in wage table i dont have the information of status. I only have user code. How can i overcome this problem ?
I've solved the problem.By creating a view and applying policy on the view.
Firstly i have created a view inner joins the wage table with user table to obtain user's status information.
Secondly i defined a function
CREATE FUNCTION [dbo].[fn_RowLevelSecurity] (#status varchar(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 as fn_SecureData
Where #status = user_name()
And Finally i applied security policy by using function to my view.
CREATE SECURITY POLICY RowFilter
ADD FILTER PREDICATE dbo.fn_RowLevelSecurity(status)
ON dbo.MyView
WITH (STATE = ON);

Massive Delete statement - How to improve query execution time?

I have a Spring batch that will run everyday to :
Read CSV files and import them into our database
Aggregate this data and save these aggregated data into another table.
We have a table BATCH_LIST that contains information about all the batchs that were already executed.
BATCH_LIST has the following columns :
1. BATCH_ID
2. EXECUTION_DATE
3. STATUS
Among the CSV files that are imported, we have one CSV file to feed a APP_USERS table, and another one to feed the ACCOUNTS table.
APP_USERS has the following columns :
1. USER_ID
2. BATCH_ID
-- more columns
ACCOUNTS has the following columns :
1. ACCOUNT_ID
2. BATCH_ID
-- more columns
In step 2, we aggregate data from ACCOUNTS and APP_USERS to insert rows into a USER_ACCOUNT_RELATION table. This table has exactly two columns : ACCOUNT_ID (refering to ACCOUNTS.ACCOUNT_ID) and USER_ID (refering to APP_USERS.USER_ID).
Now we want to add another step in our Spring batch. We want to delete all the data from USER_ACCOUNT_RELATION table but also APP_USERS and ACCOUNTS that are no longer relevant (ie data that was imported before sysdate - 2.
What has been done so far :
Get all the BATCH_ID that we want to remove from the database
SELECT BATCH_ID FROM BATCH_LIST WHERE trunc(EXECUTION_DATE) < sysdate - 2
For each BATCH_ID, we are calling the following methods :
public void deleteAppUsersByBatchId(Connection connection, long batchId) throws SQLException
// prepared statements to delete User account relation and user
And here are the two prepared statements :
DELETE FROM USER_ACCOUNT_RELATION
WHERE USER_ID IN (
SELECT USER_ID FROM APP_USERS WHERE BATCH_ID = ?
);
DELETE FROM APP_USERS WHERE BATCH_ID = ?
My issue is that it takes too long to delete data for one BATCH_ID (more than 1 hour).
Note : I only mentioned the APP_USERS, ACCOUNTS AND USER_ACCOUNT_RELATION tables, but I actually have around 25 tables to delete.
How can I improve the query time ?
(I've just tried to change the WHERE USER_ID IN () into an EXISTS. It is better but still way too long.
If that will be your regular process, ie you want to store only last 2 days, you don't need indexes, since every time you will delete 1/3 of all rows.
It's better to use just 3 deletes instead of 3*7 separate deletes:
DELETE FROM USER_ACCOUNT_RELATION
WHERE ACCOUNT_ID IN
(
SELECT u.ID
FROM {USER} u
join {FILE} f
on u.FILE_ID = f.file
WHERE trunc(f.IMPORT_DATE) < (sysdate - 2)
);
DELETE FROM {USER}
WHERE FILE_ID in (select FILE_ID from {file} where trunc(IMPORT_DATE) < (sysdate - 2));
DELETE FROM {ACCOUNT}
WHERE FILE_ID in (select FILE_ID from {file} where trunc(IMPORT_DATE) < (sysdate - 2));
Just replace {USER}, {FILE}, {ACCOUNT} with your real table names.
Obviously in case of partitioning option it would be much easier - daily interval partitioning, so you could easily drop old partitions.
But even in your case, there is also another more difficult but really fast solution - "partition views": for example for ACCOUNT, you can create 3 different tables ACCOUNT_1, ACCOUNT_2 and ACCOUNT_3, then create partition view:
create view ACCOUNT as
select 1 table_id, a1.* from ACCOUNT_1 a1
union all
select 2 table_id, a2.* from ACCOUNT_2 a2
union all
select 3 table_id, a3.* from ACCOUNT_3 a3;
Then you can use instead of trigger on this view to insert daily data into own table: first day into account_1,second - account_2, etc. And truncate old table each midnight. You can easily get table name using
select 'ACCOUNT_'|| (mod(to_char(sysdate, 'j'),3)+1) tab_name from dual;

Get downlines from a particular user in Hibernate or SQL

I have a User table which looks like below
UserID Name SponsorID (FK)
1 A null
2 B 1
3 C 1
4 D 3
The SponsorID refers to UserID. Now I need write a query which returns all user who is descendant of a given UserID.
Example
For UserID 1 the query returns all 4 users
For UserID 3 the query should return 1 user
The current implementation is getting the user list by looping each direct downline and I am looking for a better solution if it's possible.
UPDATE
Current code
public void findDownlineSponsorByUserBO(UserBO rootBO) throws Exception {
List<UserBO> downlines = businessOperationService.findUserBySponsorId(rootBO.getId(), "createdDate", false);
memberList.addAll(downlines);
for (UserBO memberBO : downlines) {
findDownlineSponsorByUserBO(memberBO);
}
}
You're going to have to use an iterative or recursive solution here, unless (perhaps) you're limited to one level of sponsor and can relate UserID to SponsorID in one join. You could load the table into a tree structure in memory, and then query that. Loading it would be O(nlogn), but then traversing it would be O(logn).
This other SO question might give you some useful ideas: Is it possible to query a tree structure table in MySQL in a single query, to any depth?

How to design the Tables / Query for (m:n relation?)

I am sorry if the term m:n is not correct, If you know a better term i will correct. I have the following situation, this is my original data:
gameID
participID
result
the data itself looks like that
1 5 10
1 4 -10
2 5 150
2 2 -100
2 1 -50
when i would extract this table it will easily have some 100mio rows and around 1mio participIDs ore more.
i will need:
show me all results of all games from participant x, where participant y was present
luckily only for a very limited amount of participants, but those are subject to change so I need a complete table and can reduce in a second step.
my idea is the following, it just looks very unoptimized
1) get the list of games where the "point of view participant" is included"
insert into consolidatedtable (gameid, participid, result)
select gameID,participID,sum(result) from mastertable where participID=x and result<>0
2) get all games where other participant is included
insert into consolidatedtable (gameid, participid, result)
where gameID in (select gameID from consolidatedtable)
AND participID=y and result<>0
3) delete all games from consolidate table where count<2
delete from consolidatedDB where gameID in (select gameid from consolidatedtable where count(distinct(participID)<2 group by gameid)
the whole thing looks like a childrens solution to me
I need a consolidated table for each player
I insert way to many games into this table and delete them later on
the whole thing needs to be run participant by participant over
the whole master table, it would not work if i do this for several
participants at the same time
any better ideas, must be, this ones just so bad. the master table will be postgreSQL on the DW server, the consolidated view will be mySQL (but the number crunching will be done in postgreSQL)
my problems
1) how do i build the consolidated table(s - do i need more than one), without having to run a single query for each player over the whole master table (i need to data for players x,y,z and no matter who else is playing) - this is the consolidation task for the DW server, it should create the table for webserver (which is condensed)
2) how can i then extract the at the webserver fast (so the table design of (1) should take this into consideration. we are not talking about a lot of players here i need this info, maybe 100? (so i could then either partition by player ID, or just create single table)
Datawarehouse: postgreSQL 9.2 (48GB, SSD)
Webserver: mySQL 5.5 (4GB Ram, SSD)
master table: gameid BIGINT, participID, Result INT, foreign key on particiP ID (to participants table)
the DW server will hold the master table, the DW server should also prepare the consolidated/extracted Tables (processing power, ssd space is not
an issue)
the webserver should hold the consoldiated tables (only for the 100
players where i need the info) and query this data in a very
efficient manner
so efficient query at webserver >> workload of DW server)
i think this is important, sorry that i didnt include it at the beginning.
the data at the DW server updates daily, but i do not need to query the whole "master table" completely every day. the setup allows me to consolidate only never values. eg: yesterday consolidation was up to ID 500, current ID=550, so today i only consolidate 501-550.
Here is another idea that might work, depending on your database (and my understanding of the question):
SELECT *
FROM table a
WHERE participID = 'x'
AND EXISTS (
SELECT 1 FROM table b
WHERE b.participID = 'y'
AND b.gameID=a.gameID
);
Assuming you have indexes on the two columns (participID and gameID), the performance should be good.
I'd compare it to this and see which runs faster:
SELECT *
FROM table a
JOIN (
SELECT gameID
FROM table
WHERE participID = 'y'
GROUP BY gameID
) b
ON a.gameID=b.gameID
WHERE a.participID = 'x';
Sounds like you just want a self join:
For all participants:
SELECT x.gameID, x.participID, x.results, y.participID, y.results
FROM table as x
JOIN table as y
ON T1.gameID = T2.gameID
WHERE x.participID <> y.participID
The downside of that is you'd get each participant on each side of each game.
For 2 specific particpants:
SELECT x.gameID, x.results, y.results
FROM (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'x'
and results <> 0)
as x
JOIN (SELECT gameID, participID, results
FROM table
WHERE t1.participID = 'y'
and results <> 0)
as y
ON T1.gameID = T2.gameID
You might not need to select participID in your query, depending on what you're doing with the results.