SparkSQL in EMR to fetch Data from AWS Glue (Cross Account) - apache-spark-sql

I have a EMR cluster on which I am running a SparkSQL Job to fetch data from AWS Glue Catalogue (S3) and both are present in different accounts.
My query is something of the form:-
CREATE OR REPLACE VIEW employee AS
SELECT
pay.recordid,
pay.employeeid,
pay.amount,
pay.paycode,
pay.paydate,
pay.paycycle,
pay.updatetime
FROM database.table pay
WHERE
pay.partition_0 in (var1)
and pay.partition_1 in (var2)
and pay.partition_2 in (var3)
and paycode = 'P1'
AND paycycle = 'M'
AND country = 'test'
AND paydate = ( SELECT DISTINCT paydate FROM default.table2
WHERE CURRENT_DATE < DATE(paydate) AND CURRENT_DATE > DATE(payperiodstart)
AND paycycle = 'M')
AND amount > 0;
In Glue Catalog settings I have to give permissions for Glue:CreateTable and Glue:DeleteTable as well. If I remove them then my query fails, so for creating views why are these 2 permissions required? Can I somehow remove these 2 permissions and run the query using only these permissions:
"glue:GetDatabase", "glue:GetUserDefinedFunctions", "glue:GetTable", "glue:GetPartitions"
I am saying this because it can pose a security risk. I am the owner of Glue Account and giving someone else read only permission so can't give DeleteTable or Create Table permission.

You need to provide those permissions, since a view is behind the scenes just a regular Glue Table with special table properties:

Related

Filter SQL results with 1 to many relationship

There are two tables, an account table and a tenant table. An account has multiple tenants. I want to find a list of accounts.
For eg, An account XXX can have multiple tenants - sandbox, implementation, development etc.
I want to extract a list of accounts without an implementation tenant, using SQL.
I tried the something like this :
select a.accountname, t.tenanttype,
from account a
inner join tenant t
using (accountid)
WHERE
t.tenanttype NOT IN ('Implementation%');
I get all accounts with their tenant types, but it just filters out the Implementation tenant , even though it exists.
Eg. An account XXX has 4 tenants, Sandbox. Dev, Implementation and Preview.
My code returns the account XXX, but with just three values - Sandbox, Dev, and Preview.
I want to get a list of accounts that don't have the Implementation tenant AT ALL.
Based on your final statement it sounds like you need to use not exists, does the following work for you?
select a.accountname, t.tenanttype
from account a
join tenant t on a.accountid = t.accountid
where not exists (
select * from tenant t2
where t2.accountid = t.accountid and t2.tenanttype like 'Implementation%'
);

Notify user about not allowed results via ASPECT pfcg_auth?

In "classic" ABAP authority checks, you would sometimes loop over a result list. If for at least one item the check fails, you'd notify the user about this and show only the items he's entitled to.
My question is: How would you do this in CDS using the pfcg_auth aspect?
For example:
define role my_role {
grant select
on vbak
where ( vkorg ) = aspect pfcg_auth ( v_vbak_vko, vkorg, actvt = '03' );
}
How would you tell that the selection found say 50 sales orders but the user is only authorized to display 40 of them?
For CDS view select, there is a syntax that you can select bypassing the DCL WITH PRIVILEGED ACCESS
You can select count(*) for the data in the database WITH PRIVILEGED ACCESS. if the numbers do not equal, you can raise the message.

How to display different interactive grid queries, depending on user privileges

I use Master-Detail page. I want to display whole table only for user who is an administrator. For the user who isn't the administrator, I want to display sql query which will be restrict MD view.
I tried to create another one master detail on same page which is visible only for users without Administrator role. And first MD is not visible for them.(I used Server-side Condition) Is there exist some other way to display different query, depending on user role.
I hope I explained the problem clearly. Thanks in advance
Have a look at the package APEX_ACL, you can use the related views in your where clause.
Example:admins see all rows, other users only see the row for KING
SELECT *
FROM emp
WHERE
-- user is admin ?
( EXISTS ( SELECT 1
FROM apex_appl_acl_user_roles
WHERE application_id = :APP_ID AND
user_name = :APP_USER AND
role_static_id = 'ADMINISTRATOR'
) ) OR
-- user is no admin
( ename = 'KING' );

Can you tell me a better approach in designing a table for banned users than this?

I am designing a web application and I need to give the administrators and moderators the right to allow and deny other users access to the application. I am thinking of having a table with the following columns:
OperationType (Ban / Access regained).
BannedUser
User (admin/mod that gave access or banned another user)
EventDate
Reason (optional)
I can just have a table, storing all banned users, but I want to keep track of what is actually happening in the app and make sure that the administrators and moderators are not misbehaving as well.
So, If my table doesn't include an OperationType column, a list of all the banned users could be retrieved as simple as writing the following query:
select BannedUser from UserBan;
But if I leave the table with an OperationType column, as shown above, my simple select query could become something like this:
select o3.BannedUser
from
(
select o1.BannedUser, max(o1.EventDate) EventDate
from UserBan o1
group by o1.BannedUser
) o2, UserBan o3
where o3.EventDate = o2.EventDate and
o3.BannedUser = o2.BannedUser and
o3.OperationType = 1
Assume that OperationType = 1 is ban.
So, can someone give me a better solution for my case? :)

Difference between views and SELECT queries

If views are used to show selected columns to the user, and same can be done by using
SELECT col1, col2
FROM xyz
, what is the point of using views?
Using a view saves you copying and pasting your queries and adds code reusability, so you can change a single view instead of 10 queries in the different places of your code.
Different permissions can be granted on views and tables, so that you can show only a portion of data to a user
A view can be materialized, which means caching the results of the underlying query
As Quassnoi said, it's useful for granting permission to certain rows in a table.
For example, let's say a lecturer at a university needs access to information on their students. The lecturer shouldn't have access to the "students" table because they could look up or modify information for any student in the whole university. The database admin makes a view that only shows students from the lecturers classes, and gives the lecturer the appropriate permissions for the view. Now the lecturer has access to their own students' data but not the whole "students" table.
A view can be described as a virtual table, created from a SQL query stored in the database.
Therefore, the following are aspects to consider in using VIEWS
Performance: it can improve data access performance as queries involving several tables generate transactions already exchanged to generate the view.
Simplicity: most of the views I work with are data arrangements of columns from 4+ tables, a bunch of inner joins. Once the view is created, your application developers will have to deal with the SELECT statements using column in the same view, hence the term virtual table.
Security: or just called it access control. Most relational database management system allow properties in the view object that control the type of access. For instance, one can allow users to update a view but only the DBA can make modifications to the tables that compose the view.
A view can be more complicated than just showing certain columns. It is a stored query. Wikipedia has much more detail.
Views make SQL easier to write (and read).
You can also use views to control access permissions.
Views:
1. View will not store any data
2. Used for Security purpose
3. When the base table is dropped, then the view is no longer accessible
4. One can perform DML operations directly on the view
Materialized view:
1. Materialized view does not store data
2. It is used for better performance
3. When the base table is dropped, a materialized view is still accessible
4. One cannot perform DML operations on materialized view.
Reference: https://www.youtube.com/watch?v=8ySsyZlixuE
All answers above provide an excellent explanation for the difference between a view and a query.
The query in the question is simple to an extreme degree, and creating a view for it might be overkill.
However, most queries are more complex, for example:
;with Orders2016 as (
select Customers.CustomerID
, Customers.CompanyName
, TotalOrderAmount = sum(OD.Quantity * OD.UnitPrice)
from Customers
join Orders O on Customers.CustomerID = O.CustomerID
join OrderDetails OD on O.OrderID = OD.OrderID
where OrderDate >= '2016-01-01'
and OrderDate < '2017-01-01'
group by Customers.CustomerID, Customers.CompanyName
)
, CustomersGroups as
(
select CustomerID
, CompanyName
, TotalOrderAmount
, CustomerGroup =
(
case
when
TotalOrderAmount >= 0 and TotalOrderAmount < 1000
then 'low'
when TotalOrderAmount >= 1000 and TotalOrderAmount < 5000
then 'Medium'
when TotalOrderAmount >= 5000 and TotalOrderAmount < 10000
then 'High'
when TotalOrderAmount >= 10000 then 'VeryHigh'
end
)
from Orders2016
)
select CustomerGroup
, TotalInGroup = Count(*)
, PercentageInGroup = Count(*) * 1.0 / (select count(*) from CustomersGroups)
from CustomersGroups
group by CustomerGroup
order by TotalInGroup desc;
Imagine rewriting it each time you want to access data. (Or even searching through files, copy & paste). Poor time management.
Views also save us tons of time and are a way to be DRY.
Views are very useful for access permissions. There are multiple other advantages (as stated in those links provided above), but for me the main advantage is the reusability. as Quassnoi writes, you can have a single point where you can edit your query, instead of editing a list of methods.
Perfect!