How can I find the users that queried a view in Redshift? - sql

Hello everyone and thank you in advanced!
I'm having trouble to find a query to get a list of users that have queried some specifics views.
A example to clarify, if I have a couple of views
user_activity_last_6_months &
user_compliance_last_month
I need to know who is querying those 2 views and if posible other statistics. This could be a desired output.
+--------+-----------------------------+----------+----------------------------+----------------------------+----------------+-------------------+----------------------+------------------+
| userid | view_name | queryid | starttime | endtime | query_cpu_time | query_blocks_read | query_execution_time | return_row_count |
+--------+-----------------------------+----------+----------------------------+----------------------------+----------------+-------------------+----------------------+------------------+
| 293 | user_activity_last_6_months | 88723456 | 2018-05-08 13:08:08.727686 | 2018-05-08 13:08:12.423532 | 4 | 1023 | 6 | 435 |
| 345 | user_compliance_last_month | 99347882 | 2018-05-10 00:00:03.049967 | 2018-05-10 00:00:09.177362 | 6 | 345 | 8 | 214 |
| 345 | user_activity_last_6_months | 99347883 | 2018-05-10 12:27:36.637483 | 2018-05-10 12:27:44.502705 | 8 | 14 | 9 | 13 |
| 293 | user_compliance_last_month | 99347884 | 2018-05-10 12:31:00.433556 | 2018-05-10 12:31:30.090183 | 30 | 67 | 35 | 7654 |
+--------+-----------------------------+----------+----------------------------+----------------------------+----------------+-------------------+----------------------+------------------+
I have developed a query to get this info but for tables in the database using system tables and views, but I can't find any clue to get the same results for views.
As I've said, the first 3 columns are mandatory and the others will be nice to have. Plus, any further information is welcome!!
Thank you all!!

If you need that level of auditing for table and view access then I recommend you start by enabling Database Audit Logging for your Redshift cluster. This will generate a number of logs files in S3.
The "User Activity Log" contains the text for all queries run on the cluster, it can then either be loaded back into Redshift or added as a Spectrum table so that the query text can be parsed for table and view names.

Related

SQL or Pandas: Join/Pivot information from two tables

I have three relational postgres tables (timescale hypertables) and need to get my data into a CSV file, but I am struggling to get it to the format I want. I am using django as frameworks, but I need to solve this with raw SQL.
Imagine I have 2 tables: drinks and consumption_data.
The drinks table looks like this:
name | fieldx | fieldy
---------+--------+------
test-0 | |
test-1 | |
test-2 | |
The consumption_data table looks like this:
time | drink_id | consumption
------------------------+-------------+------------------------------
2018-12-15 00:00:00+00 | 2 | 123
2018-12-15 00:01:00+00 | 2 | 122
2018-12-15 00:02:00+00 | 2 | 125
My target table should join these two tables and give me all consumption data with the drink names back.
time | test-0 | test-1 | test-2
------------------------+-------------+---------+-------
2018-12-15 00:00:00+00 | 123 | 123 | 22
2018-12-15 00:01:00+00 | 334 | 122 | 32
2018-12-15 00:02:00+00 | 204 | 125 | 24
I do have all the drink-ids and all the names, but those are hundreds or thousands.
I tried this by first querying the consumption data for a single drink and renaming the column: SELECT time, drink_id, "consumption" AS test-0 FROM heatflowweb_timeseriestestperformance WHERE drink_id = 1;
Result:
time | test-0 |
------------------------+-------------+
2018-12-15 00:00:00+00 | 123 |
2018-12-15 00:01:00+00 | 334 |
2018-12-15 00:02:00+00 | 204 |
But now, I would have to add hundreds of columns and I am not sure how to do this. With UNION? But I don't want to write thousands of union statements...
Maybe there is an easier way to achieve what I want? I am not an SQL expert, so what I need could be super easy to achieve or also impossible... Thanks in advance for any help, really appreciated.

How to manage relationships between a main table and a variable number of secondary tables in Postgresql

I am trying to create a postgresql database to store the performance specifications of wind turbines and their characteristics.
The way I have structures this in my head is the following:
A main table with a unique id for each turbine model as well as basic information about them (rotor size, max power, height, manufacturer, model id, design date, etc.)
example structure of the "main" table holding all of the main turbine characteristics
turbine_model
rotor_size
height
max_power
etc.
model_x1
200
120
15
etc.
model_b7
250
145
18
etc.
A lookup table for each turbine model storing how much each produces for a given wind speed, with one column for wind speeds and another row for power output. There will be as many of these tables as there are rows in the main table.
example table "model_x1":
wind_speed
power_output
1
0.5
2
1.5
3
2.0
4
2.7
5
3.2
6
3.9
7
4.9
8
7.0
9
10.0
However, I am struggling to find a way to implement this as I cannot find a way to build relationships between each row of the "main" table and the lookup tables. I am starting to think this approach is not suited for a relational database.
How would you design a database to solve this problem?
A relational database is perfect for this, but you will want to learn a little bit about normalization to design the layout of the tables.
Basically, you'll want to add a 3rd column to your poweroutput reference table so that each model is just more rows (grow long, not wide).
Here is an example of what I mean, but I even took this to a further extreme where you might want to have a reference for other metrics in addition to windspeed (rpm in this case) so you can see what I mean.
PowerOutput Reference Table
+----------+--------+------------+-------------+
| model_id | metric | metric_val | poweroutput |
+----------+--------+------------+-------------+
| model_x1 | wind | 1 | 0.5 |
| model_x1 | wind | 2 | 1.5 |
| model_x1 | wind | 3 | 3 |
| ... | ... | ... | ... |
| model_x1 | rpm | 1250 | 1.5 |
| model_x1 | rpm | 1350 | 2.5 |
| model_x1 | rpm | 1450 | 3.5 |
| ... | ... | ... | ... |
| model_bg | wind | 1 | 0.7 |
| model_bg | wind | 2 | 0.9 |
| model_bg | wind | 3 | 1.2 |
| ... | ... | ... | ... |
| model_bg | rpm | 1250 | 1 |
| model_bg | rpm | 1350 | 1.5 |
| model_bg | rpm | 1450 | 2 |
+----------+--------+------------+-------------+

Merging some columns from two postgres tables into a new table based on row value

Hello PostgresSQL experts (and maybe this is also a task for Perl's DBI since I also happen to be working with it, but...) I might also have some terminologies misused here so bear with me.
I have a set of 32 tables, every one exactly as the other. The first column of every table always contains a date, while the second column contains values (integers) that can change once every 24 hours, some samples get back-dated. In many cases, the tables may not contain data for a particular date, ever. So here's an example of two such tables:
date_list | sum date_list | sum
---------------------- --------------------------
2020-03-12 | 4 2020-03-09 | 1
2020-03-14 | 5 2020-03-11 | 3
| 2020-03-12 | 5
| 2020-03-13 | 9
| 2020-03-14 | 12
The idea is to merge the separate tables into one, sort of like a grid, but with the samples placed in the correct row in its own column and ensuring that the date column (always the first column) is not missing any dates, looking like this:
date_list | sum1 | sum2 | sum3 .... | sum32
---------------------------------------------------------
2020-03-08 | | |
2020-03-09 | | 1 |
2020-03-10 | | | 5
2020-03-11 | | 3 | 25
2020-03-12 | 4 | 5 | 35
2020-03-13 | | 9 | 37
2020-03-14 | 5 | 12 | 40
And so on, so 33 columns by 2020-01-01 to date.
Now, I have tried doing a FULL OUTER JOIN and it succeeds. It's the subsequent attempts that get me trouble, creating a long, cascading table with the values in the wrong place or accidentally clobbering data. So I know this works if I use a table of one column with a date sequence and joining the first data table, just as a test of my theory using baby steps:
SELECT date_table.date_list, sums_1.sum FROM date_table FULL OUTER JOIN sums_1 ON date_table.date_list = sums_1.date_list
2020-03-07 | 1
2020-03-08 |
2020-03-09 |
2020-03-10 | 2
2020-03-11 |
2020-03-12 | 4
Encouraged, I thought I'd get a little more ambitious with my testing, but that places some rows out of sequence to the bottom of the table and I'm not sure that I'm losing data or not, this time trying USING as an alternative:
SELECT * FROM sums_1 FULL OUTER JOIN sums_2 USING (date_list);
Result:
fecha_sintomas | sum | sum
----------------+-------+-------
2020-03-09 | | 1
2020-03-11 | | 3
2020-03-12 | 4 | 5
2020-03-13 | | 9
2020-03-14 | 5 | 12
2020-03-15 | 6 | 15
2020-03-16 | 8 | 20
: : :
2020-10-29 | 10053 | 22403
2020-10-30 | 10066 | 22407
2020-10-31 | 10074 | 22416
2020-11-01 | 10076 | 22432
2020-11-02 | 10077 | 22434
2020-03-07 | 1 |
2020-03-10 | 2 |
(240 rows)
I think I'm getting close. In any case, where do I get to what I want, which is my grid of data described above? Maybe this is an iterative process that could benefit from using DBI?
Thanks,
You can full join like so:
select date_list, s1.sum as sum1, s2.sum as sum2, s3.sum as sum3
from sums_1 s1
full join sums_2 s2 using (date_list)
full join sums_3 s3 using (date_list)
order by date_list;
The using syntax makes unqualified column date_list unambiguous in the select and order by clauses. Then, we need to enumerate the sum columns, provided aliases for each of them.

How to select all columns of a bigquery table

I have the follow bigquery table:
+---------------------+-----------+-------------------------+-----------------+
| links.href | links.rel | dados.dataHora | dados.sequencia |
+---------------------+-----------+-------------------------+-----------------+
| https://www.url.com | self | 2017-03-16 16:27:10 UTC | 2 |
| | | 2017-03-16 16:35:34 UTC | 1 |
| | | 2017-03-16 19:50:32 UTC | 3 |
+---------------------+-----------+-------------------------+-----------------+
and I want select all rows. So, I try the follow query:
SELECT * FROM [my_project:a_import.my_table] LIMIT 100
But, I have a bad (and sad) error:
Error: Cannot output multiple independently repeated fields at the same time. Found links_rel and dados_dataHora
Please, can anybody help me?

How to assign event counts to relative date values in SQL?

I want to line up multiple series so that all milestone dates are set to month zero, allowing me to measure the before-and-after effect of the milestone. I'm hoping to be able to do this using SQL server.
You can see an approximation of what I'm starting with at this data.stackexchange.com query. This sample query returns a table that basically looks like this:
+------------+-------------+---------+---------+---------+---------+---------+
| UserID | BadgeDate | 2014-01 | 2014-02 | 2014-03 | 2014-04 | 2014-05 |
+------------+-------------+---------+---------+---------+---------+---------+
| 7 | 2014-01-02 | 232 | 22 | 19 | 77 | 11 |
+------------+-------------+---------+---------+---------+---------+---------+
| 89 | 2014-04-02 | 345 | 45 | 564 | 13 | 122 |
+------------+-------------+---------+---------+---------+---------+---------+
| 678 | 2014-03-11 | 55 | 14 | 17 | 222 | 109 |
+------------+-------------+---------+---------+---------+---------+---------+
| 897 | 2014-03-07 | 234 | 56 | 201 | 19 | 55 |
+------------+-------------+---------+---------+---------+---------+---------+
| 789 | 2014-02-22 | 331 | 33 | 67 | 108 | 111 |
+------------+-------------+---------+---------+---------+---------+---------+
| 989 | 2014-01-09 | 12 | 89 | 97 | 125 | 323 |
+------------+-------------+---------+---------+---------+---------+---------+
This is not what I'm ultimately looking for. Values in month columns are counts of answers per month. What I want is a table with counts under relative month numbers as defined by BadgeDate (with BadgeDate month set to month 0 for each user, earlier months set to negative relative month #s, and later months set to positive relative month #s).
Is this possible in SQL? Or is there a way to do it in Excel with the above table?
After generating this table I plan on averaging relative month totals to plot a line graph that will hopefully show a noticeable inflection point at relative month zero. If there's no apparent bend, I can probably assume the milestone has a negligible effect on the Y-axis metric. (I'm not even quite sure what this kind of chart is called. I think Google might have been more helpful if I knew the proper terms for what I'm talking about.)
Any ideas?
This is precisely what the aggregate functions and case when ... then ... else ... end construct are for:
select
UserID
,BadgeDate
,sum(case when AnswerDate = '2014-01' then 1 else 0 end) as '2014-01'
-- etc.
group by
userid
,BadgeDate
The PIVOT clause is also available in some flavours and versions of SQL, but is less flexible in general so the traditional mechanism is worth understanding.
Likewise, the PIVOT TABLE construct in EXCEL can produce the same report, but there is value in maximally aggregating the data on the server in bandwidth competitive environments.