Excel: filtering a time series graph - vba

I have data that looks like the following:
ID | Location | Attendees | StartDate | EndDate
---------------------------------------------
Event1 | Bldg 1 | 10 | June 1 | June 5
Event2 | Bldg 2 | 15 | June 3 | June 6
Event3 | Bldg 1 | 5 | June 3 | June 10
I'd like to create a time series graph showing, for every given date, how many events were active on that date (i.e. started but haven't ended yet). For example, on June 1, there was 1 active event, and on June 4, there were 4 active events.
This should be simple enough to do by creating a new range where my first column consists of consecutive dates, and the second column consists of formulas like the following (I hardcoded June 8 in this example):
=COUNTIFS(Events[StartDate],"<=6/8/2009", Events[EndDate],">6/8/2009")
However, the challenge is that I'd like to be able to dynamically filter the time series graph based on various criteria. For example, I'd like to be able to quickly switch between seeing the above time series only for events in Bldg 1; or for Events with more than 10 attendees. I have at least 10 different criteria I'd like to be able to filter on.
What is the best way to do this? Does Excel have a built-in way to do this, or should I write the filtering code in VBA?

Apart from that my answer is not programming related: That's prime example for using a pivot table. Use this to show data consolidated for e.g. each day. Then you can play around with filtering as you like.
Your question is exactly what pivot tables are made for.

Related

Find current data set using two SQL tables storing separately historical insertions and deletions

Problem
I need to do daily syncs of our latest internal data to an external audit database that does not offer an update interface. In order to update some records, I need to first generate and send in a deletion file to remove those records, and then follow by an insertion file with the same but updated records in it.
An important detail is that all of the records in deletion files must match the external records verbatim, in order to be deleted.
Proposed approach
Currently I use two separate SQL tables to version control what I have inserted/deleted.
Let's say that right now the inserted_records table looks like this:
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
Accompanied by a separate and empty deleted_records table with identical columns.
Now, if I want to
change the customer_name from Alice to Dave on line id 9
change the start_year for Bob from 2015 to 2020 on line id 10
Two new lines in inserted_records would be generated, line 12 and 13, in turn creating a new insertion file 7.
id | file_version | contract_id | customer_name | start_year
9 | 6 | 1 | Alice | 2015
10 | 6 | 2 | Bob | 2015
11 | 6 | 3 | Charlie | 2015
12 | 7 | 1 | Dave | 2015
13 | 7 | 2 | Bob | 2020
Then their original column values in line 9 and 10 are then copied onto the previously empty deleted_records, in turn creating a new deletion file 1.
id | file_version | contract_id | customer_name | start_year
1 | 1 | 1 | Alice | 2015
2 | 1 | 2 | Bob | 2015
Now, if I were to send in the deletion file 1 first followed by the insertion file 7, I would get the result that I wanted.
Question
How can I query the current set of records, considering all insertions and deletions that have occurred? Assuming all records in deleted_records always have matches in inserted_records and if multiple, we always delete records with smaller file version numbers first.
I have tried by first writing one to query the inserted_records for the latest records grouped by contract_id.
select top 1 with ties *
from insertion_record
order by row_number() over (partition by contract_id order by file_version desc)
This would give me line 11, 12 and 13, which is what I wanted in this particular example. But if we also wanted to delete the record line 11 with Charlie, then my query wouldn't work anymore as it doesn't take deleted_records into account, and I have no idea how to do it in SQL.
Furthermore, my nut tells me that this approach isn't solid as there are two separate and moving parts, perhaps there is a better approach to solve this?
How can I query the current set of records
I don't understand your question. Every SQL query is against the current set of records, if by that you mean the data currently in the database.
I do see a couple of problems.
Unless the table you're deleting from has a key defined, even an exact match on every column risks deleting more than one row.
You're performing an ad hoc update with UPDATE's transaction guarantee. I suppose the table you're updating is otherwise idle, and as a practical matter you don't have to worry about someone else (or you) re-inserting the deleted rows before your inserts arrive. But it's problem waiting to happen.
If what you're trying to do is produce the set of rows that will be the result of a series of inserts and deletions, you haven't provided enough information to say how that could be done, or even if it's possible. There would have to be some way to uniquely identify rows, so that deletions and insertions can be associated. (They don't match on all columns, after all.) And you'd need some indication of order of operation, because it matters whether INSERT follows or precedes DELETE.

Google Data Studio: Average Number of Sessions based on selected country values

Let's say that for the dimension country I have 4 values and for each of the 4 I have the respective number of Sessions. E.g.
+---------+----------+
| country | Sessions |
+---------+----------+
| Italy | 10 |
| France | 12 |
| Germany | 14 |
| Spain | 16 |
+---------+----------+
I want to compute and output in a scorecard the average number of Sessions, only for those specific countries. So, in the example, the output should be 13.
I tried with the following calculated field but it doesn't work:
Sessions * AVG(CASE
WHEN REGEXP_MATCH(country, '^Italy|France|Germany|Spain.*') THEN 1
ELSE 0 END)
Create a filter based on the country dimension using the Matching RegEx operator.
Then apply this to a scorecard with the metric sessions. In the Data tab on the right hand side, you should be able to click on a little pencil icon for the metric, and choose the aggregation method as average instead of sum.
You may not have this option if you're using the GA connector. In this case, there should be an Average Session metric in the data source.
One way it can be achieved is by using a Filter Control and a Calculated Field:
1) Filter Control
Add the component with the Dimension set to Country and then add a Default Selection (a comma separated list of the required countries):
Italy, France, Germany, Spain
2) Calculated Field (Scorecard)
Sessions / COUNT_DISTINCT(Country)
Google Data Studio Report and a GIF to elaborate:

How to group rows vertically in PowerBuilder?

I have this sample rows of plate nos with bay nos:
Plate no | Bay no
------------------
AAA111 | 1
AAA222 | 1
AAA333 | 2
BBB111 | 3
BBB222 | 3
CCC111 | 1
Is there a way to make it look like this in a datawindow in powerbuilder?
1 | 2 | 3
------------------------
AAA111 | AAA333 | BBB111
AAA222 BBB222
CCC111
There isn't an simple answer, especially if you need cells to be update-able.
Variable Column Count Strategy
If the number of columns across the top is unknown at development time than you might get by with a "Crosstab" style datawindow but it would be a display only. If you need updates you'll need to do manual data manipulations & updates as each cell would probably represent one row.
Fixed Column Count Strategy
If the number of columns is known (fixed) you could flatten the data at the database and use a standard tabular (or grid) datawindow control but you'll still need to get creative if updates are needed.
If you use Oracle to obtain the data you can use the Pivot and Unpivot function to perform what you are looking for. Here is an example of how to do it:
http://www.oracle.com/technetwork/es/articles/sql/caracteristicas-database11g-2108415-esa.html

Use excel and vba pivot tables to summarize data before and after dates

I have an excel document that is an dumped output of all service tickets(with statuses, assigned to, submitted by...etc) from our ticket tracking software. There is one row per ticket.
I am trying to make a flexible report generator in vba that will allow me to take in the ticket dump and output a report which will have a copy of the data in one sheet, a summary in another sheet, and a line graph in another sheet.
I feel like a pivot table is the perfect approach for this, the only problem is in the summary.
The data from the ticket dump looks something like this:
| Submitted_On | Priority | Title | Status | Closed_On |
10/10/2016 1 Ticket 1 New
10/11/2016 1 Ticket 2 Fixed 11/10/2016
10/12/2016 3 Ticket 3 Rejected 11/9/2016
10/15/2016 1 Ticket 4 In Review
The problem is the way I want the summary to look. Basically the summary should show all tickets that were opened and closed at the first of every month at exactly midnight within the past three years. In other words, if this report was a time machine, at that exact time X would be open and Y would be closed. Furthermore, The summary table should break that down by priority.
The hard part is that these simulated report dates (first of every month within the last 3 years) are extraneous values and are not within each data row.
So the report would be like this:
| Open | Closed |
| Reporting Date | P1 | P2 | P3 | P1 | P2 | P3 |
1-Oct-2016 6 10 0 3 2 0
1-Nov-2016 4 10 0 5 2 0
1-Dec-2016 6 3 0 5 9 0
Basically the formula for the Open section would be something like:
priority=1 AND Submitted_On<Reporting Date AND (Closed_On>Reporting Date OR Closed_On="")
and the formula for the closed section would be something like:
priority=1 AND Submitted_On<Reporting Date AND Closed_On<Reporting Date
It would be needed where I can filter the data so that its only coming assigned to x or only with these statuses...etc. which is why I don't think a regular sheet with formulas would work.
I thought pivot tables would work but Reporting Date isn't a field.
Do you have any advice as to what I can do to make this report work and be very flexible as far as filtering goes?
Thank you!
P.S. I am using excel 2010, so I do not have access to queries

SQL: What is a value?

The Question
One thing that I am confused about is the technical definition of possibly the most basic component of a database: a single value.
Some Examples
I understand and follow (at a minimum) the first three normal forms of database normalization - or so I think. That said, with the introduction of RANGE in PostgreSQL 9.2 I started thinking about what makes a single value.
From the docs:
Range types are useful because they represent many element values in a single range value
So, what are you? Several values, or a single value... nothingness... 42?
Why does this matter?
Because is speaks directly to the Second Normal Form:
Create separate tables for sets of values that apply to multiple records.
Relate these tables with a foreign key.
#1 Ranges
For example, in Postgres 9.1 I had some tables structured like this:
"SomeSchema"."StatusType"
"StatusTypeID" | "StatusType"
--------------------|----------------
1 | Start
2 | Stop
"SomeSchema"."Statuses"
"StatusID" | "Identifier" | "StatusType" | "Value" | "Timestamp"
---------------|----------------|----------------|---------|---------------------
1 | 1 | 1 | 0 | 2000-01-01 00:00:00
2 | 1 | 2 | 5 | 2000-01-02 12:00:00
3 | 2 | 1 | 1 | 2000-01-01 00:00:00
4 | 3 | 1 | 2 | 2000-01-01 00:00:00
5 | 2 | 2 | 7 | 2000-01-01 18:30:00
6 | 1 | 2 | 3 | 2000-01-02 12:00:00
This enabled me to keep an historical record of how things were configured at any given point in time.
This structure takes the position that the data in the "Value" column were all separate values.
Now, in Postgres 9.2 if I do the same thing with a RANGE value it would look like this:
"SomeSchema"."Statuses"
"StatusID" | "Identifier" | "Value" | "Timestamp"
---------------|----------------|-------------|---------------------
1 | 1 | (0, NULL) | 2000-01-01 00:00:00
2 | 1 | (0, 5) | 2000-01-02 12:00:00
3 | 2 | (1, NULL) | 2000-01-01 00:00:00
4 | 3 | (2, NULL) | 2000-01-01 00:00:00
5 | 2 | (1, 7) | 2000-01-01 18:30:00
6 | 1 | (0, 3) | 2000-01-02 12:00:00
Again, this structure would enable me to keep an historical record of how things were configured, but I would be storing the same value several times in separate places. It makes updating (technically inserting a new record) more tricky because I have to make sure the data rolls over from the original record.
#2 Arrays
Arrays have been around for a long time, and while they can be abused, I tend to use them for things like color codes. For example, my project stores information and at times needs to know how to display it. I could create three columns to store red, green, and blue values; but that just seems silly. When would I ever create a foreign key (or even just filter) based on one of the given color codes.
When I created the field it was from the perspective that I needed to store a color in a neutral format so that I could feed anything that accepts a color value. I made the column an array and filled it with the appropriate codes to make the color I want.
#3 PostGIS: Geometry & Geography
When storing a polygon in PostGIS, it stores all the points that make the boundary in a single field. If one point were to change and I wanted to keep an historical record, I would have to store all of the points that have not changed twice in order to store the new polygon along with the old.
So, what is a value? and... if RANGE, ARRAY, and GEOGRAPHY are values do they really break the second normal form?
The fact that some operation can derive new values from X that appear to be components of X's value doesn't mean X itself isn't "single valued". Thus "range" values and "geography" values should be single values as far as the DBMSs type system is concerned. I don't know enough about Postgresql's implementation to know whether "arrays" can be considered as single values in themselves. SQL DBMSs like Postgresql are not truly relational DBMSs and SQL supports various structures that certainly aren't proper relation variables, values or types (pointers, nulls and other exotica).
This is a difficult and sometimes controversial topic however. If you haven't read it then I recommend the book Databases, Types, and the Relational Model - The Third Manifesto by Date and Darwen. It addresses exactly the kind of questions you are asking about.
I don't like your description of 2NF but it's not very relevant here.