Grid of counts by row and column in SQL - sql

I'm really struggling to create a simple grid which I can provide to non-technical team members so they can easily look up the number of people in our SQL customer database based on any two demographic characteristics.
For example, we frequently get questions like "how many people in the database are Male aged 36-45" or "how many people aged 18-25 on our database are emailable". With a lookup grid they will be able to answer these for themselves.
The actual data has lots of variables so the real grid is likely to be large but a simplified example of the source data table has 1 row per customer (~3m rows) with the following columns:
party_id -- Customer ID #
, gender -- 'Male'/'Female'/'Unknown gender'
, age -- '18-25'/'26-35'/'36-45'/'46-55'/'56-65'/'66-80'/'Unknown age'
, emailability -- 'Emailable'/'Not Emailable'
The results grid will have the same rows and columns with a count of customers in each cell (i.e. the count of people who satisfy both the row and the column criteria). In this example the rows and columns would be:
Male
Female
Unknown gender
18-25
26-35
36-45
46-55
56-65
66-80
Unknown age
Emailable
Not emailable
To look up the number of people who satisfy any two criteria, you would just need to find the intersection on the grid (in the first question above, this is the intersection of row:"Male" and column:"18-25" or row:"18-25" and column: "Male").
This didn't sound like it should be a difficult problem but I'm completely stumped. I thought it would be solved by Pivots but I couldn't figure out a way to cross-tabulate more than 2 variables. SQL is likely not the right tool for this job either but there's not many other tools available to me right now so if possible I'd like to find a SQL solution but let me know if you know of better options.
I was surprised not to find an existing solution here but I might not be using the right search terms so apologies if this has already been answered.
Thanks!
--EDIT--
As requested here's some sample data:
party_id |gender |age |Emailability |
---------|-------|--------|--------------|
1 |Male |18-25 |Not Emailable |
2 |Female |Unknown |Emailable |
3 |Unknown|36-45 |Emailable |
4 |Male |36-45 |Not Emailable |
5 |Male |56-65 |Emailable |
6 |Female |26-35 |Emailable |
7 |Male |18-25 |Emailable |
8 |Unknown|18-25 |Not Emailable |
9 |Male |66-80 |Emailable |
10 |Female |26-35 |Emailable |
Based on this example data the first few rows of the grid filled in would look like this (I've had to drop the emailability columns because of the page width but they would be included too):
Male Female Unknown Gender 18-25 26-35 36-45 46-55 56-65 66-80
Male 5 0 0 2 0 1 0 1 1
Female 0 3 0 0 2 0 0 0 0
Unknown gender 0 0 2 1 0 1 0 0 0
18-25
26-35
36-45
46-55
56-65
66-80
Unknown age
Emailable
Not emailable
To clarify as I think it caused some confusion, I'm not trying to create a view for other database users - I'm hoping to create an output that I can export to excel for non-technical people to use as a handout/cheat sheet. It may be that this grid can't be set up as planned using SQL alone but due to the size of the dataset and limited numbers of tools I am hoping that it could be done in SQL. Thanks again for any suggestions.

Related

In CloudWatch Insights, how do I build a histogram of an aggregation function (second level query)?

I'm not sure I'm asking this correctly which is probably why I can't find the solution. So I'll provide an example.
Suppose I have a log of employees hired by managers in a given time period. I can create a query that groups by manager and shows the number of employees hired
stats count() as numEmployees by managerId
| filter #message like /hired employee/
| sort numEmployees desc
Let's suppose that generates the following table
Mngr | numHires
Jack | 4
Judy | 3
May | 3
John | 2
Jake | 2
Mary | 1
Sam | 1
Alan | 1
I'd like to further refine my result so that I can produce another histogram of numHires and count like so
4 | 1
3 | 2
2 | 2
1 | 3
This table means there was 1 instance of 4 hires, 2 instances of 3 hires, 2 instances of 2 hires, and 3 instances of 1 hire.
Is there a way to do this?
ps - I know I can download the csv and do this in excel. However, there is a limit of 10000 results returned in cloudwatch
I needed to do the same type of aggregation and raise a support case with AWS to ask how this could be done. The response from the AWS team was that unfortunately at the moment it is not possible using Insights.
Insights does not have the capabilities for second level aggregations
currently.
So an alternate workaround is to use AWS Quick Sights or MS Excel to
plot the required graphs.
In my case Excel is not an option because my the resulting dataset for a day has millions of records. That being said in the end my solution was to sample over just a few minutes of data, export this to Excel, and generate a pivot table to aggregate the data. This allowed me to get a rough idea of my system.
I have not looked into AWS Quick Sights.
There may be other third-party solutions besides AWS Insights, such as Datadog, that provide more powerful log analysis functionality. I have not used Datadog personally so cannot vouch for it but have read good things about it.
References:
[1] https://docs.aws.amazon.com/quicksight/latest/user/histogram-charts.html

In field matching records retrieve using SQL

I have a table Chapters which contains three columns
id, QuestionId, QuestionText
Sample data:
Id QuestionId QuestionText
---------------------------------------------------
1 12 Biology Nuclear Lining
2 13 Chemistry Molecules
3 14 Molecular dividing
4 15 Biology Planting
4 15 Physics Elevator
Now I need to retrieve the data based on the QuestionText column which contains text matching with same field not fully matching.
My output needs to be like this:
Id QuestionId QuestionText
---------------------------------------------------
1 12 Biology Nuclear Lining
4 15 Biology Planting
2 13 Chemistry Molecules
3 14 Molecular dividing
Similar words are in each data. At least 40% words are same then I need to retrieve it. I am using SQL Server, can anyone help me please how to write the query, what method I need to use?

How to group rows vertically in PowerBuilder?

I have this sample rows of plate nos with bay nos:
Plate no | Bay no
------------------
AAA111 | 1
AAA222 | 1
AAA333 | 2
BBB111 | 3
BBB222 | 3
CCC111 | 1
Is there a way to make it look like this in a datawindow in powerbuilder?
1 | 2 | 3
------------------------
AAA111 | AAA333 | BBB111
AAA222 BBB222
CCC111
There isn't an simple answer, especially if you need cells to be update-able.
Variable Column Count Strategy
If the number of columns across the top is unknown at development time than you might get by with a "Crosstab" style datawindow but it would be a display only. If you need updates you'll need to do manual data manipulations & updates as each cell would probably represent one row.
Fixed Column Count Strategy
If the number of columns is known (fixed) you could flatten the data at the database and use a standard tabular (or grid) datawindow control but you'll still need to get creative if updates are needed.
If you use Oracle to obtain the data you can use the Pivot and Unpivot function to perform what you are looking for. Here is an example of how to do it:
http://www.oracle.com/technetwork/es/articles/sql/caracteristicas-database11g-2108415-esa.html

Getting sum of items for each unique tag in database (Access)

I have a database with a table of devices. Each device requires a number of tests each year. The database is populated using a spreadsheet which is updated monthly. What I would like to be able to do is sum the number of pass or fails for each device and then compare it to the total number of tests the device was supposed to have. Does anyone know how this can be done?
Example: Lets say Horn A requires 2 tests per year, and these are some of the rows for horn A.
Device | Pass | Fail
Horn A | 1 | 0 (Jan)
Horn A | 0 | 0 (Feb)
Horn A | 1 | 0 (March)
Horn B | 1 | 0
And so on
By summing the number of passes and fails with Horn A, one can see that it had two tests. I'm just not sure how to make this into a proper query.
If this isn't clear, let me know
Hmmm, if you want to get "2" from the above data, you can do:
select device, count(*)
from table
where pass > 0 or fail > 0
group by device;
This should work for you:
SELECT Device, Sum(Pass + Fail) AS "Total Tests"
FROM table
GROUP BY Device;
This will ensure you can handle more than 0 or 1 in either Pass or Fail. In the case that you have 2 passes and 1 fail for a total of 3 tests in that one month entry.

Pairwise testing: How to create the table?

Hello I have doubt regarding how to create the table for the pairwise testing.
For example if I have three parameter which can each attain two different values. How do I create a table of input with all possible combination then? Would it look something like this?
| 1 2 3
-----------
1 | 1 1 1
2 | 1 2 2
3 | 1 1 2
4 | 1 2 1
Does each parameter corresponds to each column?
However since I have 3 parameter, which each can take 2 different value. The number of test cases should be 2^3 isn't it?
There's a good article with links to some useful tools here:
http://blog.josephwilk.net/ruby/pairwise-testing-with-cucumber.html
For the parameters: each column is a parameter, and each row is a possible combination. Here is the table:
| 1 2 3
-----------
1 | 1 1 1
2 | 2 1 1
3 | 1 2 1
4 | 1 1 2
5 | 2 2 1
6 | 2 1 2
7 | 1 2 2
8 | 2 2 2
so 2^3=8 possible combinations as you can see :)
For the values: each column is a value, and each row is a possible combination:
| 1 2
--------
1 | 1 1
2 | 2 1
3 | 1 2
4 | 2 2
They are 2^2=4 possible combinations. Hope it helps.
1) Please note that pair-wise testing is not about scanning exhaustively all possible combination of values of all parameters. Firstly, such a scanning would give you an enormous amount of test cases that almost no existing system could be able to run all of them.
Secondly, pair-wise testing for a software system is based on the hope that the two parameters having the highest number of possible values are the culprit for the highest percentage of faults of that system.
This is of course only a hope and almost no rigorous scientific research has existed so far to prove that.
2) What I often see in the documentations discussing pair wise testing, like this is that the list of all possible values (aka the pair-wise test table) is not constructed in a well thought way. This creates confusions.
In your case, all the parameters have the same number of possible values (2 values), therefore you could choose any two parameters of those three to build the table. What you could pay attention is the ordering of the combination: you iterate first the top-right parameter, then the next parameter to the left, and so on, ...
Say if you have two parameters p1 and p2, p1 has two possible values apple and orange; and p2 has two possible values red and blue, then your pair-wise test table would be:
index| p1 p2
------------------
1 | apple red
2 | apple blue
3 | orange red
4 | orange blue