SQL Efficient way to locate an example of each combination - sql

Is there a more efficient way of querying a table (or collection of table) for all possible combinations of a few columns, I'm currently running group by and then max, but this doesn't seem to be the most efficient way.
SQL Fiddle for the below example: http://sqlfiddle.com/#!2/25f8b/3
Example Table
ID | Name | Age | City | Color
--------------------------------
1 | Dave | 10 | London | Red
2 | Dave | 11 | London | Purple
3 | Dave | 10 | Paris | Orange
4 | Jim | 10 | London | Red
5 | Jim | 10 | London | Green
6 | Jim | 11 | London | Lazer
etc... (around 500,000 rows)
Currently doing:
SELECT max(ID), Name, Age, City, Color
from People
group by Name, Age, City
To produce:
MAX(ID) NAME AGE CITY COLOR
1 Dave 10 London Red
3 Dave 10 Paris Orange
2 Dave 11 London Purple
5 Jim 10 London Red
6 Jim 11 London Lazer
Note 4 is missing as it's a exact duplicate of 5
3 Is included as it has a different city to 1, even with same age/name
However currently on this massive database it takes around a ten minutes to return the results (note it's actually a join of a few tables)
Is there a more efficient way to return the same results? I was imagining a mass collection of SELECT * WHERE name = %, age = % and city = % LIMIT 1 or something similar

To get the different combinations there is as reserved word DISTINCT :
SELECT DISTINCT Name, Age, City
FROM People
This gives the same result as :
SELECT Name, Age, City
FROM People
GROUP BY Name, Age, City
However it is limited :
If you add a column (like Color), it is included in the combinations analysis
You can't use aggregate functions, like MAX
I don't know if it's any better performance wise

Related

How can I sort a table by two columns in sisense?

I use Sisense Version: 20.21.6.10054 on Windows.
I need to sort a table widget in sisense by two columns, first by name, and second by number of behavior that person demonstrates.
The result should look like this:
id first_name last_name behavior_NO behavior_link
1 Ben Smith 1 behavior_1
1 Ben Smith 2 behavior_2
1 Ben Smith 3 behavior_3
2 Sam Johns 1 behavior_1
2 Sam Johns 2 behavior_2
3 Martha Star 1 behavior_1
3 Martha Star 2 behavior_2
3 Martha Star 3 behavior_3
3 Martha Star 4 behavior_4
Now, when I sort by Last_name the behavior_No is not sorted in correct order, but it looks like this:
id first_name last_name behavior_NO behavior_link
1 Ben Smith 1 behavior_1
1 Ben Smith 3 behavior_3
1 Ben Smith 2 behavior_2
2 Sam Johns 2 behavior_2
2 Sam Johns 1 behavior_1
3 Martha Star 4 behavior_4
3 Martha Star 2 behavior_2
3 Martha Star 1 behavior_1
3 Martha Star 3 behavior_3
Sisense does not allow to sort by two columns in a table.
I tried to pivot the table but the problem is that there is a column with hyperlinks in it, and when making a pivot hyperlinks display like a text (<a href="https://https://stackoverflow.com/ ) but not like a link.
Can anyone advise on how to solve this, either to sort the table by two columns or to insert a hyperlink in a pivot?
Thanks in advance.
Maybe you already find a better way that the following but yesterday I had a requested to do a rank but also, ordering three columns. First I needed order by the Target then by Rank and then by Sales so in the pivot table can look like this:
Sales_Person | Target | Sales | Rank
Joe | 100% | 12 | 1
Chris | 100% | 12 | 1
Maria | 98% | 11 | 2
Peter | 97% | 10 | 3
So because Sisense in the front end does not allow to sort two or more columns there is a built-in function called "ORDERING".
In the following link you will find the function under "Other Functions"
Function References Sisense
The only disadvantage is that at the time you implement this function it will create an additional column for ordering so at the end I obtained the following results:
Sales_Person | Target | Sales | Rank | Ordering
Joe | 100% | 12 | 1 | 0
Chris | 100% | 12 | 1 | 1
Maria | 98% | 11 | 2 | 2
Peter | 97% | 10 | 3 | 3
Also, keep in mind that all the different columns should be dimensions.
By the way, the version I have is Sisense L2022.4.0.222

How to run a group by in AWS Cloud Watch Logs Insights

I have CWL Entries as below. Showing entries in SQL Type for clarity
Name City
1 Chicago
2 Wuhan
3 Chicago
4 Wuhan
5 Los Angeles
Now I want to get below output
City Count
Chicago 2
Wuhan 2
Los Angeles 1
Is there a way I can run GROUP BY in CWL Insights.
Pseudo Query
Select Count(*), City From {TableName} GROUP BY City
You can use the aggregation function count with the by statement: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html
Here is a full example for your case, assuming the logs contain the entries exactly as you have in the example (regex for city name is very simple, you may want to refine that).
fields #timestamp, #message
| parse #message /^(?<number>\d+)\s+(?<city>[a-zA-Z\s]+)$/
| filter ispresent(city)
| stats count(*) by city
Result:
---------------------------
| city | count(*) |
|--------------|----------|
| Chicago | 2 |
| Wuhan | 2 |
| Los Angeles | 1 |
---------------------------

Recursive SQL that gets the first instance of a value up a hierarchy

I have to do this in SQL.
I have a table called 'locations'. It contains a list of locations ranging from houses, to streets, to cities all the way up to continents.
locationId | name | desiredValue
1 | Wimbledon |
2 | Peckham |
3 | London |
4 | UK |
5 | France | 123
6 | Europe | 456
7 | Australia |
8 | Paris |
I have a second table called 'links' which contains the link of locations, and their relation
Location1 | Location2 | Linktype
3 | 1 | 5
3 | 2 | 5
4 | 3 | 5
6 | 4 | 5
5 | 8 | 5
linktype 5 indicates that location2 is situated 'in' location1. In the example above, locationId 1 (wimbledon) is located 'in' locationId 3 (london). LocationId 3 (london) is located 'in' locationId 4 (Europe) and so on.
The linktype just describes this 'in' relationship - the link table contains other relations as well which are not pertinent to this question, I just mention it in case it needs to be in a where clause.
For a given location, I want to get the first instance in its location hierarchy that has a 'desiredValue'
For example:
if I was interested in Peckham, I'd like to see that Peckham has no value, that London has no value, that UK has no value but that Europe does (456).
If I was interested in London, I'd see that it has no value, nor does the UK, but that Europe does (456)
If I was interested in Europe, I'd see that it has a value (456)
If I was interested in Paris, I'd see that it has no value, but France does (123)
I know I should probably be using recursive CTEs for this, but I'm stumped. Any help would be greatfuly received!

Postgresql - conditional statement to refer to other columns

I currently have a dataset that looks like this:
Personid | Question | Response
1 | Name | Daniel
1 | Gender | Male
1 | Address | New York, NY
2 | Name | Susan
2 | Gender | Female
2 | Address | Boston, MA
3 | Name | Leonard
3 | Gender | Male
3 | Address | New York, NY
I also have another table that looks like this (just the person id):
Personid
1
1
1
2
2
2
3
3
3
I want to write a query to return something like this:
Personid | Name | Gender | Address
1 |Daniel | Male | New York, NY
2 | Susan | Female | Boston, MA
3 |Leonard | Male | New York, NY
I think it's a mix of some sort of "transpose" (not sure if it's even available in SQL) and conditional statement on just the gender, but I'm having issues with getting the end result. Could anyone offer any advice?
Easiest way is just to link to the question table three times with different aliases.
select
p.person_id,
n.response as name,
g.response as gender,
a.response as address
from
person p
join question n
on n.personid = p.personid and n.question = 'Name'
join question g
on g.personid = p.personid and g.question = 'Gender'
join question a
on a.personid = p.personid and a.question = 'Address'
I'm assuming that your person table only has 3 rows not the 9 you've listed. if there are really 9, then just do a select distinct.
This is a textbook example of a pivot table. In postgresql it is implemented by the CROSSTAB function, which is available from the TABLEFUNC additional extension module.
If your need is really as simple as the provided MCVE, multiple JOIN’s might be enough, but in more complicated situations CROSSTAB is really the way to go, and worth the pain of installing an additional module, if it is not installed by default by your distro. In short, if your initial table is called dataset, and personid is an INT:
-- To execute as superuser. Be sure you have installed the extension
-- package. Execute once to install, it will stay in your database
-- ever since.
CREATE EXTENSION TABLEFUNC;
-- As normal user
SELECT * FROM CROSSTAB($$
SELECT personid, question, response FROM dataset
$$) AS ct(person INT, name TEXT, gender TEXT, address TEXT);
person | name | gender | address
--------+----------+---------+---------------
1 | Daniel | Male | New York, NY
2 | Susan | Female | Boston, MA
3 | Leonard | Male | New York, NY
(3 rows)
You can add WHERE clauses, JOIN with other tables, etc., according to your needs.

SQL select / join performace (PHP, PDO)

What is the most efficient way to get data from two tables set up in the following way:
Table 1:
ID(PK) | Name | Age
--------------------------
1 | Jim | 44
2 | Jane | 35
3 | John | 22
Table 2
Name(PK) | Pet(PK)
-----------------
Jim | Cat
Jim | Dog
Jane | Fish
There is a constraint on "Name" with the FK in Table 2
Results
I want the age and all the pets for a specific person.
Name | Age | Pet
---------------------
Jim | 44 | Cat
Jim | 44 | Dog
As I see it these are my options:
1) Left join table 2 on Name and end up with redundant data in my resulting array for Name and Age (as above).
2) Use a function that turns the pets into a comma separated list.
3) Use 2 separate selects.
My question is relating to performance of the 3 options above. I don't need SQL (specifically, unless you want to suggest another method).
Thanks!
select
tb01.name, tb01.age, tb02.pet
from
table01 tb01
left join table02 tb02 on tb02.name = tb01.name