How to Group By 2 fields in SQL Query? - sql

I have Two tables in Postgresql and I'm trying to get the number of times a hashtag is repeated by place.
I've made this query:
SELECT tweets_with_location.user_location,
tweets_with_location.my_new_id,
all_hashtags_with_location.regexp_split_to_table
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id;
Which returns the Location, the tweet id and the hashtag:
USER_LOCATION | MY_NEW_ID | HASHTAG
New York, NY | 33 | Happy
New York, NY | 40 | BigApple
Bronx, NY | 12 | Happy
Bronx, NY | 45 | Happy
Queens, NY | 23 | Trump
Queens, NY | 20 | Trump
Then, I've made another SQL Query but it seems it doesn't sums up the number of times a hashtag was displayed by place, the Count value is always 1:
SELECT tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table,
COUNT(DISTINCT all_hashtags_with_location.regexp_split_to_table) AS CountOf
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id
GROUP BY tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table
ORDER BY CountOf DESC;
I need is this result:
USER_LOCATION - HASHTAG - COUNT
New York, NY | Happy | 1
Bronx, NY | Happy | 2
Queens, NY | Trump | 2
New York, NY | Happy | 1
How do I do this? What is wrong with my SQL Query?

Or just remove the DISTINCT qualifier in the COUNT() function.

You were really close, you are counting the wrong field:
SELECT tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table,
COUNT(DISTINCT tweets_with_location.my_new_id) AS CountOf
FROM tweets_with_location, all_hashtags_with_location
WHERE tweets_with_location.my_new_id = all_hashtags_with_location.my_new_id
GROUP BY tweets_with_location.user_location,
all_hashtags_with_location.regexp_split_to_table
ORDER BY CountOf DESC;

Related

How to run a group by in AWS Cloud Watch Logs Insights

I have CWL Entries as below. Showing entries in SQL Type for clarity
Name City
1 Chicago
2 Wuhan
3 Chicago
4 Wuhan
5 Los Angeles
Now I want to get below output
City Count
Chicago 2
Wuhan 2
Los Angeles 1
Is there a way I can run GROUP BY in CWL Insights.
Pseudo Query
Select Count(*), City From {TableName} GROUP BY City
You can use the aggregation function count with the by statement: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html
Here is a full example for your case, assuming the logs contain the entries exactly as you have in the example (regex for city name is very simple, you may want to refine that).
fields #timestamp, #message
| parse #message /^(?<number>\d+)\s+(?<city>[a-zA-Z\s]+)$/
| filter ispresent(city)
| stats count(*) by city
Result:
---------------------------
| city | count(*) |
|--------------|----------|
| Chicago | 2 |
| Wuhan | 2 |
| Los Angeles | 1 |
---------------------------

Postgresql - conditional statement to refer to other columns

I currently have a dataset that looks like this:
Personid | Question | Response
1 | Name | Daniel
1 | Gender | Male
1 | Address | New York, NY
2 | Name | Susan
2 | Gender | Female
2 | Address | Boston, MA
3 | Name | Leonard
3 | Gender | Male
3 | Address | New York, NY
I also have another table that looks like this (just the person id):
Personid
1
1
1
2
2
2
3
3
3
I want to write a query to return something like this:
Personid | Name | Gender | Address
1 |Daniel | Male | New York, NY
2 | Susan | Female | Boston, MA
3 |Leonard | Male | New York, NY
I think it's a mix of some sort of "transpose" (not sure if it's even available in SQL) and conditional statement on just the gender, but I'm having issues with getting the end result. Could anyone offer any advice?
Easiest way is just to link to the question table three times with different aliases.
select
p.person_id,
n.response as name,
g.response as gender,
a.response as address
from
person p
join question n
on n.personid = p.personid and n.question = 'Name'
join question g
on g.personid = p.personid and g.question = 'Gender'
join question a
on a.personid = p.personid and a.question = 'Address'
I'm assuming that your person table only has 3 rows not the 9 you've listed. if there are really 9, then just do a select distinct.
This is a textbook example of a pivot table. In postgresql it is implemented by the CROSSTAB function, which is available from the TABLEFUNC additional extension module.
If your need is really as simple as the provided MCVE, multiple JOIN’s might be enough, but in more complicated situations CROSSTAB is really the way to go, and worth the pain of installing an additional module, if it is not installed by default by your distro. In short, if your initial table is called dataset, and personid is an INT:
-- To execute as superuser. Be sure you have installed the extension
-- package. Execute once to install, it will stay in your database
-- ever since.
CREATE EXTENSION TABLEFUNC;
-- As normal user
SELECT * FROM CROSSTAB($$
SELECT personid, question, response FROM dataset
$$) AS ct(person INT, name TEXT, gender TEXT, address TEXT);
person | name | gender | address
--------+----------+---------+---------------
1 | Daniel | Male | New York, NY
2 | Susan | Female | Boston, MA
3 | Leonard | Male | New York, NY
(3 rows)
You can add WHERE clauses, JOIN with other tables, etc., according to your needs.

Access SQL - Return unique combinations in field

I have a table with data stored vertically, I have shown a simplified example below which has a record for each city a customer has lived in:
| CUSTOMER | CITY |
------------------------------
| John | London |
| John | Manchester |
| Sarah | Cardiff |
| Sarah | Edinburgh |
| Sarah | Liverpool |
| Craig | Manchester |
| Craig | London |
I am trying to come up with an SQL query that will return all unique combinations of cities so in the example above, John and Craig have both lived in London and Manchester but Sarah has lived in different cities (Cardiff, Edinburgh and Liverpool) so I would like an output as below (which can handle any amount of cities)
| CITY1 | CITY2 | CITY3 |
--------------------------------------------
| London | Manchester | |
| Cardiff | Edinburgh | Liverpool |
I have tried using a crosstab query to view the data horizontally like this:
TRANSFORM Max(City)
SELECT Customer
FROM tblCities
GROUP BY Customer
PIVOT City
but it is just returning a field for all cities for every customer. Does anyone know if this is possible using SQL?
p.s Ideally it will ignore the order of cities
This was a nice challenge! The query below gets the groupings per customer. It doesn't discard the duplicates where multiple customers have lived in the same combination of cities ... I'll let you or others find a way to handle that.
TRANSFORM Min(OrderedList.City) AS MinOfCity
SELECT OrderedList.Customer
FROM (SELECT CustomerCities.Customer, CustomerCities.City, Count(1) AS CityNo
FROM CustomerCities INNER JOIN CustomerCities AS CustomerCities_1 ON CustomerCities.Customer = CustomerCities_1.Customer
WHERE (((CustomerCities.City)>=[CustomerCities_1].[City]))
GROUP BY CustomerCities.Customer, CustomerCities.City) OrderedList
GROUP BY OrderedList.Customer
PIVOT "CITY" & [CityNo];
Is this what you want?
select distinct c1.city, c2.city
from tblCities as c1 inner join
tblCities as c2
on c1.customer = c2.customer and c1.city < c2.city;
This returns all pairs of cities that appear for any single customer.
Here is a query which might work assuming each customer is only associated with two cities:
SELECT DISTINCT t.city_1, t.city_2
FROM
(
SELECT MIN(CITY) AS city_1, MAX(CITY) AS city_2
FROM tblCities
GROUP BY CUSTOMER
) t

Counting fields in a group by, and generating a greport with ms access

So I have this table
City | Status | District | Revenue
------------------------------------------
Oakland | Executed | North | $9.50
Los Angeles| Cancelled| South | $0.05
Oakland | Executed | North | $0.99
Oakland | Cancelled| North | $98.40
Sacramento | Executed | North | $43.50
Sacramento | Cancelled| North | $5.40
Los Angeles| Cancelled| South | $5.30
So I need this report that reads like this:
North District | Executed | Cancelled | Revenue
--------------------------------------------------------
Oakland | 2 | 1 | Sum of revenue
Sacramento | 1 | 1 | Sum of revenue
--------------------------------------------------------
South District | Executed | Cancelled | Revenue
--------------------------------------------------------
Los Angeles | 0 | 2 | Sum of revenue
But I'm stuck on how to create a query that groups and counts instances of specific values inside that group.
I mean I know syntax of group statements and count statements, but the counting a specific number of instances of a row inside a group seems pretty different than a regular count.
Can anyone guide me in the right direction? I'm not asking anyone to do my work (this isn't even a full sample of what I have to do) but if someone can help me with a statement that groups and counts specific rows in the group, with a SQL statement or an Access function, that would be awesome. From there I'd be able to figure out everything else.
Hey I ran across an answer actually. I just had to use Sum(IIF()) and it worked correctly.
SELECT
Test.City,
=Sum(IIf(Status="Cancelled",1,0))
FROM Test
Group BY Test.City

SQL Efficient way to locate an example of each combination

Is there a more efficient way of querying a table (or collection of table) for all possible combinations of a few columns, I'm currently running group by and then max, but this doesn't seem to be the most efficient way.
SQL Fiddle for the below example: http://sqlfiddle.com/#!2/25f8b/3
Example Table
ID | Name | Age | City | Color
--------------------------------
1 | Dave | 10 | London | Red
2 | Dave | 11 | London | Purple
3 | Dave | 10 | Paris | Orange
4 | Jim | 10 | London | Red
5 | Jim | 10 | London | Green
6 | Jim | 11 | London | Lazer
etc... (around 500,000 rows)
Currently doing:
SELECT max(ID), Name, Age, City, Color
from People
group by Name, Age, City
To produce:
MAX(ID) NAME AGE CITY COLOR
1 Dave 10 London Red
3 Dave 10 Paris Orange
2 Dave 11 London Purple
5 Jim 10 London Red
6 Jim 11 London Lazer
Note 4 is missing as it's a exact duplicate of 5
3 Is included as it has a different city to 1, even with same age/name
However currently on this massive database it takes around a ten minutes to return the results (note it's actually a join of a few tables)
Is there a more efficient way to return the same results? I was imagining a mass collection of SELECT * WHERE name = %, age = % and city = % LIMIT 1 or something similar
To get the different combinations there is as reserved word DISTINCT :
SELECT DISTINCT Name, Age, City
FROM People
This gives the same result as :
SELECT Name, Age, City
FROM People
GROUP BY Name, Age, City
However it is limited :
If you add a column (like Color), it is included in the combinations analysis
You can't use aggregate functions, like MAX
I don't know if it's any better performance wise