compare two columns in PostgreSQL show only highest value - sql

This is my table
I'm trying to find in which urban area having high girls to boys ratio.
Thank you for helping me in advance.
| urban | allgirls | allboys |
| :---- | :------: | :-----: |
| Ran | 100 | 120 |
| Ran | 110 | 105 |
| dhanr | 80 | 73 |
| dhanr | 140 | 80 |
| mohan | 180 | 73 |
| mohan | 25 | 26 |
This is the query I used, but I did not get the expected results
SELECT urban, Max(allboys) as high_girls,Max(allgirls) as high_boys
from table_urban group by urban
Expected results
| urban | allgirls | allboys |
| :---- | :------: | :-----: |
| dhar | 220 | 153 |

First of all your example expected result doesn't seems correct because the girls to boys ratio is highest in "mohan" and not in "dhanr" - If what you are really looking for is the highest ratio and not the highest number of girls.
You need to first group and find the sum and then find the ratio (divide one with other) and get the first one.
select foo.urban as urban, foo.girls/foo.boys as ratio from (
SELECT urban, SUM(allboys) as boys, SUM(allgirls) as girls
FROM table_urban
GROUP BY urban) as foo order by ratio desc limit 1

SELECT urban, SUM(allboys) boys, SUM(allgirls) girls
FROM table_urban
GROUP BY urban
ORDER BY boys / girls -- or backward, "girls / boys"
LIMIT 1

Related

How to transform data into a map using group by in Hive SQL?

I have data like below
|-----------|-------|-------|
| grade |lecture| count |
|-----------|-------|-------|
| freshman | eng1 | 3 |
|-----------|-------|-------|
| freshman | eng2 | 4 |
|-----------|-------|-------|
| freshman | eng3 | 5 |
|-----------|-------|-------|
| senior | eng2 | 4 |
|-----------|-------|-------|
| senior | eng3 | 4 |
|-----------|-------|-------|
...and I want to create a map with lecture as the key and count as a value.
How can I get an output like below?
|-----------|----------------------------|
| grade | lecture per count |
|-----------|----------------------------|
| freshman | {eng1:3, eng2:4, eng3:5} |
|-----------|----------------------------|
| senior | {eng2:4, eng3:4} |
|-----------|----------------------------|
If you can live with count being a string, you probably be able to use Hive str_to_map() function to get a desired map. That will require a couple of preliminary steps, to reformat column values in a way accepted by it. Something like this:
select
grade,
str_to_map(course_list,',',':') lecture_count_map
from (
select
grade,
concat_ws(',',
collect_list(concat_ws(':', lecture, cast(count as string)))
) course_list
from courses
group by grade
) T;
Output:
grade lecture_count_map
1 freshman {"eng1":"3","eng2":"4","eng3":"5"}
2 senior {"eng2":"4","eng3":"5"}
Otherwise, you're looking at writing your own UDAF or using one of the existing ones built by third-parties, at least until JIRA-4966 is resolved (although those chances are quite low after 7 years).

How to optimize nested innner hive query

I have a table with following stock data where we have couple of columns like date, ticker, open and close(stock prices).
To query this data, I want to know which stock has given the highest margin on particular date. So if I have 516 different stocks, my query should return 516 rows of ticker, date, open, close and a new column Margin(which will be max(close-open)).
| deep_stocks.date_ | deep_stocks.ticker | deep_stocks.open | deep_stocks.close |
+--------------------+---------------------+-------------------+--------------------+--+
| 20100721 | A | 27.68 | 27.58 |
| 20100722 | A | 27.95 | 28.72 |
| 20100723 | A | 28.56 | 29.3 |
| 20100726 | A | 29.22 | 29.64 |
| 20100727 | A | 29.73 | 28.87 |
| 20100728 | A | 28.79 | 28.78 |
| 20100729 | A | 28.97 | 28.15 |
| 20100730 | A | 27.78 | 27.93 |
| 20100802 | A | 28.35 | 28.82 |
| 20100803 | A | 28.7 | 27.84 |
I have written a query where my approach was:
Step 1 - Get the difference between Close and Open prices (Inner/Sub query)
Step 2 - Get the maximum of margin for every stock (used group by with max function)
Step 3 - Join the results with Main Table and get the data.
I'll put my query in solution or comments can someone please correct it as it is taking more time. Also I would like to know can we have any other alternative approach.
As already told about my approach please find below query:
SELECT ds.ticker, ds.date_, ds.close, ds.open, ds.Margin FROM
(SELECT ticker, date_, close, open, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks) ds
JOIN
(SELECT dsIn.ticker, max(dsIn.Margin) mxMargin FROM
(select ticker, case(close-open)>0 when true then round(close-open,2) else 0 end as Margin FROM DataStocks ) dsIn group by dsIn.ticker) dsEx
ON ds.ticker=dsEx.ticker AND ds.Margin=dsEx.mxMargin ORDER BY ds.Margin;
Do we have any other alternatives for this query or can it be possible to optimize it.

Count numbers in single row - SQL

is it possible to return count of values in single row?
For example this is test table and I want to count of daily_typing_pages
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 3 | Jack | 2007-04-06 | 100 |
| 4 | Jill | 2007-04-06 | 220 |
| 5 | Zara | 2007-06-06 | 300 |
| 5 | Zara | 2007-02-06 | 350 |
+------+------+------------+--------------------+
Result of this count should be : 1610 how ever if I simply count() AROUND it return:
SQL>SELECT COUNT(daily_typing_pages) FROM employee_tbl ;
+---------------------------+
| COUNT(daily_typing_pages) |
+---------------------------+
| 7 |
+---------------------------+
1 row in set (0.01 sec)
So it return number of rows instead of count single row.
Is there some way how to do things like I want without using external programming language which will count it for me?
Thanks
You want SUM instead of COUNT. COUNT merely counts the number of records, you want them summed.
You didn't mention your DBMS, but see for example, for sql server this
Did you mean you want to summarize alle numbers of daily_typing_pages ?
So you can use sum(daily_typing_pages):
SELECT SUM(daily_typing_pages) FROM employee_tbl

How to add column with the value of another dimension?

I appologize if the title does not make sense. I am trying to do something that is probably simple, but I have not been able to figure it out, and I'm not sure how to search for the answer. I have the following MDX query:
SELECT
event_count ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
which returns something like this:
| | event_count |
+---------------+-------------+
| P Davis | 123 |
| J Davis | 123 |
| A Brown | 120 |
| K Thompson | 119 |
| R White | 119 |
| M Wilson | 118 |
| D Harris | 118 |
| R Thompson | 116 |
| Z Williams | 115 |
| X Smith | 114 |
I need to include an additional column (gender). Gender is not a metric. It's just another dimension on the data. For instance, consider this query:
SELECT
gender.children ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
But this is not what I want! :(
| | female | male | unknown |
+--------------+--------+------+---------+
| P Davis | | | 123 |
| J Davis | | 123 | |
| A Brown | | 120 | |
| K Thompson | | 119 | |
| R White | 119 | | |
| M Wilson | | | 118 |
| D Harris | | | 118 |
| R Thompson | | | 116 |
| Z Williams | | | 115 |
| X Smith | | | 114 |
Nice try, but I just want three columns: name, event_count, and gender. How hard can it be?
Obviously this reflects lack of understanding about MDX on my part. Any pointers to quality introductory material would be appreciated.
It's important to understand that in MDX you are building sets of members on each axis, and not specifying column names like a tabular rowset. You are describing a 2-dimensional grid of results, not a linear rowset. If you imagine each dimension as a table, the member set is the set of unique values from a single column in that table.
When you choose a Measure as the member (as in your first example), it looks as if you're selecting from a table, so it's easy to misunderstand. When you choose a Dimension, you get many members, and a cross-join between the rows and columns (which is sparse in this case because the names and genders are 1-to-1).
So, you could crossjoin these two dimensions on a single axis, and then filter out the null cells:
SELECT
event_count ON 0,
TOPCOUNT(
NonEmptyCrossJoin(name.children, gender.children),
10,
event_count) ON 1
FROM
events
Which should give you results that have a single column (event_count) and 10 rows, where each row is composed of the tuple (name, gender).
I hope that sets you on the right path, and please feel free to ask you want me to clarify.
For general introductory material, I think the book "MDX Solutions" is a good place to start:
http://www.amazon.ca/MDX-Solutions-Microsoft-Analysis-Services/dp/0471748080/
For an online MDX introductory material, you can have a look to this gentle introduction that presents the main MDX concepts.

Problem with MySQL query for high scores leaderboard

I have a MySQL high scores table for a game that shows the daily high score for each of the past days of the year. Right now I am doing a PHP for-loop and making a separate query for each day, but the table is becoming too large to do that so I would like to condense it into one simple MySQL statement.
Here is my new query right now (date_submitted is a timestamp):
SELECT date(date_submitted) as subDate, name, score FROM highScores WHERE date_submitted > "2009-07-16" GROUP BY subDate ORDER BY subDate DESC, score DESC LIMIT 10;
output:
+------------+------------+--------+
| subDate | name | score |
+------------+------------+--------+
| 2010-07-18 | krissy | 959976 |
| 2010-07-10 | claire | 260261 |
| 2010-07-05 | krissy | 771416 |
| 2010-06-19 | krissy | 698031 |
| 2010-06-18 | otli | 264898 |
| 2010-06-15 | robbie | 82303 |
| 2010-06-01 | dad | 480469 |
| 2010-05-29 | vicente | 124149 |
| 2010-05-27 | dad | 564007 |
| 2010-05-26 | caleb | 502623 |
+------------+------------+--------+
My problem is that when it grouped by subDate, it took the highest score for the earliest timestamp of that day, as you can see in the next query:
SELECT name, score, date_submitted FROM highScores WHERE date(date_submitted)='2010-06-15' GROUP BY name ORDER BY score DESC;
output:
+--------+--------+---------------------+
| name | score | date_submitted |
+--------+--------+---------------------+
| john | 304095 | 2010-06-15 22:58:02 |
| april | 247126 | 2010-06-15 21:25:31 |
| orli | 166021 | 2010-06-15 21:25:31 |
| robbie | 82303 | 2010-06-15 11:38:39 |
+--------+--------+---------------------+
As you can see, poor john should have been the leader for 2010-06-15. Can anyone help? Hopefully it is something real simple I am overlooking. I tried using max(score) before the FROM part in the 1st query and it gave me the correct score but didn't carry over the name.
Thank you for any help.
SELECT userName, userScore, subDate FROM (
SELECT
userName,
userScrore,
DATE(submitDate) as subDate,
#rn := CASE WHEN #subDate = DATE(submitDate)
THEN #rn + 1
ELSE 1
END AS rn,
#subDate := DATE(submitDate)
FROM (SELECT #subDate := NULL) vars, highScores
ORDER BY submitDate, userScore DESC
) deriv
WHERE rn=1;
See also the answer to another 'highest record per something'-question
Add a
ORDER BY userScore DESC
at the end of the second query.