Creating indvidual columns with SQL - sql

I'm new to sql and very novice with this package. SQL is running in Watson DashDB. For the past few hours I've been struggling to find the correct code.
The code is trying to accomplish a few things.
Create a new view called SENTIMENT
Join two tables together
Have the new table show 4 columns with A. USER_SCREEN_NAME, B. Total Tweets, C. Postive SENTIMENT Count D. Negative SENTIMENT Count
The code below only creates 2 columns, I am needing 4. SPACEX_SENTIMENTS.SENTIMENT_POLARITY contain both Negative and Positive.
CREATE VIEW SENTIMENT
AS
(SELECT SPACEX_TWEETS.USER_SCREEN_NAME, SPACEX_SENTIMENTS.SENTIMENT_POLARITY
FROM dash015214.SPACEX_TWEETS
LEFT JOIN dash015214.SPACEX_SENTIMENTS ON
SPACEX_TWEETS.MESSAGE_ID=SPACEX_SENTIMENTS.MESSAGE_ID);
SELECT USER_SCREEN_NAME, COUNT(1) tweetsCount
FROM dash015214.SENTIMENT
GROUP BY USER_SCREEN_NAME
HAVING COUNT (1)>1
ORDER BY COUNT (USER_SCREEN_NAME) DESC
FETCH FIRST 20 ROWS ONLY;

It looks like your view SENTIMENT has one row per tweet, with two columns: the user name and then a polarity column. From your comment, I assume the polarity column can have the values of either 'POSITIVE' or 'NEGATIVE'. I think you can get what you want with this query:
SELECT
USER_SCREEN_NAME,
COUNT(1) AS "Total Tweets",
COUNT(CASE SENTIMENT_POLARITY WHEN 'POSITIVE' THEN 1 ELSE NULL END) AS "Positive Tweets",
COUNT(CASE SENTIMENT_POLARITY WHEN 'NEGATIVE' THEN 1 ELSE NULL END) AS "Negative Tweets"
FROM
SENTIMENT
GROUP BY USER_SCREEN_NAME
HAVING COUNT(1) > 1
ORDER BY COUNT(1) DESC;
This will give you everyone with at least 2 tweets (is that what you want?), and tell you the number of tweets per user and how many were positive and how many were negative. Replace " with whatever your SQL uses to indicate column names.

Related

SQL multiple constrained counts in query

I am trying to get with 1 query multiple count results where each one is a subset of the previous one.
So my table would be called Recipe and has these columns:
recipe_num(Primary_key) decimal,
recipe_added date,
is_featured bit,
liked decimal
And what I want is to make a query that will return the amount of likes grouped by day for any particular month with
total recipes as total_recipes,
total recipes that were featured as featured_recipes,
total number of recipes that were featured and had more than 100 likes liked_recipes
So as you can see each they are all counts with each being a subset of the previous one.
Ideally I don't want to run separate select count's where that query the whole table but rather get from the previous one.
I am not very good at using count with Where, Having, etc... and not exactly sure how to do it, so far I have the following which I managed via digging around here.
select
recipe_added,
count(*) total_recipes,
count(case is_featured when 1 then 1 else null end) total_featured_recipes
from
RECIPES
group by
recipe_added
I am not exactly sure why I have to use case inside the count but I wasn't able to get it to work using WHERE, would like to know if this is possible as well.
Thanks
With a CASE expression inside COUNT() you are doing conditional aggregation and this is exactly what you need for this requirement:
select recipe_added,
count(*) total_recipes,
count(case when is_featured = 1 then 1 end) total_featured_recipes,
count(case when is_featured = 1 and liked > 100 then 1 end) liked_recipes
from Recipes
group by recipe_added
There is no need for ELSE null because the default behavior of a CASE expression is to return null when no other branch returns a value.
If you want results for a specific month, say October 2020, you can add a WHERE clause before the GROUP BY:
where format(recipe_added, 'yyyyMM') = '202010'
This will work for SQL Server.
If you are using a different database then you can use a similar approach.

PostgreSQL - Handling empty query result

I am quite new to SQL and I am currently working on some survey results with PostgreSQL. I need to calculate percentages of each option from 5-point scale for all survey questions. I have a table with respondentid, questionid, question response value. Demographic info needed for filtering datacut is retrieved from another table. Then query is passed to result table. All queries texts for specific datacuts are generated by VBA script.
It works OK in general, however there's one problematic case - when there are no respondents for specific cut and I receive empty table as query result. If respondent count is greater than 0 but lower than calculation threshold (5 respondents) I am getting table full of NULLs which is OK. For 0 respondents I get 0 rows as result and nothing is passed to result table and it causes some displacement in final table. I am able to track such cuts as I am also calculating respondent number for overall datacut and storing it in another table. But is there anything I can do at this point - generate somehow table full of NULLs which could be inserted into result table when needed?
Thanks in advance and sorry for clumsiness in code.
WITH ItemScores AS (
SELECT
rsp.questionid,
CASE WHEN SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) < 5 THEN
NULL
ELSE
ROUND(SUM(CASE WHEN rsp.respvalue = 5 THEN 1 ELSE 0 END)/CAST(SUM(CASE
WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS DECIMAL),2)
END AS 5spercentage,
... and so on for frequencies of 1s,2s,3s and 4s
SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS QuestionTotalAnswers
FROM (
some filtering applied here [...]
) AS rsp
GROUP BY rsp.questionid
ORDER BY rsp.questionid;
INSERT INTO results_items SELECT * from ItemScores;
If you want to ensure that the questionid column won't be empty, then you must call a cte with its plain values and then left join with the table that actually you are using to make the aggregations, calcs etc. So it will generate for sure the first list and then join its values.
The example of its concept would be something like:
with calcs as (
select questionid, sum(respvalue) as sum_per_question
from rsp
group by questionid)
select distinct rsp.questionid, calcs.sum_per_question
from rsp
left join calcs on rsp.questionid = calcs.questionid

Transposing and summing the top 5 results in Teradata SQL Assistant

I have a query that I converted from Access and is currently working correctly in Teradata SQL Assistant. The data pulled is just a standard table full of all of the data I need.
What I am wondering is: Can something be added to this query that will essentially sum up all of the Exposure values and then only show the top 5 Divisions by greatest to smallest sum (of those Top 5). Also, transposing the data so that my Topics are the left most column.
Here is the working code, details omitted.
SELECT
A.AS_OF_DT
, B.DIVISION
, B.CLASS
, Sum(A.BALANCE/1000000) AS "Bal in MMs"
, Sum(A.EXPOSURE/1000000) AS "Exp in MMs"
, Sum(CASE WHEN A.STATUS = 'NACC' THEN (B.BALANCE/1000000) ELSE 0 END) AS "NPL Bal as MMs"
FROM DB.TABLE1 A LEFT JOIN DB.TABLE2 B ON A.NAICS = B.NAICS_CD
WHERE A.AS_OF_DT= '2017-03-31'
GROUP BY
A.AS_OF_DT,
B.DIVISION,
B.CLASS
ORDER BY SUM (A.EXPOSURE/1000000) DESC
Essentially I want the columns to be the following:
DIVISION|DATE|
Below DIVISION would only be the Top 5 DIVISIONS summarized by EXPOSURE (under DATE)
I can try and clarify if needed. Just let me know.
Thanks!
End result is to have a datapaste I can throw into Excel without the manual work of transposing the data in Excel along with writing formulas to rummage through the 1000's of results of the base query to find summarize the individual Divisions and then picking the top 5 each month.
Thanks!
Shill
To get the 5 top for each division, you can use QUALIFY.
Add this to the end of you query:
QUALIFY ROW_NUMBER() over (PARTITION BY AS_OF_DATE,DIVISION order by (SUM (A.EXPOSURE/1000000))
For your other questions, SQL Assistant isn't much of a presentation tool, it won't do what you are asking for.
If your query already work,
try replacing:
SELECT
By:
SELECT top 10
(line 1)

Want to display (and count) rows that have zero out - SQL/SAS

Visual
I have tried the following SQL query already in SAS (if there is a better way to use SAS please let me know):
Select *
From [Database]
Having COUNT(Month)>1 and SUM(Amount)=0;
This then only shows User_ID 003 and User_ID 004. The problem is I need it to show User_ID 001's first and third row since they offset (zero-out) in the same month along with 003 and 004. Basically, I am looking for duplicate values in certain columns with amounts that offset in a dataset that has 20,000+ rows (the Visual link above is not related to the dataset I am working on).
I have also tried separating the data so only positive numbers are in the Amount column and negative numbers are in a second Amount2 column and ran the query to find -Amount in Amount2. This didn't work either.
Thanks in advance.
This will the user_ids where a duplicate case exists in a month
select a.user_id , a.month , amount
from my_table as a
inner join my_table_a as b
on a.user_id = b.user_id
and a.month = b.month
and a.amount = -1 * b.amount

Identifying parent records for many transactions

This is related to a question I asked previously for which lag/lead was suggested. However the data I'm working with are more complex than I first thought so I need a more robust solution. This screen shot shows an issue I need to tackle:
Within a single serial number, a shipment event defines a new reference window. So records 2,3,4 relate to 1. Record 6 relates to 5 and so forth. I need to mark the records for which the BillToId doesn't match the parent shipment.
I'm trying to understand if I could even use the LAG function to compare records 2,3,4 back to 1 when the number of post-shipment events varies (duplicates are allowed). I was thinking I might be better off with another fact table that identifies the parent rowid along each record first?
So then my question becomes how do I efficiently identify which shipment each row belongs to? Am I forced to run a subquery for each record? I'm working right now with over 2 million total rows. I would later make this query part of the ETL process so it would be processing smaller chunks of data.
Here is an approach that uses the cumulative sum functionality in SQL Server. The idea is to assign each "ship" activity a value of "1" and "0" for everything else. Then do a cumulative sum to identify each group that should have the same billtoid. After that, the ship information can be assigned to all records in the same group:
select rowid, dateid, billtoid, activitytypeid, serialnumber
from (select t.*,
max(case when activitytypeid = 'Ship' then billtoid end) over
(partition by serialnumber, cumships) as ship_billtoid
from (select t.*,
sum(case when activitytypeid = 'Ship' then 1 else 0 end) over
(partition by serialnumber order by rowid) as cumships
from t
) t
) t
where billtoid <> ship_billtoid;