VERTICA Database :- how to get distinct count of "first names" where first name and last name are stored in single column? - sql

I'm new to Vertica DB and was facing a problem.
It is mostly like SQL but I have a Customer table
Customer Table
NAME | AGE | SEX
JOHN KENY |26 |M
JOHN CENA |32 |M
JOHN MCCAIN |35 |M
PETER PAN |33 |M
SELENA GOMEZ |24 |F
Now i would like an output of a query to run on vertica DB to Fetch me DISTINCT customer first name i.e
NAME
JOHN
PETER
SELENA
I'm Trying the SPLIT_PART() function in Vertica but I am not able to execute the query correctly
SELECT DISTINCT NAME FROM
(SELECT SPLIT_PART(NAME,' ',1) from Customer );
gives
ERROR SYNTAX error at or near "Select"
I also tried
SELECT SPLIT_PART(SELECT DISTINCT NAME FROM Customer,' ',1);
resulting in
ERROR SYNTAX error at or near "Select"
but
SELECT SPLIT_PART('JOHN KENY',' ',1) ;
outputs
JOHN

The following query should do the job :
select distinct SPLIT_PART(NAME,' ',1) from Customer
However, note that this is fragile. If this is a production environment (and not a simple exercise), I bet you'll end up with names containing spaces that will break your query.

Related

In proc sql when using SELECT * and GROUP BY, the result is not collapsed

When using the asterisk in combination with sum and group, the duplicates are not removed as I expect (and as it works in for example mysql):
col1 | country
-----------------
5 | sweden
20 | sweden
30 | denmark
select *, sum(col1) as s from table
group by country
the data returned is:
col1 | country | s
--------------------
5 | sweden | 25
20 | sweden | 25
30 | denmark | 30
instead of what I expected:
col1 | country | s
------------------------
5 | sweden | 25
30 | denmark | 30
If I don't use asterisk (*), the data returned is as I expect it to be.
SELECT country, sum(col1) as s from table
You are correct, SAS does not collapse WHEN you have variables in the statement that are not in the GROUP BY statement.
There will be a note to that effect in the log, about your data being merged.
If you want just the variables, you'll have to list them unfortunately, but since you have to list them in GROUP BY it's not extra work per se.
Different SQL implementations handle things differently, this is one way that SAS is different. It's handy when you do want to merge a summary stat back with the main data set though.
If you don't want this behaviour add the NOREMERGE option to your PROC SQL - but it throws an error, it still doesn't work the way you want.
See the documentation for the reference
Don't use SELECT *, ever. It's bad practice, risky, unsustainable... Read about it.
What flavor of SQL?
Your first query shouldn't work. You're basically saying...
select col1
, country
, sum(col1) as s
from table
group by country
...which will return an error:
Column 'table.col1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
SELECT country, sum(col1) as s from table
...also should not work:
Column 'table.country' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Given your expected output, I suspect what you are looking for is...
select min(col1) as col1
, country
, sum(col1) as s
from table
group by country

How to use the distinct and count function to get distinct users and number of messages sent

There is this table I'm currently trying to make a query on. the data looks like as follows:
--------------------------------
| Message_Time | User | Message|
--------------------------------
|y-m-d H:M:S |User1 | msg-body
|y-m-d H:M:S |User1 | msg-body
|y-m-d H:M:S |User1 | msg-body
|y-m-d H:M:S |User2 | msg-body
|y-m-d H:M:S |User2 | msg-body
I'm trying to select the users and count the amount of messages sent by each individual user
I've tried to select from this db and do a distinct on the User column and count on the Message column but i keep getting an error. I'm sure I'm not approaching it the correct way.
I tried the following query:
SELECT DISTINCT(USER) AS [User_ID], COUNT(Message) AS [Messages_Sent]
FROM [dbo].[Table]
This is my desired output:
--------------------------
| User_ID | Messages_Sent|
--------------------------
|User1 | 3 |
|User2 | 2 |
But unfortunately this is the error that occurs:
Operand data type ntext is invalid for count operator.
Any help would be great, thanks.
Use GROUP BY and COUNT(*):
SELECT USER AS [User_ID], COUNT(*) AS [Messages_Sent]
FROM [dbo].[Table]
GROUP BY USER;
Presumably, Message is never NULL, so this does the same thing.
By the way, replace text with nvarchar(max) or varchar(max). text is deprecated. As the documentation states:
IMPORTANT! ntext, text, and image data types will be removed in a future version of SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Fix your data type, and then your query will work fine:
ALTER TABLE dbo.[Table] ALTER COLUMN Message nvarchar(MAX);
As mentioned in the comments ntext has been deprecated for years, it's past time to stop using it and use the correct one.
use group by for aggreagated
SELECT USER AS [User_ID], COUNT(*) AS [Messages_Sent]
FROM [dbo].[Table]
GROUP BY USER
this return not null rows
the aggreation function always return distinct result
SELECT USER AS [User_ID], COUNT(message) AS [Messages_Sent]
FROM [dbo].[Table]
GROUP BY USER
this return the count where column value is not null

Find missing entries in a SQL table conditional on criteria

I have modest simple SQL experience (using MS SQL server 2012 here) but this evades me. I wish to output distinct names from a table (previously successfully created from a join) which have some required entries missing, but conditional on the existence of another similar entry. For anyone who has location 90, I want to check they also have locations 10 and 20...
For example, consider this table:
Name |Number |Location
--------|-------|--------
Alice |136218 |90
Alice |136218 |10
Alice |136218 |20
Alice |136218 |40
Bob |121478 |10
Bob |121478 |90
Chris |147835 |20
Chris |147835 |90
Don |138396 |20
Don |138396 |10
Emma |136412 |10
Emma |136412 |20
Emma |136412 |90
Fred |158647 |90
Gay |154221 |90
Gay |154221 |10
Gay |154221 |30
So formally, I would like to obtain the Names (and Numbers) of those entries in the table who:
Have an entry at location 90
AND do not have all the other required location entries - in this case also 10 and 20.
So in the example above
Alice and Emma are not output by this query, they have entries for 90, 10 & 20 (all present and correct, we ignore the location 40 entry).
Don is not output by this query, he does not have an entry for location 90.
Bob and Gay are output by this query, they are both missing location 20 (we ignore Gay's location 30 entry).
Chris is output by this query, he is missing location 10.
Fred is output by this query, he is missing locations 10 & 20.
The desired query output is therefore something like:
Name |Number |Location
--------|-------|--------
Bob |121478 |20
Chris |147835 |10
Fred |158647 |10
Fred |158647 |20
Gay |154221 |20
I've tried a few approaches with left/right joins where B.Key is null, and select from ... except but so far I can't quite get the logical approach correct. In the original table there are hundreds of thousands of entries and only a few tens of valid missing matches. Unfortunately I can't use anything that counts entries as the query has to be locations specific and there are other valid table entries at other locations outside of the desired ones.
I feel that the correct way to do this is something like a left outer join but as the starting table is the output of another join does this require declaring an intermediate table and then outer joining the intermediate table with its self? Note there is no requirement to fill in any gaps or enter items into the table.
Any advice would be very much appreciated.
===Answered and used code pasted here===
--STEP 0: Create a CTE of all valid actual data in the ranges that we want
WITH ValidSplits AS
(
SELECT DISTINCT C.StartNo, S.ChipNo, S.TimingPointId
FROM Splits AS S INNER JOIN Competitors AS C
ON S.ChipNo = C.ChipNo
AND (
S.TimingPointId IN (SELECT TimingPointId FROM #TimingPointCheck)
OR
S.TimingPointId = #TimingPointMasterCheck
)
),
--STEP 1: Create a CTE of the actual data that is specific to the precondition of passing #TimingPointMasterCheck
MasterSplits AS
(
SELECT DISTINCT StartNo, ChipNo, TimingPointId
FROM ValidSplits
WHERE TimingPointId = #TimingPointMasterCheck
)
--STEP 2: Create table of the other data we wish to see, i.e. a representation of the StartNo, ChipNo and TimingPointId of the finishers at the locations in #TimingPointCheck
--The key part here is the CROSS JOIN which makes a copy of every Start/ChipNo for every TimingPointId
SELECT StartNo, ChipNo, Missing.TimingPointId
FROM MasterSplits
CROSS JOIN (SELECT * FROM #TimingPointCheck) AS Missing(TimingPointId)
EXCEPT
SELECT StartNo, ChipNo, TimingPointId FROM ValidSplits
ORDER BY StartNo
Welcome to Stack Overflow.
What you need is a bit challenging, since you want to see data that do not exist.
Thus, we first must create all possible rows, then substract the ones that exist
select ppl_with_90.Name,ppl_with_90.Number,search_if_miss.Location
from
(
select distinct Name,Number
from yourtable t
where Location=90
)ppl_with_90 -- All Name/Numbers that have the 90
cross join (values (10),(20)) as search_if_miss(Location) -- For all the previous, combine them with both 10 and 20
except -- remove the lines already existing
select *
from yourtable
where Location in (10,20)
You need to generate the sets consisting of name, number, 10_and_20 for all rows where location = 90. You can then use your favorite method (left join + null, not exists, not in) to filter the rows that do not exist:
WITH name_number_location AS (
SELECT t.Name, t.Number, v.Location
FROM #yourdata AS t
CROSS JOIN (VALUES (10), (20)) AS v(Location)
WHERE t.Location = 90
)
SELECT *
FROM name_number_location AS r
WHERE NOT EXISTS (
SELECT *
FROM #yourdata AS t
WHERE r.Name = t.Name AND r.Location = t.Location
)

Data value "0" has invalid format error in redshift

We are facing a weird problem with one of our query.
Below is the query we are running
INSERT into test
SELECT
member.name as mem_name,
CASE WHEN ( member.dob>0 AND length (member.dob)=8 ) THEN (DATEDIFF(year,to_date("dob",'YYYYMMDD'), to_date(20140716,'YYYYMMDD'))) WHEN ( member.dob=0 ) Then 0 END As Age,
20140716021501
FROM
member
Below is the sample data present in our table.
|name |dob
|Ajitsh |0 |
|rk |51015 |
|s_thiagarajan |19500130 |
|madhav_7 |19700725 |
|1922 |0 |
|rekha |25478 |
|vmkurup |0 |
|ravikris |19620109 |
|ksairaman |0 |
|sruthi |0 |
|rrbha |19630825 |
|sunilsw |0 |
|sunilh |0 |
|venky_pmv |19701207 |
|malagi |0 |
|an752001 |0 |
|edsdf |19790201 |
|anuanand |19730724 |
|fresh |19720821 |
|ampharcopharma |19590127 |
|Nanze |19621123 |
The date of birth is stored in bigint as YYYYMMDD format.
In the data there are some rows, in which date is invalid like 0, 51015.
On some instances this query raises the following error.
INSERT INTO test not successful
An error occurred when executing the SQL command:
INSERT into test
SELECT
member.name as mem_name,
CASE WHEN ( member.dob>0 AND length (member.dob)=8 ) THEN (DATEDIFF(y...
ERROR: Data value "0" has invalid format
Detail:
-----------------------------------------------
error: Data value "0" has invalid format
code: 1009
context: PG ERROR
query: 92776
location: pg_utils.cpp:2731
process: query1_30 [pid=1434]
-----------------------------------------------
Execution time: 3.99s
1 statement failed.
But the strange thing is, it raises the error randomly and not all the time.
Many times it works without any change in query or dataset.
Sometime it also works in second or third attempt.
My doubt is that to_date function is giving this error. But why randomly
and not gives error on every run.
To support my assumption I also tried this small query.
SELECT to_date(20140716,'YYYYMMDD'), to_date(0,'YYYYMMDD');
But this also creates the same scenario. It raises error randomly, while
runs smoothly rest of the times.
If is it fine to ignore this type of values and just convert this to Date format you can follow the below way.
SELECT to_date('20140716','YYYYMMDD'), to_date('0','FMYYYYMMDD');
Here FM suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a pattern be fixed-width.

Delphi 7 - error when get values using sql operator in

Got no answer on related thread, so i make this question. I've been searched for how to retrieve records value using where clause with multiple values and i got this.
table example :
|ID |PRICE|
|1 |3000 |
|2 |2000 |
|3 |1000 |
|4 |5000 |
|5 |4000 |
SQL query :
DM.Zread.Close;
DM.Zread.SQL.CommaText := 'select PRICE from DVD where ID in (1, 2, 3)';
DM.Zread.Open;
Above gave me an error, when i only put one 1 values which is (1) or (2) it's works fine.
Questions are :
how to straight it, so i could get the values from 3 different
records ?
how to apply it on string values instead ?
SQL is a TStrings subclass. When you set CommaText using the above, you are actually setting your query to:
select PRICE from DVD where ID in (1
2
3)
This obviously won't work.
You want to set the Text property or use Add() method to add separate lines.
Try using CommandText rather than CommaText on your SQL call
DM.Zread.Close;
DM.Zread.SQL.CommaText := 'select PRICE from DVD where ID in (1, 2, 3)';
DM.Zread.Open;
DM.Zread.SQL.CommandText := 'select PRICE from DVD where ID in (1, 2, 3)';