Sql help to find username in a data set - sql

Data
row data
1 (172.32.313.20:5892) User 'ant\john' requested
2 User ant\john logged on from 172.31.13.2129
3 user=ant\john domain=ant.amazon.com server=172.31.19.541 protocol=LDAPS result=0:Success
I need to pull the username (john) from this dataset .
select message,replace(TRIM(split_part(split_part(message, 'requested', 1), 'User ', 2)), 'ant\\', '') username1,
replace(TRIM(split_part(split_part(message, 'requested', 1), 'user=', 2)), 'ant\\', '')username2
from test_kemp_log.archive
where message like '%john%'
Is there a better way to extract User(user) ant/ information from dataset?

The following could work in your case.
SELECT regexp_substr(data, 'ant\\\\'||'([a-z]+)', 1,1, 'e'), data From test_kemp_log.archive
WHERE message like '%john%';
'e' is for extracting the first group. In our case, it's the first thing that matches ([a-z]+) after "ant\".

Related

How to combine multiple rows into a STRUCT in SQL from a query on Hive?

I have an output table that resembles:
User
Preference
User A
Pref A
User A
Pref B
I'd like to get the data into the following format:
User
Preferences, which is array of struct
User A
[{pref => "Pref A"}, {pref => "Pref B"} ]
I attempted the following, but to no avail:
SELECT
User,
ARRAY_AGG(SELECT AS STRUCT(Preference)
) as Preferences
FROM
users
GROUP BY User
Curious if anyone might have any pointers? Thank you in advance
If you are looking to do it in Presto, then you can do it like this:
with raw as (
SELECT
t.*
FROM ( VALUES
('User A', 'Pref A'),
('User A', 'Pref B')
) as t (user_name, pref)
)
SELECT
user_name
, ARRAY_AGG(CAST(ROW(pref) AS ROW(pref VARCHAR))) as y
FROM raw
GROUP BY 1
If you are doing it via Hive, then the option is:
Check named_struct function is available (should be after Hive 0.8)
And then simply apply an
array_agg(named_struct('pref', pref))

SQL: Consolidating results

I am a relative beginner in SQL (Learned and forgotten many times) and entirely self taught so please excuse my likely lack of proper terminology. I made a query that pulls items that have been returned, each items' row has a return code. Here is a result sample:
In the final report(Created with Visual Studio), I would like to be able to have a count of returns by return type but I would need to consolidate the 40 or so return codes into 4 or 5 return type groups. So RET_CODE values ) and . are both product quality issues and would count in the "Product Quality" row.
I could use some help with what the best way to accomplish this would be.
Thank You
Andrew
The bad way!
You could do this by creating the grouping within your SQL statement with something like
SELECT
*,
CASE
WHEN RET_CODE IN ('.', ')') THEN 'Quality Error'
WHEN RET_CODE IN ('X', 'Y', 'Z') THEN 'Some Other error'
ELSE [Desc1]
END AS GroupDescription
FROM myTable
The problem with this approach is that you have to keep repeating it every time you want to something similar.
The better option. (but not perfect!)
Assuming you do not already have such a table...
Create a table that contains the grouping. You can use this in the future whenever you need to do this kind of thing.
For example.
CREATE TABLE dbo.MyErrorGroupTable (RET_CODE varchar(10), GroupDescription varchar(50))
INSERT INTO dbo.MyErrorGroupTable VALUES
('.', 'Quality Error'),
(')', 'Quality Error'),
('X', 'Some Other Error'),
('Y', 'Some Other Error'),
('.', 'Some Other Error'),
('P', 'UPS Error'),
('A', 'PAck/Pick Error')
etc....
Then you can simply join to this table and use the GroupDescription in your report
e.g.
SELECT
a.*, b.GroupDescription
FROM myTable a
JOIN MyErrorGroupTable b ON a.RET_CODE = b.RET_CODE
Hope this helps...
You're looking for the GROUP BY clause.

How could I delete sql functions format in awk?

I've got a sql query which looks like this:
SELECT test1(func1(MYFIELD)),
test2(MAX(MYFIELD), LOWER("NOPE")),
test3(MAX(MYFIELD), 1234),
AVG(test1(test2(MYFIELD, func1(4)))),
func2(UPPER("stack"))
SUBSTR(MYFIELD, 2, 4),
test2(MIN(MYFIELD), SUBSTR(LOWER(UPPER("NOPE")), 1, 7)),
SUBSTR('func1(', 2, 4)
FROM MYTABLE;
Then I'm trying to remove all functions called:
test1
test2
test3
func1
func2
But preserving the AVG, MAX, UPPER, SUBSTR... and all native functions.
So the desired output would be:
SELECT MYFIELD,
MAX(MYFIELD),
MAX(MYFIELD),
AVG(MYFIELD),
UPPER("stack")
SUBSTR(MYFIELD, 2, 4),
MIN(MYFIELD)
SUBSTR('func1(', 2, 4)
FROM MYTABLE;
I want to remove the LOWER of the second line because, it is an argument of one of the functions to delete, in this case test2, which has two parameters. Then if we delete the function, we should delete its params as well.
I've tried to do it by this way in awk:
{
print gensub(/(test1|test2|func1|func2)\(/,"","gi", $0);
}
But the output doesn't have into account the right parentheses, it doesn't also delete the rest of parameters of the custom functions:
SELECT MYFIELD)),
MAX(MYFIELD), LOWER("NOPE")),
MAX(MYFIELD), 1234),
AVG(MYFIELD, 4)))),
UPPER("stack"))
SUBSTR(MYFIELD, 2, 4),
MIN(MYFIELD), SUBSTR(LOWER(UPPER("NOPE")), 1, 7)),
SUBSTR('', 2, 4)
FROM MYTABLE;
Any idea or clue to handle this situation?
you could just rename functions' names to built-in functionCOALESCE while keep the brakets ( ) and other params of users' functions.
It will produce the same result, not syntactically, but it will work the same UNLESS the built-in functions don't return NULL values. It will be much easier to achieve because you don't have to worry about brakets.
If file is an input you provide, then:
cat file | sed 's#\(test1\|test2\|func1\|func2\)(#COALESCE(#g'
will produce:
SELECT COALESCE(COALESCE(MYFIELD)),
COALESCE(MAX(MYFIELD), 4),
AVG(COALESCE(COALESCE(MYFIELD, COALESCE(4)))),
COALESCE(UPPER("stack"))
FROM MYTABLE;

Set alias for column values

Is it possible to Set Alias for Column Values as we are set for column header in sql server.
Or if there is any other way to convert my column values to readable format for clients.
I have the following System generated values:
BILL_DETAILS
BILLING_MENU
ComplaintNumberInput
CUSTOMER_ACCOUNT_NUMBER_INPUT
DEFAULTER
FAULTS_SHUTDOWN_MENU
KUNDA_CONNECTION
LOAD_SHEDDING_MENU
LOAD_SHEDDING_SCHEDULED
loadSheddingScheduleReplayer
loadSheddingStatus
loadSheddingStatusReplayer
MENU_CONTEXT_EVAL
POWER_COMPLAINTS_MENU
repaetComplaintStatus
Is it possible to change them in the following:
BILL DETAILS
BILLING MENU
COMPLAINT NUMBER INPUT
CUSTOMER ACCOUNT NUMBER INPUT
DEFAULTER
FAULTS SHUTDOWN MENU
KUNDA CONNECTION
LOAD SHEDDING MENU
LOAD SHEDDING SCHEDULED
LOAD SHEDDING SCHEDULE REPLAYER
LOAD SHEDDING STATUS
LOAD SHEDDING STATUS REPLAYER
MENU CONTEXT EVAL
POWER COMPLAINTS MENU
REPEAT COMPLAINT STATUS
In sql, an Alias is a different name for a database object. Values does not fall under this category so it's impossible to alias them. You can, however, format the output of your query, though formatting is usually best to do in the presentation layer and not in the data layer.
Having said that, there is a t-sql solution for your question:
SELECT REPLACE(ColumnName, '_', ' ') As ColumnName
FROM TableName
This will convert all underlines to spaces.
To handle the other format you can thank Jeff Moden for solving that problem as well (see this link).
SELECT COALESCE(STUFF(ColumnName, NULLIF(patindex('%[a-z][A-Z]%', ColumnName COLLATE Latin1_General_BIN), 0) + 1, 0, ' '), Col) AS ColumnName
FROM TableName
So combining the 2 solutions your final sql should be something like this:
SELECT REPLACE(COALESCE(STUFF(ColumnName, NULLIF(patindex('%[a-z][A-Z]%', ColumnName COLLATE Latin1_General_BIN), 0) + 1, 0, ' '), ColumnName), '_', ' ') AS ColumnName
FROM TableName
This way you can handle these 2 formats in pure t-sql without having to change your query whenever a new value is added to the table.
Here is a test case with the values you posted:
DECLARE #t TABLE (Col VARCHAR(40))
INSERT INTO #t VALUES
('BILL_DETAILS'),
('BILLING_MENU'),
('ComplaintNumberInput'),
('CUSTOMER_ACCOUNT_NUMBER_INPUT'),
('DEFAULTER'),
('FAULTS_SHUTDOWN_MENU'),
('KUNDA_CONNECTION'),
('LOAD_SHEDDING_MENU'),
('LOAD_SHEDDING_SCHEDULED'),
('loadSheddingScheduleReplayer'),
('loadSheddingStatus'),
('loadSheddingStatusReplayer'),
('MENU_CONTEXT_EVAL'),
('POWER_COMPLAINTS_MENU'),
('repaetComplaintStatus')
SELECT Col
,UPPER(REPLACE(COALESCE(STUFF(col, NULLIF(patindex('%[a-z][A-Z]%', Col COLLATE Latin1_General_BIN), 0) + 1, 0, ' '), Col), '_', ' ')) AS NewCol
FROM #t
Results:
Col NewCol
BILL_DETAILS BILL DETAILS
BILLING_MENU BILLING MENU
ComplaintNumberInput COMPLAINT NUMBERINPUT
CUSTOMER_ACCOUNT_NUMBER_INPUT CUSTOMER ACCOUNT NUMBER INPUT
DEFAULTER DEFAULTER
FAULTS_SHUTDOWN_MENU FAULTS SHUTDOWN MENU
KUNDA_CONNECTION KUNDA CONNECTION
LOAD_SHEDDING_MENU LOAD SHEDDING MENU
LOAD_SHEDDING_SCHEDULED LOAD SHEDDING SCHEDULED
loadSheddingScheduleReplayer LOAD SHEDDINGSCHEDULEREPLAYER
loadSheddingStatus LOAD SHEDDINGSTATUS
loadSheddingStatusReplayer LOAD SHEDDINGSTATUSREPLAYER
MENU_CONTEXT_EVAL MENU CONTEXT EVAL
POWER_COMPLAINTS_MENU POWER COMPLAINTS MENU
repaetComplaintStatus REPAET COMPLAINTSTATUS
Use case statements for each value, like:
case old_column_name
when 'LOAD_SHEDDING_MENU'
then 'LOAD SHEDDING MENU'
when 'loadSheddingScheduleReplayer'
then 'LOAD SHEDDING SCHEDULE REPLAYER'
when ...........
then ...........
end as column_name

SQL server query on json string for stats

I have this SQL Server database that holds contest participations. In the Participation table, I have various fields and a special one called ParticipationDetails. It's a varchar(MAX). This field is used to throw in all contest specific data in json format. Example rows:
Id,ParticipationDetails
1,"{'Phone evening': '6546546541', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1951'}"
2,"{'Phone evening': '6546546542', 'Store': 'StoreABC', 'Math': '2', 'Age': '01/01/1952'}"
3,"{'Phone evening': '6546546543', 'Store': 'StoreXYZ', 'Math': '2', 'Age': '01/01/1953'}"
4,"{'Phone evening': '6546546544', 'Store': 'StoreABC', 'Math': '3', 'Age': '01/01/1954'}"
I'm trying to get a a query runing, that will yield this result:
Store, Count
StoreABC, 3
StoreXYZ, 1
I used to run this query:
SELECT TOP (20) ParticipationDetails, COUNT(*) Count FROM Participation GROUP BY ParticipationDetails ORDER BY Count DESC
This works as long as I want unique ParticipationDetails. How can I change this to "sub-query" into my json strings. I've gotten to this query, but I'm kind of stuck here:
SELECT 'StoreABC' Store, Count(*) Count FROM Participation WHERE ParticipationDetails LIKE '%StoreABC%'
This query gets me the results I want for a specific store, but I want the store value to be "anything that was put in there".
Thanks for the help!
first of all, I suggest to avoid any json management with t-sql, since is not natively supported. If you have an application layer, let it to manage those kind of formatted data (i.e. .net framework and non MS frameworks have json serializers available).
However, you can convert your json strings using the function described in this link.
You can also write your own query which works with strings. Something like the following one:
SELECT
T.Store,
COUNT(*) AS [Count]
FROM
(
SELECT
STUFF(
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, ''),
CHARINDEX('"Math"',
STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')) - 3, LEN(STUFF(ParticipationDetails, 1, CHARINDEX('"Store"', ParticipationDetails) + 9, '')), '')
AS Store
FROM
Participation
) AS T
GROUP BY
T.Store