I'm trying to clean some varchar data on Snowflake and having some issues on this column - I'd like all data where information is missing to display as null, rather than eg. 'unknown'. The data looks like this:
+---------------------------+
| Entity |
+---------------------------+
| Walgreens |
+---------------------------+
| Apple |
+---------------------------+
| Microsoft |
+---------------------------+
| 2018 Quora hack |
+---------------------------+
| Unknown government agency |
+---------------------------+
| Unknown |
+---------------------------+
And I'd like to standardised it, either by changing the original column or adding a revised ones, so it looks like this:
+-----------+
| Entity |
+-----------+
| Walgreens |
+-----------+
| Apple |
+-----------+
| Microsoft |
+-----------+
| Quora |
+-----------+
| null |
+-----------+
| null |
+-----------+
Here's what I've tried so far. The plan was to find something that would work for the 'Unknown' bits of data and then apply it to more specific cases like simplifying the '2018 Quora hack.'
1
select *
from data_breaches
order by case when "Entity" like '%nknown%'
then NULL else "Entity" end
This returned the data, but put entities which said 'Unknown' in them at the end of the table and didn't change them to null
2
select "Sector", "Records Number", "Method"
if "Entity" IN('Unknown'), NULL, "Entity") as Enclean
from data_breaches
Returned this error: Syntax error: unexpected '"Entity"'. (line 2)
I think maybe Snowflake doesn't support this syntax?
3
select "Year", "Records", "Organization type","Method"
iff("Entity" like '%nknown', NULL,"Entity")
from data_breaches
Returned this error: Syntax error: unexpected '('. (line 2)
Using ILIKE and CASE expression to handle Unknown inside column:
SELECT CASE WHEN NOT Entity ILIKE '%Unknown%' THEN Entity END AS Entity
FROM data_breaches;
Related
I'm using Spark and I found that my data is not being correctly interpreted. I've tried using decode and encode built-in functions but they can be applied only to one column at a time.
Update:
An example of the behaviour I am having:
+-----------+
| Pa�s |
+-----------+
| Espa�a |
+-----------+
And the one I'm expecting:
+-----------+
| País |
+-----------+
| España |
+-----------+
The sentence is just a simple
SELECT * FROM table
I am trying to visualise in Grafana from timescale db with the following query
SELECT $__timeGroup(timestamp,'30m'), sum(error) as Error
FROM userCounts
WHERE serviceid IN ($Service) AND ciclusterid IN ($CiClusterId)
AND environment IN ($environment) AND filterid IN ($filterId)
AND $__timeFilter("timestamp")
GROUP BY timestamp;
however it gives an error and no data shows when i add the filterid IN ($filterId) part
have checked the variable names a thousand times but not sure what is error. Logically if the filters for variables are working in other conditions , it should work here also. not sure what is going wrong. Can anyone give input ?
Edit:
The schema is like
timestamp | timestamp without time zone | | not nul
l |
measurement | character varying(150) | |
|
filterid | character varying(150) | |
|
environment | character varying(150) | |
|
iscanary | boolean | |
|
servicename | character varying(150) | |
|
serviceid | character varying(150) | |
|
ciclusterid | character varying(150) | |
--more--
In grafana , it is giving the error
pq: column "in_orgs_that_have_had_an_operational_connector" does not exist
Where filterId = IN_ORGS_THAT_HAVE_HAD_AN_OPERATIONAL_CONNECTOR is selected, it is a value and not a column so not sure why they mentioned that, also they are showing in lower case while the value is in uppercase
I searched and figured out that I could use either substr with || or a printf statement with format specifiers in order to add padding to the results, but that doesn't seem to work if I had DISTINCT in the sqlite query.
I've a table called timeLapse that looks like so:
+----+-------+-----------+
| ID | Time | Status |
+----+-------+-----------+
| 1 | 0.001 | Initiated |
| 1 | 0.002 | Cranked |
| 3 | 0.002 | Initiated |
| 2 | 0.002 | Initiated |
| 2 | 0.003 | Cranked |
+----+-------+-----------+
I could query the distinct IDs with something like SELECT distinct(ID) FROM timeLapse as IDs, which returns this:
+-----+
| IDs |
+-----+
| 1 |
| 2 |
| 3 |
+-----+
However, I would like to pad the resultant distinct rows like so:
+----------+
| IDs |
+----------+
| Object-1 |
| Object-2 |
| Object-3 |
+----------+
My query SELECT substr('Object-' || DISTINCT(ID), 10, 10) as IDs FROM timeLapse results in an error:
"[17:22:47] Error while executing SQL query on database 'machining': near "distinct": syntax error"
Could someone please help me understand what am I doing wrong here? I am enormously thankful for your time and help.
get distinct() first before using substr() function.
select substr('Object-' || t1.ID, 1, 10) as IDs
from (SELECT DISTINCT(ID) ID FROM timeLapse) t1
see sqlfiddle
All credits to the user named ϻᴇᴛᴀʟ, as I only understood from their answer that I should have a sub-query within this query where the DISTINCT should go into.
This resolves my problem:
select printf('Object-%s', t1.ID) as IDs
FROM (SELECT DISTINCT(id) ID FROM timeLapse) t1
I have an Oracle table which is being loaded by a function - whenever it finds "LOW_MEMORY" in best_status, it will add the systimestamp in low_mem_timestamp column.
+----------+-------------------+-------+-------------------------------+
| device_id| best_status | job_id| low_mem_timestamp |
+----------+-------------------+-------+-------------------------------+
| 715016 | OPERATION_FAILURE | 511008|(null) |
| 715009 | LOW_MEMORY | 511008|10-MAY-17 11.13.22.143122000 AM|
| 715014 | DOWNLOAD_COMPLETE | 740004|(null) |
| 941015 | LOW_MEMORY | 740004|10-MAY-17 11.13.22.143122000 AM|
+----------+-------------------+-------+-------------------------------+
After this I have another table where i want to record the changes from above table
Whenever low_mem_timestamp changes for any device_id like:
if it had timestamp and now it got updated to "null" then it should add "1"
if it had null value and got updated to timestamp then "0"
Output table:
Condition:
device_id='715009' BEST STATUS moved from "LOW_MEMORY" to "UPDATE_DEFERRED" then low_mem_timstamp got updated to "null" then low_mem_timstamp should be "1"
device_id='715014' BEST STATUS moved from " DOWNLOAD_COMPLETE" to "LOW_MEMORY" then low_mem_timestamp got updated to some timestamp "any timestamp" then low_mem_timstamp should be "0"
device_id='941015' BEST STATUS remains same, it is not updated then low_mem_timstamp should be "NA"
Then in my final table output should be like
+----------+-------------------+-------+---------------+
| device_id| best_status | job_id| low_mem_toggle|
+----------+-------------------+-------+---------------+
| 715009 | UPDATE_DEFERRED | 511008|1 |
| 715014 | LOW_MEMORY | 740004|0 |
| 941015 | LOW_MEMORY | 740004|NA |
+----------+-------------------+-------+---------------+
Please suggest a sql query to implement this functionality.
Thanks in advance.
I have the following data:
| ID | TYPE | USER_ID |
|----------|----------|----------|
| 1 | A | 7 |
| 1 | A | 8 |
| 1 | B | 6 |
| 2 | A | 9 |
| 2 | B | 5 |
I'm trying to create a query to return
| ID | RESULT |
|----------|----------|
| 1 | 7, 8, 6 |
| 2 | 9, 5 |
The USER_ID values must be ordered by the TYPE attribute.
Since I'm using MS ACCESS, I'm trying to pivot. What I've tried:
TRANSFORM first(user_id)
SELECT id, type
FROM mytable
GROUP BY id, type
ORDER BY type
PIVOT user_id
Error:
Too many crosstab column headers (4547).
I'm missing something in the syntax. However, it seems to be wrong since the first() aggregate needs to be changed to something else to concatenate the results.
PS: I'm using MS-ACCESS 2007. If you know a solution for SQL-Server or Oracle using only SQL (without vendor functions or stored procedures), I'll probably accept your answer since it will help me to find a solution for this problem.
You don't want to use PIVOT. Pivot will create a column named after each of your user IDs (1 - 7). Your TYPE field doesn't seem to do anything either.
Unfortunately, doing this in SQL Server requires the use of a function (FOR XML Path) that's not available in Access.
Here's a link with a similar Access function to do something similar.