Selecting NA values from sql file in R

Selecting NA values from sql file in R - sql

I have sql tables being read in R with the following code
library(RSQLite)
setwd("C:/Users/Cat/Downloads")
drv = dbDriver("SQLite")
# Use the driver to connect to a particular sqlite database file
con = dbConnect(drv, "cartype")
dbListTables(con)
And there are columns named as ID and credit in a table Sale. Some of the credit is missing and I can select them with the following code.
wow = dbGetQuery(con, 'SELECT DISTINCT ID FROM Sale WHERE
credit IS NOT \"NA\";')
Question is how can I select ID with Date is NA? I tried the code
wow = dbGetQuery(con, 'SELECT DISTINCT ID FROM Sale WHERE
credit IS \"NA\";')
OR
wow = dbGetQuery(con, 'SELECT DISTINCT ID FROM Sale WHERE
credit == \"NA\";')
The codes work but giving a incorrect result that there's 0 result matching the condition, while it should have more than 100 IDs with NA credit.
Can anyone help me out, and show me how can I get IDs with NA credit?
Thanks !

I think you're coming up against a mis-aprehension about how missings are stored in databases. These are usually stored as NULLs as opposed to NA.
Your first statement works because it is comparing a column of some type (e.g. date, int, varchar) against a string ("NA") and this will exclude NULLs because a string comparison (whether implicit or explicit) will always exclude missings and since all your dates will be different to "NA" it will return all non-missing records.
The reason why your second and third statements go onto return 0 records is because it is again doing a string comparison which will exclude NULLs and also won't find a match.
For SQLite, there is a great page on how NULLs are handled which might help you out with more detail on this topic: http://sqlite.org/nulls.html

Related

Multiply values of two variables according to id

I have a problem with the multiplication of two variables.
I have two sql statements with which I get the number of hours that an employee has worked during a specific month, and another statement to get the hourly cost that each employee has.
I must say that they are two different tables in the database.
What I would like is to multiply the hours of each employee by the hourly cost that employee has, and then add the results.
These two sentences work fine, if I do "print_r" I see the results that each employee has.
I guess I have to do it through the ID, but I don't know how to do it.
This is what I have so far, and I only get it to multiply the first person it finds.
$SumaHoras = $DB->Sql ("SELECT Id_Persona, CONCAT(SUM(Horas)* 3600)/36000000 as Hora FROM table1");
$SumaCosteHora = $DB->Sql("SELECT Id_CostPers, CONCAT(Coste_Hora) as Coste FROM table2");
$suma1 = [];
while($totalSumaHoras = $DB->Row($SumaHoras)){
$suma1 = $totalSumaHoras;
print_r($suma1);
}
$suma2 = [];
while($totalSumaCoste = $DB->Row($SumaCosteHora)){
$suma2 = $totalSumaCoste;
print_r($suma2);
}
$total4 = $suma1->Hora * $suma2->Coste;
I don't know if I explained myself well, but I appreciate any help you can offer me.

Your first line of code does not work - it generates errors - "The concat function requires 2 to 254 arguments." and then if you remove the CONCAT you get "Column 'table1.Id_Persona' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." - as pointed out by #isamu
You should not use CONCAT for numbers - that returns a string - always use the correct Type for data.
To get all the information you require at once you need to use a JOIN see This article
E.g.
SELECT Id_Persona, t2.Coste_Hora * (SUM(t1.Horas) * 3600.0)/36000000.0
FROM table1 t1
INNER JOIN table2 t2 ON t1.Id_Persona = t2.Id_CostPers
group by Id_Persona, t2.Coste_Hora
I'm not sure why you are using that particular calculation but note I have appended .0 to those integer values to ensure that decimal calculation takes place - to explain what I mean try running these two lines of SQL
SELECT 10 * 3600/36000000;
SELECT 10 * 3600.0/36000000.0;

Write SQL from SAS

I have this code in SAS, I'm trying to write SQL equivalent. I have no experience in SAS.
data Fulls Fulls_Dupes;
set Fulls;
by name, coeff, week;
if rid = 0 and ^last.week then output Fulls_Dupes;
else output Fulls;
run;
I tried the following, but didn't produce the same output:
Select * from Fulls where rid = 0 groupby name,coeff,week
is my sql query correct ?

SQL does not have a concept of observation order. So there is no direct equivalent of the LAST. concept. If you have some variable that is monotonically increasing within the groups defined by distinct values of name, coeff, and week then you could select the observation that has the maximum value of that variable to find the observation that is the LAST.
So for example if you also had a variable named DAY that uniquely identified and ordered the observations in the same way as they exist in the FULLES dataset now then you could use the test DAY=MAX(DAY) to find the last observation. In PROC SQL you can use that test directly because SAS will automatically remerge the aggregate value back onto all of the detailed observations. In other SQL implementations you might need to add an extra query to get the max.
create table new_FULLES as
select * from FULLES
group by name, coeff, week
having day=max(day) or rid ne 0
;
SQL also does not have any concept of writing two datasets at once. But for this example since the two generated datasets are distinct and include all of the original observations you could generate the second from the first using EXCEPT.
So if you could build the new FULLS you could get FULLS_DUPES from the new FULLS and the old FULLS.
create table FULLS_DUPES as
select * from FULLES
except
select * from new_FULLES
;

Combine Rows but concatenate on a certain field in Excel Power Query or Microsoft SQL

I have brought a table from an Authority database into Excel via power query OBDC type, that includes fields like:
Date - various
Comments - mem_txt
Sequence - seq_num
The Comments field has a length restriction, and if a longer string is entered, it returns multiple rows with the Comments field being chopped into suitable lengths and the order returned in the Sequence field as per extract below. All other parts of the records are the same.
I want to collapse the rows based and concatenate the various Comments into a single entry. There is a date/time column just outside of the screen shot above that can be used to group the rows by (it is the same for the set of rows, but unique across the data set).
For example:
I did try bring the data in by a query connection, using the GROUP_CONCAT(Comments SEPARATOR ', ') and GROUP BY date, but that command isn't available in Microsoft Query.

Assuming the date/time column you refer to is named date_time, the M code would be:
let
Source = Excel.CurrentWorkbook(){[Name = "Table1"]}[Content],
#"Grouped Rows" = Table.Group(
Source,
{"date_time"},
{{"NewCol", each Text.Combine([mem_text])}}
)
in
#"Grouped Rows"
Amend the Source line as required.

How to count unique occurences of string in table for separate records in apex 5

I am trying to automatically count the unique occurrences of a string saved in the table. Currently I have a count of a string but only when a user selects the string and it gives every record the same count value.
For example
Below is a image of my current table:
From the image you can see that there is a Requirement column and a count column. I have got it to the point were when the user would select a requirement record (each requirement record has a link) it would insert the requirement text into a requirement item called 'P33_REQUIREMENT' so the count can have a value to compare to.
This is the SQL that I have at current:
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = :P33_REQUIREMENT
group by REQUIREMENT
) AS COUNT,
DPD.DIA_SELECTED,
DPD.Q_NUMBER_SELECTED,
DPD.SECTION_SELECTED,
DPD.ASSIGNED_TO_PERSON,
DAQD.REFERENCE,
DAQD.REQUIREMENT,
DAQD.PROGRESS,
DAQD.ACTION_DUE_DATE,
DAQD.COMPLETION_DATE,
DAQD.DIA_REF,
DA.DIA,
DA.ORG_RISK_SCORE
FROM DIA_PROPOSED_DETAIL DPD,
DIA_ASSOCIATED_QMS_DOCUMENTS DAQD,
DIA_ASSESSMENTS DA
WHERE DPD.DIA_SELECTED = DAQD.DIA_REF
AND DPD.DIA_SELECTED = DA.DIA
This is the sql used to make the table in the image.
This issue with this is, it is giving every record the same count when the user selects a requirement value. I can kind of fix this by also adding in AND DIA_SELECTED = :P33_DIA into the where clause of the count. DIA_SELECTED being the first column in the table and :P33_DIA being the item that stores the DIA ref number relating to the record chosen.
The output of this looks like:
As you can see there is only one count. Still doesn't fix the entire issue but a bit better.
So to sum up is there a way to have the count, count the occurrences individually and insert them in the requirements that are the same. So if there are three tests like in the images there would be a '3' in the count column where requirement = 'test', and if there is one record with 'test the system' there would be a '1' in the count column.
Also for more context I wont know what the user will input into the requirement so I can't compare to pre-determined strings.
I'm new to stack overflow I am hoping I have explained enough and its not too confusing.

The following extract:
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = :P33_REQUIREMENT group by REQUIREMENT ) AS COUNT
Could be replaced by
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = DAQD.REQUIREMENT ) AS COUNT
Which would give you - for each line, the number of requirements that are identical.
I'm not completely certain it is what you are after, but if it isn't, it should give you some ideas on how to progress (or allow you to indicate where I failed to understand your request)

Literal Does Not Match 01861

select case
when sale not in TO_date(sale,'MMYY') then 'N'
else sale
end test
from daily sales
I would like to see if the data in the column meets the following criteria using the above select. I am receiving the following error:
LITERAL DOES NOT MATCH FORMAT STRING
small data set
0109
0106
0409
column is a varchar

to_date('0109', 'mmyy') is correct, it will return 01-01-2009 00:00:00 AM (in one possible format). So the problem is likely in the column - you may have one or more value(s) that are not, in fact, four digits. You need to do some troubleshooting.
Something I would try:
select max(length(sale)) as max_len, min(length(sale)) as min_len from table_name
(daily sales cannot be a table name - table names don't have spaces in them)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas