My employer has switched data systems and reporting tools. We used to use Report Builder with a nicely built data model that allowed me to do some complex filtering easily. Then we used Business Objects, and though I didn't like it very much, it also let me do some complex filtering. Now we're back to Report Builder, but the data model is different, and the only filtering I seem to be able to do is a string of AND operators.
(Note: I'm self-taught on both Report Builder and Business Objects. I have minimal experience with the SQL coding language itself. Also, actual data labels have been changed in this example.)
I'm pulling from a large amount of data, so I need to filter on the query level. I first need to include data based on five criteria, like this.
| SYSTEM.REGION.REGION_STATUS_CODE = N'1'
| SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND | SYSTEM.ORDERS.DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
| SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
Then I need to include data that fits one of two pairings, like this.
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
OR |
| | SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
| AND | SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
After I built my query using the query designer and switched to text mode, it gave me this.
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail'
AND SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale'
AND SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
I've tried putting parentheses in, but I must have done it wrong because the query ran for ages before essentially giving me the entire database.
Anybody care to help a SQL newbie?
Presuming everything else is right, it should just be about applying parenthesis to get the logic right. Using slightly exaggerated whitespace to try and make it clear:
WHERE
SYSTEM.REGION.REGION_STATUS_CODE = N'1'
AND SYSTEM.STATE.STATE_STATUS_CODE = N'1'
AND SYSTEM.ORDERS.DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_DISCARDED_DATE IS NULL
AND SYSTEM.SERVICE.SERVICE_STATUS_CODE = N'01'
AND (
(SYSTEM.ORDERS.DISCOUNT_CODE = N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Retail')
OR
(SYSTEM.ORDERS.DISCOUNT_CODE != N'N/A'
AND SYSTEM.SERVICE.SERVICE_CONTRACT_CODE = N'Wholesale')
)
(It still may run forever, but that's more a factor of database size and indexing.)
Related
I have a table as shown
ID (int) | DATA (bytea)
1 | \x800495356.....
The contents of the data column have been stored via a python script
result_dict = {'datapoint1': 100, 'datapoint2': 2.334'}
table.data = pickle.dumps(result_dict)
I can easily read the data back using
queried_dict = pickle.loads(table.data)
But I don't know how to query it directly as a json or even as plain text in postgres alone. I have tried the following query and many versions of it but it doesn't seem to work
-- I don't know what should come between SELECT and FROM
SELECT encode(data, 'escape') AS res FROM table WHERE id = 1;
-- I need to get this or somewhere close to this as the query result
res |
{"datapoint1": 100, "datapoint2": 2.33}
Thanks a lot in advance to everyone trying to help.
Looked all over Azure Data Explorer documentation for migration scenarios and I didn't manage to find an article on this.
What I'm trying to do is to apply migration to incoming data and I thought of putting it in the Update Policy. I don't know if this is a good idea or not, let me know. Aside for this, I don't know if what I'm doing is good enough or if it could be made better.
I have table Target and table Source. Source has a dynamic Payload column and I'm mapping that column to the table Target IF it has a certain property. I did it as such:
let new_data = Source
| where Payload.Name == 'NameImLookingFor'
;
let good_data = new_data
| where isnull(Payload.DeprecatedField)
| project
FieldA = todouble(Payload.FieldA),
FieldB = todouble(Payload.FieldB),
FieldC = todouble(Payload.FieldC)
;
let migrated_data = new_data
| where isnotnull(Payload.DeprecatedField)
| project
FieldA = iff(toint(Payload.DeprecatedField)==0,todouble(Payload.DeprecatedFieldValue), Payload.UndefinedMemeber),
FieldB = iff(toint(Payload.DeprecatedField)==1,todouble(Payload.DeprecatedFieldValue), Payload.UndefinedMemeber),
FieldC = iff(toint(Payload.DeprecatedField)==2,todouble(Payload.DeprecatedFieldValue), Payload.UndefinedMemeber)
;
good_data
| union migrated_data
I have some questions and incertitudes:
iff must have an else value specified. I want it to be null, but that type doesn't exist so I'm using Payload. some field that I'm sure it doesn't exists on the object so I have an empty value. Is this good enough? Could it be better?
I'm calling that iff 3 times, could a function be made for it? If yes, how and where? Should I place that in the update policy also or define it somewhere else?
Could it be done in a single query? I looked into case statement but I didn't feel like it would make my life easier.
Thanks.
Using an update policy is valid (though, ideally, you'd fix the data at its source, if possible, before ingestion into Kusto/ADX).
You could replace your logic with the following:
Source
| where Payload.Name == 'NameImLookingFor'
| extend df = toint(Payload.DeprecatedField)
| project FieldA = case(isnull(df), todouble(Payload.FieldA), case(df == 0, todouble(Payload.DeprecatedFieldValue), double(null))),
FieldB = case(isnull(df), todouble(Payload.FieldB), case(df == 1, todouble(Payload.DeprecatedFieldValue), double(null))),
FieldC = case(isnull(df), todouble(Payload.FieldC), case(df == 2, todouble(Payload.DeprecatedFieldValue), double(null)))
I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.
The problem is: I have a list of objects, with some containing the same PlanId property value. I want to only grab the first occurrence of those and ignore the next object with that PlanId. The root problem is a View in the database, but it's tied in everywhere and I don't know if changing it will break a ton of stuff nearing a deadline, so I'm tossing in a hack for now.
So, if I have a list of PlanObjects like such.
Plan1.PlanId = 1
Plan2.PlanId = 1
Plan3.PlanId = 2
Plan4.PlanId = 3
Plan5.PlanId = 4
Plan6.PlanId = 4
I want to take a sub-list from that with LINQ (italics mean an item is not included)
Plan1.PlanId = 1
Plan2.PlanId = 1
Plan3.PlanId = 2
Plan4.PlanId = 3
Plan5.PlanId = 4
Plan6.PlanId = 4
For my needs, it doesn't matter which one is taken first. The Id is used to update a datbase record.
If I didn't explain that well enough, let me know and I'll edit the question. I think it makes sense though.
PlanObjects.GroupBy(p => p.PlanId).Select(r => r.First());
The other answer (and its comments) supplies the fluent interface solution. Here's the query syntax:
From p In PlanObjects Group By p.PlanId Into First Select First
Django 1.3-dev provides several ways to query the database using raw SQL. They are covered here and here. The recommended ways are to use the .raw() or the .extra() methods. The advantage is that if the retrieved data fits the Model you can still use some of it's features directly.
The page I'm trying to display is somewhat complex because it uses lots of information which is spread across multiple tables with different relationships (one2one, one2many). With the current approach the server has to do about 4K queries per page. This is obviously slow due to database to webserver communication.
A possible solution is to use raw SQL to retrieve the relevant data but due to the complexity of the query I couldn't translate this to an equivalent in Django.
The query is:
SELECT clin.iso as iso,
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp
FROM clin
WHERE iso IN (...)
or alternatively WHERE iso = ....
Sample output looks like:
+-------+--------------+---------------+-------------+
| iso | multiple_iso | multiple_samp | snp |
+-------+--------------+---------------+-------------+
| 7 | 19883 | 0 | NULL |
| 8 | 19883 | 0 | NULL |
| 21092 | 1 | 2 | G,T,C,G,T,G |
| 31548 | 1 | 0 | NULL |
+-------+--------------+---------------+-------------+
4 rows in set (0.00 sec)
The documentation explains how one can do a query using WHERE col = %s but not the IN syntax.
One part of this question is How do I perform raw SQL queries using Django and the IN statement?
The other part is, considering the following models:
class Clin(models.Model):
iso = models.IntegerField(primary_key=True)
pat = models.IntegerField(db_column='pat_id')
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samptopat_id = models.IntegerField(primary_key=True)
samp = models.OneToOneField(Samp, db_column='samp_id')
pat = models.IntegerField(db_column='pat_id')
iso = models.ForeignKey(Clin, db_column='iso_id')
class Meta:
db_table = u'samptopat'
class Samp(models.Model):
samp_id = models.IntegerField(primary_key=True)
samp = models.CharField(max_length=8)
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samptosnp_id = models.IntegerField(primary_key=True)
samp = models.ForeignKey(Samp, db_column='samp_id')
snp = models.IntegerField(db_column='snp_id')
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
Is it possible to rewrite the above query into something more ORM oriented?
For a problem like this one, I'd split the query into a small number of simpler ones, I think it's quite possible. Also, I found that MySQL actually may return results faster with this approach.
edit ...Actually after thinking a bit I see that you need to "annotate on subqueries", which is not possible in Django ORM (not in 1.2 at least). Maybe you have to do plain sql here or use some other tool to build the query.
Tried to rewrite your models in more default django pattern, maybe it will help to understand the problem better. Models Pat and Snp are missing though...
class Clin(models.Model):
pat = models.ForeignKey(Pat)
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samp = models.ForeignKey(Samp)
pat = models.ForeignKey(Pat)
iso = models.ForeignKey(Clin)
class Meta:
db_table = u'samptopat'
unique_together = ['samp', 'pat']
class Samp(models.Model):
samp = models.CharField(max_length=8)
snp_set = models.ManyToManyField(Snp, through='SampToSnp')
pat_set = models.ManyToManyField(Pat, through='SaptToPat')
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samp = models.ForeignKey(Samp)
snp = models.ForeignKey(Snp)
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
The following seems to mean - get count of unique patients per clinic ...
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
Sample count per clinic:
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
This part is harder to understand, but in Django there is no way to do GROUP_CONCAT in plain ORM.
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp
Could you explain exactly what you're trying to extract w/ the snp subquery? I see you're joining over the two tables, but it looks like what you really want is Snp objects which have an associated Clin which has the given id. If so, this becomes almost as straightforward to do as a separate query as the other 2:
Snp.objects.filter(samp__pat__clin__pk=given_clin)
or some such thing ought to do the trick. You may have to rewrite that a bit due to all the ways you're violating the conventions, unfortunately.
The others are something like:
Pat.objects.filter(clin__pk=given_clin).count()
and
Samp.objects.filter(clin__pk=given_clin).count()
if #Evgeny's reading is correct (which is how I read it as well).
Often, with Django's ORM, I find I get better results if I try to think about directly what I want in terms of the ORM, instead of trying to translate to or from the SQL I might use if I wasn't using the ORM.