Crate db cannot query data in a shard - cratedb

I have a instance of Crate 1.0.2 and I dropped a table from it. Then re-created table with same name and slightly modified schema. Then I imported data using copy from command. File argument to copy from command consists of 10,000 records and copy from command runs ok. When I check table tab in crate web console, it shows many partitions added and each partition having few records. If I add number of records column on this tab, it comes close to 10k but when I fire a command "select count(*) from mytable", it returns around 8000 records only. On further investigation found that there are certain partitions on which data cannot be queried at all. Has any one seen this problem? Does it have anything to do with table drop and creation with same name ? I also observed that when a table is dropped, not all files related to that table are deleted from path.data. Are these directories a reason for those partitions become non-query able? While importing, I saw "Document already exists" exception. I know my data does not have any duplicate value for primary column.

Some questions to clarify the issue:
Have you run refresh table mytable after your copy command has finished?
Are you sure that with the new schema of the table, there are no duplicate records?
Since 1.x versions are not supported anymore, could you try with CrateDB 2.1.6 which is the current stable version to see if the problem persists?

Related

How to stop Big Query using old schema when creating new table with the same name as a deleted one from Google Sheets

I am using a Google Sheet as the source of a table in Big Query. Since I am unable to rename fieldnames in the schema of an existing table I deleted the table and attempted to re-create it after amending the column names in the source Google Sheet. I need to keep the table name the same as I already have analysis files connecting to the table, however when I create the new table as ask Big Query to auto-detect the schema it uses the schema of the previous table. Even if I enter the new schema as text when creating the table it ignores what I enter and use the schema from the old table.
Any ideas how I get Big Query to detect the new schema from the Google Sheet whilst using the same table name as the deleted table?
Thanks in advance!
After trying this multiple times and it not working - with several tables - randomly it worked and let me create a table with the new scheme (manually). Not sure why this didn't work before as I'm pretty sure I didn't do anything differently. If anyone has any insight on what might have caused the initial errors I'd love to hear it for future reference but my current problem is solved.

Access Macro Creating a Duplicate Table that then Breaks the Macro

I inherited an Access based dashboard which uses a series of SQL queries and Make Table actions to write the results of those queries into tables located in two other Access files.
There is a macro that runs all of the make table commands which in turn run the sql queries. All of them are working when I run this on my machine, but when I run it from our VM which handles scheduled refreshes, it creates a duplicate table for one of the queries.
If I delete that table from the Tables1 database, the Queries database will run successfully and create both the correct table, and the duplicate in Tables1. However, each subsequent time, the macro will fail with an error saying that the duplicate table already exists.
This is the make table sql:
SELECT [SQLQueryName].*
INTO [TableName]
IN 'filepath\Tables1.accdb'
FROM SQLQueryName;
This is the same structure that all the other make table queries are using and they are not having this issue. I'm not sure if this matters or will tell someone something, but the duplicate table is the same name with a 1 added to the end of it. We also recently had to get a new VM setup and there have been a lot of weird issues with things not working as they did in the previous VM where this had been running without issue for quite a long time.
So far I've tried compacting both the Query and Table1 Database files. I've tried deleting the make table query and making a new version. I've tried deleting the table and duplicate from Tables1. I've tried rolling back the database files to versions several weeks old. We also made sure that the version of Access is the same as on my PC.
I am currently attempting to change it to a delete row/append row method rather than a make table, but even if that works I would still love to know why this is happening.
Edit:
Actual Code from make table that is failing, removed filepath.
SELECT [012_HHtoERorOBS].*
INTO [HHtoER-Obs]
IN '\\filepath\MiscTablesB.accdb'
FROM 012_HHtoERorOBS;
Here is code from 2 other make table queries that are working. Each make table query follows an identical format of select * from the sql query to get the data into the table in the destination accdbs.
SELECT [010_ScheduledVisitsQuery].*
INTO ScheduledVisits
IN '\\filepath\MiscTablesB.accdb'
FROM 010_ScheduledVisitsQuery;
SELECT [020_HH_HO].*
INTO HH_Referred_to_HO
IN '\\filepath\MiscTablesB.accdb'
FROM 020_HH_HO;
All of these tables exist in the destination accdbs when the make table queries are run. The macro does not include any commands to delete tables. Here is a screenshot of the top of the macro, it repeats all the make table queries then ends with a command to quit access.

Cannot find the object % because it does not exist or you do not have permissions

I am trying to write data to an Azure SQL DB with Azure Data Factory. I'm using a Copy Data Task within a For Each that looks through all the rows in an ETL table in the DB. In the pre-copy script, I have
TRUNCATE TABLE [master].[dbo].[#{item().DestinationObjectName}]
DestinationObjectName being the name of the table that is being loaded in the ETL table. The problem I'm having is that for some of the tables (not all, some work perfectly fine) I am getting the error 'Cannot find the object % because it does not exist or you do not have permissions'. The account I'm using has all the necessary privileges as well. I am able to see the script that is sent to ADF which I have copied into the DB and confirmed this script works sometimes but not every time. If I select top 1000 from the table in question and replace that object for the one in the truncate table script, it works. I'm really at a loss here. Like I said the truncate works for a majority of the tables but not all. I have also double checked that the object names are the exact same.
Any help is appreciated.
This issue has been solved. I had to drop the affected tables and remove the brackets surrounding each in the create table statements and recreate without the brackets. very strange issue.

How to identify deleted records in sql server while importing to hadoop using Sqoop

While importing data from sql server or any RDBMS database to hadoop using Sqoop, we can get newly appended records or modified records using incremental append or last modified or some free form queries.
Is there anyway we can identify deleted records? Considering when record is deleted it will not exist in sql table.
One workaround is to load full table using Sqoop and compare with previous table in hive.
Is there any other best way to do?
No, you can not get deleted records using sqoop.
A better workaround could be:
Create a boolean field status(default true) in your SQL Server table.
Whenever you need to delete that record don't delete just update with marking status false.
If you are using last-modified increment import, you will get this changed data in HDFS.
Later (after sqqop import) you can delete all these records with status false.
If you are syncing the entire partition or table then you can identify deleted records after sqoop import before merging them using full join with existing target partition or table. Records existing in target table/partition which do not exist in imported data are those deleted on source database since last sync.
Incremental sqooping does not handle deleted records out of the box. There are two approach you may want to consider.
Please look at this post.

Controlling the updates in my Database

I came here today to see if someone could give me a suggestion to improve the way I update my database.
Here is the problem, I have one file that I store new scripts every time that I need to change something. For instance, let's say I need to add a new column in a table. I would add the following line in my file called script1.sql:
alter table CLIENTS
add AGE integer
After doing that, I am going to send it to a client with an updated application, and ask him to run script1.sql on his database. That works just fine for me.
The problem shows up when this file starts to get bigger, and the client needs to receive the new updates.
The client would run the script1.sql file again, but now with more updates. He will get errors indicating that a column named AGE already exists in the database.
The biggest problem is when I change the version of my application. If I update my application from Application1 to Application2, I also change the script from script1.sql to script2.sql.
Now, my client will need to run both to get to the correct version without conflicts. He will also get lots of errors, since almost everything from script1.sql was already processed in his database.
What I want is to eliminate the chance to face conflicts. This process has been working for me, but always causing some sort of trouble. Therefore, if anyone has any idea about how I could make it work better, please help me out.
Usually SQL provides something called IF EXISTS ( also IF NOT EXISTS) so eg you can write a statement such as:
CREATE TABLE IF NOT EXISTS users ...
Which will only create the users table if it hasn't already been created.
There is usually a variant of this that can be added to all your statements (including updates such as renaming columns etc).
Then if the table has already been added (or column updated etc) then it won't try to run that SQL command again - which means you can run the same file over and over as many times as you like.
(Note: this is called idempotency)
You will need to google for the details on how to use EXISTS for sql-server