cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sumeet_Dora
by New Contributor II
  • 2278 Views
  • 1 replies
  • 3 kudos

Resolved! Write mode features in Bigquey using Databricks notebook.

Currently using df.write.format("bigquery") ,Databricks only supports append and Overwrite modes in to writing Bigquery tables.Does Databricks has any option of executing the DMLs like Merge in to Bigquey using Databricks Notebooks.?

  • 2278 Views
  • 1 replies
  • 3 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 3 kudos

@Sumeet Dora​ , Unfortunately there is no direct "merge into" option for writing to Bigquery using Databricks notebook. You could write to an intermediate delta table using the "merge into" option in delta table. Then read from the delta table and pe...

  • 3 kudos
gbrueckl
by Contributor II
  • 6831 Views
  • 8 replies
  • 9 kudos

Slow performance of VACUUM on Azure Data Lake Store Gen2

We need to run VACCUM on one of our biggest tables to free the storage. According to our analysis using VACUUM bigtable DRY RUN this affects 30M+ files that need to be deleted.If we run the final VACUUM, the file-listing takes up to 2h (which is OK) ...

  • 6831 Views
  • 8 replies
  • 9 kudos
Latest Reply
Deepak_Bhutada
Contributor III
  • 9 kudos

@Gerhard Brueckl​ we have seen near 80k-120k file deletions in Azure per hour while doing a VACUUM on delta tables, it's just that the vacuum is slower in azure and S3. It might take some time as you said when deleting the files from the delta path. ...

  • 9 kudos
7 More Replies
Erik
by Valued Contributor III
  • 5875 Views
  • 6 replies
  • 2 kudos

Run more than nr-of-cores concurrent tasks.

We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If w...

  • 5875 Views
  • 6 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

hi @Erik Parmann​ ,It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details here. As best practice, we do not recom...

  • 2 kudos
5 More Replies
Jon
by New Contributor II
  • 17635 Views
  • 3 replies
  • 5 kudos

How can I use custom python library in Azure Databricks?

I am trying to access functions in my coreapi.py by importing in the main notebook, but I have error ModuleNotFoundError: No module named 'coreapi'. I tried by uploading the file into the same folder and I tried creating a python egg and uploading it...

  • 17635 Views
  • 3 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

There is also the possibility to use Repos file functionality:https://databricks.com/blog/2021/10/07/databricks-repos-is-now-generally-available.html

  • 5 kudos
2 More Replies
giacomosachs
by New Contributor
  • 1455 Views
  • 0 replies
  • 0 kudos

apt-get install texlive error 404

Hi everybody, I'm trying to install on a cluster (Azure Databricks, DBR 7.3LTS) texlive-full using apt-get install texlive-full in an init script.The issue is that, most of the times (not always), I get a 404 when downloading packages from security.u...

  • 1455 Views
  • 0 replies
  • 0 kudos
kjoth
by Contributor II
  • 12441 Views
  • 9 replies
  • 7 kudos

Resolved! Delete row from table is not working.

I have created External table using spark via below command. (Using Data science & Engineering)df.write.mode("overwrite").format("parquet").saveAsTable(name=f'{db_name}.{table_name}', path="dbfs:/reports/testing")I have tried to delete a row based on...

  • 12441 Views
  • 9 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

hi @karthick J​ ,Can you try to delete the row and execute your command in a non high concurrency cluster? the reason why im asking this is because we first need to isolate the error message and undertand why is happening to be able to find the best ...

  • 7 kudos
8 More Replies
User16835756816
by Valued Contributor
  • 1066 Views
  • 0 replies
  • 5 kudos

Never miss a beat! Stay up to date with all Databricks news, including product updates, and helpful product tips by signing up for our monthly newslet...

Never miss a beat! Stay up to date with all Databricks news, including product updates, and helpful product tips by signing up for our monthly newsletter.Note: Newsletter is currently available to only AWS & GCP customers.

  • 1066 Views
  • 0 replies
  • 5 kudos
User16835756816
by Valued Contributor
  • 1071 Views
  • 0 replies
  • 5 kudos

Learn the basics with these resources: Register for an AWS Onboarding Webinar or an Azure Quickstart Lab- Learn the fundamentals from a Customer Succe...

Learn the basics with these resources: Register for an AWS Onboarding Webinar or an Azure Quickstart Lab- Learn the fundamentals from a Customer Success Engineer & get all your onboarding questions answered live.Started using Databricks, but have que...

  • 1071 Views
  • 0 replies
  • 5 kudos
User16835756816
by Valued Contributor
  • 1178 Views
  • 0 replies
  • 6 kudos

Welcome to Databricks! Here you will find resources for a successful onboarding experience. In this group you can ask quick questions and have them an...

Welcome to Databricks! Here you will find resources for a successful onboarding experience. In this group you can ask quick questions and have them answered by experts to unblock and accelerate your ramp up with Databricks.

  • 1178 Views
  • 0 replies
  • 6 kudos
magy
by New Contributor
  • 2950 Views
  • 3 replies
  • 0 kudos

Display, count and write commands stuck after 1st job

Hi, I have problems with displaying and saving a table in Databricks. Simple command can run for hours without any progress..Before that I am not doing any rocket science - code runs in less than a minute, I have one join at the end. I am using 7.3 ...

image
  • 2950 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @Just Magy​ ,what is your data source? what type of lazy transformation and actions do you have in your code? Do you partition your data? Please provide more details.

  • 0 kudos
2 More Replies
amitdatabricksc
by New Contributor II
  • 7988 Views
  • 2 replies
  • 0 kudos

AttributeError: 'NoneType' object has no attribute 'repartition'

I am using a framework and i have a query where i am doing,df = seg_df.select(*).write.option("compression", "gzip') and i am getting below error,When i don't do the write.option i am not getting below error. Why is it giving me repartition error. Wh...

  • 7988 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @AMIT GADHAVI​ ,could you provide more details? for example, what is your data source? how do you repartition? etc

  • 0 kudos
1 More Replies
eq
by New Contributor III
  • 5122 Views
  • 7 replies
  • 7 kudos

Resolved! Multi-task Jobs orchestration - simulating onComplete status

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason...

  • 5122 Views
  • 7 replies
  • 7 kudos
Latest Reply
User16844513407
New Contributor III
  • 7 kudos

Hi @Stefan V​ ,My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conve...

  • 7 kudos
6 More Replies
snoeprol
by New Contributor II
  • 5529 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

  • 5529 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

  • 2 kudos
2 More Replies
alonisser
by Contributor II
  • 2818 Views
  • 2 replies
  • 1 kudos

Resolved! Accessing confluent schema registry from databricks with scala fails with 401 (just for scala, not python, just in databricks)

Nore, I've tested with the same connection variable:locally with scala - works (via the same prod schema registry)in the cluster with python - worksin the cluster with scala - fails with 401 auth errordef setupSchemaRegistry(schemaRegistryUrl: String...

  • 2818 Views
  • 2 replies
  • 1 kudos
Latest Reply
alonisser
Contributor II
  • 1 kudos

Found the issue: it's the uber package mangling some dependency resolving, which I fixedAnother issue, is that currently you can't use 6.* branch of confluent schema registry client in databricks, because the avro version is different then the one su...

  • 1 kudos
1 More Replies
kjoth
by Contributor II
  • 19614 Views
  • 5 replies
  • 5 kudos

Resolved! Databricks default python libraries list & version

We are using data-bricks. How do we know the default libraries installed in the databricks & what versions are being installed. I have ran pip list, but couldn't find the pyspark in the returned list.

  • 19614 Views
  • 5 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 5 kudos

Hi @karthick J​ ,If you would like to see all the libraries installed in your cluster and the version, then I will recommend to check the "Environment" tab. In there you will be able to find all the libraries installed in your cluster.Please follow t...

  • 5 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels