cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jose_gonzalez
by Databricks Employee
  • 6972 Views
  • 1 replies
  • 0 kudos

Resolved! How can I read a specific Delta table part file?

is there a way to read a specific part off a delta table? When I try to read the parquet file as parquet I get an error in the notebook that I’m using the incorrect format as it’s part of a delta table. I just want to read a single Parquet file, not ...

  • 6972 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Disable Delta format to read as Parquet you need to set to false the following Spark settings:>> SET spark.databricks.delta.formatCheck.enabled=false OR>> spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")its not recommended to re...

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 2129 Views
  • 1 replies
  • 0 kudos

Resolved! should I run ANALYZE TABLE on Delta tables?

I would like to know if it recommended to run Analyze table on Delta tables or not. If not, why?

  • 2129 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can run  ANALYZE TABLE  on Delta tables only on Databricks Runtime 8.3 and above. For more details please refer to the docs: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-analyze-table.html

  • 0 kudos
User16753724663
by Valued Contributor
  • 2050 Views
  • 1 replies
  • 1 kudos

Download private repo from GitHub Enterprise in Databricks notebook

We are trying to download our repository which is hosted on GitHub Enterprise to use its python libraries in our notebooks.Earlier we had issues with downloading our repository using the repos feature in Databricks platform since only notebooks can b...

  • 2050 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753724663
Valued Contributor
  • 1 kudos

To fix the issue, we need to pass the token in the header itself git clone https://<token>:x-oauth-basic@github.com/owner/repo.gitExample:%sh   git clone https://<token>@github.com/darshanbargal4747/databricks.git

  • 1 kudos
User16753724663
by Valued Contributor
  • 1249 Views
  • 1 replies
  • 0 kudos

Unable to use on prem Mysql server as we are not able to resolve the hostname

while connecting from notebook, it returns the error unable to resolve name.

  • 1249 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

Since we are unable to resolve hostname, it point towards the DNS issue. We can use custom dns using init script and add in the cluster:%scala dbutils.fs.put("/databricks/<directory>/dns-masq.sh";,""" #!/bin/bash #####################################...

  • 0 kudos
User16783853906
by Contributor III
  • 1079 Views
  • 0 replies
  • 0 kudos

Verify auto-optimize from delta history

How can I verify if auto-optimize is activated from Delta history for the two scenarios below? Will the DESC history show the details in both the cases? 1). Auto-optimize set on the table properties2). Auto-optimize enabled in spark sessionP.S. - I'm...

  • 1079 Views
  • 0 replies
  • 0 kudos
User16753724663
by Valued Contributor
  • 1525 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to create a token while deploying the workspace using terraform

we have automated out deployment with python API's however we have been caught in a situation which we cannot yet solve.We are looking to collect a token during the first deployment within the environment. currently our API requires a token.Is there...

  • 1525 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

We can use below API to create a token and use the username and passwordcurl -X POST -u "admin_email":"xxxx" https://host/api/2.0/token/create -d' { "lifetime_seconds": 100, "comment": "this is an example token" }'

  • 0 kudos
User16826992666
by Valued Contributor
  • 9813 Views
  • 1 replies
  • 1 kudos

Resolved! Can you import a Jupyter notebook to a Databricks workspace?

Also curious if you can export a notebook created in Databricks as a Jupyter notebook

  • 9813 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826992666
Valued Contributor
  • 1 kudos

Yes, the .ipynb format is a supported file type which can be imported to a Databricks workspace. Note that some special configurations may need to be adjusted to work in the Databricks environment. Additional accepted file formats which can be import...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1979 Views
  • 1 replies
  • 0 kudos

Resolved! What should I be looking for when evaluating the performance of a Spark job?

Where do I start when starting performance tuning of my queries? Are there particular things I should be looking out for?

  • 1979 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Databricks Employee
  • 0 kudos

Few things on top of my mind.1) Check Spark UI and check which stage is taking more time.2) Check for data skewing3) Data skew can severely downgrade performance of queries, Spark SQL accepts skew hints in queries, also make sure to use proper join h...

  • 0 kudos
User16826992666
by Valued Contributor
  • 803 Views
  • 1 replies
  • 0 kudos

Does Databricks SQL support any kind of custom visuals?

Wondering if I can make any kind of custom visuals or are the ones that come built in the only options?

  • 803 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

At this time the only available visuals are the ones that are included in the Databricks SQL environment. There is no way to import or create custom visuals.

  • 0 kudos
User16826992666
by Valued Contributor
  • 847 Views
  • 1 replies
  • 0 kudos
  • 847 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

No you do not. Although Delta is the default file format when writing data using Databricks, any file type supported by spark can be used when writing data.

  • 0 kudos
User16826992666
by Valued Contributor
  • 2500 Views
  • 1 replies
  • 0 kudos

What happens if a spot instance worker is lost in the middle of a query?

Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs

  • 2500 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...

  • 0 kudos
User16826992666
by Valued Contributor
  • 962 Views
  • 1 replies
  • 0 kudos
  • 962 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

No, the HTML is a point-in-time snapshot of the notebook from when you perform the export. Visuals and data results do not update in the HTML when updates are made on the notebook still in the workspace.

  • 0 kudos
User16826992666
by Valued Contributor
  • 802 Views
  • 1 replies
  • 0 kudos

Which MLlib library am I supposed to use - pyspark.mllib or pyspark.ml?

Both of these libraries seem to be available and they are both for MLlib, how do I know which one to use?

  • 802 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

The pyspark.mllib library is built for RDD's, and the pyspark.ml library is built for Dataframes. The RDD-based mllib library is currently in maintenance mode, while the Dataframe library will continue to receive updates and active development. For t...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels