cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

a2_ish
by New Contributor II
  • 3064 Views
  • 1 replies
  • 0 kudos

Where are delta lake files stored by given path?

I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...

  • 3064 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ankit Kumar​ :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...

  • 0 kudos
vicusbass
by New Contributor II
  • 19872 Views
  • 3 replies
  • 1 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 19872 Views
  • 3 replies
  • 1 kudos
Latest Reply
vicusbass
New Contributor II
  • 1 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 1 kudos
2 More Replies
labromb
by Contributor
  • 3497 Views
  • 2 replies
  • 0 kudos

Getting Databricks SQL dashboard to recognise change to an underlying query

Hi CommunityScenario:I have created a query in Databricks SQL, built a number of visualisations from it and published them to a dashboard. I then realise that I need to add another field to the underlying query that I want to then leverage as a dashb...

  • 3497 Views
  • 2 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

Can you take a screenshot ?

  • 0 kudos
1 More Replies
Tim_T
by New Contributor
  • 1241 Views
  • 0 replies
  • 0 kudos

Are training/ecommerce data tables available as CSVs?

The course "Apache Sparkâ„¢ Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, bu...

  • 1241 Views
  • 0 replies
  • 0 kudos
Hitesh_goswami
by New Contributor
  • 1606 Views
  • 1 replies
  • 0 kudos

Upgrading Ipython version without changing LTS version

I am using a specific Pydeeque function called ColumnProfilerRunner which is only supported with Spark 3.0.1, so I must use 7.3 LTS. Currently, I am trying to install "great_expectations" library on Python, which requires Ipython version==7.16.3, an...

  • 1606 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hitesh Goswami​ : please check if the below helps!To upgrade the Ipython version on a Databricks 7.3LTS cluster, you can follow these steps:Create a new library installation command using the Databricks CLI by running the following command in your l...

  • 0 kudos
JGil
by New Contributor III
  • 3828 Views
  • 5 replies
  • 0 kudos

Installing Bazel on databricks cluster

I am new to azure databricks and I want to install a library on a cluster and to do that I need to install bazel build tool first.I checked the site bazel but I am still not sure how to do it in databricks?I appriciate if any can help me and write me...

  • 3828 Views
  • 5 replies
  • 0 kudos
Latest Reply
Avinash_94
New Contributor III
  • 0 kudos

Databricks migrated over from the standard Scala Build Tool (SBT) to using Bazel to build, test and deploy our Scala code. Follow this doc https://www.databricks.com/blog/2019/02/27/speedy-scala-builds-with-bazel-at-databricks.html

  • 0 kudos
4 More Replies
afzi
by New Contributor II
  • 3264 Views
  • 1 replies
  • 1 kudos

Pandas DataFrame error when using to_csv

Hi Everyone, I would like to a Pandas Dataframe to /dbfs/FileStore/ using to_csv method.Usually it would just write the Dataframe to the path described but It has been giving me "FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStor...

  • 3264 Views
  • 1 replies
  • 1 kudos
Latest Reply
Avinash_94
New Contributor III
  • 1 kudos

f = open("/dbfs/mnt/blob/myNames.txt", "r")

  • 1 kudos
User16826992783
by New Contributor II
  • 1571 Views
  • 1 replies
  • 0 kudos

Why are some of my AWS EBS volumes in my workspace unencrypted?

I noticed that 30GB of my EBS volumes are unencrypted, is there a reason for this, and is there a way to encrypt these volumes?

  • 1571 Views
  • 1 replies
  • 0 kudos
Latest Reply
Abishek
Databricks Employee
  • 0 kudos

https://docs.databricks.com/security/keys/customer-managed-keys-storage-aws.html#introductionThe Databricks cluster’s EBS volumes (optional) - For Databricks Runtime cluster nodes and other compute resources in the Classic data plane, you can option...

  • 0 kudos
wb
by New Contributor II
  • 1514 Views
  • 1 replies
  • 2 kudos

Import paths using repos and installed libraries get confused

We use Azure Devops and Azure Databricks and have custom Python libraries. I placed my notebooks in the same repo and the structure is like this:mylib/ mylib/__init__.pyt mylib/code.py notebooks/ notebooks/job_notebook.py setup.pyAzure pipelines buil...

  • 1514 Views
  • 1 replies
  • 2 kudos
Latest Reply
Avinash_94
New Contributor III
  • 2 kudos

It looks for the configs locally i suppose if you can share requirements .txt i can elaborate

  • 2 kudos
User16826990884
by New Contributor III
  • 2827 Views
  • 1 replies
  • 1 kudos

Delta log retention

Is there an impact on performance if I increase the Delta log retention to 3000?

  • 2827 Views
  • 1 replies
  • 1 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 1 kudos

There will be no performance impact if you want to keep " Delta log retention to 3000". However, it will increase the storage cost so it's not advisable to use a large number until really needed for the business use cases.The default delta.logRetenti...

  • 1 kudos
Saurabh98290
by New Contributor II
  • 1183 Views
  • 1 replies
  • 2 kudos

Best Suited Language To Parallelize Notebook

I would like to know if we are writing code for parallel execution on notebook which language is best suited for that Python or Scala.

  • 1183 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16756723392
Databricks Employee
  • 2 kudos

You need to test in Python and scala based on the complexity one of it outperforms the other. In few cases Python was faster where as in other Scala. It is all about the efficiency of the code

  • 2 kudos
NimaiAhl
by New Contributor II
  • 1515 Views
  • 1 replies
  • 0 kudos

External Tables - SQL

To create external tables we need to use the location keyword and use the link for the storage location, in reference to that does the user need to have permission for the storage location if not then will we use storage credentials to provide the ac...

  • 1515 Views
  • 1 replies
  • 0 kudos
Latest Reply
Shikamaru
Databricks Employee
  • 0 kudos

Hi Nimai, That's partially right. You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage cre...

  • 0 kudos
Kotofosonline
by New Contributor III
  • 2781 Views
  • 1 replies
  • 2 kudos

Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by cla...

  • 2781 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16756723392
Databricks Employee
  • 2 kudos

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdCan you try this

  • 2 kudos
UmaMahesh1
by Honored Contributor III
  • 2789 Views
  • 1 replies
  • 2 kudos

Checkpoint issue when loading data from confluent kafka

I have a streaming notebook which fetches messages from confluent Kafka topic and loads them into adls. It is a streaming notebook with the trigger as continuous processing. Before loading the message (which is in Avro format), I'm flattening out the...

  • 2789 Views
  • 1 replies
  • 2 kudos
Latest Reply
Avinash_94
New Contributor III
  • 2 kudos

Best approach is to not to depend on Kafka’s commit mechanism! We can store processing result and message offset to external data store in the same database transaction. So, if the database transaction fails, both commit and processing will fail and ...

  • 2 kudos
Himanshu1
by New Contributor II
  • 2938 Views
  • 1 replies
  • 3 kudos

How to read XML files in delta live tables?

Even after maven library installation using the Auto installation.spark.read.option("rowTag", "tag").xml("dbfs:/mnt/dev/bronze/xml/fileName.xml")not working.

image.png
  • 2938 Views
  • 1 replies
  • 3 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 3 kudos

At present DLT does not support installing the maven library from the DLT pipeline. In the future this feature will come for sure so please wait for some time and keep checking data bricks runtime release docs https://docs.databricks.com/release-note...

  • 3 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels