cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

scvbelle
by New Contributor III
  • 4812 Views
  • 3 replies
  • 3 kudos

Resolved! DLT failure: ABFS does not allow files or directories to end with a dot

In my DLT pipeline outlined below which generically cleans identifier tables, after successfully creating initial streaming tables from the append-only sources, fails when trying to create the second cleaned tables witht the following:It'**bleep** cl...

Data Engineering
abfss
azure
dlt
engineering
  • 4812 Views
  • 3 replies
  • 3 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 3 kudos

Hi @scvbelle The error message you're seeing is caused by an IllegalArgumentException error due to the restriction in Azure Blob File System (ABFS) that does not allow files or directories to end with a dot. This error is thrown by the trailingPeriod...

  • 3 kudos
2 More Replies
kinsun
by New Contributor II
  • 25065 Views
  • 5 replies
  • 2 kudos

Resolved! DBFS and Local File System Doubts

Dear Databricks Expert,I got some doubts when dealing with DBFS and Local File System.Case01: Copy a file from ADLS to DBFS. I am able to do so through the below python codes:#spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.a...

  • 25065 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @KS LAU​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 2 kudos
4 More Replies
Madison
by New Contributor II
  • 12542 Views
  • 1 replies
  • 0 kudos

AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme

I am trying to follow along Apache Spark Programming training module where the instructor creates events table from a parquet file like this:%sql CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.par...

Data Engineering
Databricks SQL
  • 12542 Views
  • 1 replies
  • 0 kudos
Latest Reply
Madison
New Contributor II
  • 0 kudos

@Retired_mod Thanks for your response. I didn't provide cloud file system scheme in the path while creating the table using DataFrame API, but I was still able to create the table.  %python # File location and type file_location = "/mnt/training/ecom...

  • 0 kudos
Nino
by Contributor
  • 9770 Views
  • 8 replies
  • 5 kudos

Resolved! Where in Hive Metastore can the s3 locations of Databricks tables be found?

I have a few Databricks clusters, some share a single Hive Metastore (HMS), call them PROD_CLUSTERS, and an additional cluster, ADHOC_CLUSTER, which has its own HMS. All my data is stored in S3, as Databricks delta tables: PROD_CLUSTERS have read-wri...

Data Engineering
HMS
metastore
  • 9770 Views
  • 8 replies
  • 5 kudos
Latest Reply
Nino
Contributor
  • 5 kudos

Something went wrong there, here's the last sentence: I expected "location" will be the s3 path but it's not always so (elaborated in the original posting). Thanks! 

  • 5 kudos
7 More Replies
sourander
by New Contributor III
  • 18209 Views
  • 13 replies
  • 7 kudos

Resolved! Protobuf deserialization in Databricks

Hi,​Let's assume I have these things:Binary column containing protobuf-serialized dataThe .proto file including message definition​What different approaches have Databricks users chosen to deserialize the data? Python is the programming language that...

  • 18209 Views
  • 13 replies
  • 7 kudos
Latest Reply
Amou
Databricks Employee
  • 7 kudos

We've now added a native connector with parsing directly with Spark Dataframes. https://docs.databricks.com/en/structured-streaming/protocol-buffers.htmlfrom pyspark.sql.protobuf.functions import to_protobuf, from_protobuf schema_registry_options = ...

  • 7 kudos
12 More Replies
mjbobak
by Contributor
  • 31384 Views
  • 5 replies
  • 9 kudos

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-i...

  • 31384 Views
  • 5 replies
  • 9 kudos
Latest Reply
amitca71
Contributor II
  • 9 kudos

Hi,i 'm facing similiar issue, when deploying via dbx.I have an helper notebook, that when executing it via jobs works fine (without any includes)while i deploy it via dbx (to same cluster), the helper notebook results withdbutils.fs.ls(path)NameEr...

  • 9 kudos
4 More Replies
jagger9919
by New Contributor II
  • 9701 Views
  • 6 replies
  • 5 kudos

Unable to login to community edition

Hello there,I have successfully created a databricks account and went to login to the community edition with the exact same login credentials as my account, but it tells me that the email/password are invalid. I can login with these same exact creden...

  • 9701 Views
  • 6 replies
  • 5 kudos
Latest Reply
xxl4tomxu98
New Contributor III
  • 5 kudos

I have the same issue, and I have previously logged in and only a few months ago, it started the problem. Can anyone help?

  • 5 kudos
5 More Replies
lnights
by New Contributor II
  • 6180 Views
  • 5 replies
  • 2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

transactions in azure storage
  • 6180 Views
  • 5 replies
  • 2 kudos
Latest Reply
PetePP
New Contributor II
  • 2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

  • 2 kudos
4 More Replies
Gustavo_Az
by Contributor
  • 16460 Views
  • 1 replies
  • 1 kudos

Resolved! Access Azure KeyVault from all executors in Databricks

HelloI suspect that this can´t be done out of the box and want to know a way of doing this. I am trying to do it without success. So far I have tried this:Based on this link, I have created a Class and an object (companion and not, both ways) for cip...

Data Engineering
KeyVault
Scala
  • 16460 Views
  • 1 replies
  • 1 kudos
Latest Reply
Gustavo_Az
Contributor
  • 1 kudos

I found a workaround for the problem to be able to use the secrets from the KeyVault in all the execturos. I only tested so far this in the notebooks, I want to try later in a JAR job.First here is a link to the official documentation that highlights...

  • 1 kudos
Policepatil
by New Contributor III
  • 2705 Views
  • 0 replies
  • 0 kudos

Writing data to RDS table taking more time

Hi, Cluster Configuration details:RDS Configuration Details:I have 30 files, each file having 540000 recordsI read all files and created one dataframe.When i write dataframe(16,200,000 records) to a table it take more time nearly more than 1 hour (so...

Policepatil_0-1693458570195.png Policepatil_1-1693458629845.png
  • 2705 Views
  • 0 replies
  • 0 kudos
RC
by Contributor
  • 3879 Views
  • 1 replies
  • 1 kudos

Error while creating table with Glue catalog

Hi, I have Databricks cluster earlier connected to hive metastore and we have started migrating to Glue catalog.I'm facing an issue while creating table,Path must be absolute: <table-name>-__PLACEHOLDER__We have provided full access to glue and s3 in...

  • 3879 Views
  • 1 replies
  • 1 kudos
erigaud
by Honored Contributor
  • 2406 Views
  • 1 replies
  • 0 kudos

Cannot update databricks repos in Devops Pipeline

Hello,I am creating a Devops pipeline to run unit tests in my notebooks using the Nutter library. When a commit is pushed to a branch, I have a pipeline that triggers, and it should update my repo in a folder Staging (/Repos/Staging/MyRepo)For that I...

  • 2406 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Databricks Employee
  • 0 kudos

Hello, Thanks for contacting Databricks Support.The error message indicates that there is an issue with the URL or endpoint you are using with the Databricks repos update command. It appears that one or more required parameters are not being set corr...

  • 0 kudos
ywaihong6123
by New Contributor
  • 6624 Views
  • 1 replies
  • 0 kudos

Libraries Not Working on Shared Cluster 13.3 LTS

I am facing this error while installing the spark-excel library into the cluster. Does anyone know how to add library into artifact allowlist?Jars and Maven Libraries on Shared Clusters must be on the allowlist. Failed Libraries: com.crealytics:spark...

  • 6624 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

You can add the jar followed below steps:How to add items to the allowlistYou can add items to the allowlist with Data Explorer or the REST API.To open the dialog for adding items to the allowlist in Data Explorer, do the following:In your Databricks...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels