cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pbarbosa154
by New Contributor III
  • 8263 Views
  • 7 replies
  • 2 kudos

Ingest Data into Databricks with Kafka

I am trying to ingest data into Databricks with Kafka. I have Kafka installed in a Virtual Machine where I already have the data I need in a Kafka Topic stored as json. In Databricks, I have the following code:```df = (spark.readStream .format("kaf...

  • 8263 Views
  • 7 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

you need to check the driver's logs when your streaming is initializing. Please check the log4j output for the driver's logs. If there is an issue connecting to your Kafka broker, you will be able to see it 

  • 2 kudos
6 More Replies
marcuskw
by Contributor II
  • 8385 Views
  • 1 replies
  • 1 kudos

Resolved! whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for "whenMatchedUpdate" and "whenNotMatchedInsert" logic. When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate...

  • 8385 Views
  • 1 replies
  • 1 kudos
pvm26042000
by New Contributor III
  • 7961 Views
  • 4 replies
  • 2 kudos

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

  • 7961 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sai1098
New Contributor II
  • 2 kudos

Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...

  • 2 kudos
3 More Replies
MUA
by New Contributor
  • 5532 Views
  • 2 replies
  • 1 kudos

OSError: [Errno 7] Argument list too long

Getting this error in Databricks and don't know how to solveOSError: [Errno 7] Argument list too long: '/dbfs/databricks/aaecz/dev/w000aaecz/etl-framework-adb/0.4.31-20230503.131701-1/etl_libraries/utils/datadog/restart_datadog.sh'if anyone can help 

  • 5532 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

@MUA  Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
1 More Replies
lawrence009
by Contributor
  • 15919 Views
  • 3 replies
  • 1 kudos

Troubleshooting Spill

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

IMG_1085.jpeg
  • 15919 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?

  • 1 kudos
2 More Replies
JKR
by Contributor
  • 6176 Views
  • 4 replies
  • 1 kudos

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.I want to know why this error occurred and how can I prevent it to happen again.And how to debug these errors in future ?  com.databricks.backend.commo...

  • 6176 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

@JKR Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
3 More Replies
mbejarano89
by New Contributor III
  • 2273 Views
  • 1 replies
  • 1 kudos

Resolved! Cloning content of Repos into shared Workspace

Hello, I have a git repository on Databricks with notebooks that are meant to be shared with other users. The reason these notebooks are in git as opposed to the "shared" workspace already is because they are to be continuously improved and need sepa...

  • 2273 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16539034020
Databricks Employee
  • 1 kudos

Hello,  Thanks for contacting Databricks Support.  I presume you're looking to transfer files from external repositories to Databricks workspace. I'm afraid currently there is no direct support on it. You may consider to use REST API which allows for...

  • 1 kudos
Bagger
by New Contributor II
  • 2090 Views
  • 1 replies
  • 0 kudos

Databricks Connect - driver error

Hi We are experiencing instability when executing queries using Databricks Connect.Sometimes we are unable to receive the full result set without encountering an error with the message "Driver is up but is not responsive..."When we run the same query...

Data Engineering
databricks connect
driver
  • 2090 Views
  • 1 replies
  • 0 kudos
RohitKulkarni
by Contributor II
  • 2618 Views
  • 2 replies
  • 3 kudos

Databricks Libaries

Hello Team,I am trying to use the below libraries in data bricks .But they are not supporting.import com.microsoft.spark.sqlanalyticsfrom com.microsoft.spark.sqlanalytics.Constants import ConstantsPlease advise the correct Libraries nameRegardsRohit

  • 2618 Views
  • 2 replies
  • 3 kudos
Latest Reply
User16752242622
Databricks Employee
  • 3 kudos

Hi @Rohit Kulkarni​ Yes, This module is not supported in databricks. May I know the use case behind using this library in databricks?FYI: To access Azure Synapse from Databricks using the Azure Synapse connector.Check the below doc: https://docs.data...

  • 3 kudos
1 More Replies
Soma
by Valued Contributor
  • 9015 Views
  • 11 replies
  • 2 kudos

spark streaming listener is lagging

We use pyspark streaming listener and it is lagging for 10 hrsThe data streamed in 10 am IST is logged at 10 PM IstCan someone explain how logging listener interface work

  • 9015 Views
  • 11 replies
  • 2 kudos
Latest Reply
michalforgusion
New Contributor II
  • 2 kudos

If you're experiencing lag in a Spark Streaming application, there are several potential reasons and corresponding solutions you can try:1. **Resource Allocation**:- **Insufficient Resources**: Make sure that you have allocated enough resources (CPU,...

  • 2 kudos
10 More Replies
Gilg
by Contributor II
  • 3207 Views
  • 2 replies
  • 0 kudos

APPLY_CHANGES with json data

Hi Team,I am building a DLT pipeline and planning to use APPLY_CHANGES from Bronze to Silver. In the bronze table, a column has a json value. This value contains questions and answers as key, value pair and can change depending on list of questions h...

  • 3207 Views
  • 2 replies
  • 0 kudos
SaraCorralLou
by New Contributor III
  • 38053 Views
  • 5 replies
  • 2 kudos

Resolved! Error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...

  • 38053 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sara Corral​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies
Kratik
by New Contributor III
  • 2444 Views
  • 0 replies
  • 0 kudos

Spark submit job running python file

I have a spark submit job which is running one python file called main.py.The other file is alert.py which is being imported in main.py.Also main.py is using multiple config files.Alert.py is passed in --py-files and other config files are passed as ...

Data Engineering
pyfiles
spark
submit
  • 2444 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 5602 Views
  • 3 replies
  • 2 kudos
  • 5602 Views
  • 3 replies
  • 2 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 2 kudos

User sessions automatically timeout after six hours of idle time. This is not configurable like @Kunal Gaurav​  mentioned. Please raise a feature request if you have a requirement to configure this.Now, in Azure you could configure AAD refresh token ...

  • 2 kudos
2 More Replies
alexkit
by New Contributor II
  • 3749 Views
  • 4 replies
  • 3 kudos

ASP1.2 Error create database in Spark Programming with Databricks training

I'm on Demo and Lab in Dataframes section. I've imported the dbc into my company cluster and has run "%run ./Includes/Classroom-Setup" successfully. When i run the 1st sql command %sql CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/m...

  • 3749 Views
  • 4 replies
  • 3 kudos
Latest Reply
KDOCKX
New Contributor II
  • 3 kudos

I had the same issue and solved it like this:In the includes folder, there is a reset notebook, run the first command, this unmounts all mounted databases.Go back to the ASP 1.2 notebook and run the %run ./Includes/Classroom-Setup codeblock.Then run ...

  • 3 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels