cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pbarbosa154
by New Contributor III
  • 6168 Views
  • 7 replies
  • 2 kudos

Ingest Data into Databricks with Kafka

I am trying to ingest data into Databricks with Kafka. I have Kafka installed in a Virtual Machine where I already have the data I need in a Kafka Topic stored as json. In Databricks, I have the following code:```df = (spark.readStream .format("kaf...

  • 6168 Views
  • 7 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

you need to check the driver's logs when your streaming is initializing. Please check the log4j output for the driver's logs. If there is an issue connecting to your Kafka broker, you will be able to see it 

  • 2 kudos
6 More Replies
marcuskw
by Contributor II
  • 4703 Views
  • 1 replies
  • 1 kudos

Resolved! whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for "whenMatchedUpdate" and "whenNotMatchedInsert" logic. When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate...

  • 4703 Views
  • 1 replies
  • 1 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 6017 Views
  • 5 replies
  • 0 kudos

How we can send databricks log to Azure Application Insight ?

Hi All,I want to send databricks logs to azure application insight.Is there any way we can do it ??Any blog or doc will help me.

  • 6017 Views
  • 5 replies
  • 0 kudos
Latest Reply
floringrigoriu
New Contributor II
  • 0 kudos

hi @Debayan in the  https://learn.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs. there is a github repository mentioned https://github.com/mspnp/spark-monitoring ? That repository is marked as  maintainance mode.  Just...

  • 0 kudos
4 More Replies
pvm26042000
by New Contributor III
  • 5785 Views
  • 4 replies
  • 2 kudos

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

  • 5785 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sai1098
New Contributor II
  • 2 kudos

Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...

  • 2 kudos
3 More Replies
MUA
by New Contributor
  • 4513 Views
  • 2 replies
  • 1 kudos

OSError: [Errno 7] Argument list too long

Getting this error in Databricks and don't know how to solveOSError: [Errno 7] Argument list too long: '/dbfs/databricks/aaecz/dev/w000aaecz/etl-framework-adb/0.4.31-20230503.131701-1/etl_libraries/utils/datadog/restart_datadog.sh'if anyone can help 

  • 4513 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

@MUA  Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
1 More Replies
lawrence009
by Contributor
  • 13064 Views
  • 3 replies
  • 1 kudos

Troubleshooting Spill

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

IMG_1085.jpeg
  • 13064 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?

  • 1 kudos
2 More Replies
JKR
by Contributor
  • 4550 Views
  • 4 replies
  • 1 kudos

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.I want to know why this error occurred and how can I prevent it to happen again.And how to debug these errors in future ?  com.databricks.backend.commo...

  • 4550 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

@JKR Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
3 More Replies
mbejarano89
by New Contributor III
  • 1617 Views
  • 1 replies
  • 1 kudos

Resolved! Cloning content of Repos into shared Workspace

Hello, I have a git repository on Databricks with notebooks that are meant to be shared with other users. The reason these notebooks are in git as opposed to the "shared" workspace already is because they are to be continuously improved and need sepa...

  • 1617 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16539034020
Databricks Employee
  • 1 kudos

Hello,  Thanks for contacting Databricks Support.  I presume you're looking to transfer files from external repositories to Databricks workspace. I'm afraid currently there is no direct support on it. You may consider to use REST API which allows for...

  • 1 kudos
Bagger
by New Contributor II
  • 1697 Views
  • 1 replies
  • 0 kudos

Databricks Connect - driver error

Hi We are experiencing instability when executing queries using Databricks Connect.Sometimes we are unable to receive the full result set without encountering an error with the message "Driver is up but is not responsive..."When we run the same query...

Data Engineering
databricks connect
driver
  • 1697 Views
  • 1 replies
  • 0 kudos
vgupta
by New Contributor II
  • 6865 Views
  • 5 replies
  • 4 kudos

DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds

Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...

DLT_ERROR DLT_Cluster_events
  • 6865 Views
  • 5 replies
  • 4 kudos
Latest Reply
Reddy-24
New Contributor II
  • 4 kudos

Hello @Debayan , I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this. 

  • 4 kudos
4 More Replies
RohitKulkarni
by Contributor II
  • 1957 Views
  • 2 replies
  • 3 kudos

Databricks Libaries

Hello Team,I am trying to use the below libraries in data bricks .But they are not supporting.import com.microsoft.spark.sqlanalyticsfrom com.microsoft.spark.sqlanalytics.Constants import ConstantsPlease advise the correct Libraries nameRegardsRohit

  • 1957 Views
  • 2 replies
  • 3 kudos
Latest Reply
User16752242622
Valued Contributor
  • 3 kudos

Hi @Rohit Kulkarni​ Yes, This module is not supported in databricks. May I know the use case behind using this library in databricks?FYI: To access Azure Synapse from Databricks using the Azure Synapse connector.Check the below doc: https://docs.data...

  • 3 kudos
1 More Replies
Soma
by Valued Contributor
  • 6290 Views
  • 11 replies
  • 2 kudos

spark streaming listener is lagging

We use pyspark streaming listener and it is lagging for 10 hrsThe data streamed in 10 am IST is logged at 10 PM IstCan someone explain how logging listener interface work

  • 6290 Views
  • 11 replies
  • 2 kudos
Latest Reply
michalforgusion
New Contributor II
  • 2 kudos

If you're experiencing lag in a Spark Streaming application, there are several potential reasons and corresponding solutions you can try:1. **Resource Allocation**:- **Insufficient Resources**: Make sure that you have allocated enough resources (CPU,...

  • 2 kudos
10 More Replies
Gilg
by Contributor II
  • 2723 Views
  • 2 replies
  • 0 kudos

APPLY_CHANGES with json data

Hi Team,I am building a DLT pipeline and planning to use APPLY_CHANGES from Bronze to Silver. In the bronze table, a column has a json value. This value contains questions and answers as key, value pair and can change depending on list of questions h...

  • 2723 Views
  • 2 replies
  • 0 kudos
SaraCorralLou
by New Contributor III
  • 28586 Views
  • 5 replies
  • 2 kudos

Resolved! Error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...

  • 28586 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sara Corral​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
4 More Replies
Kratik
by New Contributor III
  • 2056 Views
  • 0 replies
  • 0 kudos

Spark submit job running python file

I have a spark submit job which is running one python file called main.py.The other file is alert.py which is being imported in main.py.Also main.py is using multiple config files.Alert.py is passed in --py-files and other config files are passed as ...

Data Engineering
pyfiles
spark
submit
  • 2056 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels