cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kratik
by New Contributor III
  • 966 Views
  • 1 replies
  • 0 kudos

Spark submit job running python file

I have a spark submit job which is running one python file called main.py.The other file is alert.py which is being imported in main.py.Also main.py is using multiple config files.Alert.py is passed in --py-files and other config files are passed as ...

Data Engineering
pyfiles
spark
submit
  • 966 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Kratik, To run the Spark submit job in Databricks and pass the --py-files and --files options, you can use the dbx command-line tool.

  • 0 kudos
TimB
by New Contributor II
  • 2742 Views
  • 1 replies
  • 0 kudos

Create external table using multiple paths/locations

I want to create an external table from more than a single path. I have configured my storage creds and added an external location, and I can successfully create a table using the following code;create table test.base.Example using csv options ( h...

  • 2742 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @TimB, you can import data from multiple paths using wildcards or similar patterns when creating an external table in Databricks. To import data from multiple paths using wildcards, you can modify the location parameter in the CREATE TABLE stateme...

  • 0 kudos
marcuskw
by Contributor
  • 1453 Views
  • 2 replies
  • 1 kudos

Resolved! whenNotMatchedBySourceUpdate ConcurrentAppendException Partition

ConcurrentAppendException requires a good partitioning strategy, here my logic works without fault for "whenMatchedUpdate" and "whenNotMatchedInsert" logic. When using "whenNotMatchedBySourceUpdate" however it seems that the condition doesn't isolate...

  • 1453 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @marcuskw, Based on the provided information and the given code snippet, it seems that the condition in the whenNotMatchedBySourceUpdate The clause does not isolate the specific partition in the Delta table. This can lead to a ConcurrentAppendExc...

  • 1 kudos
1 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2719 Views
  • 5 replies
  • 0 kudos

How we can send databricks log to Azure Application Insight ?

Hi All,I want to send databricks logs to azure application insight.Is there any way we can do it ??Any blog or doc will help me.

  • 2719 Views
  • 5 replies
  • 0 kudos
Latest Reply
floringrigoriu
New Contributor II
  • 0 kudos

hi @Debayan in the  https://learn.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs. there is a github repository mentioned https://github.com/mspnp/spark-monitoring ? That repository is marked as  maintainance mode.  Just...

  • 0 kudos
4 More Replies
pvm26042000
by New Contributor III
  • 1728 Views
  • 4 replies
  • 2 kudos

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

benefit of using vectorized pandas UDFs instead of the standard Pyspark UDFs?

  • 1728 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sai1098
New Contributor II
  • 2 kudos

Vectorized Pandas UDFs offer improved performance compared to standard PySpark UDFs by leveraging the power of Pandas and operating on entire columns of data at once, rather than row by row.They provide a more intuitive and familiar programming inter...

  • 2 kudos
3 More Replies
pranavyadavbugy
by New Contributor
  • 1710 Views
  • 2 replies
  • 0 kudos

Regarding Discount on certifications for students

Hi team,I'm a student is there any student discounts for students on certification if yes please let me know.Thanks

  • 1710 Views
  • 2 replies
  • 0 kudos
Latest Reply
FeliciaWilliam
New Contributor III
  • 0 kudos

Exciting news for students! Enjoy special discounts on certifications. If you need more study resources check out Chegg Study alternatives on https://edureviewer.com/sites-like-chegg-study/ for extra support in the middle of your academic journey. It...

  • 0 kudos
1 More Replies
User15787040559
by New Contributor III
  • 1035 Views
  • 2 replies
  • 0 kudos

How to translate Apache Pig FILTER statement to Spark?

If you have the following Apache Pig FILTER statement:XCOCD_ACT_Y = FILTER XCOCD BY act_ind == 'Y';the equivalent code in Apache Spark is:XCOCD_ACT_Y_DF = (XCOCD_DF .filter(col("act_ind") == "Y"))

  • 1035 Views
  • 2 replies
  • 0 kudos
Latest Reply
FeliciaWilliam
New Contributor III
  • 0 kudos

Translating an Apache Pig FILTER statement to Spark requires understanding the differences in syntax and functionality between the two processing frameworks. While both aim to filter data, Spark uses a different syntax and approach, typically involvi...

  • 0 kudos
1 More Replies
narvinya
by New Contributor
  • 1540 Views
  • 1 replies
  • 0 kudos

Resolved! What is the best approach to use Delta tables without Unity Catalog enabled?

Hello!I would like to work with delta tables outside of Databricks UI notebook. I know that the best option would be to use databricks-connect but I don’t have Unity Catalog enabled.What would be the most effective way to do so? I know that via JDBC ...

  • 1540 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @narvinya,  • Delta tables can be accessed outside of Databricks UI notebook without using Databricks-connect or Unity Catalog. • Three options are available for working with Delta tables outside of Databricks UI notebook:  1. Using JDBC: Read and...

  • 0 kudos
MUA
by New Contributor
  • 2157 Views
  • 2 replies
  • 1 kudos

OSError: [Errno 7] Argument list too long

Getting this error in Databricks and don't know how to solveOSError: [Errno 7] Argument list too long: '/dbfs/databricks/aaecz/dev/w000aaecz/etl-framework-adb/0.4.31-20230503.131701-1/etl_libraries/utils/datadog/restart_datadog.sh'if anyone can help 

  • 2157 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

@MUA  Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
1 More Replies
lawrence009
by Contributor
  • 1177 Views
  • 4 replies
  • 2 kudos

Troubleshooting Spill

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

IMG_1085.jpeg
  • 1177 Views
  • 4 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?

  • 2 kudos
3 More Replies
JKR
by New Contributor III
  • 1686 Views
  • 4 replies
  • 1 kudos

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.I want to know why this error occurred and how can I prevent it to happen again.And how to debug these errors in future ?  com.databricks.backend.commo...

  • 1686 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

@JKR Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 1 kudos
3 More Replies
mbejarano89
by New Contributor III
  • 630 Views
  • 1 replies
  • 1 kudos

Resolved! Cloning content of Repos into shared Workspace

Hello, I have a git repository on Databricks with notebooks that are meant to be shared with other users. The reason these notebooks are in git as opposed to the "shared" workspace already is because they are to be continuously improved and need sepa...

  • 630 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16539034020
Contributor II
  • 1 kudos

Hello,  Thanks for contacting Databricks Support.  I presume you're looking to transfer files from external repositories to Databricks workspace. I'm afraid currently there is no direct support on it. You may consider to use REST API which allows for...

  • 1 kudos
Bagger
by New Contributor II
  • 681 Views
  • 2 replies
  • 1 kudos

Databricks Connect - driver error

Hi We are experiencing instability when executing queries using Databricks Connect.Sometimes we are unable to receive the full result set without encountering an error with the message "Driver is up but is not responsive..."When we run the same query...

Data Engineering
databricks connect
driver
  • 681 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Bagger, - Error message: "Driver is up but is not responsive..." - Potential causes:   - Unreachable Cluster       - Check workspace instance name and cluster ID       - Verify environment variables on the local development machine   - Python Ve...

  • 1 kudos
1 More Replies
vgupta
by New Contributor II
  • 2945 Views
  • 5 replies
  • 4 kudos

DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds

Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...

DLT_ERROR DLT_Cluster_events
  • 2945 Views
  • 5 replies
  • 4 kudos
Latest Reply
Reddy-24
New Contributor II
  • 4 kudos

Hello @Debayan , I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this. 

  • 4 kudos
4 More Replies
alonisser
by Contributor
  • 3049 Views
  • 6 replies
  • 3 kudos

Resolved! Changing shuffle.partitions with spark.conf in a spark stream - isn't respected even after a checkpoint

Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...

  • 3049 Views
  • 6 replies
  • 3 kudos
Latest Reply
Leszek
Contributor
  • 3 kudos

@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...

  • 3 kudos
5 More Replies
Labels
Top Kudoed Authors