cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16783853906
by Contributor III
  • 1862 Views
  • 5 replies
  • 5 kudos

Resolved! Update code for a streaming job in Production

How to update a streaming job in production with minimal/no downtime when there are significant code changes that may not be compatible with the existing checkpoint state to resume the stream processing?

  • 1862 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.MA Health Connector

  • 5 kudos
4 More Replies
Megan05
by New Contributor III
  • 1270 Views
  • 4 replies
  • 1 kudos

Trying to write to S3 bucket but executed code not showing any progress

I am trying to write data from databricks to an S3 bucket but when I submit the code, it runs and runs and does not make any progress. I am not getting any errors and the logs don't seem to recognize I've submitted anything. The cluster also looks un...

image
  • 1270 Views
  • 4 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Can you please check the driver log4j to see what is happening?

  • 1 kudos
3 More Replies
Dicer
by Valued Contributor
  • 11055 Views
  • 13 replies
  • 13 kudos

Resolved! Failed to convert Spark.sql to Pandas Dataframe using .toPandas()

I wrote the following code:​data = spark.sql (" SELECT A_adjClose, AA_adjClose, AAL_adjClose, AAP_adjClose, AAPL_adjClose FROM deltabase.a_30min_delta, deltabase.aa_30min_delta, deltabase.aal_30min_delta, deltabase.aap_30min_delta ,deltabase.aapl_30m...

  • 11055 Views
  • 13 replies
  • 13 kudos
Latest Reply
Dicer
Valued Contributor
  • 13 kudos

I just discovered a solution.Today, I opened Azure Databricks. When I imported python libraries. Databricks told me that toPandas() was deprecated and it suggested me to use toPandas.The following solution works: Use toPandas instead of toPandas() da...

  • 13 kudos
12 More Replies
Andyfcx
by New Contributor
  • 1363 Views
  • 3 replies
  • 2 kudos

Resolved! Is it possible to clone a private repository and use it in databricks Repos?

As title, I need to clone code from my private git repo, and use it in my notebook, I do something likedef cmd(command, cwd=None): process = subprocess.Popen(command.split(), stdout=subprocess.PIPE, cwd=cwd) output, error = process.communicate(...

  • 1363 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Andy Huang​ , Just a friendly follow-up. Do you still need help, or @Prabakar Ammeappin​ 's response help you to find the solution? Please let us know.

  • 2 kudos
2 More Replies
Jackie
by New Contributor II
  • 4290 Views
  • 4 replies
  • 6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

  • 4290 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hemant
Valued Contributor II
  • 6 kudos

@Jackie Chan​ , What's the data size you want to copy? If it's bigger, then use ADF.

  • 6 kudos
3 More Replies
Syed1
by New Contributor III
  • 8546 Views
  • 9 replies
  • 13 kudos

Resolved! Python Graph not showing

Hi , I have run this code import matplotlib.pyplot as pltimport numpy as npplt.style.use('bmh')%matplotlib inlinex = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])p= plt.scatter(x, y)display command r...

  • 8546 Views
  • 9 replies
  • 13 kudos
Latest Reply
User16725394280
Contributor II
  • 13 kudos

@Syed Ubaid​  i tried with 7.3 LTS and its works fine.

  • 13 kudos
8 More Replies
p42af
by New Contributor
  • 3086 Views
  • 4 replies
  • 1 kudos

Resolved! rdd.foreachPartition() does nothing?

I expected the code below to print "hello" for each partition, and "world" for each record. But when I ran it the code ran but had no print outs of any kind. No errors either. What is happening here?%scala   val rdd = spark.sparkContext.parallelize(S...

  • 3086 Views
  • 4 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Is it lazy evaluated so you need to trigger action I guess

  • 1 kudos
3 More Replies
s_plank
by New Contributor III
  • 1860 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks-Connect shows different partitions than Databricks for the same delta table

Hello,here is a small code-snippet:from pyspark.sql import SparkSession spark = SparkSession.builder.appName('example_app').getOrCreate()   spark.sql('SHOW PARTITIONS database.table').show() The output inside the Databricks-Notebook:+-------------+--...

  • 1860 Views
  • 6 replies
  • 5 kudos
Latest Reply
s_plank
New Contributor III
  • 5 kudos

Hi @Jose Gonzalez​ ,yes the SQL-Connector works fine. Thank you!

  • 5 kudos
5 More Replies
Databricks_7045
by New Contributor III
  • 1330 Views
  • 4 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 1330 Views
  • 4 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2572 Views
  • 9 replies
  • 3 kudos

Resolved! Tring to create incremental pipeline but fails when I try to use outputMode "update"

def upsertToDelta(microBatchOutputDF, batchId): microBatchOutputDF.createOrReplaceTempView("updates")   microBatchOutputDF._jdf.sparkSession().sql(""" MERGE INTO old o USING updates u ON u.id = o.id WHEN MATCHED THEN UPDATE SE...

  • 2572 Views
  • 9 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Delta table/file version is too old. Please try to upgrade it as described here https://docs.microsoft.com/en-us/azure/databricks/delta/versioning​

  • 3 kudos
8 More Replies
wyzer
by Contributor II
  • 5589 Views
  • 3 replies
  • 3 kudos

Resolved! Why database/table names are in lower case ?

Hello,When I run this code :CREATE DATABASE BackOfficeI see the database like this :backofficeWhy everything is in lower case ?Is it possible to configure Databricks in order to keep the real name ?Thanks.

  • 5589 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

It is managed by hive metastore as you can put it in different databases is saver this way as some database are Case Sensitive and some not (you can easily test it with standard WHERE syntax).Probably you could change it with some hive settings but i...

  • 3 kudos
2 More Replies
JK2021
by New Contributor III
  • 2395 Views
  • 5 replies
  • 3 kudos

Resolved! Exception handling in Databricks

We are planning to customise code on Databricks to call Salesforce bulk API 2.0 to load data from databricks delta table to Salesforce.My question is : All the exception handling, retries and all around Bulk API can be coded explicitly in Data bricks...

  • 2395 Views
  • 5 replies
  • 3 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 3 kudos

Hi @Jazmine Kochan​ , I haven't tried Salesforce bulk API 2.0 to load data. But in theory, it should be fine.

  • 3 kudos
4 More Replies
Orianh
by Valued Contributor II
  • 4985 Views
  • 7 replies
  • 3 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

  • 4985 Views
  • 7 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@orian hindi​ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

  • 3 kudos
6 More Replies
Labels