cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

624398
by New Contributor III
  • 2852 Views
  • 7 replies
  • 0 kudos

is there a read only option in jdbc driver?

Is there a "read only" option when using databricks sql using jdbc driver?I'm looking for an equivalent to this:https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-configuration-options.html#jdbc20-readonly-optionThanks!

  • 2852 Views
  • 7 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Hi @Nativ Issac​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 0 kudos
6 More Replies
shelly
by New Contributor
  • 1014 Views
  • 2 replies
  • 0 kudos

take() ooperation is throwing error

Traceback (most recent call last): File "/usr/local/spark/python/pyspark/serializers.py", line 458, in dumps return cloudpickle.dumps(obj, pickle_protocol) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/spark/python/pyspa...

  • 1014 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Shelly Bhardwaj​ :The error message you provided seems to be incomplete, as it only shows the traceback of a serialization error. Can you provide the full error message or describe the issue in more detail?Regarding the code you provided, it looks c...

  • 0 kudos
1 More Replies
KayCon86
by New Contributor
  • 1762 Views
  • 3 replies
  • 0 kudos

Creating a Api links by url & list from a saved df

I have 106,000 + api's I need to call, so instead of calling them one by one I would like to create a loop as I have the list of location Id's which I've called from there api locations list and these will sit at the end of the url to get more info o...

  • 1762 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Kay Connolly​ :It looks like you are trying to concatenate a string with a column object, which is causing the error. You need to convert the column object to a string first before concatenating it to the URL. Here's a modified code snippet that sho...

  • 0 kudos
2 More Replies
Anonymous
by Not applicable
  • 1245 Views
  • 2 replies
  • 2 kudos
  • 1245 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@ppatel:If you are using insertInto with overwrite=True on a Hive external table in PySpark, it might not work as expected. This is because Hive external tables are not managed by Hive and the table data is stored externally. When you use overwrite=T...

  • 2 kudos
1 More Replies
maartenvr
by New Contributor III
  • 17040 Views
  • 9 replies
  • 2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

  • 17040 Views
  • 9 replies
  • 2 kudos
Latest Reply
maartenvr
New Contributor III
  • 2 kudos

No solution yet:Hi @Suteja Kanuri​ ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

  • 2 kudos
8 More Replies
Jyo777
by Contributor
  • 3535 Views
  • 4 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 3535 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @Jyoti j​​, We haven't heard from you since the last response from @Suteja Kanuri​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others....

  • 4 kudos
3 More Replies
darthdickhead
by New Contributor III
  • 6374 Views
  • 5 replies
  • 3 kudos

Best way to install and manage a private Python package that has a continuously updating Wheel

I'm trying to setup a Workspace Library that is used internally within our organization. This is a Python package, where the source is available on a private GitHub repository, and not accessible on PyPi or the wider internet / surface web. I managed...

  • 6374 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Eshwaran Venkat​ ​​, We haven't heard from you since the last response from @Suteja Kanuri​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpfu...

  • 3 kudos
4 More Replies
YSDPrasad
by New Contributor III
  • 3700 Views
  • 5 replies
  • 2 kudos

Resolved! Facing issue While executing DDL and DML queries in 12.0 cluster runtime version.

Hi all,Currently we are using Driver: Standard_D32s_v3 · Workers: Standard_D32_v3 · 2-8 workers · 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11) cluster. For this we are running 24/7 streaming notebook on trigger of every minute and 5...

cluster metrics
  • 3700 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Someswara Durga Prasad Yaralgadda​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love t...

  • 2 kudos
4 More Replies
Chhaya
by New Contributor III
  • 2896 Views
  • 6 replies
  • 2 kudos

Using great expectations with autolaoder

Hi everyone ,I have implemented a data pipeline using autoloader bronze-->silver-->gold .now while I do this I want to perform some data quality checks , and for that I'm using great expectations library.However I'm stuck with below error when trying...

  • 2896 Views
  • 6 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Chhaya Vishwakarma​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fe...

  • 2 kudos
5 More Replies
ckwan48
by New Contributor III
  • 1726 Views
  • 4 replies
  • 1 kudos

Different results in Databricks using SARIMAX

In Databricks, using 11.3 ML runtime give different results when using general purpose vs memory-optimized workers. I used SARIMAX and to forecast the results but I’m getting different results when I change the driver and worker types to this options...

  • 1726 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kevin Kim​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 1 kudos
3 More Replies
brendanc19
by New Contributor III
  • 2620 Views
  • 5 replies
  • 2 kudos

Resolved! Does cancelling a job run rollback any actions performed by query plan?

If I were to stop a rather large job run, say half way thru execution, will any actions performed on our Delta tables persist or will they be rolled back?Are there any other risks that I need to be aware of in terms of cancelling a job run half way t...

  • 2620 Views
  • 5 replies
  • 2 kudos
Latest Reply
brendanc19
New Contributor III
  • 2 kudos

Will do, thank you Vartika

  • 2 kudos
4 More Replies
Sid1805
by New Contributor II
  • 5199 Views
  • 5 replies
  • 0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

  • 5199 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siddharth Krishna​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

  • 0 kudos
4 More Replies
Arnold_Souza
by New Contributor III
  • 4974 Views
  • 4 replies
  • 1 kudos

Connect Databricks to a database protected by a firewall

We a facing a situation and I would like to understand from the Databricks side what is the best practice regarding that. Question: Is it possible to have a cluster with a fixed Global IP on Databricks?DetailsWe have a vendor that has a SQL Server da...

Diagram
  • 4974 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Arnold Souza​ If you file a support to Azure support they can help customize the Vnet by unlocking it as the Azure Databricks resources are deployed in a managed resource group. Your plan B also should be the way to go if option 1 does not work as e...

  • 1 kudos
3 More Replies
Mado
by Valued Contributor II
  • 2284 Views
  • 4 replies
  • 0 kudos

Medallion architecture, how to update Gold tables?

Assume that I have a data source that is ingested to a few bronze tables, and transformed to a silver table. Ans next, a gold table is created by aggregating the silver table. If new records arrive in the data source, bronze and silver tables are upd...

  • 2284 Views
  • 4 replies
  • 0 kudos
Latest Reply
Mado
Valued Contributor II
  • 0 kudos

Hi @Vidula Khanna​ The answer didn't fit my question. In the case of using Merge, I found a good article here:https://medium.com/@avnishjain22/simplify-optimise-and-improve-your-data-pipelines-with-incremental-etl-on-the-lakehouse-61b279afadea

  • 0 kudos
3 More Replies
hv
by New Contributor
  • 2989 Views
  • 1 replies
  • 0 kudos

Error-"'Column' object is not callable".

I am trying to lowercase one of the columns(A_description) of a dataframe(df) and getting the error-"'Column' object is not callable".Code: def new_desc():  for line in df:    line = df['A_description'].map(str.lower)  return line new_desc()Have used...

  • 2989 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Himadri Verma​ Hope this below suggestion will help you in pyspark.Please let me know if you are looking for something elseHappy Learning!!

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels