cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Arnold_Souza
by New Contributor III
  • 5080 Views
  • 4 replies
  • 1 kudos

Connect Databricks to a database protected by a firewall

We a facing a situation and I would like to understand from the Databricks side what is the best practice regarding that. Question: Is it possible to have a cluster with a fixed Global IP on Databricks?DetailsWe have a vendor that has a SQL Server da...

Diagram
  • 5080 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Arnold Souza​ If you file a support to Azure support they can help customize the Vnet by unlocking it as the Azure Databricks resources are deployed in a managed resource group. Your plan B also should be the way to go if option 1 does not work as e...

  • 1 kudos
3 More Replies
Mado
by Valued Contributor II
  • 2328 Views
  • 4 replies
  • 0 kudos

Medallion architecture, how to update Gold tables?

Assume that I have a data source that is ingested to a few bronze tables, and transformed to a silver table. Ans next, a gold table is created by aggregating the silver table. If new records arrive in the data source, bronze and silver tables are upd...

  • 2328 Views
  • 4 replies
  • 0 kudos
Latest Reply
Mado
Valued Contributor II
  • 0 kudos

Hi @Vidula Khanna​ The answer didn't fit my question. In the case of using Merge, I found a good article here:https://medium.com/@avnishjain22/simplify-optimise-and-improve-your-data-pipelines-with-incremental-etl-on-the-lakehouse-61b279afadea

  • 0 kudos
3 More Replies
hv
by New Contributor
  • 3042 Views
  • 1 replies
  • 0 kudos

Error-"'Column' object is not callable".

I am trying to lowercase one of the columns(A_description) of a dataframe(df) and getting the error-"'Column' object is not callable".Code: def new_desc():  for line in df:    line = df['A_description'].map(str.lower)  return line new_desc()Have used...

  • 3042 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Himadri Verma​ Hope this below suggestion will help you in pyspark.Please let me know if you are looking for something elseHappy Learning!!

  • 0 kudos
CM1
by New Contributor
  • 5318 Views
  • 1 replies
  • 0 kudos

Can you migrate me from Customer Academy to Partner Academy

HelloI registered using my work email on the Customer Academy, but I should be on Partner Academy.Can you migrate my account as you have done on other posts, iehttps://community.databricks.com/s/question/0D53f00001fcieKCAQ/cannot-sign-in-at-databrick...

  • 5318 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Chris M​ For any issue with Academy learnings/certifications, you can raise a ticket in the below link, sharing it with you for your future reference as well.https://help.databricks.com/s/contact-us?ReqType=trainingHappy Learning!!

  • 0 kudos
Vladif1
by New Contributor II
  • 5183 Views
  • 4 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 5183 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Vlad Feigin​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
3 More Replies
RafaelGomez61
by New Contributor
  • 2581 Views
  • 2 replies
  • 0 kudos

Can't access delta tables under SQL Warehouse cluster. Getting Error while using path .../_delta_log/000000000.checkpoint

In our Databricks workspace, we have several delta tables available in the hive_metastore catalog. we are able to access and query the data via Data Science & Engineering persona clusters with no issues. The cluster have the credential passthrough en...

  • 2581 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rafael Gomez​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 0 kudos
1 More Replies
jerry-xu-sa
by New Contributor II
  • 1900 Views
  • 2 replies
  • 1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

  • 1900 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

  • 1 kudos
1 More Replies
wschoi
by New Contributor III
  • 2191 Views
  • 4 replies
  • 1 kudos

Resolved! How can I cluster-install a c-Python library (pyRFC)?

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to mat...

  • 2191 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Wonseok Choi​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 1 kudos
3 More Replies
ramz
by New Contributor II
  • 2480 Views
  • 4 replies
  • 1 kudos

High driver memory usage on loading parquet file

Hi, I am using pyspark and i am reading a bunch of parquet files and doing the count on each of them. Driver memory shoots up about 6G to 8G. My setup:I have a cluster of 1 driver node and 2 worker node (all of them 16 core 128 GB RAM). This is th...

  • 2480 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @ramz siva​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wi...

  • 1 kudos
3 More Replies
pepe
by New Contributor II
  • 8066 Views
  • 2 replies
  • 1 kudos

Why can't I install python libraries when i update cluster runtime from 10.1 to 12.1?

This same question was asked here 9 months ago without any answer:https://community.databricks.com/s/question/0D58Y000096VjKrSAK/managedlibraryinstallfailed-when-changing-databricks-runtime-version-from-91-to-110I was using runtime 9.1, and then upgr...

  • 8066 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @JOSE RODRIGUEZ​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 1 kudos
1 More Replies
Ondrej_Lostak
by New Contributor
  • 1023 Views
  • 2 replies
  • 0 kudos

Visulization only from sample of data

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is incon...

  • 1023 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ondrej Lostak​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
1 More Replies
thushar
by Contributor
  • 2371 Views
  • 4 replies
  • 0 kudos

Delta file partitions

Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'. Wh...

  • 2371 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Thushar R​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 0 kudos
3 More Replies
sedat
by New Contributor II
  • 3463 Views
  • 2 replies
  • 0 kudos

Rust support (?) in databricks

Hi, for kafka streams and integration, I have seen some presentations and documents Rust is a good alternative to Spark. Is there a native support for RUST in databricks or what is best method to connect to kafka resources within Databricks.thanks fo...

  • 3463 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sedat EKSI​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
1 More Replies
Anjum
by New Contributor II
  • 3624 Views
  • 6 replies
  • 1 kudos

PGP encryption and decryption using gnupg

Hi,We are using python-gnupg==0.4.8 package for encryption and decryption and this was working as expected when we are using Databricks runtime : 9.1 LTS but when we upgarded our runtime to 12.1, it stopped working with error "gnupghome should be a d...

  • 3624 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Anjum Aara​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 1 kudos
5 More Replies
Prasann_gupta
by New Contributor
  • 5635 Views
  • 3 replies
  • 0 kudos

SQL CONTAINS Function is not working on Databricks

I am trying to use sql CONTAINS function in my sql query but it is throwing the below error :AnalysisException: Undefined function: 'CONTAINS'. This function is neither a registered temporary function nor a permanent function registered in the databa...

  • 5635 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Prasann Gupta​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 0 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors