cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chhaya
by New Contributor III
  • 2003 Views
  • 6 replies
  • 2 kudos

Using great expectations with autolaoder

Hi everyone ,I have implemented a data pipeline using autoloader bronze-->silver-->gold .now while I do this I want to perform some data quality checks , and for that I'm using great expectations library.However I'm stuck with below error when trying...

  • 2003 Views
  • 6 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Chhaya Vishwakarma​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your fe...

  • 2 kudos
5 More Replies
ckwan48
by New Contributor III
  • 1217 Views
  • 4 replies
  • 1 kudos

Different results in Databricks using SARIMAX

In Databricks, using 11.3 ML runtime give different results when using general purpose vs memory-optimized workers. I used SARIMAX and to forecast the results but I’m getting different results when I change the driver and worker types to this options...

  • 1217 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kevin Kim​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 1 kudos
3 More Replies
brendanc19
by New Contributor III
  • 1780 Views
  • 5 replies
  • 2 kudos

Resolved! Does cancelling a job run rollback any actions performed by query plan?

If I were to stop a rather large job run, say half way thru execution, will any actions performed on our Delta tables persist or will they be rolled back?Are there any other risks that I need to be aware of in terms of cancelling a job run half way t...

  • 1780 Views
  • 5 replies
  • 2 kudos
Latest Reply
brendanc19
New Contributor III
  • 2 kudos

Will do, thank you Vartika

  • 2 kudos
4 More Replies
Sid1805
by New Contributor II
  • 1949 Views
  • 5 replies
  • 0 kudos

Resolved! Calling Delta Tables using JDBC

Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...

  • 1949 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Siddharth Krishna​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...

  • 0 kudos
4 More Replies
Arnold_Souza
by New Contributor III
  • 3536 Views
  • 4 replies
  • 1 kudos

Connect Databricks to a database protected by a firewall

We a facing a situation and I would like to understand from the Databricks side what is the best practice regarding that. Question: Is it possible to have a cluster with a fixed Global IP on Databricks?DetailsWe have a vendor that has a SQL Server da...

Diagram
  • 3536 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Arnold Souza​ If you file a support to Azure support they can help customize the Vnet by unlocking it as the Azure Databricks resources are deployed in a managed resource group. Your plan B also should be the way to go if option 1 does not work as e...

  • 1 kudos
3 More Replies
Mado
by Valued Contributor II
  • 1768 Views
  • 4 replies
  • 0 kudos

Medallion architecture, how to update Gold tables?

Assume that I have a data source that is ingested to a few bronze tables, and transformed to a silver table. Ans next, a gold table is created by aggregating the silver table. If new records arrive in the data source, bronze and silver tables are upd...

  • 1768 Views
  • 4 replies
  • 0 kudos
Latest Reply
Mado
Valued Contributor II
  • 0 kudos

Hi @Vidula Khanna​ The answer didn't fit my question. In the case of using Merge, I found a good article here:https://medium.com/@avnishjain22/simplify-optimise-and-improve-your-data-pipelines-with-incremental-etl-on-the-lakehouse-61b279afadea

  • 0 kudos
3 More Replies
hv
by New Contributor
  • 2093 Views
  • 1 replies
  • 0 kudos

Error-"'Column' object is not callable".

I am trying to lowercase one of the columns(A_description) of a dataframe(df) and getting the error-"'Column' object is not callable".Code: def new_desc():  for line in df:    line = df['A_description'].map(str.lower)  return line new_desc()Have used...

  • 2093 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Himadri Verma​ Hope this below suggestion will help you in pyspark.Please let me know if you are looking for something elseHappy Learning!!

  • 0 kudos
CM1
by New Contributor
  • 977 Views
  • 1 replies
  • 0 kudos

Can you migrate me from Customer Academy to Partner Academy

HelloI registered using my work email on the Customer Academy, but I should be on Partner Academy.Can you migrate my account as you have done on other posts, iehttps://community.databricks.com/s/question/0D53f00001fcieKCAQ/cannot-sign-in-at-databrick...

  • 977 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Chris M​ For any issue with Academy learnings/certifications, you can raise a ticket in the below link, sharing it with you for your future reference as well.https://help.databricks.com/s/contact-us?ReqType=trainingHappy Learning!!

  • 0 kudos
Vladif1
by New Contributor II
  • 4032 Views
  • 4 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 4032 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Vlad Feigin​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
3 More Replies
RafaelGomez61
by New Contributor
  • 1881 Views
  • 2 replies
  • 0 kudos

Can't access delta tables under SQL Warehouse cluster. Getting Error while using path .../_delta_log/000000000.checkpoint

In our Databricks workspace, we have several delta tables available in the hive_metastore catalog. we are able to access and query the data via Data Science & Engineering persona clusters with no issues. The cluster have the credential passthrough en...

  • 1881 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Rafael Gomez​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 0 kudos
1 More Replies
jerry-xu-sa
by New Contributor II
  • 1315 Views
  • 2 replies
  • 1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

  • 1315 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

  • 1 kudos
1 More Replies
wschoi
by New Contributor III
  • 1494 Views
  • 4 replies
  • 1 kudos

Resolved! How can I cluster-install a c-Python library (pyRFC)?

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to mat...

  • 1494 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Wonseok Choi​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 1 kudos
3 More Replies
ramz
by New Contributor II
  • 1826 Views
  • 4 replies
  • 1 kudos

High driver memory usage on loading parquet file

Hi, I am using pyspark and i am reading a bunch of parquet files and doing the count on each of them. Driver memory shoots up about 6G to 8G. My setup:I have a cluster of 1 driver node and 2 worker node (all of them 16 core 128 GB RAM). This is th...

  • 1826 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @ramz siva​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wi...

  • 1 kudos
3 More Replies
pepe
by New Contributor II
  • 2161 Views
  • 2 replies
  • 1 kudos

Why can't I install python libraries when i update cluster runtime from 10.1 to 12.1?

This same question was asked here 9 months ago without any answer:https://community.databricks.com/s/question/0D58Y000096VjKrSAK/managedlibraryinstallfailed-when-changing-databricks-runtime-version-from-91-to-110I was using runtime 9.1, and then upgr...

  • 2161 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @JOSE RODRIGUEZ​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 1 kudos
1 More Replies
Ondrej_Lostak
by New Contributor
  • 691 Views
  • 2 replies
  • 0 kudos

Visulization only from sample of data

When I display dataframe and add visualization, I can see a preview from only a sample of data, and when I confirm it, it is counted from all of the data. Until now, everything is fine. However, when I change the dataframe, the visualization is incon...

  • 691 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ondrej Lostak​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
1 More Replies
Labels
Top Kudoed Authors